Sequential Generalized Likelihood Ratio Tests for Online Item Monitoring

Hyeon-Ah Kang

doi:10.1007/s11336-022-09871-9

Sequential Generalized Likelihood Ratio Tests for Online Item Monitoring

Published online by Cambridge University Press: 01 January 2025

Hyeon-Ah Kang

Show author details

Hyeon-Ah Kang*: Affiliation:
University of Texas at Austin
*: Correspondence should be made to Hyeon-Ah Kang, University of Texas at Austin, Austin, USA. Email: hkang@austin.utexas.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The study presents statistical procedures that monitor functioning of items over time. We propose generalized likelihood ratio tests that surveil multiple item parameters and implement with various sampling techniques to perform continuous or intermittent monitoring. The procedures examine stability of item parameters across time and inform compromise as soon as they identify significant parameter shift. The performance of the monitoring procedures was validated using simulated and real-assessment data. The empirical evaluation suggests that the proposed procedures perform adequately well in identifying the parameter drift. They showed satisfactory detection power and gave timely signals while regulating error rates reasonably low. The procedures also showed superior performance when compared with the existent methods. The empirical findings suggest that multivariate parametric monitoring can provide an efficient and powerful control tool for maintaining the quality of items. The procedures allow joint monitoring of multiple item parameters and achieve sufficient power using powerful likelihood-ratio tests. Based on the findings from the empirical experimentation, we suggest some practical strategies for performing online item monitoring.

Keywords

item parameter drift online monitoring sequential generalized likelihood ratio test cumulative sum control chart response time

Type: Theory and Methods
Information: Psychometrika , Volume 88 , Issue 2 , June 2023 , pp. 672 - 696

DOI: https://doi.org/10.1007/s11336-022-09871-9 [Opens in a new window]
Copyright: Copyright © 2022 The Author(s) under exclusive licence to The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Armstrong, R. D.,Shi, M.A parametric cumulative sum statistic for person fit.Applied Psychological Measurement,(2009).33,391–410.CrossRef Google Scholar

Ban, J. C.,Hanson, B. A.,Wang, T.,Yi, Q., &Harris, D. J.(2001).A comparative study of on-line pretest item-calibration/scaling methods in computerized adaptive testing.Journal of Educational Measurement,38(3),191–212.CrossRef Google Scholar

Basseville, M., & Nikiforov, I. V. (1993). Detection of abrupt changes: Theory and applications. Prentice-Hall Inc.Google Scholar

Birnbaum, A. (1968). Theories of mental test scores. In F. M. Lord & M. R. Novick (Eds.), Some latent trait models and their use in inferring an examinee’s ability (pp. 397–479). MA: Addison-Wesley, Reading.Google Scholar

Bock, R.,Muraki, E., &Pfeiffenberger, W.(1988).Item pool maintenance in the presence of item parameter drift.Journal of Educational Measurement,25,275–285.CrossRef Google Scholar

Choe, E. M.,Zhang, J., &Chang, H-.(2018).Sequential detection of compromised items using response times in computerized adaptive testing.Psychometrika,83,650–673.CrossRef Google Scholar PubMed

Clark, A. (2013). Review of parameter drift methodology and implications for operational testing. Retrieved from https://www.ncbex.org/statistics-and-research/covington-award.Google Scholar

Cohen, J.(1992).A power primer.Psychological Bulletin,112,155–159.CrossRef Google Scholar PubMed

Crosier, R. B.(1988).Multivariate generalizations of cumulative sum quality-control schemes.Technometrics,30,291–303.CrossRef Google Scholar

DeMars, C. E.(2004).Detection of item parameter drift over multiple test administrations.Applied Measurement in Education,17,265–300.CrossRef Google Scholar

Donoghue, J. R., &Isham, S. P.(1998).A comparison of procedures to detect item parameter drift.Applied Psychological Measurement,22(1),33–51.CrossRef Google Scholar

Goldstein, H.(1983).Measuring changes in educational attainment over time: Problems and possibilities.Journal of Educational Measurement,20,369–377.CrossRef Google Scholar

Guo, H.,Robin, F., &Dorans, N.(2017).Detecting item drift in large-scale testing.Journal of Educational Measurement,54,265–284.CrossRef Google Scholar

Healy, J. D.(1987).A note on multivariate CUSUM procedures.Technometrics,29,409–412.CrossRef Google Scholar

Hotelling, H.(1931).The generalization of Student’s ratio.Annals of Mathematical Statistics,2,360–378.CrossRef Google Scholar

Huggins-Manley, A. C.(2017).Psychometric Consequences of Subpopulation Item Parameter Drift.Educational and Psychological Measurement,2017,143–164.CrossRef Google Scholar

Kang, H.-A.,Zheng, Y., &Chang, H-.(2020).Online Calibration of a Joint Model of Item Responses and Response Times in Computerized Adaptive Testing.Journal of Educational and Behavioral Statistics,45,175–208.CrossRef Google Scholar

Klein Entink, R. H.,Kuhn, J.-T.,Hornke, L. F., &Fox, J-.(2009).Evaluating cognitive theory: A joint modeling approach using responses and response times.Psychological Methods,14,54–75.CrossRef Google Scholar PubMed

Lai, T.,Ghosh, B. K., &Sen, P. K.(1991).Asymptotic optimality of generalized sequential likelihood ratio tests in some classical sequential testing problems.Handbook of sequential analysis handbook of sequential analysis,New York:Marcel Dekker Inc.121–144.Google Scholar

Lee, Y.-H., &Lewis, C.(2021).Monitoring item performance with CUSUM statistics in continuous testing.Journal of Educational and Behavioral Statistics,46,611–648.CrossRef Google Scholar

Liu, C.,Han, K. T., &Li, J.(2019).Compromised item detection for computerized adaptive testing.Front. Psychol.,10,829CrossRef Google Scholar PubMed

Lowry, C. A.,Woodall, W. H.,Champ, C. W., &Rigdon, S. E.(1992).A multivariate EWMA control chart.Technometrics,34,46–53.CrossRef Google Scholar

Marianti, S.,Fox, J.-P.,Avetisyan, M.,Veldkamp, B. P., &TijmstraFirs, J.(2014).Testing for aberrant behavior in response time modeling.Journal of Educational and Behavioral Statistics,39,426–451.CrossRef Google Scholar

Page, E. S.(1954).Continuous inspection schemes.Biometrika,41,100–115.CrossRef Google Scholar

Pignatiello, J. J., &Runger, G. C.(1990).Comparisons of multivariate CUSUM charts.Journal of Quality Technology,22,173–186.CrossRef Google Scholar

Segall, D. O.(2002).An item response model for characterizing test compromise.Journal of Educational and Behavioral Statistics,27,163–179.CrossRef Google Scholar

Segall, D. O.(2004).A sharing item response theory model for computerized adaptive testing.Journal of Educational and Behavioral Statistics,29,439–460.CrossRef Google Scholar

Shu, Z.,Henson, R., &Luecht, R.(2013).Using deterministic, gated item response theory model to detect test cheating due to item compromise.Psychometrika,78,481–497.CrossRef Google Scholar PubMed

Sinharay, S., &Johnson, M. S.(2020).The use of item scores and response times to detect examinees who may have benefited from item preknowledge.British Journal of Mathematical and Statistical Psychology,73,397–419.CrossRef Google Scholar PubMed

Tendeiro, J. N.,Meijer, R. R.,Schakel, L., &Maij-de Meij, A. M.(2013).Using cumulative sum statistics to detect inconsistencies in unproctored internet testing.Educational and Psychological Measurement,73,143–161.CrossRef Google Scholar

van der Linden, W. J.(2006).A lognormal model for response times on test items.Journal of Educational and Behavioral Statistics,31,181–204.CrossRef Google Scholar

van der Linden, W. J.(2007).A hierarchical framework for modeling speed and accuracy on test items.Psychometrika,72,287–308.CrossRef Google Scholar

van der Linden, W. J., &Guo, F.(2008).Bayesian procedures for identifying aberrant response-time patterns in adaptive testing.Psychometrika,73(3),365–384.CrossRef Google Scholar

van Krimpen-Stoop, E. M. L. A., &Meijer, R. R.(2001).CUSUM-based person-fit statistics for adaptive testing.Journal of Educational and Behavioral Statistics,26,199–218.CrossRef Google Scholar

Veerkamp, W. J. J., &Glas, C. A. W.(2000).Detection of known items in adaptive testing with a statistical quality control method.Journal of Educational and Behavioral Statistics,25,373–389.CrossRef Google Scholar

Wang, X., &Liu, Y.(2020).Detecting compromised items using information from secure items.Journal of Educational and Behavioral Statistics,45,667–689.CrossRef Google Scholar

Wells, C. S.,Subkoviak, M. J., &Serlin, R. C.(2002).The effect of item parameter drift on examinee ability estimates.Applied Psychological Measurement,26,77–87.CrossRef Google Scholar

Wilks, S. S.(1938).The large-sample distribution of the likelihood ratio for testing composite hypotheses.The Annals of Mathematical Statistics,9,60–62.CrossRef Google Scholar

Woodall, W. H., &Ncube, M. M.(1985).Multivariate CUSUM quality control procedures.Technometrics,27,285–292.CrossRef Google Scholar

Yang, Y., Ferdous, A., & Chin, T. Y. (2007). Exposed items detection in personnel selection assessment: An exploration of new item statistic. Chicago, IL: Paper presented at the annual meeting of the National Council of Measurement in Education.Google Scholar

Zhang, J.(2014).A sequential procedure for detecting compromised items in the item pool of CAT system.Applied Psychological Measurement,38,87–104.CrossRef Google Scholar

Zhang, J., &Li, J.(2016).Monitoring items in real time to enhance CAT security.Journal of Educational Measurement,53,131–151.CrossRef Google Scholar

Zhang, J.,Li, Z., &Wang, Z.(2010).A multivariate control chart for simultaneously monitoring process mean and variability.Computational Statistics and Data Analysis,54,2244–2252.CrossRef Google Scholar

Zopluoglu, C.(2019).Detecting examinees with item Preknowledge in large-scale testing using extreme gradient boosting (XGBoost).Educational and Psychological Measurement,79,931–961.CrossRef Google Scholar PubMed

Article contents

Sequential Generalized Likelihood Ratio Tests for Online Item Monitoring

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests