Hostname: page-component-5f745c7db-sbzbt Total loading time: 0 Render date: 2025-01-06T07:53:16.114Z Has data issue: true hasContentIssue false

Sequential Detection of Compromised Items Using Response Times in Computerized Adaptive Testing

Published online by Cambridge University Press:  01 January 2025

Edison M. Choe*
Affiliation:
Graduate Management Admission Council® (GMAC®)
Jinming Zhang
Affiliation:
University of Illinois at Urbana-Champaign
Hua-Hua Chang
Affiliation:
University of Illinois at Urbana-Champaign
*
Correspondence should be made to Edison M. Choe, Graduate Management Admission Council® (GMAC®), 11921 Freedom Drive, Suite 300, Reston, VA 20190, USA. Email: echoe@gmac.com

Abstract

Item compromise persists in undermining the integrity of testing, even secure administrations of computerized adaptive testing (CAT) with sophisticated item exposure controls. In ongoing efforts to tackle this perennial security issue in CAT, a couple of recent studies investigated sequential procedures for detecting compromised items, in which a significant increase in the proportion of correct responses for each item in the pool is monitored in real time using moving averages. In addition to actual responses, response times are valuable information with tremendous potential to reveal items that may have been leaked. Specifically, examinees that have preknowledge of an item would likely respond more quickly to it than those who do not. Therefore, the current study proposes several augmented methods for the detection of compromised items, all involving simultaneous monitoring of changes in both the proportion correct and average response time for every item using various moving average strategies. Simulation results with an operational item pool indicate that, compared to the analysis of responses alone, utilizing response times can afford marked improvements in detection power with fewer false positives.

Type
Original Paper
Copyright
Copyright © The Psychometric Society 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Armstrong, R. D., &Shi, M. (2009). A parametric cumulative sum statistic for person fit.Applied Psychological Measurement, 33 391410.CrossRefGoogle Scholar
Armstrong, R. D., Stoumbos, Z. G., Kung, M. T., & Shi, M. (2007). On the performance of the lz person-fit statistic. Practical Assessment Research and Evaluation, 12(16).Google Scholar
Belov, D. I. (2014). Detecting item preknowledge in computerized adaptive testing using information theory and combinatorial optimization.Journal of Computerized Adaptive Testing, 2 3758.CrossRefGoogle Scholar
Belov, D. I. (2015). Comparing the performance of eight item preknowledge detection statistics.Applied Psychological Measurement, 40 8397.10.1177/0146621615603327298810405982173CrossRefGoogle ScholarPubMed
Belov, D. I., &Armstrong, R. D. (2010). Automatic detection of answer copying via Kullback–Leibler divergence and K-Index.Applied Psychological Measurement, 34 379392.CrossRefGoogle Scholar
Belov, D. I.,Pashley, P. J.,Lewis, C., &Armstrong, R. D. (2007). Detecting aberrant responses with Kullback–Leibler distance.Shigemasu, K.Okada, A.Imaizumi, T.Hoshino, T. New trends in psychometrics.714.Tokyo:Universal Academy Press.Google Scholar
Bock, R. D., &Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment.Applied Psychological Measurement, 6 431444.CrossRefGoogle Scholar
Chang, H.-H. (2015). Psychometrics behind computerized adaptive testing.Psychometrika, 80 120.10.1007/s11336-014-9401-524499939CrossRefGoogle ScholarPubMed
Chang, H.-H.Qian, J., &Ying, Z. (2001). a\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$a$$\end{document}-stratified multistage computerized adaptive testing with b\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$b$$\end{document}-blocking.Applied Psychological Measurement, 25 333341.CrossRefGoogle Scholar
Chang, H.-H., &Stout, W. (1993). The asymptotic posterior normality of the latent trait in an IRT model.Psychometrika, 58 3752.CrossRefGoogle Scholar
Chang, H.-H., &Ying, Z. (1999). a\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$a$$\end{document}-stratified multistage computerized adaptive testing.Applied Psychological Measurement, 23 211222.CrossRefGoogle Scholar
Chang, H.-H.,&Ying, Z. (2008). To weight or not to weight? Balancing influence of initial items in adaptive testing.Psychometrika, 73 441450.CrossRefGoogle Scholar
Chang, H.-H., &Ying, Z. (2009). Nonlinear sequential designs for logistic item response theory models with applications to computerized adaptive tests.The Annals of Statistics, 37 14661488.CrossRefGoogle Scholar
Chang, S. W.,Ansley, T. N., & Lin, S. H. (2000). Performance of item exposure control methods in computerized adaptive testing: Further explorations. In Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA.Google Scholar
Drasgow, F.,Levine, M. V., &Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices.British Journal of Mathematical and Statistical Psychology, 38 6786.CrossRefGoogle Scholar
Egberink, I.,Meijer, R. R.,Veldkamp, B. P.,Schakel, L., &Smid, N. G. (2010). Detection of aberrant item score patterns in computerized adaptive testing: An empirical example using the CUSUM.Personality and Individual Differences, 48 921925.CrossRefGoogle Scholar
Georgiadou, E., Triantafillou, E., & Economides, A. A. (2007). A review of item exposure control strategies for computerized adaptive testing developed from 1983 to 2005. The Journal of Technology, Learning, and Assessment.Google Scholar
Han, N., & Hambleton, R. (2004). Detecting exposed test items in computer-based testing. In Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.Google Scholar
Hau, K. -T.,&Chang, H-H. (2001). Item selection in computerized adaptive testing: Should more discriminating items be used first?.Journal of Educational Measurement, 38 249266.CrossRefGoogle Scholar
Hetter, R. D., &Sympson, J. B. (1997). Item exposure control in CAT-ASVAB.Sands, W.Waters, B.McBride, J. Computerized adaptive testing: From inquiry to operation, Washington, DCAmerican Psychological Association 141144.CrossRefGoogle Scholar
Impara, J. C., & Kingsbury, G. (2005). Detecting cheating in computer adaptive tests using data forensics. In Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Cananda.Google Scholar
Kang, H.-A., & Chang, H.-H. (2016). Online detection of item compromise in CAT using responses and response times. In Paper presented at the annual meeting of the National Council on Measurement in Education, Washington, D.C.Google Scholar
Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics.Applied Measurement in Education, 16 277298.CrossRefGoogle Scholar
Kingsbury, G. G.Zara, A. R. (1989). Procedures for selecting items for computerized adaptive tests.Applied Measurement in Education, 2 359375.CrossRefGoogle Scholar
Levine, M. V.Drasgow, F. (1988). Optimal appropriateness measurement.Psychometrika, 53 161176.CrossRefGoogle Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems.Hillsdale, NJ:Erlbaum.Google Scholar
Lord, F. M.Novick, M. R. (1968). Statistical Theories of mental test scores.Reading, MA:Addison-Wesley.Google Scholar
Lu, Y., & Hambleton, R. (2003). Statistics for detecting disclosed items in a CAT environment (Research Report No. 498). Amherst, MA: University of Massachusetts, School of Education, Center for Educational Assessment.Google Scholar
Marianti, S.Fox, J-P.Marianna, A.Veldkamp, BP.Tijmstra, J. (2014). Testing for aberrant behavior in response time modeling.Journal of Educational and Behavioral Statistics, 39 426451.CrossRefGoogle Scholar
Mavridis, D.Moustaki, I. (2008). Detecting outliers in factor analysis using the forward search algorithm.Multivariate Behavioral Research, 43 435475.CrossRefGoogle ScholarPubMed
Mavridis, D.Moustaki, I. (2009). The forward search algorithm for detecting aberrant response patterns in factor analysis for binary data.Journal of Computational and Graphical Statistics, 18 10161034.CrossRefGoogle Scholar
McLeod, LD.Lewis, C. (1999). Detecting item memorization in the CAT environment.Applied Psychological Measurement, 23 147160.CrossRefGoogle Scholar
McLeod, L. D., & Schnipke, D. L. (1999). Detecting items that have been memorized in the computerized adaptive testing environment. In Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Canada.Google Scholar
Meijer, RR. (2002). Outlier detection in high-stakes certification testing.Journal of Educational Measurement, 39 219233.CrossRefGoogle Scholar
Meijer, R. R., & Sotaridona, L. S. (2006). Detection of advance item knowledge using response times in computer adaptive testing. Technical Report 03-03, Law School Admission Council.Google Scholar
Mislevy, RJ.Chang, H-H. (2000). Does adaptive testing violate local independence?.Psychometrika, 65 149156.CrossRefGoogle Scholar
Moustaki, I.Knott, M. (2014). Latent variable models that account for atypical responses.Journal of the Royal Statistical Society, Series C, 63 343360.CrossRefGoogle Scholar
O’Leary, LS.Smith, RW.Cizek, GJ.Wollack, JA. (2017). Detecting candidate preknowledge and compromised content using differential person and item functioning.Handbook of quantitative methods for detecting cheating on tests, New York, NYRoutledge 151163.Google Scholar
Öztürk, NK.Karabatsos, G. (2017). A Bayesian robust IRT outlier-detection model.Applied Psychological Measurement, 41 195208.10.1177/014662161667939429881088CrossRefGoogle ScholarPubMed
Risk, N. M. (2015). The impact of item parameter drift in computer adaptive testing (CAT) (Unpublished doctoral dissertation). University of Illinois at Chicago.Google Scholar
Stocking, M. L. (1993). Controlling item exposure rates in a realistic adaptive testing paradigm. ETS Research Report Series (pp. 1–31).CrossRefGoogle Scholar
Stocking, ML.Lewis, C. (1998). Controlling item exposure conditional on ability in computerized adaptive testing.Journal of Educational and Behavioral Statistics, 23 5775.CrossRefGoogle Scholar
Sympson, J. B., & Hetter, R. D. (1985). Controlling item-exposure rates in computerized adaptive testing. In Proceedings of the 27th annual meeting of the Military Testing Association, San Diego, CA: Navy Personnel Research and Development Center.Google Scholar
Tatsuoka, KK. (1984). Caution indices based on item response theory.Psychometrika, 49 95110.CrossRefGoogle Scholar
Tendeiro, JN.Meijer, RR. (2012). A CUSUM to detect person misfit: A discussion and some alternative for existing procedures.Applied Psychological Measurement, 36 420442.CrossRefGoogle Scholar
van der Linden, WJ. (2003). Some alternatives to Sympson–Hetter item-exposure control in computerized adaptive testing.Journal of Educational and Behavioral Statistics, 28 249265.CrossRefGoogle Scholar
van der Linden, WJ. (2006). A lognormal model for response times on test items.Journal of Educational and Behavioral Statistics, 31 181204.CrossRefGoogle Scholar
van der Linden, WJ. (2007). A hierarchical framework for modeling speed and accuracy on test items.Psychometrika, 72 287308.CrossRefGoogle Scholar
van der Linden, WJ.Guo, F. (2008). Bayesian procedures for identifying aberrant response-time patterns in adaptive testing.Psychometrika, 73 365384.CrossRefGoogle Scholar
van der Linden, WJ.Lewis, C. (2015). Bayesian checks on cheating on tests.Psychometrika, 80 689706.CrossRefGoogle ScholarPubMed
van der Linden, WJ.van Krimpen-Stoop, E. (2003). Using response times to detect aberrant responses in computerized adaptive testing.Psychometrika, 68 251265.CrossRefGoogle Scholar
van Krimpen-Stoop, E.Meijer, RR. (2001). CUSUM-based person-fit statistics for adaptive testing.Journal of Educational and Behavioral Statistics, 26 199218.CrossRefGoogle Scholar
Veerkamp, WJJ.Glas, CAW. (2000). Detection of known items in adaptive testing with a statistical quality control method.Journal of Educational and Behavioral Statistics, 25 373389.CrossRefGoogle Scholar
Zhang, J. (2014). A sequential procedure for detecting compromised items in the item pool of a CAT system.Applied Psychological Measurement, 38 87104.CrossRefGoogle Scholar
Zhang, J.Li, J. (2016). Monitoring items in real time to enhance CAT security.Journal of Educational Measurement, 53 131151.CrossRefGoogle Scholar
Zhu, R., Yu, F., & Liu, S. (2002). Statistical indexes for monitoring item behavior under computer adaptive testing environment. In: Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA.Google Scholar