Sequential Detection of Compromised Items Using Response Times in Computerized Adaptive Testing

Edison M. Choe; Jinming Zhang; Hua-Hua Chang

doi:10.1007/s11336-017-9596-3

Sequential Detection of Compromised Items Using Response Times in Computerized Adaptive Testing

Published online by Cambridge University Press: 01 January 2025

Edison M. Choe ,

Jinming Zhang and

Hua-Hua Chang

Show author details

Edison M. Choe*: Affiliation:
Graduate Management Admission Council® (GMAC®)
Jinming Zhang: Affiliation:
University of Illinois at Urbana-Champaign
Hua-Hua Chang: Affiliation:
University of Illinois at Urbana-Champaign
*: Correspondence should be made to Edison M. Choe, Graduate Management Admission Council® (GMAC®), 11921 Freedom Drive, Suite 300, Reston, VA 20190, USA. Email: echoe@gmac.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Item compromise persists in undermining the integrity of testing, even secure administrations of computerized adaptive testing (CAT) with sophisticated item exposure controls. In ongoing efforts to tackle this perennial security issue in CAT, a couple of recent studies investigated sequential procedures for detecting compromised items, in which a significant increase in the proportion of correct responses for each item in the pool is monitored in real time using moving averages. In addition to actual responses, response times are valuable information with tremendous potential to reveal items that may have been leaked. Specifically, examinees that have preknowledge of an item would likely respond more quickly to it than those who do not. Therefore, the current study proposes several augmented methods for the detection of compromised items, all involving simultaneous monitoring of changes in both the proportion correct and average response time for every item using various moving average strategies. Simulation results with an operational item pool indicate that, compared to the analysis of responses alone, utilizing response times can afford marked improvements in detection power with fewer false positives.

Keywords

test security response time computerized adaptive testing sequential analysis change-point detection repeated significance tests

Type: Original Paper
Information: Psychometrika , Volume 83 , Issue 3 , September 2018 , pp. 650 - 673

DOI: https://doi.org/10.1007/s11336-017-9596-3 [Opens in a new window]
Copyright: Copyright © The Psychometric Society 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Armstrong, R. D., &Shi, M. (2009). A parametric cumulative sum statistic for person fit.Applied Psychological Measurement, 33 391–410.CrossRef Google Scholar

Armstrong, R. D., Stoumbos, Z. G., Kung, M. T., & Shi, M. (2007). On the performance of the lz person-fit statistic. Practical Assessment Research and Evaluation, 12(16).Google Scholar

Belov, D. I. (2014). Detecting item preknowledge in computerized adaptive testing using information theory and combinatorial optimization.Journal of Computerized Adaptive Testing, 2 37–58.CrossRef Google Scholar

Belov, D. I. (2015). Comparing the performance of eight item preknowledge detection statistics.Applied Psychological Measurement, 40 83–97.10.1177/0146621615603327298810405982173CrossRef Google Scholar PubMed

Belov, D. I., &Armstrong, R. D. (2010). Automatic detection of answer copying via Kullback–Leibler divergence and K-Index.Applied Psychological Measurement, 34 379–392.CrossRef Google Scholar

Belov, D. I.,Pashley, P. J.,Lewis, C., &Armstrong, R. D. (2007). Detecting aberrant responses with Kullback–Leibler distance.Shigemasu, K.Okada, A.Imaizumi, T.Hoshino, T. New trends in psychometrics.7–14.Tokyo:Universal Academy Press.Google Scholar

Bock, R. D., &Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment.Applied Psychological Measurement, 6 431–444.CrossRef Google Scholar

Chang, H.-H. (2015). Psychometrics behind computerized adaptive testing.Psychometrika, 80 1–20.10.1007/s11336-014-9401-524499939CrossRef Google Scholar PubMed

Chang, H.-H.Qian, J., &Ying, Z. (2001).

a

\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$a$$\end{document}

-stratified multistage computerized adaptive testing with

b

-blocking.Applied Psychological Measurement, 25 333–341.CrossRef Google Scholar

Chang, H.-H., &Stout, W. (1993). The asymptotic posterior normality of the latent trait in an IRT model.Psychometrika, 58 37–52.CrossRef Google Scholar

Chang, H.-H., &Ying, Z. (1999).

a

-stratified multistage computerized adaptive testing.Applied Psychological Measurement, 23 211–222.CrossRef Google Scholar

Chang, H.-H.,&Ying, Z. (2008). To weight or not to weight? Balancing influence of initial items in adaptive testing.Psychometrika, 73 441–450.CrossRef Google Scholar

Chang, H.-H., &Ying, Z. (2009). Nonlinear sequential designs for logistic item response theory models with applications to computerized adaptive tests.The Annals of Statistics, 37 1466–1488.CrossRef Google Scholar

Chang, S. W.,Ansley, T. N., & Lin, S. H. (2000). Performance of item exposure control methods in computerized adaptive testing: Further explorations. In Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA.Google Scholar

Drasgow, F.,Levine, M. V., &Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices.British Journal of Mathematical and Statistical Psychology, 38 67–86.CrossRef Google Scholar

Egberink, I.,Meijer, R. R.,Veldkamp, B. P.,Schakel, L., &Smid, N. G. (2010). Detection of aberrant item score patterns in computerized adaptive testing: An empirical example using the CUSUM.Personality and Individual Differences, 48 921–925.CrossRef Google Scholar

Georgiadou, E., Triantafillou, E., & Economides, A. A. (2007). A review of item exposure control strategies for computerized adaptive testing developed from 1983 to 2005. The Journal of Technology, Learning, and Assessment.Google Scholar

Han, N., & Hambleton, R. (2004). Detecting exposed test items in computer-based testing. In Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.Google Scholar

Hau, K. -T.,&Chang, H-H. (2001). Item selection in computerized adaptive testing: Should more discriminating items be used first?.Journal of Educational Measurement, 38 249–266.CrossRef Google Scholar

Hetter, R. D., &Sympson, J. B. (1997). Item exposure control in CAT-ASVAB.Sands, W.Waters, B.McBride, J. Computerized adaptive testing: From inquiry to operation, Washington, DCAmerican Psychological Association 141–144.CrossRef Google Scholar

Impara, J. C., & Kingsbury, G. (2005). Detecting cheating in computer adaptive tests using data forensics. In Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Cananda.Google Scholar

Kang, H.-A., & Chang, H.-H. (2016). Online detection of item compromise in CAT using responses and response times. In Paper presented at the annual meeting of the National Council on Measurement in Education, Washington, D.C.Google Scholar

Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics.Applied Measurement in Education, 16 277–298.CrossRef Google Scholar

Kingsbury, G. G.Zara, A. R. (1989). Procedures for selecting items for computerized adaptive tests.Applied Measurement in Education, 2 359–375.CrossRef Google Scholar

Levine, M. V.Drasgow, F. (1988). Optimal appropriateness measurement.Psychometrika, 53 161–176.CrossRef Google Scholar

Lord, F. M. (1980). Applications of item response theory to practical testing problems.Hillsdale, NJ:Erlbaum.Google Scholar

Lord, F. M.Novick, M. R. (1968). Statistical Theories of mental test scores.Reading, MA:Addison-Wesley.Google Scholar

Lu, Y., & Hambleton, R. (2003). Statistics for detecting disclosed items in a CAT environment (Research Report No. 498). Amherst, MA: University of Massachusetts, School of Education, Center for Educational Assessment.Google Scholar

Marianti, S.Fox, J-P.Marianna, A.Veldkamp, BP.Tijmstra, J. (2014). Testing for aberrant behavior in response time modeling.Journal of Educational and Behavioral Statistics, 39 426–451.CrossRef Google Scholar

Mavridis, D.Moustaki, I. (2008). Detecting outliers in factor analysis using the forward search algorithm.Multivariate Behavioral Research, 43 435–475.CrossRef Google Scholar PubMed

Mavridis, D.Moustaki, I. (2009). The forward search algorithm for detecting aberrant response patterns in factor analysis for binary data.Journal of Computational and Graphical Statistics, 18 1016–1034.CrossRef Google Scholar

McLeod, LD.Lewis, C. (1999). Detecting item memorization in the CAT environment.Applied Psychological Measurement, 23 147–160.CrossRef Google Scholar

McLeod, L. D., & Schnipke, D. L. (1999). Detecting items that have been memorized in the computerized adaptive testing environment. In Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Canada.Google Scholar

Meijer, RR. (2002). Outlier detection in high-stakes certification testing.Journal of Educational Measurement, 39 219–233.CrossRef Google Scholar

Meijer, R. R., & Sotaridona, L. S. (2006). Detection of advance item knowledge using response times in computer adaptive testing. Technical Report 03-03, Law School Admission Council.Google Scholar

Mislevy, RJ.Chang, H-H. (2000). Does adaptive testing violate local independence?.Psychometrika, 65 149–156.CrossRef Google Scholar

Moustaki, I.Knott, M. (2014). Latent variable models that account for atypical responses.Journal of the Royal Statistical Society, Series C, 63 343–360.CrossRef Google Scholar

O’Leary, LS.Smith, RW.Cizek, GJ.Wollack, JA. (2017). Detecting candidate preknowledge and compromised content using differential person and item functioning.Handbook of quantitative methods for detecting cheating on tests, New York, NYRoutledge 151–163.Google Scholar

Öztürk, NK.Karabatsos, G. (2017). A Bayesian robust IRT outlier-detection model.Applied Psychological Measurement, 41 195–208.10.1177/014662161667939429881088CrossRef Google Scholar PubMed

Risk, N. M. (2015). The impact of item parameter drift in computer adaptive testing (CAT) (Unpublished doctoral dissertation). University of Illinois at Chicago.Google Scholar

Stocking, M. L. (1993). Controlling item exposure rates in a realistic adaptive testing paradigm. ETS Research Report Series (pp. 1–31).CrossRef Google Scholar

Stocking, ML.Lewis, C. (1998). Controlling item exposure conditional on ability in computerized adaptive testing.Journal of Educational and Behavioral Statistics, 23 57–75.CrossRef Google Scholar

Sympson, J. B., & Hetter, R. D. (1985). Controlling item-exposure rates in computerized adaptive testing. In Proceedings of the 27th annual meeting of the Military Testing Association, San Diego, CA: Navy Personnel Research and Development Center.Google Scholar

Tatsuoka, KK. (1984). Caution indices based on item response theory.Psychometrika, 49 95–110.CrossRef Google Scholar

Tendeiro, JN.Meijer, RR. (2012). A CUSUM to detect person misfit: A discussion and some alternative for existing procedures.Applied Psychological Measurement, 36 420–442.CrossRef Google Scholar

van der Linden, WJ. (2003). Some alternatives to Sympson–Hetter item-exposure control in computerized adaptive testing.Journal of Educational and Behavioral Statistics, 28 249–265.CrossRef Google Scholar

van der Linden, WJ. (2006). A lognormal model for response times on test items.Journal of Educational and Behavioral Statistics, 31 181–204.CrossRef Google Scholar

van der Linden, WJ. (2007). A hierarchical framework for modeling speed and accuracy on test items.Psychometrika, 72 287–308.CrossRef Google Scholar

van der Linden, WJ.Guo, F. (2008). Bayesian procedures for identifying aberrant response-time patterns in adaptive testing.Psychometrika, 73 365–384.CrossRef Google Scholar

van der Linden, WJ.Lewis, C. (2015). Bayesian checks on cheating on tests.Psychometrika, 80 689–706.CrossRef Google Scholar PubMed

van der Linden, WJ.van Krimpen-Stoop, E. (2003). Using response times to detect aberrant responses in computerized adaptive testing.Psychometrika, 68 251–265.CrossRef Google Scholar

van Krimpen-Stoop, E.Meijer, RR. (2001). CUSUM-based person-fit statistics for adaptive testing.Journal of Educational and Behavioral Statistics, 26 199–218.CrossRef Google Scholar

Veerkamp, WJJ.Glas, CAW. (2000). Detection of known items in adaptive testing with a statistical quality control method.Journal of Educational and Behavioral Statistics, 25 373–389.CrossRef Google Scholar

Zhang, J. (2014). A sequential procedure for detecting compromised items in the item pool of a CAT system.Applied Psychological Measurement, 38 87–104.CrossRef Google Scholar

Zhang, J.Li, J. (2016). Monitoring items in real time to enhance CAT security.Journal of Educational Measurement, 53 131–151.CrossRef Google Scholar

Zhu, R., Yu, F., & Liu, S. (2002). Statistical indexes for monitoring item behavior under computer adaptive testing environment. In: Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA.Google Scholar

Article contents

Sequential Detection of Compromised Items Using Response Times in Computerized Adaptive Testing

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests