Hostname: page-component-5f745c7db-96s6r Total loading time: 0 Render date: 2025-01-06T06:44:45.611Z Has data issue: true hasContentIssue false

Statistical Foundations for Computerized Adaptive Testing with Response Revision

Published online by Cambridge University Press:  01 January 2025

Shiyu Wang*
Affiliation:
University of Georgia
Georgios Fellouris
Affiliation:
University of Illinois at Urbana-Champaign
Hua-Hua Chang
Affiliation:
Purdue University
*
Correspondence should be made to Shiyu Wang, University of Georgia, Athens, USA. Email: swang44@uga.edu; https://coe.uga.edu/directory/profiles/swang44

Abstract

The compatibility of computerized adaptive testing (CAT) with response revision has been a topic of debate in psychometrics for many years. The problem is to provide test takers opportunities to change their answers during the test, while discouraging deceptive strategies from their side and preserving the statistical efficiency of the traditional CAT. The estimating approach proposed in Wang et al. (Stat Sin 27(4):1987–2010, 2017), based on the nominal response model, allows test takers to provide more than one answer to each item during the test, which they all contribute to the interim and final ability estimation. This approach is here reformulated, extended to incorporate a larger class of polytomous and dichotomous item response theory models, and investigated with simulation studies under different test-taking strategies.

Type
Original Paper
Copyright
Copyright © 2019 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Al-Hamly, M., Coombe, C. (2005). To change or not to change: Investigating the value of mcq answer changing for Gulf Arab students. Language Testing, 22(4), 509531.CrossRefGoogle Scholar
Barton, M. A., Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item-response model*. ETS Research Report Series, 1981(1), i8.CrossRefGoogle Scholar
Benjamin, LT Jr, Cavell, T. A., Shallenberger, WR III (1984). Staying with initial answers on objective tests: Is it a myth?. Teaching of Psychology, 11(3), 133141.CrossRefGoogle Scholar
Billingsley, P. (2008). Probability and measure, Hoboken: Wiley.Google Scholar
Bock, R. D., Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443459.CrossRefGoogle Scholar
Bock, R. D., Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35(2), 179197.CrossRefGoogle Scholar
Davey, T., & Fan, M. (2000). Specific information item selection for adaptive testing. In Annual meeting of the National Council on Measurement in Education, New Orleans, LA.Google Scholar
Green, B. F., Bock, R. D., Humphreys, L. G., Linn, R. L., Reckase, M. D. (1984). Technical guidelines for assessing computerized adaptive tests. Journal of Educational Measurement, 21(4), 347360.CrossRefGoogle Scholar
Han, K. T. (2013). Item pocket method to allow response review and change in computerized adaptive testing. Applied Psychological Measurement, 37(4), 259275.CrossRefGoogle Scholar
Han, K. T. (2015). Happy cat: Options to allow test takers to review and change responses in cat. In In the International Association of Computerized Adaptive Testing.Google Scholar
Jeon, M., De Boeck, P., van der Linden, W. (2017). Modeling answer change behavior: An application of a generalized item response tree model. Journal of Educational and Behavioral Statistics, 42(4), 467490.CrossRefGoogle Scholar
Kingsbury, G. (1996). Item review and adaptive testing. In Annual Meeting of the National Council on Measurement in Education, New York, NY.Google Scholar
Kruger, J., Wirtz, D., Miller, D. T. (2005). Counterfactual thinking and the first instinct fallacy. Journal of Personality and Social Psychology, 88(5), 725.CrossRefGoogle ScholarPubMed
Linn, R. L., Rock, D. A., Cleary, T. A. (1969). The development and evaluation of several programmed testing methods. Educational and Psychological Measurement, 29(1), 129146.CrossRefGoogle Scholar
Liu, O. L., Bridgeman, B., Gu, L., Xu, J., Kong, N. (2015). Investigation of response changes in the GRE revised general test. Educational and Psychological Measurement, 75(6), 10021020.CrossRefGoogle ScholarPubMed
Lord, F. M. (1971). Robbins-Monro procedures for tailored testing. Educational and Psychological Measurement, 31(1), 331.CrossRefGoogle Scholar
Luecht, R. M., Nungester, R. J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229249.CrossRefGoogle Scholar
Luecht, R. M., Sireci, S. G. (2011). A review of models for computer-based testing. Research report 2011–12. College Board.Google Scholar
Pagni, S. E., Bak, A. G., Eisen, S. E., Murphy, J. L., Finkelman, M. D., Kugel, G. (2017). The benefit of a switch: Answer-changing on multiple-choice exams by first-year dental students. Journal of Dental Education, 81(1), 110115.CrossRefGoogle ScholarPubMed
Papanastasiou, E. C., Reckase, M. D. (2007). A "rearrangement procedure" for scoring adaptive tests with review options. International Journal of Testing, 7(4), 387407.CrossRefGoogle Scholar
Passos, V. L., Berger, M. P., Tan, F. E. (2007). Test design optimization in cat early stage with the nominal response model. Applied Psychological Measurement, 31(3), 213232.CrossRefGoogle Scholar
Stocking, M. L. (1997). Revising item responses in computerized adaptive tests: A comparison of three models. Applied Psychological Measurement, 21(2), 129142.CrossRefGoogle Scholar
van der Linden, W. J., Jeon, M. (2012). Modeling answer changes on test items. Journal of Educational and Behavioral Statistics, 37(1), 180199.CrossRefGoogle Scholar
van der Linden, W. J., Jeon, M., Ferrara, S. (2011). A paradox in the study of the benefits of test-item review. Journal of Educational Measurement, 48(4), 380398.CrossRefGoogle Scholar
Vispoel, W. P. (1998). Reviewing and changing answers on computer-adaptive and self-adaptive vocabulary tests. Journal of Educational Measurement, 35(4), 328345.CrossRefGoogle Scholar
Vispoel, W. P., Hendrickson, A. B., Bleiler, T. (2000). Limiting answer review and change on computerized adaptive vocabulary tests: Psychometric and attitudinal results. Journal of Educational Measurement, 37(1), 2138.CrossRefGoogle Scholar
Wainer, H. (1993). Some practical considerations when converting a linearly administered test to an adaptive format. Educational Measurement: Issues and Practice, 12(1), 1520.CrossRefGoogle Scholar
Wainer, H., Dorans, N. J., Flaugher, R., Green, B. F., Mislevy, R. J. (2000). Computerized adaptive testing: A primer, Abingdon: Routledge.CrossRefGoogle Scholar
Wainer, H., Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24(3), 185201.CrossRefGoogle Scholar
Wang, S., Fellouris, G., Chang, H.-H. (2017). Computerized adaptive testing that allows for response revision: Design and asymptotic theory. Statistica Sinica, 27(4), 19872010.Google Scholar
Williams, D. (1991). Probability with martingales, Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Wise, S. L. (1996). A critical analysis of the arguments for and against item review in computerized adaptive testing. In Annual Meeting of the National Council on Measurement in Education (NCME), volume 1996.Google Scholar
Wise, S. L., Finney, S. J., Enders, C. K., Freeman, S. A., Severance, D. D. (1999). Examinee judgments of changes in item difficulty: Implications for item review in computerized adaptive testing. Applied Measurement in Education, 12(2), 185198.CrossRefGoogle Scholar
Yen, Y.-C., Ho, R.-G., Liao, W.-W., Chen, L.-J. (2012). Reducing the impact of inappropriate items on reviewable computerized adaptive testing. Educational Technology & Society, 15(2), 231243.Google Scholar