Psychometrics Behind Computerized Adaptive Testing

Hua-Hua Chang

doi:10.1007/s11336-014-9401-5

Psychometrics Behind Computerized Adaptive Testing

Published online by Cambridge University Press: 01 January 2025

Hua-Hua Chang

Show author details

Hua-Hua Chang*: Affiliation:
University of Illinois at Urbana-Champaign
*: Requests for reprints should be sent to Hua-Hua Chang, University of Illinois at Urbana-Champaign, 430 Psychology Building, 630 E. Daniel Street, M/C 716, Champaign, IL 61820, USA. E-mail: hhchang@illinois.edu

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

The paper provides a survey of 18 years’ progress that my colleagues, students (both former and current) and I made in a prominent research area in Psychometrics—Computerized Adaptive Testing (CAT). We start with a historical review of the establishment of a large sample foundation for CAT. It is worth noting that the asymptotic results were derived under the framework of Martingale Theory, a very theoretical perspective of Probability Theory, which may seem unrelated to educational and psychological testing. In addition, we address a number of issues that emerged from large scale implementation and show that how theoretical works can be helpful to solve the problems. Finally, we propose that CAT technology can be very useful to support individualized instruction on a mass scale. We show that even paper and pencil based tests can be made adaptive to support classroom teaching.

Keywords

computerized adaptive testing multidimensional CAT sequential design martingale theory a-stratified item selection response time constraint management CD-CAT

Type: Original Paper
Information: Psychometrika , Volume 80 , Issue 1 , March 2015 , pp. 1 - 20

DOI: https://doi.org/10.1007/s11336-014-9401-5 [Opens in a new window]
Copyright: Copyright © 2013 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This article is based on the Presidential Address Hua-Hua Chang gave on June 25, 2013 at the 78th Annual Meeting of the Psychometric Society held in Arnhem, the Netherlands.

References

Armitage, P. (2002). Statistical methods in medical research (4th ed.). Bodmin: MPG Books.CrossRef Google Scholar

Carlson, S. (2000). ETS finds flaws in the way online GRE rates some students. The Chronicle of Higher Education, 47(8), A47.Google Scholar

Chang, H.-H. (2004). Understanding computerized adaptive testing—from Robbins—Monro to Lord, and beyond. In Kaplan, D. The Sage handbook of quantitative methods for the social sciences (pp. 117–133). Thousand Oaks: Sage.Google Scholar

Chang, H.-H. (2012). Making computerized adaptive testing diagnostic tools for schools. In Lissitz, R.W., & Jiao, H. (Eds.), Computers and their impact on state assessments: recent history and predictions for the future (pp. 195–226). Charlotte: Information Age Publisher.Google Scholar

Chang, H.-H., & Stout, W. (1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58(1), 37–52.CrossRef Google Scholar

Chang, H.-H., & van der Linden, W.J. (2003). Optimal stratification of item pools in a-stratified computerized adaptive testing. Applied Psychological Measurement, 27(4), 262–274.CrossRef Google Scholar

Chang, H.-H., & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20(3), 213–229.CrossRef Google Scholar

Chang, H.-H., & Ying, Z. (1999). a-stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23(3), 211–222.CrossRef Google Scholar

Chang, H.-H., & Ying, Z. (2007). Computerized adaptive testing. In Salkind, N. (Ed.), The Sage encyclopedia of measurement and statistics (pp. 170–174). Thousand Oaks, CA: Sage.Google Scholar

Chang, H.-H., & Ying, Z. (2008). To weight or not to weight? Balancing influence of initial items in adaptive testing. Psychometrika, 73(3), 441–450.CrossRef Google Scholar

Chang, H.-H., & Ying, Z. (2009). Nonlinear sequential designs for logistic item response theory models with applications to computerized adaptive tests. The Annals of Statistics, 37(3), 1466–1488.CrossRef Google Scholar

Chang, H.-H., Qian, J., & Ying, Z. (2001). a-stratified multistage computerized adaptive testing with b blocking. Applied Psychological Measurement, 25(4), 333–341.CrossRef Google Scholar

Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74(4), 619–642.CrossRef Google Scholar

Cheng, Y. (2010). Improving cognitive diagnostic computerized adaptive testing by balancing attribute coverage: the modified maximum global discrimination index method. Educational and Psychological Measurement, 70, 902–913.CrossRef Google Scholar

Cheng, Y., & Chang, H.-H. (2009). The maximum priority index method for severely constrained item selection in computerized adaptive testing. British Journal of Mathematical and Statistical Psychology, 62, 369–383.CrossRef Google Scholar PubMed

Cheng, Y., Chang, H.-H., & Yi, Q. (2007). Two-phase item selection procedure for flexible content balancing in CAT. Applied Psychological Measurement, 31(6), 467–482.CrossRef Google Scholar

Cheng, Y., Chang, H.-H., Douglas, J., & Guo, F. (2009). Constraint-weighted a-stratification for computerized adaptive testing with non-psychometric constraints: balancing measurement efficiency and exposure control. Educational and Psychological Measurement, 69, 35–49.CrossRef Google Scholar

Davey, T., & Nering, N. (2002). Controlling item exposure and maintaining item security. In Mills, C.N., Potenza, M.T., Fremer, J.J., & Ward, W.C. (Eds.), Computer-based testing: building the foundation for future assessments (pp. 165–191). Mahwah: Lawrence Erlbaum.Google Scholar

Downing, S.M. (2006). Twelve steps for effective test development. In Downing, S.M., & Haladyna, T.M. Handbook of test development (pp. 3–25). Mahwah: Lawrence Erlbaum Associates.Google Scholar

Fan, Z., Wang, C., Chang, H.-H., & Douglas, J. (2012). Utilizing response time distributions for item selection in computerized adaptive testing. Journal of Educational and Behavioral Statistics, 37(5), 655–670.CrossRef Google Scholar

Hau, K., & Chang, H.-H. (2001). Item selection in computerized adaptive testing: should more discriminating items be used first?. Journal of Educational Measurement, 38(3), 249–266.CrossRef Google Scholar

Hodges, J.I., & Lehmann, E.L. (1956). The efficiency of some nonparametric competitors of t-test. The Annals of Mathematical Statistics, 27(2), 324–335.CrossRef Google Scholar

Holland, P.W. (1990). The Dutch identity: a new tool for the study of item response theory model. Psychometrika, 55, 577–601.CrossRef Google Scholar

Klein Entink, R.H., van der Linden, W.J., & Fox, J.-P. (2009). A Box–Cox normal model for response times. British Journal of Mathematical and Statistical Psychology, 62, 621–640.CrossRef Google Scholar PubMed

Lan, K.K.G., & DeMets, D.L. (1983). Discrete sequential boundaries for clinical trials. Biometrika, 70(3), 659–663.CrossRef Google Scholar

Leung, C., Chang, H.-H., & Hau, K. (2003). Computerized adaptive testing: a comparison of three content balancing methods. The Journal of Technology, Learning, and Assessment, 2(5), 2–15.Google Scholar

Liu, H., You, X., Wang, W., Ding, S., & Chang, H.-H. (2013). The development of computerized adaptive testing with cognitive diagnosis for an English achievement test in China. Journal of Classification, 30, 152–172.CrossRef Google Scholar

Lord, M.F. (1970). Some test theory for tailored testing. In Holzman, W.H. Computer assisted instruction, testing, and guidance (pp. 139–183). New York: Harper and Row.Google Scholar

Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale: Erlbaum.Google Scholar

Luecht, R.M., & Nungester, R.J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229–249.CrossRef Google Scholar

Maris, E. (1993). Additive and multiplicative models for gamma distributed random variables, and their applications as psychometric models for response times. Psychometrika, 58, 445–469.CrossRef Google Scholar

McGlohen, M., & Chang, H.-H. (2008). Combining computer adaptive testing technology with cognitively diagnostic assessment. Behavior Research Methods, 40(3), 808–821.CrossRef Google Scholar PubMed

Merritt, J. (2003). Why the folks at ETS flunked the course—a tech-savvy service will soon be giving B-school applicants their GMATs. Business Week, Dec. 29.Google Scholar

Mislevy, R., & Chang, H.-H. (2000). Does adaptive testing violate local independence?. Psychometrika, 65(2), 149–156.CrossRef Google Scholar

Mulder, J., & van der Linden, W.J. (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74(2), 273–296.CrossRef Google Scholar PubMed

O’Brien, P.C., & Fleming, T.R. (1979). A multiple testing procedure for clinical trials. Biometrics, 35, 549–556.CrossRef Google Scholar PubMed

Pocock, S.J. (2002). Clinical trials: a practical research approach. Padstow: TJ International.Google Scholar

Ranger, J., & Kuhn, J.T. (2011). A flexible latent trait model for response times in tests. Psychometrika, 77, 31–47.CrossRef Google Scholar

Reckase, M.D. (2009). Multidimensional item response theory. New York: Springer.CrossRef Google Scholar

Reckase, M.D., & McKinley, R.L. (1991). The discrimination power of items that measure more than one dimension. Applied Psychological Measurement, 15(4), 361–373.CrossRef Google Scholar

Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22, 400–407.CrossRef Google Scholar

Roskam, E.E. (1997). Models for speed and time-limit tests. In van der Linden, W.J., & Hambleton, R. (Eds.), Handbook of modern item response theory (pp. 187–208). New York: Springer.CrossRef Google Scholar

Rounder, J.N., Sun, D., Speckman, P.L., Lu, J., & Zhou, D. (2003). A hierarchical Bayesian statistical framework for response time distributions. Psychometrika, 68, 589–606.CrossRef Google Scholar

Scheiblechner, H. (1979). Specific objective stochastic latency mechanisms. Journal of Mathematical Psychology, 19, 18–38.CrossRef Google Scholar

Segall, D.O. (1996). Multidimensional adaptive testing. Psychometrika, 61(2), 331–354.CrossRef Google Scholar

Segall, D.O. (2001). General ability measurement: an application of multidimensional item response theory. Psychometrika, 66(1), 79–97.CrossRef Google Scholar

Thissen, D. (1983). Timed testing: an approach using item response theory. In Weiss, D.J. (Ed.), New horizons in testing (pp. 179–203). New York: Academic Press.Google Scholar

van der Linden, W.J. (1999). Empirical initialization of the trait estimator in adaptive testing. Applied Psychological Measurement, 23, 21–29.CrossRef Google Scholar

van der Linden, W.J. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31, 181–204.CrossRef Google Scholar

van der Linden, W.J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72, 287–308.CrossRef Google Scholar

van der Linden, W.J., & Chang, H.-H. (2003). Implementing content constraints in alpha-stratified adaptive testing using a shadow test approach. Applied Psychological Measurement, 27(2), 107–120.CrossRef Google Scholar

Veldkamp, B.P., & Van Der Linden, W.J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67(4), 575–588.CrossRef Google Scholar

Wang, C. (2013). Mutual information item selection method in cognitive diagnostic computerized adaptive testing with short test length. Educational and Psychological Measurement, 73, 1017–1035.CrossRef Google Scholar

Wang, C., & Chang, H.-H. (2011). Item selection in multidimensional computerized adaptive testing—gaining information different angles. Psychometrika, 76(3), 363–384.CrossRef Google Scholar

Wang, T., & Hanson, B.A. (2005). Development and calibration of an item response model that incorporates response time. Applied Psychological Measurement, 29, 323–339.CrossRef Google Scholar

Wang, C., Chang, H.-H., & Huebner, A. (2011). Restrictive stochastic item selection methods in cognitive diagnostic CAT. Journal of Educational Measurement, 48(3), 255–273.CrossRef Google Scholar

Wang, C., Chang, H.-H., & Boughton, K. (2011). Kullback–Leibler information and its applications in multidimensional adaptive testing. Psychometrika, 76(1), 13–39.CrossRef Google Scholar

Wang, C., Chang, H.-H., & Douglas, J. (2012). Combining CAT with cognitive diagnosis: a weighted item selection approach. Behavior Research Methods, 44, 95–109.CrossRef Google Scholar PubMed

Wang, C., Chang, H.-H., & Douglas, J. (2013). The linear transformation model with frailties for the analysis of item response times. British Journal of Mathematical & Statistical Psychology, 66, 144–168.CrossRef Google Scholar PubMed

Wang, C., Chang, H., & Boughton, K. (2013). Deriving stopping rules for multidimensional computerized adaptive testing. Applied Psychological Measurement, 37, 99–122.CrossRef Google Scholar

Wang, C., Fan, Z., Chang, H.-H., & Douglas, J. (2013). A semiparametric model for jointly analyzing response times and accuracy in computerized testing. Journal of Educational and Behavioral Statistics, 38(4), 381–417.CrossRef Google Scholar

Wang, Y.-Q., Liu, H., & You, X. (2013d). Learning diagnosis—from concepts to system development. Paper presented at the Anual Meeting of Assessment and Evaluation, the Chinese Society of Education, Dalian, China, May.Google Scholar

Webley, K. (2013). A is for adaptive—personalized learning is poised to transform education. Can it enrich students and investors as the same time?. Time, 17, 40–45.Google Scholar

Weiss, D.J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6, 473–492.CrossRef Google Scholar

Xu, X., Chang, H., & Douglas, J. (2003). A simulation study to compare CAT strategies for cognitive diagnosis. Paper presented at the annual meeting of National Council on Measurement in Education, Chicago.Google Scholar

Yi, Q., & Chang, H.-H. (2003). α-stratified CAT design with content blocking. British Journal of Mathematical & Statistical Psychology, 56, 359–378.CrossRef Google Scholar PubMed

Zheng, Y., & Chang, H.-H. (2011). Automatic on-the-fly assembly for computer adaptive multistage testing. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA, April.Google Scholar

Zheng, Y., Chang, C.-H., & Chang, H.-H. (2013). Content-balancing strategy in bifactor computerized adaptive patient-reported outcome measurement. Quality of Life Research, 22, 491–499.CrossRef Google Scholar PubMed

Article contents

Psychometrics Behind Computerized Adaptive Testing

Abstract

Keywords

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests