Multidimensional CAT Item Selection Methods for Domain Scores and Composite Scores: Theory and Applications

Lihua Yao

doi:10.1007/s11336-012-9265-5

Multidimensional CAT Item Selection Methods for Domain Scores and Composite Scores: Theory and Applications

Published online by Cambridge University Press: 01 January 2025

Lihua Yao

Show author details

Lihua Yao*: Affiliation:
Defense Manpower Data Center, Monterey Bay
*: Requests for reprints should be sent to Lihua Yao, Defense Manpower Data Center, Monterey Bay, 400 Gigling Rd., Seaside, CA 93955-6771, USA. E-mail: Lihua.Yao@osd.pentagon.mil

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Multidimensional computer adaptive testing (MCAT) can provide higher precision and reliability or reduce test length when compared with unidimensional CAT or with the paper-and-pencil test. This study compared five item selection procedures in the MCAT framework for both domain scores and overall scores through simulation by varying the structure of item pools, the population distribution of the simulees, the number of items selected, and the content area. The existing procedures such as Volume (Segall in Psychometrika, 61:331–354, 1996), Kullback–Leibler information (Veldkamp & van der Linden in Psychometrika 67:575–588, 2002), Minimize the error variance of the linear combination (van der Linden in J. Educ. Behav. Stat. 24:398–412, 1999), and Minimum Angle (Reckase in Multidimensional item response theory, Springer, New York, 2009) are compared to a new procedure, Minimize the error variance of the composite score with the optimized weight, proposed for the first time in this study. The intent is to find an item selection procedure that yields higher precisions for both the domain and composite abilities and a higher percentage of selected items from the item pool. The comparison is performed by examining the absolute bias, correlation, test reliability, time used, and item usage. Three sets of item pools are used with the item parameters estimated from real live CAT data. Results show that Volume and Minimum Angle performed similarly, balancing information for all content areas, while the other three procedures performed similarly, with a high precision for both domain and overall scores when selecting items with the required number of items for each domain. The new item selection procedure has the highest percentage of item usage. Moreover, for the overall score, it produces similar or even better results compared to those from the method that selects items favoring the general dimension using the general model (Segall in Psychometrika 66:79–97, 2001); the general dimension method has low precision for the domain scores. In addition to the simulation study, the mathematical theories for certain procedures are derived. The theories are confirmed by the simulation applications.

Keywords

BMIRT CAT domain scores Kullback–Leibler MCAT multidimensional item response theory multidimensional information overall scores

Type: Original Paper
Information: Psychometrika , Volume 77 , Issue 3 , July 2012 , pp. 495 - 523

DOI: https://doi.org/10.1007/s11336-012-9265-5 [Opens in a new window]
Copyright: Copyright © 2012 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Chang, H.-H., Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20, 213–229CrossRef Google Scholar

Cheng, Y., Chang, H.H. (2009). The maximum priority index method for severely constrained item selection in computerized adaptive testing. British Journal of Mathematical and Statistical Psychology, 62, 369–383CrossRef Google Scholar PubMed

De la Torre, J., Hong, Y. (2010). Parameter estimation with small sample size: a higher-order IRT approach. Applied Psychological Measurement, 34, 267–285CrossRef Google Scholar

Haberman, J.S., Sinharay, S. (2010). Reporting of subscores using multidimensional item response theory. Psychometrika, 75, 331–354CrossRef Google Scholar

Lee, Y.H., Ip, E.H., Fuh, C.D. (2008). A strategy for controlling item exposure in multidimensional computerized adaptive testing. Educational and Psychological Measurement, 68, 215–232CrossRef Google Scholar

Li, Y.H., Schafe, W. (2005). Trait parameter recovery using multidimensional computerized adaptive testing in reading and mathematics. Applied Psychological Measurement, 29, 3–25CrossRef Google Scholar

Luecht, R.M. (1996). Multidimensional computerized adaptive testing in a certification or licensure context. Applied Psychological Measurement, 20, 389–404CrossRef Google Scholar

Luecht, R.M., Miller, T.R. (1992). Unidimensional calibrations and interpretations of composite traits for multidimensional tests. Applied Psychological Measurement, 16, 279–293CrossRef Google Scholar

Mulder, J., van der Linden, W.J. (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74, 273–296CrossRef Google Scholar PubMed

Reckase, M.D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25–36CrossRef Google Scholar

Reckase, M.D. (2009). Multidimensional item response theory, New York: SpringerCrossRef Google Scholar

Reckase, M.D., McKinely, R.L. (1991). The discriminating power of items that measure more than one dimension. Applied Psychological Measurement, 15, 361–373CrossRef Google Scholar

Segall, D.O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331–354CrossRef Google Scholar

Segall, D.O. (2001). General ability measurement: an application of multidimensional item response theory. Psychometrika, 66, 79–97CrossRef Google Scholar

van der Linden, W.J. (1999). Multidimensional adaptive testing with a minimum error-variance criterion. Journal of Educational and Behavioral Statistics, 24, 398–412CrossRef Google Scholar

Veldkamp, B.P., van der Linden, W.J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67, 575–588CrossRef Google Scholar

Wang, C., Chang, H.-H., Boughton, K.A. (2011). Kullback–Leibler information and its applications in multi-dimensional adaptive testing. Psychometrika, 76, 13–39CrossRef Google Scholar

Yao, L. (2003). BMIRT: Bayesian multivariate item response theory [Computer software], Monterey: Defense Manpower Data CenterGoogle Scholar

Yao, L. (2010). Reporting valid and reliability overall score and domain scores. Journal of Educational Measurement, 47, 339–360CrossRef Google Scholar

Yao, L. (2010b). Multidimensional ability estimation: Bayesian or non-Bayesian. Unpublished manuscript. Google Scholar

Yao, L. (2011). simuMCAT: simulation of multidimensional computer adaptive testing [Computer software], Monterey: Defense Manpower Data CenterGoogle Scholar

Yao, L., Boughton, K.A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological Measurement, 31, 83–105CrossRef Google Scholar

Yao, L., Boughton, K.A. (2009). Multidimensional linking for tests containing polytomous items. Journal of Educational Measurement, 46, 177–197CrossRef Google Scholar

Yao, L., Schwarz, R.D. (2006). A multidimensional partial credit model with associated item and test statistics: an application to mixed-format tests. Applied Psychological Measurement, 30, 469–492CrossRef Google Scholar

Article contents

Multidimensional CAT Item Selection Methods for Domain Scores and Composite Scores: Theory and Applications

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests