A New Online Calibration Method for Multidimensional Computerized Adaptive Testing

Ping Chen; Chun Wang

doi:10.1007/s11336-015-9482-9

A New Online Calibration Method for Multidimensional Computerized Adaptive Testing

Published online by Cambridge University Press: 01 January 2025

Ping Chen

and

Chun Wang

Show author details

Ping Chen*: Affiliation:
Beijing Normal University
Chun Wang: Affiliation:
University of Minnesota
*: Correspondence should be made to Ping Chen, National Innovation Center for Assessment of Basic Education Quality, Beijing Normal University, No. 19, Xin Jie Kou Wai Street, Hai Dian District, Beijing 100875, China. Email: pchen@bnu.edu.cn

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Multidimensional-Method A (M-Method A) has been proposed as an efficient and effective online calibration method for multidimensional computerized adaptive testing (MCAT) (Chen & Xin, Paper presented at the 78th Meeting of the Psychometric Society, Arnhem, The Netherlands, 2013). However, a key assumption of M-Method A is that it treats person parameter estimates as their true values, thus this method might yield erroneous item calibration when person parameter estimates contain non-ignorable measurement errors. To improve the performance of M-Method A, this paper proposes a new MCAT online calibration method, namely, the full functional MLE-M-Method A (FFMLE-M-Method A). This new method combines the full functional MLE (Jones & Jin in Psychometrika 59:59–75, 1994; Stefanski & Carroll in Annals of Statistics 13:1335–1351, 1985) with the original M-Method A in an effort to correct for the estimation error of ability vector that might otherwise adversely affect the precision of item calibration. Two correction schemes are also proposed when implementing the new method. A simulation study was conducted to show that the new method generated more accurate item parameter estimation than the original M-Method A in almost all conditions.

Keywords

online calibration multidimensional computerized adaptive testing operational item new item multidimensional two-parameter logistic model full functional maximum likelihood estimator

Type: Original Paper
Information: Psychometrika , Volume 81 , Issue 3 , September 2016 , pp. 674 - 701

DOI: https://doi.org/10.1007/s11336-015-9482-9 [Opens in a new window]
Copyright: Copyright © 2015 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Both authors made equal contributions to the paper, and the order of authorship is alphabetical.

References

Adams, R., Wilson, M., & Wang, W. -C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21 123CrossRef Google Scholar

Baker, F. B., & Kim, S. H. (2004). Item response theory: Parameter estimation techniques

(2^{nd}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(2^{{\rm nd}}$$\end{document}

Edition). New York: Dekker.CrossRef Google Scholar

Ban, J-CHanson, B. H., Wang, T. Y., Yi, Q., & Harris, D. J., (2001). A comparative study of on—line pretest item-calibration/scaling methods in computerized adaptive testing. Journal of Educational Measurement., 38 191212CrossRef Google Scholar

Ban, J.-C, Hanson, B. H., Yi, Q., & Harris, D. J. (2002). Data sparseness and online pretest item calibration/scaling methods in CAT (ACT Research Report 02-01). Iowa City, IA, ACT, Inc. Available at http://www.eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/19/da/e9.pdf Google Scholar

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R., Novick (Eds.), Statistical theories of mental test scores (pp. 379–479). Reading, MA: Addison-Welsey.Google Scholar

Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. (2006). Measurement error in nolinear models: A modern perspective (2nd edn). London: Chapman and Hall.CrossRef Google Scholar

Chang, H. H., & Stout, W., (1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58 3752CrossRef Google Scholar

Chen, P., Xin, T., Wang, C., & Chang, H. H., (2012). Online calibration methods for the DINA model with independent attributes in CD-CAT. Psychometrika, 77 201222CrossRef Google Scholar

Chen, P., & Xin, T. (2013). Developing online calibration methods for multidimensional computerized adaptive testing. Paper presented at the 78th Meeting of the Psychometric Society, Arnhem, the Netherlands, July.Google Scholar

Cheng, Y., & Yuan, K., (2010). The impact of fallible item parameter estimates on latent trait recovery. Psychometrika, 75 280291CrossRef Google Scholar PubMed

Debeer, D., Buchholz, J., Hartig, J., & Janssen, R., (2014). Student, school, and country differences in sustained test-taking effort in the 2009 PISA reading assessment. Journal of Educational and Behavioral Statistics, 39 502523CrossRef Google Scholar

Debeer, D., & Janssen, R., (2013). Modeling item-position effects within an IRT framework. Journal of Educational Measurement, 50 164185CrossRef Google Scholar

Eggen, T.J.H.M., & Verhelst, N. D., (2011). Item calibration in incomplete testing designs. Psicologica, 32 107132Google Scholar

Folk, V. G., & Golub-Smith, M. (1996). Calibration of on-line pretest data using BILOG. Paper presented at the annual meeting of the National Council on Measurement in Education, New York, April.Google Scholar

Haberman, S. J., von Davier, A. A., Yan, D. L., von Davier, A. A., & Lewis, C., (2014). Considerations on parameter estimation, scoring, and linking in multistage testing. Computerized multistage testing: Theory and applications, Boca Raton, FLCRC Press 229248Google Scholar

Haberman, S. J., (2009). Linking parameter estimates derived from an item response model through separate calibrations. Research Report RR-09-40, Princeton, NJEducational Testing ServiceGoogle Scholar

Hartig, J., & Höhler, J., (2008). Representation of competencies in multidimensional IRT models with within-item and between-item multidimensionality. Journal of Psychology, 216 89101Google Scholar

Hecht, M., Weirich, S., Siegle, T., & Frey, A., (2015). Modeling booklet effects for nonequivalent group designs in large-scale assessment. Educational and Psychological Measurement,Google Scholar PubMed

Hsu, Y., Thompson, T. D., & Chen, W. (1998). CAT item calibration. Paper presented at the annual meeting of the National Council on Measuement in Education, San Diego, CA, April.Google Scholar

Jones, D. H., & Jin, Z. Y., (1994). Optimal sequential designs for on-line item estimation. Psychometrika, 59 5975CrossRef Google Scholar

Lehmann, E. L., & Casella, G. C. (1998). Theory of point estimation (2nd edn). New York: Springer.Google Scholar

Lien, D-HD, (1985). Moments of truncated bivariate log-normal distributions. Economics Letters, 19 243247CrossRef Google Scholar

Lord, F. M., (1971). Tailored testing, an application of stochastic approximation. Journal of the American Statistical Association, 66 707711CrossRef Google Scholar

Mislevy, R. J., (1986). Bayes modal estimation in item response models. Psychometrika, 51 177195CrossRef Google Scholar

Mislevy, R. J., & Chang, H., (2000). Does adaptive testing violate local independence?. Psychometrika, 65 149156CrossRef Google Scholar

Mulder, J., & van der Linden, W. J., (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74 273296CrossRef Google Scholar PubMed

Newman, M.E.J., & Barkema, G. T., Monte Carlo methods in statistical physics, (1999).OxfordClarendon PressCrossRef Google Scholar

Parshall, C. G. (1998). Item development and pretesting in a computer-based testing environment. Paper presented at the colloquium Computer-Based Testing: Building the Foundation for Future Assessments, Philadelphia, PA, September.Google Scholar

Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2007). Numerical recipes: the art of scientific computing (3rd edn.). New York: Cambridge University Press.Google Scholar

Reckase, M. D., Multidimensional item response theory, (2009).New York, USASpringerCrossRef Google Scholar

Segall, D. O., (1996). Multidimensional adaptive testing. Psychometrika, 61 331354CrossRef Google Scholar

Segall, D. O., (2001). General ability measurement: An application of multidimensional item response theory. Psychometrika, 66 7997CrossRef Google Scholar

Segall, D. O. (2003). Calibrating CAT pools and online pretest items using MCMC methods. Paper presented at the annual meeting of National Council on Measurement in Education, Chicago, IL, April.Google Scholar

Stefanski, L. A., & Carroll, R. J., (1985). Covariate measurement error in logistic regression. Annals of Statistics, 13 13351351CrossRef Google Scholar

Stocking, M. L. (1988). Scale drift in on-line calibration (Research Rep. 88-28). Princeton, NJ: ETS.CrossRef Google Scholar

van der Linden, W. J., & Ren, H. (2014). Optimal Bayesian adaptive design for test-item calibration. Psychometrika. doi:10.1007/s11336-013-9391-8.CrossRef Google Scholar

Wainer, H., Mislevy, R. J., & Wainer, H., (1990). Item response theory, item calibration, and proficiency estimation. Computerized adaptive testing: A primer, Hillsdale, NJErlbaum 65102Google Scholar

Wang, C. (2014a). On latent trait estimation in multidimensional compensatory item response models. Psychometrika. doi:10.1007/s11336-013-9399-0.CrossRef Google Scholar

Wang, C., Improving measurement precision of hierarchical latent traits using adaptive testing. (2014). Journal of Educational and Behavioral Statistics, 39 452477CrossRef Google Scholar

Wang, C., & Chang, H. H., (2011). Item selection in multidimensional computerized adaptive testing—gaining information from different angles. Psychometrika, 76 363384CrossRef Google Scholar

Wang, C., & Chang, H. H. (2012). Reducing bias in MIRT trait estimation. Paper presented at the annual meeting of National Council on Measurement in Education, Vancouver, Canada, April.Google Scholar

Wang, C., Chang, H. H., & Boughton, K. A., (2011). Kullback-Leibler information and its applications in multi-dimensional adaptive testing. Psychometrika, 76 1339CrossRef Google Scholar

Wang, C., Chang, H. H., & Boughton, K. A., (2013). Deriving stopping rules for multidimensional computerized adaptive testing. Applied Psychological Measurement, 37 99122CrossRef Google Scholar

Yao, L. H., (2013). Comparing the performance of five multidimensional CAT selection procedures with different stopping rules. Applied Psychological Measurement, 37 323CrossRef Google Scholar

Yao, L. H., Pommerich, M., & Segall, D. O., (2014). Using multidimensional CAT to administer a short, yet precise, screening test. Applied Psychological Measurement, 38 614631CrossRef Google Scholar

Article contents

A New Online Calibration Method for Multidimensional Computerized Adaptive Testing

Abstract

Keywords

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests