Hostname: page-component-5f745c7db-96s6r Total loading time: 0 Render date: 2025-01-06T07:01:50.608Z Has data issue: true hasContentIssue false

Optimal Online Calibration Designs for Item Replenishment in Adaptive Testing

Published online by Cambridge University Press:  01 January 2025

Yinhong He
Affiliation:
Nanjing University of Information Science and Technology Beijing Normal University
Ping Chen*
Affiliation:
Beijing Normal University
*
Correspondence should be made to Ping Chen, Collaborative Innovation Center of Assessment Toward Basic Education Quality, Beijing Normal University, No. 19, Xin Jie Kou Wai Street, Hai Dian District, Beijing 100875, China. Email: pchen@bnu.edu.cn

Abstract

The maintenance of item bank is essential for continuously implementing adaptive tests. Calibration of new items online provides an opportunity to efficiently replenish items for the operational item bank. In this study, a new optimal design for online calibration (referred to as D-c) is proposed by incorporating the idea of original D-optimal design into the reformed D-optimal design proposed by van der Linden and Ren (Psychometrika 80:263–288, 2015) (denoted as D-VR design). To deal with the dependence of design criteria on the unknown item parameters of new items, Bayesian versions of the locally optimal designs (e.g., D-c and D-VR) are put forward by adding prior information to the new items. In the simulation implementation of the locally optimal designs, five calibration sample sizes were used to obtain different levels of estimation precision for the initial item parameters, and two approaches were used to obtain the prior distributions in Bayesian optimal designs. Results showed that the D-c design performed well and retired smaller number of new items than the D-VR design at almost all levels of examinee sample size; the Bayesian version of D-c using the prior obtained from the operational items worked better than that using the default priors in BILOG-MG and PARSCALE; and Bayesian optimal designs generally outperformed locally optimal designs when the initial item parameters of the new items were poorly estimated.

Type
Original Paper
Copyright
Copyright © 2019 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Ali, U.S., & Chang, H.-H., (2014).An item-driven adaptive design for calibrating pretest items (Research Report No. RR-14-38).Princeton,NJ: ETS.Google Scholar
Ban, J.C., Hanson, B.A., Wang, T.Y.,Yi, Q., & Harris, D.J.,(2001). A comparative study of on-line pretest item—Calibration/scaling methods in computerized adaptive testing. Journal of Educational Measurement, 38, 191212CrossRefGoogle Scholar
Berger, M.P.F.,(1992). Sequential sampling designs for the two-parameter item response theory model. Psychometrika, 57, 521538CrossRefGoogle Scholar
Berger, M.P.F.,(1994). D-Optimal sequential sampling designs for item response theory models. Journal of Educational Statistics, 19, 4356CrossRefGoogle Scholar
Berger, M.P.F.,King, C.Y.J., &Wong, W.K.,(2000). Minimax D-optimal designs for item response theory models. Psychometrika, 65, 377390CrossRefGoogle Scholar
Birnbaum, A.,Lord, F.M.,Novick, M.R.,(1968). Some latent ability models and their use in inferring an examinee’s ability.Statistical theories of mental test scores. Boston:Addison-WesleyGoogle Scholar
Bock, R.D.,&Mislevy, R.J.,(1982). Adaptive EAP estimation of ability in microcomputer environment. Applied Psychological Measurement, 6, 431444CrossRefGoogle Scholar
Buyske, S., Flournoy, N., etal (1998). Optimal design for item calibration in computerized adaptive testing: The 2PL case.New developments and applications in experimental design. Lecture notes-monograph series Haywood, CA:Institute of Mathematical StatisticsGoogle Scholar
Buyske, S.,&Berger, MPF,Wong, W.K.,(2005). Optimal design in educational testing.Applied optimal designs West Sussex:WileyGoogle Scholar
Chang, Y.C.I,Lu, H.Y.,(2010). Online calibration via variable length computerized adaptive testing.. Psychometrika, 75, 140157CrossRefGoogle Scholar
Chen, P.,(2017). A comparative study of online item calibration methods in multidimensional computerized adaptive testing.. Journal of Educational and Behavioral Statistics, 42 559590CrossRefGoogle Scholar
Chen, P.,&Wang, C.,(2016). A new online calibration method for multidimensional computerized adaptive testing. Psychometrika, 81, 674701CrossRefGoogle ScholarPubMed
Chen, P.,Wang, C.,Xin, T.,&Chang, H-H,(2017). Developing new online calibration methods for multidimensional computerized adaptive testing. British Journal of Mathematical and Statistical Psychology, 70, 81117CrossRefGoogle ScholarPubMed
Chen, P.,Xin, T.,Wang, C.,&Chang, H-H,(2012). Online calibration methods for the DINA model with independent attributes in CD-CAT. Psychometrika, 77, 201222CrossRefGoogle Scholar
Cheng, Y.,Patton, J.M.,&Shao, C.,(2015). A-stratified computerized adaptive testing in the presence of calibration error. Educational and Psychological Measurement, 75, 260283CrossRefGoogle ScholarPubMed
Cheng, Y.,&Yuan, K.H., The impact of fallible item parameter estimates on latent trait recovery. Psychometrika,(2010).75, 280291CrossRefGoogle ScholarPubMed
He, Y.,Chen, P.,Li, Y.,&Zhang, S.,(2017). A new online calibration method based on Lord’s bias-correction. Applied Psychological Measurement, 41, 456471CrossRefGoogle ScholarPubMed
He, Y.,Chen, P.,&Li, Y.,(2019). New efficient and practicable adaptive designs for calibrating items online. Applied Psychological Measurement. https://doi.org/10.1177/0146621618824854.Google ScholarPubMed
Jones, D.H.,&Jin, Z.,(1994). Optimal sequential designs for on-line item estimation. Psychometrika, 59, 5975CrossRefGoogle Scholar
Kang, H. A.(2016). Likelihood estimation for jointly analyzing item responses and response times (unpublished doctoral dissertation). University of Illinois at Urbana-Champaign, Champaign, IL.Google Scholar
Kim, S.,(2006). A comparative study of IRT fixed parameter calibration methods. Journal of Educational Measurement, 43, 355381CrossRefGoogle Scholar
Kingsbury, G. G.(2009). Adaptive item calibration: A process for estimating item parameters within a computerized adaptive test. In Weiss, D. J. (Ed.), Proceedings of the 2009 GMAC conference on computerized adaptive testing.Google Scholar
Lord, F.M.,(1980). Applications of item response theory to practical testing problems Hillsdale, NJ:Lawrence Erlbaum AssociatesGoogle Scholar
Lu, H.Y.,(2014). Application of optimal designs to item calibration. Plos One, 9 9e106747CrossRefGoogle ScholarPubMed
Mathew, T.,&Sinha, B.K., Optimal designs for binary data under logistic regression. Journal of Statistical Planning and Inference,(2001).93, 295307CrossRefGoogle Scholar
Minkin, S.,(1987). Optimal designs for binary data. Journal of the American Statistical Association, 82, 10981103CrossRefGoogle Scholar
Ren, H.,van der Linden, W.J.,&Diao, Q.,(2017). Continuous online item calibration: Parameter recovery and item utilization. Psychometrika, 82, 498522CrossRefGoogle ScholarPubMed
Stocking, M. L.(1988). Scale drift in on-line calibration (Research Report. 88–28). Princeton, NJ: ETS.CrossRefGoogle Scholar
Stocking, M.L.,(1990). Specifying optimum examinees for item parameter estimation in item response theory. Psychometrika, 55, 461475CrossRefGoogle Scholar
Tsutakawa, R.K.,&Johnson, J.C.,(1990). The effect of uncertainty of item parameter estimation on ability estimates. Psychometrika, 55, 371390CrossRefGoogle Scholar
van der Linden, W.J.,&Ren, H., Optimal Bayesian adaptive design for test item calibration. Psychometrika,(2015).80, 263288CrossRefGoogle ScholarPubMed
Wainer, H.,&Mislevy, R.J.,Wainer, H.,(1990). Chap. 4: Item response theory, item calibration, and proficiency estimation.Computerized adaptive testing: A primer,Hillsdale, NJ:Erlbaum 65102Google Scholar
Wingersky, M.,Lord, F.M.,(1984). An investigation of methods for reducing sampling error in certain IRT procedures. Applied Psychological Measurement, 8, 347364CrossRefGoogle Scholar
Zheng, Y..(2014). New methods of online calibration for item bank replenishment (unpublished doctoral dissertation). University of Illinois at Urbana-Champaign, Champaign, IL.Google Scholar
Zheng, Y.,(2016). Online calibration of polytomous items under the generalized partial credit model. Applied Psychological Measurement, 40, 434450CrossRefGoogle ScholarPubMed
Zheng, Y.,&Chang, H.H.,(2017). A comparison of five methods for pretest item selection in online calibration. International Journal of Quantitative Research in Education, 4, 133158CrossRefGoogle Scholar