Hostname: page-component-745bb68f8f-lrblm Total loading time: 0 Render date: 2025-01-07T14:35:53.891Z Has data issue: false hasContentIssue false

Optimal Bayesian Adaptive Design for Test-Item Calibration

Published online by Cambridge University Press:  01 January 2025

Wim J. van der Linden*
Affiliation:
CTB/McGraw-Hill
Hao Ren
Affiliation:
CTB/McGraw-Hill
*
Requests for reprints should be sent to Wim J. van der Linden, CTB/McGraw-Hill, 20 Ryan Ranch Road, Monterey, CA 93940, USA. E-mail: wim_vanderlinden@ctb.com

Abstract

An optimal adaptive design for test-item calibration based on Bayesian optimality criteria is presented. The design adapts the choice of field-test items to the examinees taking an operational adaptive test using both the information in the posterior distributions of their ability parameters and the current posterior distributions of the field-test parameters. Different criteria of optimality based on the two types of posterior distributions are possible. The design can be implemented using an MCMC scheme with alternating stages of sampling from the posterior distributions of the test takers’ ability parameters and the parameters of the field-test items while reusing samples from earlier posterior distributions of the other parameters. Results from a simulation study demonstrated the feasibility of the proposed MCMC implementation for operational item calibration. A comparison of performances for different optimality criteria showed faster calibration of substantial numbers of items for the criterion of D-optimality relative to A-optimality, a special case of c-optimality, and random assignment of items to the test takers.

Type
Original Paper
Copyright
Copyright © 2013 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abdelbasit, K.M., & Plankett, R.L. (1983). Experimental design for binary data. Journal of the American Statistical Association, 78, 9098.CrossRefGoogle Scholar
Atchadé, Y.F., & Rosenthal, J.S. (2005). On adaptive Markov chain Monte Carlo algorithms. Bernoulli, 20, 815828.Google Scholar
Berger, M.P.F. (1991). On the efficiency of IRT models when applied to different sampling designs. Applied Psychological Measurement, 15, 293306.CrossRefGoogle Scholar
Berger, M.P.F. (1992). Sequential sampling designs for the two-parameter item response theory model. Psychometrika, 57, 521538.CrossRefGoogle Scholar
Berger, M.P.F. (1994). D-optimal sequential sampling designs for item response theory models. Journal of Educational Statistics, 19, 4356.CrossRefGoogle Scholar
Berger, M.P.F., King, C.Y.J., & Wong, W.K. (2000). Minimax D-optimal designs for item response theory models. Psychometrika, 65, 377390.CrossRefGoogle Scholar
Berger, M.P.F., & van der Linden, W.J. (1991). Optimality of sampling design in item response theory models. In Wilson, M. (Eds.), Objective measurement: theory into practice (pp. 274288). Norwood: Ablex.Google Scholar
Berger, M.P.F., & Wong, W.K. (2009). Introduction to optimal designs for social and biomedical research. Chichester: Wiley.CrossRefGoogle Scholar
Cai, L. (2010). Metropolis–Hastings Robbins–Monro algorithm for confirmatory factor analysis. Journal of Educational and Behavioral Statistics, 35, 307335.CrossRefGoogle Scholar
Chaloner, K., & Larntz, K. (1989). Optimal Bayesian design applied to logistic regression experiments. Journal of Statistical Planning and Inference, 21, 191208.CrossRefGoogle Scholar
Chang, Y.-C.I., & Lu, H.-Y. (2010). Online calibration via variable length computerized adaptive testing. Psychometrika, 75, 140157.CrossRefGoogle Scholar
Fedorov, V.V. (1972). Theory of optimal experiments. New York: Academic Press.Google Scholar
Fox, J.-P. (2010). Bayesian item response modeling. New York: Springer.CrossRefGoogle Scholar
Gilks, W.R., Richardson, S., & Spiegelhalter, D.J. (1996). Introducing Markov chain Monte Carlo. In Gilks, W.R., Richardson, S., & Spiegelhalter, D.J. (Eds.), Markov chain Monte Carlo in practice (pp. 119). London: Chapman & Hall.Google Scholar
Johnson, V.E., & Albert, J.H. (1999). Ordinal data modeling. New York: Springer.CrossRefGoogle Scholar
Jones, D.H., & Jin, Z. (1994). Optimal sequential designs for on-line item estimation. Psychometrika, 59, 5975.CrossRefGoogle Scholar
Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale: Erlbaum.Google Scholar
Makransky, G., & Glas, G.A.W. (2010). An automatic online calibration design in adaptive testing. Journal of Applied Testing Technology, 11, 1. Retrieved from http://www.testpublishers.org/mc/page.do?sitePageId=112031&orgId=atpu.Google Scholar
Mislevy, R.J., & Chang, H.-H. (2000). Does adaptive testing violate local independence. Psychometrika, 65, 149156.CrossRefGoogle Scholar
Patz, R.J., & Junker, B.W. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24, 146178.CrossRefGoogle Scholar
Patz, R.J., & Junker, B.W. (1999). Applications and extensions of MCMC in IRT: multiple item types, missing data, and rated responses. Journal of Educational and Behavioral Statistics, 24, 342366.CrossRefGoogle Scholar
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22, 400407.CrossRefGoogle Scholar
Rosenthal, J.S. (2007). AMCMC: an R interface for adaptive MCMC. Computational Statistics & Data Analysis, 51, 54675470.CrossRefGoogle Scholar
Silverman, B.W. (1986). Density estimation for statistics and data analysis. London: Chapman & Hall.Google Scholar
Silvey, S.D. (1980). Optimal design. London: Chapman & Hall.CrossRefGoogle Scholar
Stefanski, L.A., & Carroll, R.J. (1985). Covariate measurement error in logistic regression. The Annals of Statistics, 13, 13351351.CrossRefGoogle Scholar
Stocking, M.L. (1990). Specifying optimum examinees for item parameter estimation in item response theory. Psychometrika, 55, 461475.CrossRefGoogle Scholar
van der Linden, W.J. (1988). Optimizing incomplete sampling designs for item response model parameters (Research Report No. 88-5). Enschede, The Netherlands: University of Twente.Google Scholar
van der Linden, W.J. (1994). Optimal design in item response theory: applications to test assembly and item calibration. In Fischer, G.H., & Laming, D. (Eds.), Contributions to mathematical psychology, psychometrics, and methodology (pp. 305318). New York: Springer.CrossRefGoogle Scholar
van der Linden, W.J. (1999). Empirical initialization of the trait estimator in adaptive testing. Applied Psychological Measurement, 23, 2129. [Erratum, 23, 248].CrossRefGoogle Scholar
van der Linden, W.J. (2005). Linear models for optimal test design. New York: Springer.CrossRefGoogle Scholar
van der Linden, W.J. (2008). Using response times for item selection in adaptive testing. Journal of Educational and Behavioral Statistics, 33, 520.CrossRefGoogle Scholar
van der Linden, W.J. (2010). Sequencing an adaptive test battery. In van der Linden, W.J., & Glas, C.A.W. (Eds.), Elements of adaptive testing (pp. 103119). New York: Springer.CrossRefGoogle Scholar
van der Linden, W.J., & Pashley, P.J. (2010). Item selection and ability estimation adaptive testing. In van der Linden, W.J., & Glas, C.A.W. (Eds.), Elements of adaptive testing (pp. 330). New York: Springer.CrossRefGoogle Scholar
Wingersky, M., & Lord, F.M. (1984). An investigation of methods for reducing sampling error in certain IRT procedures. Applied Psychological Measurement, 8, 347364.CrossRefGoogle Scholar
Wynn, H.P. (1970). The sequential generation of D-optimum experimental designs. The Annals of Mathematical Statistics, 41, 16551664.CrossRefGoogle Scholar