Hostname: page-component-745bb68f8f-5r2nc Total loading time: 0 Render date: 2025-01-07T18:44:04.084Z Has data issue: false hasContentIssue false

A New Concurrent Calibration Method for Nonequivalent Group Design under Nonrandom Assignment

Published online by Cambridge University Press:  01 January 2025

Kei Miyazaki
Affiliation:
Department of Cognitive and Behavioral Science, The University of Tokyo
Takahiro Hoshino*
Affiliation:
Graduate School of Economics, Nagoya University
Shin-ichi Mayekawa
Affiliation:
Graduate School of Decision Science and Technology, Tokyo Institute of Technology
Kazuo Shigemasu
Affiliation:
Department of Cognitive and Behavioral Science, The University of Tokyo
*
Requests for reprints should be sent to Takahiro Hoshino, Graduate School of Economics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan. E-mail: bayesian@jasmine.ocn.ne.jp

Abstract

This study proposes a new item parameter linking method for the common-item nonequivalent groups design in item response theory (IRT). Previous studies assumed that examinees are randomly assigned to either test form. However, examinees can frequently select their own test forms and tests often differ according to examinees’ abilities. In such cases, concurrent calibration or multiple group IRT modeling without modeling test form selection behavior can yield severely biased results. We proposed a model wherein test form selection behavior depends on test scores and used a Monte Carlo expectation maximization (MCEM) algorithm. This method provided adequate estimates of testing parameters.

Type
Theory and Methods
Copyright
Copyright © 2008 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Baker, F.B., Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28, 147162.CrossRefGoogle Scholar
Bernaards, C.A., Sijtsma, K. (1999). Factor analysis of multidimensional polytomous item response data suffering from ignorable item nonresponse. Multivariate Behavioral Research, 34, 277313.CrossRefGoogle Scholar
Bock, R.D., Zimowski, M.F. (1997). Multiple group IRT. In van der Linden, W.M., Hambleton, R.K. (Eds.), Handbook of modern item response theory (pp. 433448). Berlin: Springer.CrossRefGoogle Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 138.CrossRefGoogle Scholar
Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144149.CrossRefGoogle Scholar
Hanson, B.A., Béguin, A.A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent calibration in the common-item equating design. Applied Psychological Measurement, 26, 324.CrossRefGoogle Scholar
Holman, R., Glas, C.A.W. (2005). Modelling non-ignorable missing-data mechanisms with item response theory models. British Journal of Mathematical and Statistical Psychology, 58, 117.Google ScholarPubMed
Hoshino, T. (2007). Doubly robust type estimation for covariate adjustment in latent variable modeling. Psychometrika, 72, 535549.CrossRefGoogle Scholar
Hoshino, T. (2008). A Bayesian propensity score adjustment for latent variable modeling and MCMC algorithm. Computational Statistics & Data Analysis, 52, 14131429.CrossRefGoogle Scholar
Hoshino, T., Kurata, H., Shigemasu, K. (2006). A propensity score adjustment for multiple group structural equation modeling. Psychometrika, 71, 691712.CrossRefGoogle Scholar
Ibrahim, J.G., Chen, M.H., Lipsitz, S.R. (2001). Missing responses in generalised linear mixed models when the missing data mechanism is nonignorable. Biometrika, 88, 551564.CrossRefGoogle Scholar
Kato, K. Japan External Trade Organization (JETRO) (2006). BJT buisiness Japanese proficiency test official guide. Japan External Trade Organization(JETRO), Tokyo, Japan.Google Scholar
Kim, S.H., Cohen, A.S. (1992). Effects of linking methods on detection of DIF. Journal of Educational Measurement, 29, 5166.CrossRefGoogle Scholar
Kolen, M.J., Brennan, R.L. (2004). Test equating, scaling, and linking: methods and practices, (2nd ed.). New York: Springer.CrossRefGoogle Scholar
Little, R.J.A., Rubin, D.B. (2002). Statistical analysis with missing data, (2nd ed.). New York: Wiley.CrossRefGoogle Scholar
Lord, F.M. (1974). Estimation of latent ability and item parameters when there are omitted responses. Psychometrika, 39, 247264.CrossRefGoogle Scholar
Louis, T.A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society, Series B, 44, 226233.CrossRefGoogle Scholar
Schafer, J.L. (1997). Analysis of incomplete multivariate data, New York: Chapman & Hall.CrossRefGoogle Scholar
Stocking, M.L., Lord, F.M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201210.CrossRefGoogle Scholar
van der Linden, W.J., Luecht, R.M. (1998). Observed-score equating as a test assembly problem. Psychometrika, 63, 401418.CrossRefGoogle Scholar
von Davier, M., & von Davier, A.A. (2004). A unified approach to irt scale linkage and scale transformations (Research Report RR-04-09). ETS: Princeton, NJCrossRefGoogle Scholar
Wei, G.C.G., Tanner, M.A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithm. Journal of the American Statistical Association, 85, 699704.CrossRefGoogle Scholar
Wingersky, M.S., Lord, F.M. (1984). An investigation of methods for reducing sampling error in certain IRT procedures. Applied Psychological Measurement, 8, 347364.CrossRefGoogle Scholar
Yang, W.L. (2004). Sensitivity of linkings between AP multiple-choice scores and composite scores to geographical region: An illustration of checking for population invariance. Journal of Educational Measurement, 41, 3341.CrossRefGoogle Scholar