Hostname: page-component-745bb68f8f-s22k5 Total loading time: 0 Render date: 2025-01-08T03:35:11.805Z Has data issue: false hasContentIssue false

Correction for Item Response Theory Latent Trait Measurement Error in Linear Mixed Effects Models

Published online by Cambridge University Press:  01 January 2025

Chun Wang*
Affiliation:
University of Washington
Gongjun Xu
Affiliation:
University of Michigan
Xue Zhang
Affiliation:
Northeast Normal University
*
Correspondence should be made to Chun Wang, Measurement and Statistics, College of Education, University of Washington, 312E Miller Hall, Box 353600, Seattle, WA 98195-3600, USA. Email: wang4066@uw.edu

Abstract

When latent variables are used as outcomes in regression analysis, a common approach that is used to solve the ignored measurement error issue is to take a multilevel perspective on item response modeling (IRT). Although recent computational advancement allows efficient and accurate estimation of multilevel IRT models, we argue that a two-stage divide-and-conquer strategy still has its unique advantages. Within the two-stage framework, three methods that take into account heteroscedastic measurement errors of the dependent variable in stage II analysis are introduced; they are the closed-form marginal MLE, the expectation maximization algorithm, and the moment estimation method. They are compared to the naïve two-stage estimation and the one-stage MCMC estimation. A simulation study is conducted to compare the five methods in terms of model parameter recovery and their standard error estimation. The pros and cons of each method are also discussed to provide guidelines for practitioners. Finally, a real data example is given to illustrate the applications of various methods using the National Educational Longitudinal Survey data (NELS 88).

Type
Original Research
Copyright
Copyright © 2019 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Adams, R. J.Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22 (1), 4776.CrossRefGoogle Scholar
Anderson, J. C., & Gerbing, D. W. (1984). The effect of sampling error on convergence, improper solutions, and goodness-of-fit indices for maximum likelihood confirmatory factor analysis. Psychometrika, 49 (2), 155173.CrossRefGoogle Scholar
Anderson, J. C., & Gerbing, D. W. (1988). Structural equation modeling in practice: A review and recommended two-step approach. Psychological Bulletin, 103, 411423.CrossRefGoogle Scholar
Bacharach, V. R.Baumeister, A. A., & Furr, R. M. (2003). Racial and gender science achievement gaps in secondary education. The Journal of Genetic Psychology, 164 (1), 115126.CrossRefGoogle ScholarPubMed
Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques, NewYork: Dekker.CrossRefGoogle Scholar
Bianconcini, S., & Cagnone, S. (2012). A general multivariate latent growth model with applications to student achievement. Journal of Educational and Behavioral Statistics, 37, 339364.CrossRefGoogle Scholar
Bollen, K. A. (1989). Structural equations with latent variables, New York: Wiley.CrossRefGoogle Scholar
Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA Journal of Applied Mathematics, 6, 76.CrossRefGoogle Scholar
Buonaccorsi, J. P. (1996). Measurement error in the response in the general linear model. Journal of the American Statistical Association, 91 (434), 633642.CrossRefGoogle Scholar
Burt, R. S. (1973). Confirmatory factor-analytic structures and the theory construction process. Sociological Methods and Research, 2 (2), 131190.CrossRefGoogle Scholar
Burt, R. S. (1976). Interpretational confounding of unobserved variables in structural equation models. Sociological Methods and Research, 5 (1), 352.CrossRefGoogle Scholar
Byrd, R. H.Lu, P.Nocedal, J., & Zhu, C. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16, 11901208.CrossRefGoogle Scholar
Cai, L. (2008). SEM of another flavour: Two new applications of the supplemented EM algorithm. British Journal of Mathematical and Statistical Psychology, 61 (2), 309329.CrossRefGoogle ScholarPubMed
Carroll, R.Ruppert, D.Stefanski, L., & Crainiceanu, C. (2006). Measurement error in nonlinear models: A modern perspective, 2 London: Chapman and Hall.CrossRefGoogle Scholar
Chang, H., & Stout, W. (1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58, 3752.CrossRefGoogle Scholar
Cohen, A. S.Bottge, B. A., & Wells, C. S. (2001). Using item response theory to assess effects of mathematics instruction in special populations. Exceptional Children, 68 (1), 2344. https://doi.org/10.1177/001440290106800102.CrossRefGoogle Scholar
Congdon, P. (2001). Bayesian statistical modeling, Chichester: Wiley.Google Scholar
De Boeck, P., & Wilson, M. (2004). A framework for item response models, New York: Springer.CrossRefGoogle Scholar
De Fraine, B.Van Damme, J., & Onghena, P. (2007). A longitudinal analysis of gender differences in academic self-concept and language achievement: A multivariate multilevel latent growth approach. Contemporary Educational Psychology, 32 (1), 132150.CrossRefGoogle Scholar
Dempster, A. P.Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B (Methodological), 39, 138.CrossRefGoogle Scholar
Devanarayan, V., & Stefanski, L. (2002). Empirical simulation extrapolation for measurement error models with replicate measurements. Statistics and Probability Letters, 59, 219225.CrossRefGoogle Scholar
Diakow, R. (2010). The use of plausible values in multilevel modeling. Unpublished masters thesis. Berkeley: University of California.Google Scholar
Diakow, R. P. (2013). Improving explanatory inferences from assessments. Unpublished doctoral dissertation. University of California-Berkley.Google Scholar
Drechsler, J. (2015). Multiple imputation of multilevel missing data—Rigor versus simplicity. Journal of Educational and Behavioral Statistics, 40 (1), 6995.CrossRefGoogle Scholar
Fan, X.Chen, M., & Matsumoto, A. R. (1997). Gender differences in mathematics achievement: Findings from the National Education Longitudinal Study of 1988. Journal of Experimental Education, 65 (3), 229242.CrossRefGoogle Scholar
Fletcher, R. (1970). A new approach to variable metric algorithms. The Computer Journal, 13, 317.CrossRefGoogle Scholar
Fox, J.-P. (2010). Bayesian item response theory modeling: Theory and applications, New York: Springer.CrossRefGoogle Scholar
Fox, J.-P., & Glas, C. A. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66 (2), 271288.CrossRefGoogle Scholar
Fox, J.-P., & Glas, C. A. (2003). Bayesian modeling of measurement error in predictor variables using item response theory. Psychometrika, 68 (2), 169191.CrossRefGoogle Scholar
Fuller, W. (2006). Measurement error models, 2 New York, NY: Wiley.Google Scholar
Goldfarb, D. (1970). A family of variable metric updates derived by variational means. Mathematics of Computation, 24, 2326.CrossRefGoogle Scholar
Goldhaber, D. D., & Brewer, D. J. (1997). Why don’t schools and teachers seem to matter? Assessing the impact of unobservables on educational productivity. The Journal of Human Resources, 32 (3), 505523.CrossRefGoogle Scholar
Hill, H. C.Rowan, B., & Ball, D. L. (2005). Effects of teachers’ mathematical knowledge for teaching on student achievement. American Educational Research Journal, 42 (2), 371406.CrossRefGoogle Scholar
Hong, G., & Yu, B. (2007). Early-grade retention and children’s reading and math learning in elementary years. Educational Evaluation and Policy Analysis, 29, 239261.CrossRefGoogle Scholar
Hsiao, Y.Kwok, O., & Lai, M. (2018). Evaluation of two methods for modeling measurement errors when testing interaction effects with observed composite scores. Educational and Psychological Measurement, 78, 181202.CrossRefGoogle ScholarPubMed
Jeynes, W. H. (1999). Effects of remarriage following divorce on the academic achievement of children. Journal of Youth and Adolescence, 28 (3), 385393. https://doi.org/10.1023/A:1021641112640.CrossRefGoogle Scholar
Kamata, A. (2001). Item analysis by the hierarchical generalized linear model. Journal of Educational Measurement, 38, 7993.CrossRefGoogle Scholar
Khoo, S.West, S.Wu, W., & Kwok, O. (2006). Longitudinal methods. Eid, M., & Diener, E. Handbook of psychological measurement: A multimethod perspective, Washington, DC: APA. 301317.CrossRefGoogle Scholar
Koedel, C.Leatherman, R., & Parsons, E. (2012). Test measurement error and inference from value-added models. The B. E. Journal of Economic Analysis and Policy, 12, 137.CrossRefGoogle Scholar
Kohli, N.Hughes, J.Wang, C.Zopluoglu, C., & Davison, M. L. (2015). Fitting a linear–linear piecewise growth mixture model with unknown knots: A comparison of two common approaches to inference. Psychological Methods, 20 (2), 259.CrossRefGoogle Scholar
Kolen, M. J.Hanson, B. A., & Brennan, R. L. (1992). Conditional standard errors of measurement for scale scores. Journal of Educational Measurement, 29, 285307.CrossRefGoogle Scholar
Lee, S., & Song, X. (2003). Bayesian analysis of structural equation models with dichotomous variables. Statistics in Medicine, 22, 30733088.CrossRefGoogle ScholarPubMed
Lindstrom, M. J., & Bates, D. (1988). Newton–Raphson and EM algorithms for linear mixed-effects models for repeated measure data. Journal of the American Statistical Association, 83, 10141022.Google Scholar
Liu, Y., & Yang, J. (2018). Bootstrap-calibrated interval estimates for latent variable scores in item response theory. Psychometrika, 83, 333354.CrossRefGoogle ScholarPubMed
Lockwood, L. R., & McCaffrey, D. F. (2014). Correcting for test score measurement error in ANCOVA models for estimating treatment effects. Journal of Educational and Behavioral Statistics, 39, 2252.CrossRefGoogle Scholar
Lu, I. R.Thomas, D. R., & Zumbo, B. D. (2005). Embedding IRT in structural equation models: A comparison with regression based on IRT scores. Structural Equation Modeling, 12 (2), 263277.CrossRefGoogle Scholar
Magis, D., & Raiche, G. (2012). On the relationships between Jeffrey’s model and weighted likelihood estimation of ability under logistic IRT models. Psychometrika, 77, 163169.CrossRefGoogle Scholar
Meng, X. (1994). Multiple-imputation inferences with uncongenial sources of input. Statistical Science, 10, 538573.Google Scholar
Meng, X., & Rubin, D. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80, 267278.CrossRefGoogle Scholar
Mislevy, R. J.Beaton, A. E.Kaplan, B., & Sheehan, K. M. (1992). Estimating population characteristics from sparse matrix samples of item responses. Journal of Educational Measurement, 29 (2), 133161.CrossRefGoogle Scholar
Monseur, C., & Adams, R. J. (2009). Plausible values: How to deal with their limitations. Journal of Applied Measurement, 10 (3), 320334.Google ScholarPubMed
Murphy, K. (2007). Conjugate Bayesian analysis of the Gaussian distribution. Online file at https://www.cs.ubc.ca/~murphyk/Papers/bayesGauss.pdfGoogle Scholar
Nelder, J. A., & Mead, R. (1965). A simplex algorithm for function minimization. Computer Journal, 7, 308313.CrossRefGoogle Scholar
Nussbaum, E.Hamilton, L., & Snow, R. (1997). Enhancing the validity and usefulness of large-scale educational assessment: IV.NELS:88 Science achievement to 12th grade. American Educational Research Journal, 34, 151173.Google Scholar
Pastor, D. A., & Beretvas, N. S. (2006). Longitudinal Rasch modeling in the context of psychotherapy outcomes assessment. Applied Psychological Measurement, 30, 100120.CrossRefGoogle Scholar
Pinheiro, J. C., & Bates, D. M. (1995). Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of computational and Graphical Statistics, 4 (1), 1235.CrossRefGoogle Scholar
Rabe-Hesketh, S., & Skrondal, A. (2008). Multilevel and longitudinal modeling using Stata, New York: STATA Press.Google Scholar
Rabe-Hesketh, S.Skrondal, A., & Pickles, A. (2004). GLLAMM manual, Oakland/Berkeley: University of California/Berkeley Electronic Press.Google Scholar
Raudenbush, S. W., & Bryk, A. S. (1985). Empirical Bayes meta-analysis. Journal of Educational and Behavioral Statistics, 10, 7598.CrossRefGoogle Scholar
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods, Thousand Oaks, CA: Sage.Google Scholar
Raudenbush, S. W.Bryk, A. S., & Congdon, R. (2004). HLM 6 for windows (computer software), Lincolnwood, IL: Scientific Software International.Google Scholar
Raudenbush, S. W., & Liu, X. (2000). Statistical power and optimal design for multisite randomized trials. Psychological Methods, 5 (2), 199.CrossRefGoogle ScholarPubMed
Rijmen, F.Vansteelandt, K., & De Boeck, P. (2008). Latent class models for diary method data: Parameter estimation by local computations. Psychometrika, 73 (2), 167182.CrossRefGoogle ScholarPubMed
Rosseel, Y. (2012). Lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48 (2), 136. https://doi.org/10.18637/jss.v048.i02.CrossRefGoogle Scholar
Shang, Y. (2012). Measurement error adjustment using the SIMEX method: An application to student growth percentiles. Journal of Educational Measurement, 49, 446465.CrossRefGoogle Scholar
Shanno, D. F. (1970). Conditioning of quasi-Newton methods for function minimization. Mathematics of Computation, 24, 647656.CrossRefGoogle Scholar
Sirotnik, K., & Wellington, R. (1977). Incidence sampling: an integrated theory for “matrix sampling”. Journal of Educational Measurement, 14, 343399.CrossRefGoogle Scholar
Skrondal, A., & Kuha, J. (2012). Improved regression calibration. Psychometrika, 77, 649669.CrossRefGoogle Scholar
Skrondal, A., & Laake, P. (2001). Regression among factor scores. Psychometrika, 66 (4), 563575.CrossRefGoogle Scholar
Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models, Boca Raton: CRC Press.CrossRefGoogle Scholar
StataCorp., (2011). Stata statistical software: Release 12. College Station, TX: StataCorp LP.Google Scholar
Stoel, R. D.Garre, F. G.Dolan, C., & Van Den Wittenboer, G. (2006). On the likelihood ratio test in structural equation modeling when parameters are subject to boundary constraints. Psychological Methods, 11 (4), 439.CrossRefGoogle ScholarPubMed
Thompson, N., & Weiss, D. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research and Evaluation, 16(1). http://pareonline.net/getvn.asp?v=16&n=1.Google Scholar
Tian, W.Cai, L.Thissen, D., & Xin, T. (2013). Numerical differentiation methods for computing error covariance matrices in item response theory modeling: An evaluation and a new proposal. Educational and Psychological Measurement, 73 (3), 412439.CrossRefGoogle Scholar
van der Linden, W. J., & Glas, C. AW. (2010). Elements of adaptive testing (Statistics for social and behavioral sciences series), New York: Springer.Google Scholar
Verhelst, N.Creemers, B. P.Kyriakides, L., & Sammons, P. (2010). IRT models: Parameter estimation, statistical testing and application in EER. Methodological advances in educational effectiveness research, New York: Routledge. 183218.Google Scholar
von Davier, M., & Sinharay, S. (2007). An importance sampling EM algorithm for latent regression models. Journal of Educational and Behavioral Statistics, 32 (3), 233251.CrossRefGoogle Scholar
Wang, C. (2015). On latent trait estimation in multidimensional compensatory item response models. Psychometrika, 80, 428449.CrossRefGoogle ScholarPubMed
Wang, C.Kohli, N., & Henn, L. (2016). A second-order longitudinal model for binary outcomes: Item response theory versus structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal, 23, 455465.CrossRefGoogle Scholar
Wang, C., & Nydick, S. (2015). Comparing two algorithms for calibrating the restricted non-compensatory multidimensional IRT model. Applied Psychological Measurement, 39, 119134.CrossRefGoogle ScholarPubMed
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427450.CrossRefGoogle Scholar
Ye, F. (2016). Latent growth curve analysis with dichotomous items: Comparing four approaches. British Journal of Mathematical and Statistical Psychology, 69, 4361.CrossRefGoogle ScholarPubMed
Zwinderman, A. H. (1991). A generalized Rasch model for manifest predictors. Psychometrika, 56 (4), 589600.CrossRefGoogle Scholar