Bootstrap-Calibrated Interval Estimates for Latent Variable Scores in Item Response Theory

Yang Liu; Ji Seung Yang

doi:10.1007/s11336-017-9582-9

Bootstrap-Calibrated Interval Estimates for Latent Variable Scores in Item Response Theory

Published online by Cambridge University Press: 01 January 2025

Yang Liu and

Ji Seung Yang

Show author details

Yang Liu*: Affiliation:
Department of Human Development and Quantitative Methodology, University of Maryland
Ji Seung Yang: Affiliation:
Department of Human Development and Quantitative Methodology, University of Maryland
*: Correspondence should be made to Yang Liu, Department of Human Development and Quantitative Methodology, University of Maryland, 1230B Benjamin Building, College Park, MD 20742 USA. Email: yliu87@umd.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In most item response theory applications, model parameters need to be first calibrated from sample data. Latent variable (LV) scores calculated using estimated parameters are thus subject to sampling error inherited from the calibration stage. In this article, we propose a resampling-based method, namely bootstrap calibration (BC), to reduce the impact of the carryover sampling error on the interval estimates of LV scores. BC modifies the quantile of the plug-in posterior, i.e., the posterior distribution of the LV evaluated at the estimated model parameters, to better match the corresponding quantile of the true posterior, i.e., the posterior distribution evaluated at the true model parameters, over repeated sampling of calibration data. Furthermore, to achieve better coverage of the fixed true LV score, we explore the use of BC in conjunction with Jeffreys’ prior. We investigate the finite-sample performance of BC via Monte Carlo simulations and apply it to two empirical data examples.

Keywords

item response theory scoring predictive inference bootstrap

Information

Type: Original Paper
Information: Psychometrika , Volume 83 , Issue 2 , June 2018 , pp. 333 - 354

DOI: https://doi.org/10.1007/s11336-017-9582-9 [Opens in a new window]
Copyright: Copyright © 2017 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Albert, J.H., (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling, Journal of Educational and Behavioral Statistics, 17(3), 251–269.CrossRef Google Scholar

Baker, F.B., Kim, S-H, (2004). Item response theory: Parameter estimation techniques, Boca Raton, FL:CRC Press.CrossRef Google Scholar

Barndorff-Nielsen, O.E., Cox, D.R., (1996). Prediction and asymptotics, Bernoulli, 2(4), 319–340.CrossRef Google Scholar

Bartholomew, D. J., & Knott, M., (1999). Latent variable models and factor analysis. London: Edward Arnold (Kendall’s Library of Statistics 7).Google Scholar

Beran, R., (1990). Calibrating prediction regions, Journal of the American Statistical Association, 85(411), 715–723.CrossRef Google Scholar

Birch, M.W., (1964). A new proof of the pearson-fisher theorem, The Annals of Mathematical Statistics, 35(2), 817–824.CrossRef Google Scholar

Birnbaum, A., (1968). Some latent train models and their use in inferring an examinee’s ability. In Lord, F.M., Novick, M.R.(Eds.), Statistical theories of mental test scores, (pp 395–479). Reading, MA:Addison-Wesley.Google Scholar

Bishop, Y., Fienberg, S., Holland, P., (1975). Discrete multivariate analysis: Theory and practice. Cambridge, MA:The MIT Press.Google Scholar

Bock, R. D., & Aitkin, M., (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459.CrossRef Google Scholar

Bock, R.D., Lieberman, M., (1970). Fitting a response model for

n

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n$$\end{document}

dichotomously scored items. Psychometrika. 35(2), 179–197.CrossRef Google Scholar

Bock, R.D., Mislevy, R.J., (1982). Adaptive eap estimation of ability in a microcomputer environment, Applied Psychological Measurement, 6(4), 431–444.CrossRef Google Scholar

Bolt, D.M., (2005). Limited and full-information IRT estimation. In Maydeu-Olivares, A., McArdle, J. (Eds.), Contemporary psychometrics. (pp 27–71). New Jersey:Lawrence-Erlbaum.Google Scholar

Brent, R.P., (1973). Some efficient algorithms for solving systems of nonlinear equations, SIAM Journal on Numerical Analysis, 10(2), 327–344.CrossRef Google Scholar

Brown, L.D., Cai, T.T., DasGupta, A., (2001). Interval estimation for a binomial proportion, Statistical Science, 16(2), 101–117.CrossRef Google Scholar

Brown, L.D., Cai, T.T., Dasgupta, A., (2002). Confidence intervals for a binomial proportion and asymptotic expansions, Annals of Statistics, 30(1), 160–201.CrossRef Google Scholar

Cai, L., Thissen, D., & du Toit, S. H. C., (2011). IRTPRO for windows. Lincolnwood, IL: Scientific Software International.Google Scholar

Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M., (2006). Measurement error in nonlinear models: A modern perspective. Boca-Raton, FL: CRC press.CrossRef Google Scholar

Chalmers, R. P., (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. Retrieved from http://www.jstatsoft.org/v48/i06/.CrossRef Google Scholar

Chang, H-H, Stout, W., (1993). The asymptotic posterior normality of the latent trait in an IRT model, Psychometrika, 58(1), 37–52.CrossRef Google Scholar

Cheng, Y., Yuan, K-H, (2010). The impact of fallible item parameter estimates on latent trait recovery, Psychometrika, 75(2), 280–2912976519.CrossRef Google Scholar PubMed

Chernoff, H., (1952). A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, The Annals of Mathematical Statistics, 23(4), 493–507.CrossRef Google Scholar

Cox, C., (1984). An elementary introduction to maximum likelihood estimation for multinomial models: Birch’s theorem and the delta method, The American Statistician, 38(4), 283–287.CrossRef Google Scholar

Cox, D.R., Snell, E.J., (1968). A general definition of residuals, Journal of the Royal Statistical Society: Series B (Methodological), 30(2), 248–275.CrossRef Google Scholar

Curran, P.J., Hussong, A.M., (2009). Integrative data analysis: The simultaneous analysis of multiple data sets, Psychological Methods, 14(2), 81–1002777640.CrossRef Google Scholar PubMed

Datta, G.S., Mukerjee, R., (2004). Probability matching priors: Higher order asymptotics. New York:Springer.CrossRef Google Scholar

Efron, B., Tibshirani, R.J., (1994). An introduction to the bootstrap. Boca Raton, FL:CRC Press.CrossRef Google Scholar

Fonseca, G., Giummolè, F., Vidoni, P., (2014). Calibrating predictive distributions, Journal of Statistical Computation and Simulation, 84(2), 373–383.CrossRef Google Scholar

Haberman, S. J., (2006). Adaptive quadrature for item response models. Technical report no. 06-29, Educational Testing Service, Princeton, NJ.CrossRef Google Scholar

Han, K. T., (2012). Fixing the c parameter in the three-parameter logistic model. Practical Assessment, Research & Evaluation. http://pareonline.net/getvn.asp?v=17&n=1.Google Scholar

Hoeffding, W., (1963). Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association, 58(301), 13–30.CrossRef Google Scholar

Houts, C.R., Cai, L., (2013). flexMIRT user’s manual version 2: Flexible multilevel multidimensional item analysis and test scoring [Computer software manual]. Chapel Hill, NC:Vector Psychometric Group.Google Scholar

Irwin, D.E., Stucky, B., Langer, M.M., Thissen, D., DeWitt, E.M., Lai, J-S, DeWalt, D.A., (2010). An item response analysis of the pediatric PROMIS anxiety and depressive symptoms scales, Quality of Life Research, 19(4), 595–6073158603.CrossRef Google Scholar PubMed

Jeffreys, H., (1946). An invariant form for the prior probability in estimation problems. In Proceedings of the royal society of London A: Mathematical, physical and engineering sciences (Vol. 186, pp. 453–461).Google Scholar PubMed

Lazarsfeld, P.F., (1950). The logical and mathematical foundation of latent structure analysis. In Stouffer, S.A., Guttman, L., Suchman, E.A., Lazarsfeld, P.F., Star, S.A., Clausen, J.A. (Eds.), Measurement and prediction. (pp 362–412). New York:Wiley.Google Scholar

Le Cam, L., Yang, G.L., (2000). Asymptotics in statistics: Some basic concepts. 2 New York:Springer.CrossRef Google Scholar

Lehmann, E., Casella, G., (1998). Theory of point estimation. 2 Berlin:Springer.Google Scholar

Liu, Y., Hannig, J., (2016). Generalized fiducial inference for binary logistic item response models, Psychometrika, 81(2), 290–324.CrossRef Google Scholar PubMed

Liu, Y., & Hannig, J., (2017). Generalized fiducial inference for logistic graded response models. Psychometrika. doi:https://doi.org/10.1007/s11336-017-9554-0.CrossRef Google Scholar

Magnus, B.E., Liu, Y., He, J., Quinn, H., Thissen, D., Gross, H.E., Reeve, B.B., (2016). Mode effects between computer self-administration and telephone interviewer-administration of the PROMIS pediatric measures, self-and proxy report, Quality of Life Research, 25, 1655–1665.CrossRef Google Scholar PubMed

McDonald, R.P., (1981). The dimensionality of tests and items, British Journal of Mathematical and Statistical Psychology, 34(1), 100–117.CrossRef Google Scholar

Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E., (1953). Equation of state calculations by fast computing machines, The Journal of Chemical Physics, 21(6), 1087–1092.CrossRef Google Scholar

Mislevy, R. J., Wingersky, M., & Sheehan, K. M., (1993). Dealing with uncertainty about item parameters: Expected response functions. Technical report no. 94-28, Educational Testing Service, Princeton, NJ.Google Scholar

Muenks, K., Wigfield, A., Yang, J. S., & O’Neal, C., (2017). How true is grit? Assessing its relations to high school and college students’ personality characteristics, self-regulation, engagement, and achievement. Journal of Educational Psychology, 109(5), 599–620.CrossRef Google Scholar

Muraki, E., (1992). A generalized partial credit model: Application of an EM algorithm, Applied Psychological Measurement, 16(2), 159–176.CrossRef Google Scholar

Muthén, B.O., (2002). Beyond SEM: General latent variable modeling, Behaviormetrika, 29(1), 81–117.CrossRef Google Scholar

Noel, Y., Dauvier, B., (2007). A beta item response model for continuous bounded responses, Applied Psychological Measurement, 31(1), 47–73.CrossRef Google Scholar

Patton, J.M., Cheng, Y., Yuan, K-H, Diao, Q., (2013). The influence of item calibration error on variable-length computerized adaptive testing, Applied Psychological Measurement, 37, 24–40.CrossRef Google Scholar

Patton, J.M., Cheng, Y., Yuan, K-H, Diao, Q., (2014). Bootstrap standard errors for maximum likelihood ability estimates when item parameters are unknown, Educational and Psychological Measurement, 74(4), 697–712.CrossRef Google Scholar

Patz, R.J., Junker, B.W., (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models, Journal of Educational and Behavioral Statistics, 24(2), 146–178.CrossRef Google Scholar

R Core Team. (2016). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/..Google Scholar

Rizopoulos, D., (2006). ltm: An R package for latent variable modeling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25.CrossRef Google Scholar

Rousseau, J., (2000). Coverage properties of one-sided intervals in the discrete case and application to matching priors, Annals of the Institute of Statistical Mathematics, 52(1), 28–42.CrossRef Google Scholar

Rupp, A.A., (2013). A systematic review of the methodology for person fit research in item response theory: Lessons about generalizability of inferences from the design of simulation studies, Psychological Test and Assessment Modeling, 55(1), 3–38.Google Scholar

Samejima, F., (1969). Estimation of ability using a response pattern of graded scores. In Psychometrika monograph no. 17. Richmond, VA: Psychometric Society.Google Scholar

San Martín, E., (2016). Identification of item response theory models. In van der Linden, W.J. (Eds.), Handbook of item response theory, volume two: Statistical tools. (pp 127–150). Boca Raton:CRC Press.Google Scholar

San Martín, E., De Boeck, P., (2015). What do you mean by a difficult item? On the interpretation of the difficulty parameter in a Rasch model. In Millsap, R., Bolt, D., van der Ark, L., Wang, W-C (Eds.), Quantitative psychology research. (pp 1–14). Berlin:Springer.Google Scholar

San Martín, E., Rolin, J-M, Castro, L.M., (2013). Identification of the 1PL model with guessing parameter: Parametric and semi-parametric results, Psychometrika, 78(2), 341–379.CrossRef Google Scholar PubMed

Schilling, S., Bock, R.D., (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature, Psychometrika, 70(3), 533–555.CrossRef Google Scholar

Sireci, S.G., Thissen, D., Wainer, H., (1991). On the reliability of testlet-based tests, Journal of Educational Measurement, 28(3), 237–247.CrossRef Google Scholar

Skrondal, A., Rabe-Hasketh, S., (2004). Generalized latent variable modeling. Boca Raton, FL:Chapman & Hall (Interdisciplinary Statistics Series).CrossRef Google Scholar

Thissen, D., Steinberg, L., (2009). Item response theory. In Millsap, R., Maydeu-Olivares, A. (Eds.), The Sage handbook of quantitative methods in psychology. (pp 148–177). London:Sage Publications .CrossRef Google Scholar

Thissen, D., & Wainer, H., (2001). Test scoring. Mahwah, NJ: Lawrence Erlbaum Associates, Inc..CrossRef Google Scholar

Vidoni, P., (1998). A note on modified estimative prediction limits and distributions, Biometrika, 85(4), 949–953.CrossRef Google Scholar

Vidoni, P., (2009). Improved prediction intervals and distribution functions, Scandinavian Journal of Statistics, 36(4), 735–748.CrossRef Google Scholar

Welch, B., Peers, H., (1963). On formulae for confidence points based on integrals of weighted likelihoods, Journal of the Royal Statistical Society: Series B (Methodological), 25(2), 318–329.CrossRef Google Scholar

Woods, C.M., Thissen, D., (2006). Item response theory with estimation of the latent population distribution using spline-based densities, Psychometrika, 71(2), 281–301.CrossRef Google Scholar PubMed

Wood, R., Wilson, D. T., Gibbons, R. D., Schilling, S. G., Muraki, E., & Bock, R. D., (2003). TESTFACT 4 for windows: Test scoring, item statistics, and full-information item factor analysis [Computer software]. Lincolnwood, IL: Scientific Software International.Google Scholar

Yang, J.S., Hansen, M., Cai, L., (2012). Characterizing sources of uncertainty in item response theory scale scores, Educational and Psychological Measurement, 72(2), 264–290.CrossRef Google Scholar

Article contents

Bootstrap-Calibrated Interval Estimates for Latent Variable Scores in Item Response Theory

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests