Hostname: page-component-745bb68f8f-5r2nc Total loading time: 0 Render date: 2025-01-07T18:55:04.811Z Has data issue: false hasContentIssue false

A Multivariate Multilevel Approach to the Modeling of Accuracy and Speed of Test Takers

Published online by Cambridge University Press:  01 January 2025

R. H. Klein Entink*
Affiliation:
University of Twente
J.-P. Fox
Affiliation:
University of Twente
W. J. van der Linden
Affiliation:
University of Twente
*
Requests for reprints should be sent to R.H. Klein Entink, Department of Research Methodology, Measurement and Data Analysis, University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands. E-mail: r.h.kleinentink@gw.utwente.nl
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Response times on test items are easily collected in modern computerized testing. When collecting both (binary) responses and (continuous) response times on test items, it is possible to measure the accuracy and speed of test takers. To study the relationships between these two constructs, the model is extended with a multivariate multilevel regression structure which allows the incorporation of covariates to explain the variance in speed and accuracy between individuals and groups of test takers. A Bayesian approach with Markov chain Monte Carlo (MCMC) computation enables straightforward estimation of all model parameters. Model-specific implementations of a Bayes factor (BF) and deviance information criterium (DIC) for model selection are proposed which are easily calculated as byproducts of the MCMC computation. Both results from simulation studies and real-data examples are given to illustrate several novel analyses possible with this modeling framework.

Type
Theory and Methods
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NC
This article distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Copyright
Copyright © 2008 The Author(s)

Footnotes

The authors thank Steven Wise, James Madison University, and Pere Joan Ferrando, Universitat Rovira i Virgili, for generously making available their data sets for the empirical examples in this paper.

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Petrov, B.N., Csaki, F. (Eds.), 2nd international symposium on information theory (pp. 267281). Budapest: Akademiai Kiado.Google Scholar
Albert, J.H. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational Statistics, 17, 251269.CrossRefGoogle Scholar
Barnard, J., McCullogh, R., Meng, X.-L. (2000). Modelling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Statistica Sinica, 10, 12811311.Google Scholar
Becker, P. (1999). Beyond the big five. Personality and Individual Differences, 26, 511530.CrossRefGoogle Scholar
Beguin, A.A., Glas, C.A.W. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66, 541562.CrossRefGoogle Scholar
Berger, J.O., Delampady, M. (1987). Testing precise hypotheses. Statistical Science, 2, 317352.Google Scholar
Boscardin, W.J., Zhang, X. (2004). Modeling the covariance and correlation matrix of repeated measures. In Gelman, A., Meng, X.-L. (Eds.), Applied Bayesian modeling and causal inference from incomplete-data perspectives (pp. 215226). New York: Wiley.CrossRefGoogle Scholar
Bradlow, E.T., Wainer, H., Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153168.CrossRefGoogle Scholar
Bridgeman, B., Cline, F. (2004). Effects of differentially time-consuming tests on computer-adaptive test scores. Journal of Educational Measurement, 41, 137148.CrossRefGoogle Scholar
Browne, W.J. (2006). MCMC algorithms for constrained variance matrices. Computational Statistics & Data Analysis, 50, 16551677.CrossRefGoogle Scholar
Chib, S., Greenberg, E. (1998). Analysis of multivariate probit models. Biometrika, 85, 347361.CrossRefGoogle Scholar
DeIorio, M., Robert, C.P. (2002). Discussion of Spiegelhalter et al. Journal of the Royal Statistical Society, Series B, 64, 629630.Google Scholar
Dickey, J.M. (1971). The weighted likelihood ratio, linear hypothesis on normal location parameters. The Annals of Mathematical Statistics, 42, 204223.CrossRefGoogle Scholar
Fox, J.-P. (2005). Multilevel IRT using dichotomous and polytomous items. British Journal of Mathematical and Statistical Psychology, 58, 145172.CrossRefGoogle Scholar
Fox, J.-P., Glas, C.A.W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66, 269286.CrossRefGoogle Scholar
Gelfand, A.E., Smith, A.F.M. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398409.CrossRefGoogle Scholar
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B. (2004). Bayesian data analysis, (2nd ed.). New York: Chapman & Hall/CRC.Google Scholar
Geman, S., Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721741.CrossRefGoogle ScholarPubMed
Glas, C.A.W., van der Linden, W.J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 23, 249263.Google Scholar
Goldstein, H. (2003). Multilevel statistical models, (3rd ed.). London: Arnold.Google Scholar
Harville, D.A. (1977). Maximum likelihood approaches to variance component estimation and related problems. Journal of the American Statistical Association, 72, 320338.CrossRefGoogle Scholar
Johnson, V.E., & Albert, J.H. (1999). Ordinal data modeling.CrossRefGoogle Scholar
Kass, R.E., Raftery, A.E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773795.CrossRefGoogle Scholar
Kennedy, M. (1930). Speed as a personality trait. Journal of Social Psychology, 1, 286298.CrossRefGoogle Scholar
Laird, N.M., Ware, J.H. (1982). Random-effects models for longitudinal data. Biometrics, 38, 963974.CrossRefGoogle ScholarPubMed
Lavine, M., Schervish, M.J. (1999). Bayes factors: What they are and what they are not. The American Statistician, 53, 119122.CrossRefGoogle Scholar
Lee, P.M. (2004). Bayesian statistics, an introduction, (3rd ed.). New York: Arnold.Google Scholar
Luce, D.R. (1986). Response times: Their role in inferring elementary mental organization, New York: Oxford University Press.Google Scholar
McCrea, R.R., Costa, P.T. (1997). Personality trait structure as a human universal. American Psychologist, 52, 509516.CrossRefGoogle Scholar
McCulloch, R.E., Rossi, P.E. (1994). An exact likelihood analysis of the multinomial probit model. Journal of Econometrics, 64, 207240.CrossRefGoogle Scholar
McCulloch, R.E., Polson, N.G., Rossi, P.E. (2000). A Bayesian analysis of the multinomial probit model with fully identified parameters. Journal of Econometrics, 99, 173193.CrossRefGoogle Scholar
Mellenbergh, G.J. (1994). A unidimensional latent trait model for continuous item responses. Multivariate Behavioral Research, 29, 223236.CrossRefGoogle ScholarPubMed
Newton, M.A., Raftery, A.E. (1994). Approximate Bayesian inference with the weighted likelhood bootstrap. Journal of the Royal Statistical Society Series B, 56, 358.CrossRefGoogle Scholar
Patz, R.J., Junker, B.W. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24, 146178.CrossRefGoogle Scholar
Rabe-Hesketh, S., Skrondal, A. (2001). Parameterization of multivariate random effects models for categorical data. Biometrics, 57, 12561264.CrossRefGoogle ScholarPubMed
Reinsel, G. (1983). Some results on multivariate autoregressive index models. Biometrika, 70, 145156.CrossRefGoogle Scholar
Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14, 271282.CrossRefGoogle Scholar
Schafer, J.L., Yucel, R.C. (2002). Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics, 11, 437457.CrossRefGoogle Scholar
Schnipke, D.L., Scrams, D.J. (1997). Modeling item response times with a two-state mixture model: A new method for measuring speededness. Journal of Educational Measurement, 34, 213232.CrossRefGoogle Scholar
Schnipke, D.L., Scrams, D.J. (2002). Exploring issues of examinee behavior: Insights gained from response-time analyses. In Mills, C., Fremer, M.P.J., Ward, W. (Eds.), Computer-based testing: Building the foundation for future assessments (pp. 237266). Mahwah: Lawrence Erlbaum Associates.Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461464.CrossRefGoogle Scholar
Searle, S.R., Casella, G., McCulloch, C.E. (1992). Variance components, New York: Wiley.CrossRefGoogle Scholar
Shah, A., Laird, N., Schoenfeld, D. (1997). Random-effects model for multiple characteristics with possibly missing data. Journal of the American Statistical Association, 92, 775779.CrossRefGoogle Scholar
Shi, J.Q., Lee, S.Y. (1998). Bayesian sampling based approach for factor analysis models with continuous and polytomous data. British Journal of Mathematical and Statistical Psychology, 51, 233252.CrossRefGoogle Scholar
Sinharay, S. (2005). Assessing fit of unidimensional item response theory models using a Bayesian approach. Journal of Educational Measurement, 42, 375394.CrossRefGoogle Scholar
Sinharay, S., Stern, H.S. (2002). On the sensitivity of Bayes factors to the prior distributions. The American Statistician, 56, 196201.CrossRefGoogle Scholar
Sinharay, S., Johnson, M.S., Stern, H.S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30, 298321.CrossRefGoogle Scholar
Snijders, T.A.B., Bosker, R.J. (1999). Multilevel analysis, an introduction to basic and advanced multilevel modeling, London: Sage Publishers.Google Scholar
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society Series B, 64, 583639.CrossRefGoogle Scholar
Tate, M.W. (1948). Individual differences in speed of response in mental test materials of varying degrees of difficulty. Educational and Psychological Measurement, 8, 353374.CrossRefGoogle ScholarPubMed
Thissen, D. (1983). Timed testing: An approach using item response theory. In Weiss, D. (Eds.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 179203). New York: Academic Press.Google Scholar
Vaida, F., Blanchard, S. (2005). Conditional Akaike information for mixed-effects models. Biometrika, 92, 351370.CrossRefGoogle Scholar
van der Linden, W.J. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioural Statistics, 31, 181204.Google Scholar
van der Linden, W.J. Using response times for item selection in adaptive testing. Journal of Educational and Behavioral Statistics, 2007.Google Scholar
van der Linden, W.J. (2008). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72, 287308.CrossRefGoogle Scholar
van der Linden, W.J., Guo, F. Bayesian procedures for identifying aberrant response-time patterns in adaptive testing. Psychometrika, 2008.CrossRefGoogle Scholar
van der Linden, W.J., Breithaupt, K., Chuah, S.C., Zang, Y. (2007). Detecting differential speededness in multistage testing. Journal of Educational Measurement, 44, 117130.CrossRefGoogle Scholar
Verdinelli, I., Wasserman, L. (1995). Computing Bayes factors using a generalization of the Savage–Dickey density ratio. Journal of the American Statistical Association, 90, 614618.CrossRefGoogle Scholar
Verhelst, N., Verstralen, H., Jansen, M. (1997). Models for time-limit tests. In van der Linden, W.J., Hambleton, R.K. (Eds.), Handbook of modern item response theory (pp. 169185). New York: Springer.CrossRefGoogle Scholar
Wise, S.L., Kong, X.J. (2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education, 18, 163183.CrossRefGoogle Scholar
Wise, S.L., Kong, X.J., & Pastor, D.A. (2007). Understanding correlates of rapid-guessing behavior in low-stakes testing: Implications for test development and measurement practice. Paper presented at the 2007 anual meeting of the National Council on Measurement in Education, Chicago, IL.Google Scholar