Random Item IRT Models

Paul De Boeck

doi:10.1007/s11336-008-9092-x

Random Item IRT Models

Published online by Cambridge University Press: 01 January 2025

Paul De Boeck

Show author details

Paul De Boeck*: Affiliation:
K.U.Leuven
*: Requests for reprints should be sent to Paul De Boeck, K.U.Leuven, Leuven, Belgium. E-mail: paul.deboeck@psy.kuleuven.be

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

It is common practice in IRT to consider items as fixed and persons as random. Both, continuous and categorical person parameters are most often random variables, whereas for items only continuous parameters are used and they are commonly of the fixed type, although exceptions occur. It is shown in the present article that random item parameters make sense theoretically, and that in practice the random item approach is promising to handle several issues, such as the measurement of persons, the explanation of item difficulties, and trouble shooting with respect to DIF. In correspondence with these issues, three parts are included. All three rely on the Rasch model as the simplest model to study, and the same data set is used for all applications. First, it is shown that the Rasch model with fixed persons and random items is an interesting measurement model, both, in theory, and for its goodness of fit. Second, the linear logistic test model with an error term is introduced, so that the explanation of the item difficulties based on the item properties does not need to be perfect. Finally, two more models are presented: the random item profile model (RIP) and the random item mixture model (RIM). In the RIP, DIF is not considered a discrete phenomenon, and when a robust regression approach based on the RIP difficulties is applied, quite good DIF identification results are obtained. In the RIM, no prior anchor sets are defined, but instead a latent DIF class of items is used, so that posterior anchoring is realized (anchoring based on the item mixture). It is shown that both approaches are promising for the identification of DIF.

Keywords

random effects generalizability measurement LLTM DIF

Type: Presidential Address
Information: Psychometrika , Volume 73 , Issue 4 , December 2008 , pp. 533 - 559

DOI: https://doi.org/10.1007/s11336-008-9092-x [Opens in a new window]
Copyright: Copyright © 2008 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Adams, R., Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22, 47–76.CrossRef Google Scholar

Albers, W., Does, R.J.M.M., Ombos, Tj., & Janssen, M.P.E. (1989). A stochastic growth model applied to tests of academic knowledge. Psychometrika, 54, 451–466.CrossRef Google Scholar

Andersen, E.B. (1980). Discrete statistical models with social science applications, Amsterdam: North-Holland.Google Scholar

Angoff, W.H., & Ford, S.F. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 10, 95–106.Google Scholar

Bates, D., Maechler, M., & Dai, B. (2008). The lme4 Package version 0.999375-26. http://cran.r-project.org/web/packages/lme4/lme4.pdf/.Google Scholar

Bejar, I.I. (1993). A generative approach to psychological and educational measurement. In Frederiksen, N., Mislevy, R.J. & Bejar, I.I. (Eds.), Test theory for a new generation of tests (pp. 323–359).Google Scholar

Bejar, I.I., Lawless, R.R., Morley, M.E., Wagner, M.E., Bennett, R.E., & Revuelta, J. (2003). A feasibility study of on-the-fly item generation in adaptive testing. Journal of Technology, Learning, and Assessment, 2, 1–29.Google Scholar

Bock, R.D., & Mislevy, R.J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431–444.CrossRef Google Scholar

Briggs, D.C., & Wilson, M. (2007). Generalizability in item response modeling. Journal of Educational Measurement, 44, 131–155.CrossRef Google Scholar

Camilli, G., & Shepard, L.A. (1994). Methods for identifying biased test items, Sage: Thousand Oaks.Google Scholar

Chen, Z., & Henning, G. (1985). Linguistic and cultural bias in proficiency tests. Language Testing, 2, 155–163.CrossRef Google Scholar

Cho, S.-J., & Rabe-Hesketh, S. (2008). Estimating item response models with random item parameters. Unpublished manuscript.Google Scholar

Clark, H.H. (1973). The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. Journal of Verbal Learning and Verbal Behavior, 12, 335–359.CrossRef Google Scholar

Coleman, E.B. (1964). Generalizing to a language population. Psychological Reports, 14, 219–226.CrossRef Google Scholar

De Boeck, P., & Wilson, M. (2004). Explanatory item response models, New York: Springer.CrossRef Google Scholar

De Boeck, P., Wilson, M., & Acton, S. (2005). A conceptual and psychometric framework for distinguishing categories and dimensions. Psychological Review, 112, 129–158.CrossRef Google Scholar PubMed

Dorans, N.J., & Holland, P.W. (1993). DIF detection and description: Mantal-Haenszel and standardization. In Holland, P.W., & Wainer, H. (Eds.), Differential item functioning (pp. 35–66). Hillsdale: Erlbaum.Google Scholar

Dorans, N.J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23, 355–368.CrossRef Google Scholar

Embretson, S.E. (1999). Generating items during testing: Psychometric issues and models. Psychometrika, 64, 407–433.CrossRef Google Scholar

Fischer, G.H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359–374.CrossRef Google Scholar

Frederickx, S., Tuerlinckx, F., De Boeck, P., & Magis, D. (2008). An item mixture model to detect differential item functioning. Unpublished manuscript, K.U. Leuven.Google Scholar

Gelman, A., & Rubin, D. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–511.CrossRef Google Scholar

Glas, C.A.W., & van der Linden, W.J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27, 247–261.CrossRef Google Scholar

Hively, W., Patterson, H.L., & Page, S.H. (1968). A “universe-defined” system of arithmetic achievement tests. Journal of Educational Measurement, 5, 275–290.CrossRef Google Scholar

Holland, P.W., & Thayer, D.T. (1988). Differential item performance and the Mantel-Haenszel procedure. In Wainer, H., & Braun, J.I. (Eds.), Test validity (pp. 129–145). Hillsdale: Lawrence Erlbaum.Google Scholar

Holland, P.W., & Wainer, H. (1993). Differential item functioning, Hillsdale: Lawrence Erlbaum.Google Scholar

Ironson, G.H., Homan, S., Willis, R., & Singer, B. (1984). The validity of item bias techniques with math word problems. Applied Psychological Measurement, 8, 391–396.CrossRef Google Scholar

Janssen, R., Tuerlinckx, F., Meulders, M., & De Boeck, P. (2000). A hierarchical IRT model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25, 285–306.CrossRef Google Scholar

Janssen, R., Schepers, J., & Peres, D. (2004). Models with item and item group predictors. In De Boeck, P., & Wilson, M. (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 189–212). New York: Springer.Google Scholar

Johnson, P.M., & Sinharay, S. (2005). Calibration of polytomous item families using Bayesian hierarchical modeling. Applied Psychological Measurement, 29, 369–400.CrossRef Google Scholar

Lunn, D., Thomas, A., Best, N., & Spiegelhalter, D. (2000). WinBUGS—a Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing, 10, 325–337.CrossRef Google Scholar

Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719–748.Google Scholar PubMed

McGraw, K.O., & Wong, S.P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1, 30–46.CrossRef Google Scholar

Millsap, R.E., & Everson, H.T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297–334.CrossRef Google Scholar

Raaijmakers, J., Schrijnemakers, J., & Gremmen, F. (1999). How to deal with “the language-as-fixed-effect-fallacy”: Common misconceptions and alternative solutions. Journal of Memory and Language, 41, 416–426.CrossRef Google Scholar

Popham, W.J. (1978). Criterion-referenced measurement, Englewood Cliffs: Prentice-Hall.Google Scholar

Rouder, J.N., Lu, J., Speckman, P.L., Sun, D., Morey, R.D., & Naveh-Benjamin, M. (2007). Signal detection models with random participant and random item effects. Psychometrika, 72, 621–624.CrossRef Google Scholar

Rousseeuw, P.J., & Leroy, A.M. (1987). Robust regression and outlier detection, New York: Wiley.CrossRef Google Scholar

Rousseeuw, P.J., & van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41, 212–223.CrossRef Google Scholar

Roussos, L.A., Templin, J.L., & Henson, R.A. (2007). Skills diagnosis using IRT-based latent class models. Journal of Educational Measurement, 44, 293–311.CrossRef Google Scholar

Savalei, V. (2006). Logistic approximation to the normal: The KL rationale. Psychometrika, 71, 763–767.CrossRef Google Scholar

Shrout, P.E., & Fleiss, J.L. (1979). Intraclass correlation: Uses in assessing reliability. Psychological Bulletin, 86, 420–428.CrossRef Google Scholar

Shepard, L., Camilli, G., & Williams, D.M. (1985). Validity of approximation techniques for detecting item bias. Journal of Educational Measurement, 22, 77–105.Google Scholar

Sinharay, S., Johnson, M.S., & Williamson, D.M. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Sciences, 28, 295–313.Google Scholar

Snijders, T.A.B., & Bosker, R.J. (1999). Multilevel analysis. An introduction to basic and advanced multilevel modeling, London: Sage.Google Scholar

StataCorp (2007). Stata statistical software: Release 10, College Station: StataCorp LP.Google Scholar

Swaminathan, H., & Rogers, H.J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.CrossRef Google Scholar

Tan, E.S., Ambergen, A.W., Does, R.J.M.M., & Imbos, Tj. (1999). Approximations of normal IRT models for change. Journal of Educational and Behavioral Statistics, 24, 208–223.CrossRef Google Scholar

Teresi, J.A. (2001). Statistical methods for examination of differential item functioning (DIF)—with applications to cross-cultural measurement of functional, physical and mental health. Journal of Mental Health and Aging, 7, 31–40.Google Scholar

Thierny, L., & Kadane, J.R. (1986). Accurate approximations for the posterior moments and marginal densities. Journal of the American Statistical Association, 81, 82–86.CrossRef Google Scholar

Thissen, D., Steinberg, L., & Gerrard, M. (1986). Beyond group-mean differences: The concept of item bias. Psychological Bulletin, 99, 118–128.CrossRef Google Scholar

Tuerlinckx, F., Rijmen, F., Verbeke, G., & De Boeck, P. (2006). Statistical inference in generalized linear mixed models: A review. British Journal of Mathematical and Statistical Psychology, 59, 225–255.CrossRef Google Scholar PubMed

Van den Noortgate, W., De Boeck, P., & Meulders, M. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28, 369–386.CrossRef Google Scholar

Verhelst, N.D., & Eggen, T.J.H.M. (1989). Psychometrische en statistische aspecten van peilingsonderzoek (PPON rapport 4). Arnhem: Cito.Google Scholar

Wang, W.-C. (2004). Effects of anchor item methods on the detection of differential item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221–261.Google Scholar

Zwinderman, A.H. (1991). A generalized Rasch model for manifest predictors. Psychometrika, 56, 589–600.CrossRef Google Scholar

Article contents

Random Item IRT Models

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests