Modeling and Testing Differential Item Functioning in Unidimensional Binary Item Response Models with a Single Continuous Covariate: A Functional Data Analysis Approach

Yang Liu; Brooke E. Magnus; David Thissen

doi:10.1007/s11336-015-9473-x

Modeling and Testing Differential Item Functioning in Unidimensional Binary Item Response Models with a Single Continuous Covariate: A Functional Data Analysis Approach

Published online by Cambridge University Press: 01 January 2025

Yang Liu ,

Brooke E. Magnus and

David Thissen

Show author details

Yang Liu*: Affiliation:
University of California, Merced
Brooke E. Magnus: Affiliation:
The University of North Carolina at Chapel Hill
David Thissen: Affiliation:
The University of North Carolina at Chapel Hill
*: Correspondence should be made to Yang Liu, School of Social Sciences, Humanities and Arts, University of California, Merced, 5200 North Lake Rd, Merced, CA 95343, USA. Email: yliu85@ucmerced.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Differential item functioning (DIF), referring to between-group variation in item characteristics above and beyond the group-level disparity in the latent variable of interest, has long been regarded as an important item-level diagnostic. The presence of DIF impairs the fit of the single-group item response model being used, and calls for either model modification or item deletion in practice, depending on the mode of analysis. Methods for testing DIF with continuous covariates, rather than categorical grouping variables, have been developed; however, they are restrictive in parametric forms, and thus are not sufficiently flexible to describe complex interaction among latent variables and covariates. In the current study, we formulate the probability of endorsing each test item as a general bivariate function of a unidimensional latent trait and a single covariate, which is then approximated by a two-dimensional smoothing spline. The accuracy and precision of the proposed procedure is evaluated via Monte Carlo simulations. If anchor items are available, we proposed an extended model that simultaneously estimates item characteristic functions (ICFs) for anchor items, ICFs conditional on the covariate for non-anchor items, and the latent variable density conditional on the covariate—all using regression splines. A permutation DIF test is developed, and its performance is compared to the conventional parametric approach in a simulation study. We also illustrate the proposed semiparametric DIF testing procedure with an empirical example.

Keywords

item response theory differential item functioning functional data analysis smoothing spline penalized maximum likelihood permutation test

Information

Type: Article
Information: Psychometrika , Volume 81 , Issue 2 , June 2016 , pp. 371 - 398

DOI: https://doi.org/10.1007/s11336-015-9473-x [Opens in a new window]
Copyright: Copyright © 2015 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Abrahamowicz, M., Ramsay, J.O. (1992). Multicategorical spline model for item response theory. Psychometrika, 57(1), 5–27CrossRef Google Scholar

Aitkin, M. (1999). A general maximum likelihood analysis of variance components in generalized linear models. Biometrics, 55(1), 117–128CrossRef Google Scholar PubMed

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723CrossRef Google Scholar

Angoff, W.H. (1993). Perspectives on differential item functioning methodology. In Holland, P.W., Wainer, H. (Eds.), Differential Item Functioning (pp. 3–23), Hillsdale, NJ: Lawrence Erlbaum AssociatesGoogle Scholar

Bauer, D.J., Hussong, A.M. (2009). Psychometric approaches for developing commensurate measures across independent studies: Traditional and new models. Psychological Methods, 14(2), 101–125CrossRef Google Scholar PubMed

Bedrick, E.J., Tsai, C.-L. (1994). Model selection for multivariate regression in small samples. Biometrics, 50(1), 226–231CrossRef Google Scholar

Benjamini, Y., Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289–300CrossRef Google Scholar

Bock, R.D., Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459CrossRef Google Scholar

Bock, R.D., Wainer, H., Petersen, A., Thissen, D., Murray, J., Roche, A. (1973). A parameterization for individual human growth curves. Human Biology, 45(1), 63–80Google Scholar PubMed

Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika, 75(1), 33–57CrossRef Google Scholar

Currie, I.D., Durban, M., Eilers, P.H. (2006). Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(2), 259–280CrossRef Google Scholar

Eilers, P. H. & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2), 89–102.CrossRef Google Scholar

Eysenck, S.B., Eysenck, H.J., Barrett, P. (1985). A revised version of the psychoticism scale. Personality and Individual Differences, 6(1), 21–29CrossRef Google Scholar

Glas, C.A.W. (1998). Detection of differential item functioning using Lagrange multiplier tests. Statistica Sinica, 8(3), 647–667Google Scholar

Glas, C.A.W. (2001). Differential item functioning depending on general covariates. In Boomsma, A., van Duijn, M.A.J., Snijders, T.A.B. (Eds.), Essays on item response theory (pp. 131–148), New York, NY: SpringerCrossRef Google Scholar

Green, P.J., Silverman, B.W. (1994). Nonparametric regression and generalized linear models: A roughness penalty approach, Boca Raton, FL: CRC PressCrossRef Google Scholar

Hastie, T.J., Tibshirani, R.J. (1990). Generalized additive models, Boca Raton, FL: CRC PressGoogle Scholar

Holland, P.W., Thayer, D.T. (1988). Differential item performance and the Mantel–Haenszel procedure. In Wainer, H., Braun, H.I. (Eds.), Test validity (pp. 129–145), Hillsdale, NJ: Lawrence Erlbaum AssociatesGoogle Scholar

Jöreskog, K.G., Goldberger, A.S. (1975). Estimation of a model with multiple indicators and multiple causes of a single latent variable. Journal of the American Statistical Association, 70, 631–639Google Scholar

Lord, F.M. (1980). Applications of item response theory to practical testing problems, Hillsdale, NJ: Lawrence ErlbaumGoogle Scholar

Millsap, R.E., Everson, H.T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17(4), 297–334CrossRef Google Scholar

Moustaki, I. (2003). A general class of latent variable models for ordinal manifest variables with covariate effects on the manifest and latent variables. British Journal of Mathematical and Statistical Psychology, 56(2), 337–357CrossRef Google Scholar PubMed

Muthén, L.K., Muthén, B.O. (2012). Mplus User’s Guide, Los Angeles, CA: Muthén & MuthénGoogle Scholar

Ramsay, J. (1998). Estimating smooth monotone functions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(2), 365–375CrossRef Google Scholar

Ramsay, J., Winsberg, S. (1991). Maximum marginal likelihood estimation for semiparametric item analysis. Psychometrika, 56(3), 365–379CrossRef Google Scholar

Ramsay, J.O., Silverman, B.W. (1997). Functional data analysis, New York, NY: SpringerCrossRef Google Scholar

Rossi, N., Wang, X., Ramsay, J.O. (2002). Nonparametric item response function estimates with the EM algorithm. Journal of Educational and Behavioral Statistics, 27(3), 291–317CrossRef Google Scholar

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464CrossRef Google Scholar

Skrondal, A., Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models, Boca Raton, FL: CRC PressCrossRef Google Scholar

Stein, C.M. (1981). Estimation of the mean of a multivariate normal distribution. The Annals of Statistics, 9(6), 1135–1151CrossRef Google Scholar

Swaminathan, H., Rogers, H.J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370CrossRef Google Scholar

Thissen, D. (1986). Non-monotonic item characteristic curves. In Invited presentation at the Annual Meeting of the American Educational Association, San Francisco, CA, USA, April 1986.Google Scholar

Thissen, D., Steinberg, L., Kuang, D. (2002). Quick and easy implementation of the Benjamini–Hochberg procedure for controlling the false positive rate in multiple comparisons. Journal of Educational and Behavioral Statistics, 27(1), 77–83CrossRef Google Scholar

Thissen, D., Steinberg, L., Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In Holland, P.W., Wainer, H. (Eds.), Differential Item Functioning (pp. 67–113), Hillsdale, NJ: Lawrence Erlbaum AssociatesGoogle Scholar

Van der Vaart, A.W. (2000). Asymptotic statistics, New York, NY: Cambridge University PressGoogle Scholar

Varni, J.W., Thissen, D., Stucky, B.D., Liu, Y., Magnus, B., Quinn, H. et.al (2014). PROMIS parent proxy report scales for children ages 5–7 years: An item response theory analysis of differential item functioning across age groups. Quality of Life Research, 23(1), 349–361CrossRef Google Scholar PubMed

Wang, W.-C., Yeh, Y-L (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27(6), 479–498CrossRef Google Scholar

Wang, X., Bradlow, E. T. & Wainer, H. (2004). User’s guide for SCORIGHT (Version 3.0): A computer program for scoring tests built of testlets including a module for covariate analysis. Research Report 04–49. Princeton, NJ: Educational Testing Services.Google Scholar

Wang, X., Bradlow, E. T., & Wainer, H. (2005). User’s guide for SCORIGHT (version 3.0): A computer program for scoring tests built of testlets including a module for covariate analysis. ETS Research Report Series, 2004(2), 1–59.Google Scholar

Woods, C.M. (2006). Ramsay-curve item response theory (RC-IRT) to detect and correct for nonnormal latent variables. Psychological Methods, 11(3), 253CrossRef Google Scholar PubMed

Woods, C.M. (2009). Evaluation of MIMIC-model methods for DIF testing with comparison to two-group analysis. Multivariate Behavioral Research, 44, 1–27CrossRef Google Scholar PubMed

Woods, C.M., Grimm, K.J. (2011). Testing for nonuniform differential item functioning with multiple indicator multiple cause models. Applied Psychological Measurement, 35, 339–361CrossRef Google Scholar

Woods, C.M., Thissen, D. (2006). Item response theory with estimation of the latent population distribution using spline-based densities. Psychometrika, 71(2), 281–301CrossRef Google Scholar PubMed

Article contents

Modeling and Testing Differential Item Functioning in Unidimensional Binary Item Response Models with a Single Continuous Covariate: A Functional Data Analysis Approach

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests