Hostname: page-component-5f745c7db-nzk4m Total loading time: 0 Render date: 2025-01-06T07:40:06.781Z Has data issue: true hasContentIssue false

Model-Based Measures for Detecting and Quantifying Response Bias

Published online by Cambridge University Press:  01 January 2025

R. Philip Chalmers*
Affiliation:
The University of Georgia
*
Correspondence should be made to R. Philip Chalmers, Department of Educational Psychology, The University of Georgia, 323 Aderhold Hall, Athens, GA 30602, USA. Email: rphilip.chalmers@gmail.com

Abstract

This paper proposes a model-based family of detection and quantification statistics to evaluate response bias in item bundles of any size. Compensatory (CDRF) and non-compensatory (NCDRF) response bias measures are proposed, along with their sample realizations and large-sample variability when models are fitted using multiple-group estimation. Based on the underlying connection to item response theory estimation methodology, it is argued that these new statistics provide a powerful and flexible approach to studying response bias for categorical response data over and above methods that have previously appeared in the literature. To evaluate their practical utility, CDRF and NCDRF are compared to the closely related SIBTEST family of statistics and likelihood-based detection methods through a series of Monte Carlo simulations. Results indicate that the new statistics are more optimal effect size estimates of marginal response bias than the SIBTEST family, are competitive with a selection of likelihood-based methods when studying item-level bias, and are the most optimal when studying differential bundle and test bias.

Type
Original Paper
Copyright
Copyright © The Psychometric Society 2018

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11336-018-9626-9) contains supplementary material, which is available to authorized users.

References

Bock, R. D., &Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm.Psychometrika, 46(4),443459.CrossRefGoogle Scholar
Bock, R. D., &Zimowski, M. F. (1997). Multiple group IRT.van der Linden, W. J., &Hambleton, R. K. Handbook of modern item response theory, 433448.New York:Springer.CrossRefGoogle Scholar
Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods.Applied Measurement in Education, 15(2),113141.CrossRefGoogle Scholar
Camilli, G., &Shepard, L. (1994)Methods for identifying biased test items.California:Sage.Google Scholar
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment.Journal of Statistical Software, 48(6),129.CrossRefGoogle Scholar
Chalmers, R. P. (2016)A differential response functioning framework for understanding item, bundle, and test bias (Unpublished doctoral dissertation).Toronto:York University.Google Scholar
Chalmers, R. P. (2018). Improving the Crossing-SIBTEST statistic for detecting nonuniform DIF.Psychometrika, 83(2),376386.CrossRefGoogle Scholar
Chalmers, R. P.. (2018). Numerical approximation of the observed information matrix with Oakes’ identity. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12127CrossRefGoogle Scholar
Chalmers, R. P.,Counsell, A., &Flora, D. B. (2016). It might not make a big DIF: Improved differential test functioning statistics that account for sampling variability.Educational and Psychological Measurement, 76(1),114140.CrossRefGoogle Scholar
Chalmers, R. P.,Pek, J., &Liu, Y. (2017). Profile-likelihood confidence intervals in item response theory models.Multivariate Behavioral Research, 52(5),533550.CrossRefGoogle ScholarPubMed
Chang, H.-H.,Mazzeo, J., &Roussos, L. (1996). DIF for polytomously scored items: An adaptation of the SIBTEST procedure.Journal of Educational Measurement, 33(3),333353.CrossRefGoogle Scholar
Cohen, J. (1988)Statistical power analysis for the behavioral sciences.2Hillsdale:Erlbaum.Google Scholar
Dempster, A. P.,Laird, N. M., &Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm.Journal of the Royal Statistical Society. Series B (Methodological), 39(1),138.CrossRefGoogle Scholar
Dorans, N. J., &Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test.Journal of Educational Measurement, 23(4),355368.CrossRefGoogle Scholar
Efron, B.Tibshirani, R. J. (1998)An introduction to the bootstrap.New York:Chapman & Hall.Google Scholar
Glas, C. A. W. (1998). Detection of differential item functioning using Lagrange multiplier tests.Statistica Sinica, 8,647667.Google Scholar
Guttman, L. (1945). A basis for analyzing test-retest reliability.Psychometrika, 10,255282.CrossRefGoogle ScholarPubMed
Hedges, L. V. (1981). Distribution theory for Glass’s estimator of effect size and related estimators.Journal of Educational Statistics, 6,107128.CrossRefGoogle Scholar
Hedges, L. V. (1982). Estimating effect size from a series of independent experiments.Psychological Bulletin, 92,490499.CrossRefGoogle Scholar
Holland, P. W., &Wainer, H. (1993)Differential item functioning.New York:Routledge.Google Scholar
Jiang, H., &Stout, W. (1998). Improved Type I error control and reduced estimation bias for DIF detection using SIBTEST.Journal of Educational and Behavioral Statistics, 23(4),291322.CrossRefGoogle Scholar
Kolen, M. J., &Brennan, R. L. (2004)Test equating, scaling, and linking.2New York:Springer.CrossRefGoogle Scholar
Li, H.-H., &Stout, W. (1996). A new procedure for detection of crossing DIF.Psychometrika, 61(4),647677.CrossRefGoogle Scholar
Lord, F. M. (1980)Applications of item response theory to practical testing problems.Hillsdale:Lawrence Erlbaum Associates.Google Scholar
Lord, F. M., &Novick, M. R. (1968)Statistical theory of mental test scores.Reading:Addison-Wesley.Google Scholar
Meade, A. W. (2010). A taxonomy of effect size measures for the differential functioning of items and scales.Journal of Applied Psychology, 95(4),728743.CrossRefGoogle ScholarPubMed
Metropolis, N.,Rosenbluth, A. W.,Teller, A. H., &Teller, E. (1953). Equations of state space calculations by fast computing machines.Journal of Chemical Physics, 21,10871091.CrossRefGoogle Scholar
Millsap, R. E. (2011)Statistical approaches to measurement invariance.New York:Routledge.Google Scholar
Mislevy, R. J.,Beaton, A. E.,Kaplan, B., &Sheehan, K. M. (1992). Estimating population characteristics from sparse matrix samples of item responses.Journal of Educational Measurement, 29(2),133161.CrossRefGoogle Scholar
Oakes, D. (1999). Direct calculation of the information matrix via the EM algorithm.Journal of the Royal Statistical Society. Series B (Statistical Methodology), 61(2),479482.CrossRefGoogle Scholar
Oshima, T. C.,Raju, N. S., &Flowers, C. P. (1997). Development and demonstration of multidimensional IRT-based internal measures of differential functioning of items and tests.Journal of Educational Measurement, 34(3),253272.CrossRefGoogle Scholar
Oshima, T. C.,Raju, N. S.,Flowers, C. P., &Slinde, J. A. (1998). Differential bundle functioning using the DFIT framework: Procedures for identifying possible sources of differential functioning.Applied Measurement in Education, 11(4),353369.CrossRefGoogle Scholar
Oshima, T. C.,Raju, N. S., &Nanda, A. O. (2006). A new method for assessing the statistical significance in the differential functioning of items and tests (DFIT) framework.Journal of Educational Measurement, 43(1),117.CrossRefGoogle Scholar
Penfield, R. D.., & Camilli, G. (2007). Differential item functioning and item bias. In Handbook of statistics (Vol. 26). Amsterdam: Elsevier B.V.Google Scholar
Raju, N. S. (1988). The area between two item characteristic curves.Psychometrika, 53,495502.CrossRefGoogle Scholar
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned area between two item response functions.Applied Psychological Measurement, 14,197207.CrossRefGoogle Scholar
Raju, N. S.,van der Linden, W. J., &Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests.Applied Psychological Measurement, 19(4),353368.CrossRefGoogle Scholar
Rubin, D. B. (1987)Multiple imputation for nonresponse in surveys.New York:Wiley.CrossRefGoogle Scholar
Shealy, R., &Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detect test bias/DTF as well as item bias/DIF.Psychometrika, 58(2),159194.CrossRefGoogle Scholar
Sigal, M. J., &Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte Carlo simulation.Journal of Statistics Education, 24(3),136156.CrossRefGoogle Scholar
Stark, S.,Chernyshenko, O. S., &Drasgow, F. (2004). Examining the effects of differential item (functioning and differential) test functioning on selection decisions: When are statistically significant effects practically important?.Journal of Applied Psychology, 89(3),497508.CrossRefGoogle ScholarPubMed
Thissen, D.,Steinberg, L., &Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models.Holland, P. W.Wainer, H. Differential item functioning, 67113.Hillsdale:Lawrence Erlbaum.Google Scholar
Thissen, D., &Wainer, H. (1990). Confidence envelopes for item response theory.Journal of Educational Statistics, 15(2),113128.CrossRefGoogle Scholar
Wainer, H. (1993). Model-based standardized measurement of an item’s differential impact.Holland, P. W., &Wainer, H. Differential item functioning, 123135.Mahwah:Erlbaum.Google Scholar
Woods, C. M. (2009). Empirical selection of anchors for tests of differential item functioning.Applied Psychological Measurement, 33(1),4257.CrossRefGoogle Scholar
Woods, C. M. (2011). DIF testing with an empirical- histogram approximation of the latent density for each group.Applied Measurement in Education, 24(3),256279.CrossRefGoogle Scholar
Supplementary material: File

Chalmers supplementary material

Chalmers supplementary material
Download Chalmers supplementary material(File)
File 141 KB