Hostname: page-component-745bb68f8f-cphqk Total loading time: 0 Render date: 2025-01-08T16:13:48.616Z Has data issue: false hasContentIssue false

Differential Item Functioning Analyses of the Patient-Reported Outcomes Measurement Information System (PROMIS®) Measures: Methods, Challenges, Advances, and Future Directions

Published online by Cambridge University Press:  01 January 2025

Jeanne A. Teresi*
Affiliation:
Columbia University Stroud Center Hebrew Home at Riverdale; RiverSpring Health Weill Cornell Medical Center New York State Psychiatric Institute
Chun Wang
Affiliation:
University of Washington College of Education
Marjorie Kleinman
Affiliation:
New York State Psychiatric Institute
Richard N. Jones
Affiliation:
Brown University
David J. Weiss
Affiliation:
University of Minnesota
*
Correspondence should be made to Jeanne A. Teresi, Columbia University Stroud Center, New York, NY, USA. Email: teresimeas@aol.com; jat61@cumc.columbia.edu

Abstract

Several methods used to examine differential item functioning (DIF) in Patient-Reported Outcomes Measurement Information System (PROMIS®) measures are presented, including effect size estimation. A summary of factors that may affect DIF detection and challenges encountered in PROMIS DIF analyses, e.g., anchor item selection, is provided. An issue in PROMIS was the potential for inadequately modeled multidimensionality to result in false DIF detection. Section 1 is a presentation of the unidimensional models used by most PROMIS investigators for DIF detection, as well as their multidimensional expansions. Section 2 is an illustration that builds on previous unidimensional analyses of depression and anxiety short-forms to examine DIF detection using a multidimensional item response theory (MIRT) model. The Item Response Theory-Log-likelihood Ratio Test (IRT-LRT) method was used for a real data illustration with gender as the grouping variable. The IRT-LRT DIF detection method is a flexible approach to handle group differences in trait distributions, known as impact in the DIF literature, and was studied with both real data and in simulations to compare the performance of the IRT-LRT method within the unidimensional IRT (UIRT) and MIRT contexts. Additionally, different effect size measures were compared for the data presented in Section 2. A finding from the real data illustration was that using the IRT-LRT method within a MIRT context resulted in more flagged items as compared to using the IRT-LRT method within a UIRT context. The simulations provided some evidence that while unidimensional and multidimensional approaches were similar in terms of Type I error rates, power for DIF detection was greater for the multidimensional approach. Effect size measures presented in Section 1 and applied in Section 2 varied in terms of estimation methods, choice of density function, methods of equating, and anchor item selection. Despite these differences, there was considerable consistency in results, especially for the items showing the largest values. Future work is needed to examine DIF detection in the context of polytomous, multidimensional data. PROMIS standards included incorporation of effect size measures in determining salient DIF. Integrated methods for examining effect size measures in the context of IRT-based DIF detection procedures are still in early stages of development.

Type
Application Reviews and Case Studies
Copyright
Copyright © 2021 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ackerman, T. A.(1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective.Journal of Educational Measurement,29,6791CrossRefGoogle Scholar
Ankenmann, R. D.,Witt, E. A., &Dunbar, S. B.(1999). An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning.Journal of Educational Measurement,36,277300CrossRefGoogle Scholar
Baker, F. B.(1995). EQUATE 2.1: Computer program for equating two metrics in item response theory,Madison:University of Wisconsin, Laboratory of Experimental Design.Google Scholar
Bauer, D.,Belzak, W., &Cole, V.(2019). Simplifying the assessment of measurement invariance over multiple background variables: Using regularized moderated nonlinear factor analysis to detect differential item functioning.Structural Equation Modeling A: Multidisciplinary Journal,Google ScholarPubMed
Belzak, W., &Bauer, D.(2020). Improving the assessment of measurement invariance: Using regularization to select anchor items and identify differential item functioning.Psychological Methods,CrossRefGoogle ScholarPubMed
Benjamini, Y., &Hochberg, Y.(1995). Controlling for the false discovery rate: A practical and powerful approach to multiple testing.Journal of the Royal Statistical Society, Series B,57,289300CrossRefGoogle Scholar
Bjorner, J. B.,Rose, M.,Gandek, B.,Stone, A. A.,Junghaenel, D. U., &Ware, J. E.(2014). Difference in method of administration did not significantly impact item response: An IRT-based analysis from the Patient-Reported Outcomes Measurement Information System (PROMIS) initiative.Quality of Life Research,23,217227CrossRefGoogle Scholar
Bolt, D. M.(2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods.Applied Measurement in Education,15,113141CrossRefGoogle Scholar
Boorsboom, D.(2006). Commentary: When does measurement invariance matter?.Medical Care,44(11),S17681CrossRefGoogle Scholar
Boorsboom, D.,Mellenbergh, G. J., &van Heerdon, J.(2002). Different kinds of DIF: A distinction between absolute and relative forms of measurement invariance and bias.Applied Psychological Measurement,26,433450CrossRefGoogle Scholar
Bulut, O., &Suh, Y.(2017). Detecting multidimensional differential item functioning with the multiple indicators multiple causes model, the item response theory likelihood ratio test, and logistic regression.Frontiers in Education,2,51CrossRefGoogle Scholar
Byrne, B. M.,Shavelson, R. J., &Muthén, B. O.(1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance.Psychological Bulletin,105,456566CrossRefGoogle Scholar
Cai, L.(2008). SEM of another flavour: Two new applications of the supplemented EM algorithm.British Journal of Mathematical and Statistical Psychology,61,309329CrossRefGoogle ScholarPubMed
Cai, L.(2013). FlexMIRT version 2: Flexible multilevel multidimensional item analysis and test scoring [Computer software],Chapel Hill, NC:Vector Psychometric Group.Google Scholar
Cai, L.Thissen, D. & du Toit, S. H. C. (2011). IRTPRO: Flexible, multidimensional, multiple categorical IRT Modeling [Computer software]. Lincolnwood, IL: Scientific Software International Inc.Google Scholar
Candell, G. L., &Drasgow, F.(1988). An iterative procedure for linking metrics and assessing item bias in item response theory.Applied Psychological Measurement,12,253260CrossRefGoogle Scholar
Carle, A. C.,Cella, D.,Cai, L.,Choi, S. W.,Crane, P. K.,Curtis, S. M., &Hays, R.(2011). Advancing PROMIS’s methodology: Results of the third PROMIS Psychometric Summit.Expert Review of Pharmacoeconomics & Outcome Research,11(6),677684CrossRefGoogle ScholarPubMed
Cella, D.Yount, S.Rothrock, N.Gershon, R., Cook, K., Reeve, B., Ader, D., Fries, J. F., Bruce, B., & Rose, M., on behalf of the PROMIS Cooperative Group. (2007). The patient-reported outcomes measurement information system (PROMIS): Progress of an NIH roadmap cooperative group during its first two years. Medical Care, 45(5 Suppl 1), S3–S11.CrossRefGoogle Scholar
Chalmers, R. P.(2012). mirt: A multidimensional item response theory package for the R environment.Journal of statistical software,48(6),129CrossRefGoogle Scholar
Chalmers, R. P.(2016). A differential response functioning framework for understanding item, bundle, and test bias. Doctoral Dissertation, York University, Toronto, Ontario. https://pdfs.semanticscholar.orgGoogle Scholar
Chalmers, R. P.(2018). Model-based measures for detecting and quantifying response bias.Psychometrika,83,696732CrossRefGoogle ScholarPubMed
Chalmers, R. P.,Counsell, A., &Flora, D. B.(2016). It might not make a big DIF: Improved differential test functioning statistics that account for sampling variability.Educational and Psychological Measurement,76,114140CrossRefGoogle Scholar
Chang, Y-W,Hsu, N-J, &Tsai, R-C(2017). Unifying differential item functioning in factor analysis for categorical data under a discretization of a normal variant.Psychometrika,82(2),382406CrossRefGoogle Scholar
Chen, J-H,Chen, C-T, &Shih, C-L(2013). Improving the control of type I error rate in assessing differential item functioning for hierarchical generalized linear models when impact is present.Applied Psychological Measurement,38,1836CrossRefGoogle Scholar
Cheng, C.-P.,Chen, C-C, &Shih, C-L(2020). An exploratory strategy to identify and define sources of differential item functioning.Applied Psychological Measurement,4,548560Google Scholar
Cheng, Y.,Shao, C., &Lathrop, Q. N.(2016). The mediated MIMIC model for understanding the underlying mechanisms of DIF.Educational and Psychological Measurement,76(1),4363CrossRefGoogle ScholarPubMed
Cheung, G. W., &Rensvold, R. B.(2003). Evaluating goodness-of-fit indexes for testing measurement invariance.Structural Equation Modeling,9,233255CrossRefGoogle Scholar
Choi, S. W.,Gibbons, L. E., &Crane, P. K.(2011). lordif: An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations.Journal of Statistical Software,39(8),130CrossRefGoogle Scholar
Choi, S. W.,Reise, S. P.,Pilkonis, P. A.,Hays, R. D., &Cella, D.(2010). Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms.Quality of Life Research,19,125136CrossRefGoogle ScholarPubMed
Clauser, B. E.,Mazor, K. M., &Hambleton, R. K.(1993). The effects of purification of the matching criterion on the identification of DIF using the Mantel–Haenszel procedure.Applied Measurement in Education,6,269279CrossRefGoogle Scholar
Cohen, A. S.,Kim, S-H, &Baker, F. B.(1993). Detection of differential item functioning in the graded response model.Applied Psychological Measurement,17,335350CrossRefGoogle Scholar
Cohen, P.,Cohen, J.,Teresi, J.,Marchi, P., &Velez, N.(1990). Problems in the measurement of latent variables in structural equation causal models.Applied Psychological Measurement,14(2),183196CrossRefGoogle Scholar
Crane, P. K.,Gibbons, L. E.,Jolley, L., &van Belle, G.(2006). Differential item functioning analysis with ordinal logistic regression techniques: Difdetect and difwithpar.Medical Care,44,S115S123CrossRefGoogle ScholarPubMed
Crane, P. K.,Gibbons, L. E.,Ocepek-Welikson, K.,Cook, K.,Cella, D., &Teresi, J. A.(2007). A comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression.Quality of Life Research,16,6984CrossRefGoogle ScholarPubMed
Crane, P. K.,van Belle, G., &Larson, E. B.(2004). Test bias in a cognitive test: Differential item functioning in the CASI.Statistics in Medicine,23,241256CrossRefGoogle Scholar
Culpepper, S. A.,Aguinis, H.,Kern, J. L., &Millsap, R.(2019). High-stakes testing case study: A latent variable approach for assessing measurement and prediction invariance.Psychometrika,84,285309CrossRefGoogle ScholarPubMed
DeMars, C. E.(2010). Type 1 error inflation for detecting DIF in the presence of impact.Educational and Psychological Measurement,70,961972CrossRefGoogle Scholar
DeMars, C. E.(2015). Modeling DIF for simulations: Continuous or categorical secondary trait?.Psychological Test and Assessment Modeling,57,279300Google Scholar
Edelen, M.,Stucky, B., &Chandra, A.(2015). Quantifying “problematic” DIF within an IRT framework: Application to a cancer stigma index.Quality of Life Research,24,95103CrossRefGoogle ScholarPubMed
Egberink, I. JL.,Meijer, R. R., &Tendeiro, J. N.(2015). Investigating measurement invariance in computer-based personality testing: The impact of using anchor items on effect size indices.Educational and Psychological Measurement,75,126145CrossRefGoogle ScholarPubMed
Finch, H.(2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel–Haenszel, SIBTEST and the IRT likelihood ratio test.Applied Psychological Measurement,29,278295CrossRefGoogle Scholar
Fleer, P. F. (1993). A Monte Carlo assessment of a new measure of item and test bias (p. 2266, Vol. 54, No. 04B), Illinois Institute of Technology, Dissertation Abstracts International.Google Scholar
Flowers, C. P.,Oshima, T. C., &Raju, N. S.(1999). A description and demonstration of the polytomous DFIT framework.Applied Psychological Measurement,23, 309332CrossRefGoogle Scholar
Furlow, C. F.,Ross, T. R., &Gagné, P.(2009). The impact of multidimensionality on the detection of differential bundle functioning using simultaneous item bias test.Applied Psychological Measurement,33(6),441464CrossRefGoogle Scholar
Gelin, M. N., &Zumbo, B. D.(2003). Differential item functioning results may change depending on how an item is scored: An illustration with the center for epidemiologic studies depression scale.Educational and Psychological Measurement,63(1),6574CrossRefGoogle Scholar
González-Betanzos, F., &Abad, F. J.(2012). The effects of purification and the evaluation of differential item functioning with the likelihood ratio test.Methodology: European Journal of Research Methods for the Behavioral and Social Sciences,8,130145CrossRefGoogle Scholar
Gómez-Benito, J.,Dolores-Hidalgo, M., &Zumbo, B. D.(2013). Effectiveness of combining statistical tests and effect sizes when using logistic discriminant function regression to detect differential item functioning for polytomous items.Educational and Psychological Measurement,73,875897CrossRefGoogle Scholar
Gregorich, S. E.(2006). Do self-report instruments allow meaningful comparisons across diverse population groups?: Testing measurement invariance using the confirmatory factor analysis framework.Medical Care,44(11),S78S94CrossRefGoogle Scholar
Hambleton, R. K.,Swaminathan, H., &Rogers, H. J.(1991). Fundamentals of item response theory,Newbury Park, California:Sage Publications Inc..Google Scholar
Herrel, F. E. (2009). Design; design package. R package version 2:3.0. Retrieved from http://CRANR-project.org/package=DesignGoogle Scholar
Hidalgo, M. D.,Gomez-Benito, J.,Zumbo, B. D.Binary logistic regression analysis for detecting differential item functioning: Effectiveness of R2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\text{R}^{{2}}$$\end{document} and delta log odds ratio effect size measures(2014). Educational and Psychological Measurement,74,927949CrossRefGoogle Scholar
Houts, C. R., &Cai, L.(2013). FlexMIRT user’s manual version 2: Flexible multilevel multidimensional item analysis and test scoring,Chapel Hill, NC:Vector Psychometric Group.Google Scholar
Jensen, R. E.,Moinpour, C. M.,Keegan, T. H. M.,Cress, R. D.,Wu, X-C,Paddock, L. A., &Potosky, A. L.(2016). The Measuring Your Health Study: Leveraging community-based cancer registry recruitment to establish a large, diverse cohort of cancer survivors for analyses of measurement equivalence and validity of thepatient-reported Outcomes Measurement Information System®(PROMIS®) short form items.Psychological Test and Assessment Modeling,58(1),99117Google Scholar
Jensen, R. E.,King-Kallimanis, B. L.,Sexton, E.,Reeve, B. B.,Moinpour, C. M.,Potosky, A. L.,Teresi, J. A.Measurement properties of the PROMIS®\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$^{\textregistered }$$\end{document} Sleep Disturbance short form in a large, ethnically diverse cancer cohort(2016). Psychological Test and Assessment Modeling,58(2),353370Google Scholar
Jin, K. Y.,Chen, H. F., &Wang, W. C.(2018). Using odds ratios to detect differential item functioning.Applied Psychological Measurement,42,613629CrossRefGoogle ScholarPubMed
Jodoin, M. G., &Gierl, M. J.(2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection.Applied Measurement in Education,14,329349CrossRefGoogle Scholar
Jones, R. N.(2006). Identification of measurement differences between English and Spanish language versions for the Mini-Mental State Examination: Detecting differential item functioning using MIMIC modeling.Medical Care,44 11 Suppl 3S124S133CrossRefGoogle ScholarPubMed
Jones, R. N.(2019). Differential item functioning and its relevance to epidemiology.Current Epidemiology Reports,CrossRefGoogle ScholarPubMed
Jones, R. N.,Tommet, D.,Ramirez, M.,Jensen, R. E.,Teresi, J. A.Differential item functioning in Patient Reported Outcomes Measurement Information System (PROMIS®\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$^{\textregistered }$$\end{document}) Physical Functioning short forms: Analyses across ethnically diverse groups(2016). Psychological Test and Assessment Modeling,58(2),371402Google Scholar
Jöreskog, K. G.(1971). Simultaneous factor analysis in several populations.Psychometrika,36(4),408426CrossRefGoogle Scholar
Jöreskog, K., &Goldberger, A.(1975). Estimation of a model of multiple indicators and multiple causes of a single latent variable.Journal of the American Statistical Association,10,631639Google Scholar
Jöreskog, K. G., &Moustaki, I.(2001). Factor analysis of ordinal variables: A comparison of three approaches.Multivariate Behavioral Research,36(3),347387CrossRefGoogle ScholarPubMed
Jöreskog, K., &Sorbom, D.(1996). LISREL8: Analysis of linear structural relationships: Users Reference Guide,Lincolnwood:Scientific Software International Inc..Google Scholar
Junker, B. W.(1991). Essential independence and likelihood-based ability estimation for polytomous items.Psychometrika,56,255278CrossRefGoogle Scholar
Kahraman, N.,DeBoeck, P., &Janssen, R.(2009). Modeling DIF in complex response data using test design strategies.International Journal of Testing,8,151166CrossRefGoogle Scholar
Kim, E. S., &Yoon, M.(2011). Testing measurement invariance: A comparison of multiple group categorical CFA and IRT.Structural Equation Modeling,18,212228CrossRefGoogle Scholar
Kim, E. S.,Yoon, M., &Lee, T.(2012). Testing measurement invariance using MIMIC: Likelihood ratio test with a critical value adjustment.Educational and Psychological Measurement,72,469492CrossRefGoogle Scholar
Kim, S-H, &Cohen, A. S.(1998). Detection of differential item functioning under the graded response model with the likelihood ratio test.Applied Psychological Measurement,22,345355CrossRefGoogle Scholar
Kim, S-H,Cohen, A. S.,Alagoz, C., &Kim, S.(2007). DIF detection and effect size measures for polytomously scored items.Journal of Educational Measurement,44(2),93116CrossRefGoogle Scholar
Kleinman, M., &Teresi, J. A.(2016). Differential item functioning magnitude and impact measures from item response theory models.Psychological Test and Assessment Modeling,58,7998Google ScholarPubMed
Kopf, J.,Zeileis, A., &Stobl, C.(2015). A framework for anchor methods and an iterative forward approach for DIF detection.Applied Psychological Measurement,39,83103CrossRefGoogle Scholar
Kopf, J.,Zeileis, A., &Stobl, C.(2015). Anchor selection strategies for DIF analysis: Review, assessment and new approaches.Educational and Psychological Measurement,75,2256CrossRefGoogle ScholarPubMed
Langer, M. M.(2008). A re-examination of Lord’s Wald test for differential item functioning using item response theory and modern error estimation (Doctoral dissertation, University of North Carolina at Chapel Hill library). http://search.lib.unc.edu/search?R=UNCb5878458.Google Scholar
Lee, S.,Bulut, O., &Suh, Y.(2017). Multidimensional extension of multiple indicators multiple causes models to detect DIF.Educational and Psychological Measurement,77(4),545569CrossRefGoogle ScholarPubMed
Li, Y.,Brooks, G. P., &Johanson, G. A.(2012). Item discrimination and Type I error in the detection of differential item functioning.Educational and Psychological Measurement,72,847861CrossRefGoogle Scholar
Liu, Y.,Magnus, B. E., &Thissen, D.(2016). Modeling and testing differential item functioning in unidimensional binary item response models with a single continuous covariate: A functional data analysis approach.Psychometrika,81,371398CrossRefGoogle ScholarPubMed
Lopez Rivas, G. E.,Stark, S., &Chernyshenko, O. S.(2009). The effects of referent item parameters on differential item functioning detection using the free baseline likelihood ratio test.Applied Psychological Measurement,33,251265CrossRefGoogle Scholar
Lord, F. M.(1980). Applications of item response theory to practical testing problems,Hillsdale, NJ:Lawrence Erlbaum.Google Scholar
Lord, F. M., Novick, M. R., & (with contributions by A. Birnbaum). (1968). Statistical theories of mental test scores. Reading Massachusetts: Addison-Wesley Publishing Company Inc.Google Scholar
Mazor, K. M.,Hambleton, R. K., &Clauser, B. E.(1998). Multidimensional DIF analyses: The effects of matching on unidimensional subtest scores.Applied Psychological Measurement,22,357367CrossRefGoogle Scholar
McDonald, R. P.(2000). A basis for multidimensional item response theory.Applied Psychological Measurement,24,99114CrossRefGoogle Scholar
Meade, A. W., &Lautenschlager, G. J.(2004). A comparison of IRT and CFA methodologies for establishing measurement equivalence.Organizational Research Methods,7,361388CrossRefGoogle Scholar
Meade, A.,Lautenschlager, G., &Johnson, E.(2007). A Monte Carlo examination of the sensitivity of the differential functioning of items and tests framework for tests of measurement invariance with Likert data.Applied Psychological Measurement,31,430455CrossRefGoogle Scholar
Meade, A. W., &Wright, N. A.(2012). Solving the measurement invariance anchor item problem in item response theory.Journal of Applied Psychology,97,10161031CrossRefGoogle ScholarPubMed
Mellenbergh, G. J.(1989). Item bias and item response theory.International Journal of Educational Research,13,127143CrossRefGoogle Scholar
Mellenbergh, G. J.(1994). Generalized linear item response theory.Psychological Bulletin,115,302307CrossRefGoogle Scholar
Meredith, W.(1964). Notes on factorial invariance.Psychometrika,29,177185CrossRefGoogle Scholar
Meredith, W.(1993). Measurement invariance, factor analysis and factorial invariance.Psychometrika,58,525543CrossRefGoogle Scholar
Meredith, W., &Teresi, J. A.(2006). An essay on measurement and factorial invariance.Medical Care,44 Suppl 3S69S77CrossRefGoogle ScholarPubMed
Millsap, R. E., &Everson, H. T.(1993). Methodology review: Statistical approaches for assessing measurement bias.Applied Psychological Measurement,17,297334CrossRefGoogle Scholar
Mislevy, R. J.(1986). Bayes modal estimation in item response models.Psychometrika,51,177195CrossRefGoogle Scholar
Montoya, A. K., &Jeon, M.(2020). MIMIC models for uniform and nonuniform DIF as moderated mediation models.Applied Psychological Measurement,44(2),118136CrossRefGoogle ScholarPubMed
Mukherjee, S.,Gibbons, L. E.,Kristjansson, E., &Crane, P. K.(2013). Extension of an iterative hybrid ordinal logistic regression/item response theory approach to detect and account for differential item functioning in longitudinal data.Psychological Test and Assessment Modeling,55(2),127147Google ScholarPubMed
Muthén, B. O.(1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators.Psychometrika,49,115132CrossRefGoogle Scholar
Muthén, B.(1989). Latent variable modeling in heterogeneous populations. Meetings of Psychometric Society (1989, Los Angeles, California and Leuven, Belgium).Psychometrika,54(4),557585CrossRefGoogle Scholar
Muthén, B. O.(2002). Beyond SEM: General latent variable modeling.Behaviormetrika,29,81117CrossRefGoogle Scholar
Muthén, B., &Asparouhov, T.(2002). Latent variable analysis with categorical outcomes: Multiple-group and growth modeling in Mplus (p 16),Los Angeles:University of California.Google Scholar
Muthén, L. K.&Muthén, B. O.(1998–2019). M-PLUS Users Guide. Sixth Edition. Los Angeles, California: Authors Muthén and Muthén.Google Scholar
Muthén, B.,du Toit, S.H.C. & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished Technical Report. Available at https://www.statmodel.com/wlscv.shtml.Google Scholar
Narayanan, P., &Swaminathan, H.(1996). Identification of items that show nonuniform DIF.Applied Psychological Measurement,20,257274CrossRefGoogle Scholar
Oort, E. J.(1998). Simulation study of item bias detection with restricted factor analysis.Structural Equation Modeling,5,107124CrossRefGoogle Scholar
Orlando-Edelen, M.,Stuckey, B. D., &Chandra, A.(2015). Quantifying ‘problematic’ DIF within an IRT framework: Application to a cancer stigma index.Quality of Life Research,24,95103CrossRefGoogle Scholar
Orlando-Edelen, M.,Thissen, D.,Teresi, J. A.,Kleinman, M., &Ocepek-Welikson, K.(2006). Identification of differential item functioning using item response theory ad the likelihood-based model comparison approach: Applications to the Mini-Mental State Examination.Medical Care,44,S134S142CrossRefGoogle Scholar
Oshima, T. C.Kushubar, S.Scott, J. C.&Raju, N. S.Langer, M. M. (2008). A re-examination of Lord’s Wald test for differential item functioning using item response theory and modern error estimation (Doctoral dissertation, University of North Carolina at Chapel Hill library). http://search.lib.unc.edu/search?R=UNCb5878458.Google Scholar
Lee, S.,Bulut, O., &Suh, Y.(2017). Multidimensional extension of multiple indicators multiple causes models to detect DIF.Educational and Psychological Measurement,77(4),545569CrossRefGoogle ScholarPubMed
Li, Y.,Brooks, G. P., &Johanson, G. A.(2012). Item discrimination and Type I error in the detection of differential item functioning.Educational and Psychological Measurement,72,847861CrossRefGoogle Scholar
Liu, Y.,Magnus, B. E., &Thissen, D.(2016). Modeling and testing differential item functioning in unidimensional binary item response models with a single continuous covariate: A functional data analysis approach.Psychometrika,81,371398CrossRefGoogle ScholarPubMed
Lopez Rivas, G. E.,Stark, S., &Chernyshenko, O. S.(2009). The effects of referent item parameters on differential item functioning detection using the free baseline likelihood ratio test.Applied Psychological Measurement,33,251265CrossRefGoogle Scholar
Lord, F. M.(1980). Applications of item response theory to practical testing problems,Hillsdale, NJ:Lawrence Erlbaum.Google Scholar
Lord, F. M., Novick, M. R., & (with contributions by A. Birnbaum). (1968). Statistical theories of mental test scores. Reading Massachusetts: Addison-Wesley Publishing Company Inc.Google Scholar
Mazor, K. M.,Hambleton, R. K., &Clauser, B. E.(1998). Multidimensional DIF analyses: The effects of matching on unidimensional subtest scores.Applied Psychological Measurement,22,357367CrossRefGoogle Scholar
McDonald, R. P.(2000). A basis for multidimensional item response theory.Applied Psychological Measurement,24,99114CrossRefGoogle Scholar
Meade, A. W., &Lautenschlager, G. J.(2004). A comparison of IRT and CFA methodologies for establishing measurement equivalence.Organizational Research Methods,7,361388CrossRefGoogle Scholar
Meade, A.,Lautenschlager, G., &Johnson, E.(2007). A Monte Carlo examination of the sensitivity of the differential functioning of items and tests framework for tests of measurement invariance with Likert data.Applied Psychological Measurement,31,430455CrossRefGoogle Scholar
Meade, A. W., &Wright, N. A.(2012). Solving the measurement invariance anchor item problem in item response theory.Journal of Applied Psychology,97,10161031CrossRefGoogle ScholarPubMed
Mellenbergh, G. J.(1989). Item bias and item response theory.International Journal of Educational Research,13,127143CrossRefGoogle Scholar
Mellenbergh, G. J.(1994). Generalized linear item response theory.Psychological Bulletin,115,302307CrossRefGoogle Scholar
Meredith, W.(1964). Notes on factorial invariance.Psychometrika,29,177185CrossRefGoogle Scholar
Meredith, W.(1993). Measurement invariance, factor analysis and factorial invariance.Psychometrika,58,525543CrossRefGoogle Scholar
Meredith, W., &Teresi, J. A.(2006). An essay on measurement and factorial invariance.Medical Care,44 Suppl 3S69S77CrossRefGoogle ScholarPubMed
Millsap, R. E., &Everson, H. T.(1993). Methodology review: Statistical approaches for assessing measurement bias.Applied Psychological Measurement,17,297334CrossRefGoogle Scholar
Mislevy, R. J.(1986). Bayes modal estimation in item response models.Psychometrika,51,177195CrossRefGoogle Scholar
Montoya, A. K., &Jeon, M.(2020). MIMIC models for uniform and nonuniform DIF as moderated mediation models.Applied Psychological Measurement,44(2),118136CrossRefGoogle ScholarPubMed
Mukherjee, S.,Gibbons, L. E.,Kristjansson, E., &Crane, P. K.(2013). Extension of an iterative hybrid ordinal logistic regression/item response theory approach to detect and account for differential item functioning in longitudinal data.Psychological Test and Assessment Modeling,55(2),127147Google ScholarPubMed
Muthén, B. O.(1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators.Psychometrika,49,115132CrossRefGoogle Scholar
Muthén, B.(1989). Latent variable modeling in heterogeneous populations. Meetings of Psychometric Society (1989, Los Angeles, California and Leuven, Belgium).Psychometrika,54(4),557585CrossRefGoogle Scholar
Muthén, B. O.(2002). Beyond SEM: General latent variable modeling.Behaviormetrika,29,81117CrossRefGoogle Scholar
Muthén, B., &Asparouhov, T.(2002). Latent variable analysis with categorical outcomes: Multiple-group and growth modeling in Mplus (p 16),Los Angeles:University of California.Google Scholar
Muthén, L. K.& Muthén, B. O.(1998–2019). M-PLUS Users Guide. Sixth Edition. Los Angeles, California: Authors Muthén and Muthén.Google Scholar
Muthén, B. du Toit, S.H.C.& Spisic, D.(1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished Technical Report. Available at https://www.statmodel.com/wlscv.shtml.Google Scholar
Narayanan, P., &Swaminathan, H.(1996). Identification of items that show nonuniform DIF.Applied Psychological Measurement,20,257274CrossRefGoogle Scholar
Oort, E. J.(1998). Simulation study of item bias detection with restricted factor analysis.Structural Equation Modeling,5,107124CrossRefGoogle Scholar
Orlando-Edelen, M.,Stuckey, B. D., &Chandra, A.(2015). Quantifying ‘problematic’ DIF within an IRT framework: Application to a cancer stigma index.Quality of Life Research,24,95103CrossRefGoogle Scholar
Orlando-Edelen, M.,Thissen, D.,Teresi, J. A.,Kleinman, M., &Ocepek-Welikson, K.(2006). Identification of differential item functioning using item response theory ad the likelihood-based model comparison approach: Applications to the Mini-Mental State Examination.Medical Care,44,S134S142CrossRefGoogle Scholar
Oshima, T. C.Kushubar, S.Scott, J. C.&Raju, N. S.(2009). DFIT8 for Window User’s Manual: Differential functioning of items and tests. St. Paul MN: Assessment Systems Corporation.Google Scholar
Oshima, T. C.,Raju, N. S., &Nanda, A. O.(2006). A new method for assessing the statistical significance of the differential functioning of items and tests (DFIT) framework.Journal of Educational Measurement,43,117CrossRefGoogle Scholar
Paz, S. H.,Spritzer, K. L.,Morales, L., &Hays, R. D.(2013). Evaluation of the Patient-Reported outcomes Information System (PROMIS) Spanish-language physical functioning items.Quality of Life Research,22,18191830CrossRefGoogle ScholarPubMed
Pilkonis, P. A.,Choi, S. W.,Reise, S. P.,Stover, A. M.,Riley, W. T., &Cella, D.(2011). Item banks for measuring emotional distress from the patient-reported outcomes measurement information system (PROMIS): Depression, Anxiety and Anger.Assessment,18,263283CrossRefGoogle ScholarPubMed
Raju, N. S.(1988). The area between two item characteristic curves.Psychometrika,53,495502CrossRefGoogle Scholar
Raju, N. S.(1990). Determining the significance of estimated signed and unsigned areas between two item response functions.Applied Psychological Measurement,14,197207CrossRefGoogle Scholar
Raju, N. S.(1999). DFITP5: A Fortran program for calculating dichotomous DIF/DTF [Computer program],Chicago:Illinois Institute of Technology.Google Scholar
Raju, N. S.,Fortmann-Johnson, K. A.,Kim, W.,Morris, S. B.,Nering, M. L., &Oshima, T. C.(2009). The item parameter replication method for detecting differential functioning in the polytomous DFIT framework.Applied Psychological Measurement,33,133147CrossRefGoogle Scholar
Raju, N. S.,Laffitte, L. J., &Byrne, B. M.(2002). Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory.Journal of Applied Psychology,87,517528CrossRefGoogle ScholarPubMed
Raju, N. S.van der Linden, W. J.&Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19, 353368.CrossRefGoogle Scholar
Rasch, G.(1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: DenmarksPaedagogiskeInstitut (Danish Institute of Educational Research).Google Scholar
Raykov, T.,Marcoulides, G. A.,Menold, N., &Harrison, M.(2019). Revisiting the bi-factor model: Can mixture modeling help assess its applicability?.Structural Equation Modeling,26,110118CrossRefGoogle Scholar
Reckase, M. D., &McKinley, R. L.(1991). The discriminating power of items that measure more than one dimension.Applied Psychological Measurement,15,361373CrossRefGoogle Scholar
Reeve, B. B.,Hays, R. D.,Bjorner, J. B.,Cook, K. F.,Crane, P. K.,Teresi, J. A., &Cella, D.(2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the Patient-Reported Outcome Measurement Information System (PROMIS).Medical Care,45 5 Suppl 1S22S31CrossRefGoogle ScholarPubMed
Reeve, B. B.,Teresi, J. A.Overview to the two-part series: Measurement equivalence of the Patient Reported Outcomes Measurement Information System@\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$^{@}$$\end{document} (PROMIS)@\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$^{@}$$\end{document} short forms(2016). Psychological Test and Assessment Modeling,58(1),3135Google Scholar
Reise, S. P.(2012). The rediscovery of bifactor measurement models.Multivariate Behavioral Research,47,667696CrossRefGoogle ScholarPubMed
Reise, S. P.,Widaman, K. F., &Pugh, R. H.(1993). Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance.Psychological Bulletin,114,552566CrossRefGoogle ScholarPubMed
Rikis, D. RJ., &Oshima, T. C.(2017). Effect of purification procedures on DIF analysis in IRTPRO.Educational and Psychological Measurement,77,415428Google Scholar
Rizopoulus, D.(2006). Ltm: An R package for latent variable modeling and item response theory analyses.Journal of Statistical Software,17,125Google Scholar
Rizopoulus, D.(2009). Ltm: Latent Trait Models under IRT. http://cran.rproject.org/web/packages/ltm/index.html.Google Scholar
Rouquette, A.,Hardouin, J. B.,Vanhaesebrouck, A.,Véronique Sébille, V., &Coste, J.(2019). Differential item functioning (DIF) in composite health measurement scale: Recommendations for characterizing DIF with meaningful consequences within the Rasch model framework.PLoS ONE,14(4),e0215073CrossRefGoogle ScholarPubMed
Samejima, F.(1969). Estimation of latent ability using a response pattern of graded scores.Psychometrika Monograph Supplement,34,100114Google Scholar
Schalet, B. D.,Pilkonis, P. A.,Yu, L.,Dodds, N.,Johnston, K. L.,Yount, S.,Riley, W., &Cella, D.(2016). Clinical validity of PROMIS depression, anxiety and anger across diverse clinical groups.Journal of Clinical Epidemiology,73,119127CrossRefGoogle Scholar
Setodji, C. M.,Reise, S. P.,Morales, L. S.,Fongwam, N., &Hays, R. D.(2011). Differential item functioning by survey language among older Hispanics enrolled in Medicare Managed Care a new method for anchor item selection.Medical Care,49,461468CrossRefGoogle ScholarPubMed
Seybert, J., & Stark, S. (2012). Iterative linking with the differential functioning of items and tests (DFIT) Method: Comparison of testwide and item parameter replication (IPR) critical values. Applied Psychological Measurement, 36, 494515.CrossRefGoogle Scholar
Shealy, R. T., &Stout, W. F.(1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF.Psychometrika,58,159194CrossRefGoogle Scholar
Shih, C-L,Liu, T-H, &Wang, W-C(2014). Controlling type 1 error rates in assessing DIF for logistic regression method with SIBTEST regression correction procedure and DIF-free-then-DIF strategy.Educational and Psychological Measurement,74,10181048CrossRefGoogle Scholar
Shih, C-L, &Wang, W-C(2009). Differential item functioning detection using multiple indicators, multiple causes method with a pure short anchor.Applied Psychological Measurement,33,184199CrossRefGoogle Scholar
Stark, S.,Chernyshenko, O. S., &Drasgow, F.(2004). Examining the effects of differential item (functioning and differential) test functioning on selection decisions: When are statistically significant effects practically important?.Journal of Applied Psychology,89,497508CrossRefGoogle ScholarPubMed
Stark, S.,Chernyshenko, O. S., &Drasgow, F.(2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy.Journal of Applied Psychology,91,12921306CrossRefGoogle Scholar
Steinberg, L., &Thissen, D.(2006). Using effect sizes for research reporting: Examples using item response theory to analyze differential item functioning.Psychological Methods,11,402415CrossRefGoogle ScholarPubMed
Stocking, M. L., &Lord, F. M.(1983). Developing a common metric in item response theory.Applied Psychological Measurement,7(2),201210CrossRefGoogle Scholar
Stout, W. F.(1987). A nonparametric approach for assessing latent trait dimensionality.Psychometrika,52,589617CrossRefGoogle Scholar
Stout, W. F.(1990). A new item response theory modeling approach with applications to unidimensional assessment and ability estimation.Psychometrika,55,293326CrossRefGoogle Scholar
Stout, W.,Li, H.,Nandakumar, R., &Bolt, D.(1997). MULTISIB—A procedure to investigate DIF when a test is intentionally multidimensional.Applied Psychological Measurement,21,195213CrossRefGoogle Scholar
Strobl, C.,Kopf, J., &Zeileis, A.(2015). Rasch trees: A new method for detecting differential item functioning in the Rasch model.Psychometrika,80,289316CrossRefGoogle Scholar
Suh, Y., &Cho, S-J(2014). Chi-square difference tests for detecting differential functioning in a multidimensional IRT model: A Monte Carlo study.Applied Psychological Measurement,38(5),359375CrossRefGoogle Scholar
Swaminathan, H., &Rogers, H. J.(1990). Detecting differential item functioning using logistic regression procedures.Journal of Educational Measurement,27,361370CrossRefGoogle Scholar
Takane, Y., &de Leeuw, J.(1987). On the relationship between item response theory and factor analysis of discretized variables.Psychometrika,52,393408CrossRefGoogle Scholar
Taple, B. J.,Griffith, J. W., &Wolf, M. S.(2019). Interview administration of PROMIS depression and anxiety short forms.Health Literacy Research Practice,6,e196e204Google Scholar
Teresi, J. A.(2006). Different approaches to differential item functioning in health applications: Advantages, disadvantages and some neglected topics.Medical Care,44 Suppl. 11S152S170CrossRefGoogle ScholarPubMed
Teresi, J. A. (2019). Applying and Acting on DIF. Moderator at the 2019 PROMIS Psychometric Summit, Northwestern University, Chicago, IL.Google Scholar
Teresi, J. A.& Jones, R. N.(2013). Bias in psychological assessment and other measures. In K. F. Geisinger (Ed.), APA Handbook of Testing and Assessment in Psychology: Vol 1. Test Theory and Testing and Assessment in Industrial and Organizational Psychology (pp. 139–164). American Psychological Association: Washington, DC.Google Scholar
Teresi, J. A., &Jones, R. N.(2016). Methodological issues in examining measurement equivalence in patient reported outcomes measures: Methods overview to the two-part series, “Measurement Equivalence of the Patient Reported Outcomes Measurement Information System (PROMIS) Short Form Measures”.Psychological Test and Assessment Modeling,58(1),3778Google Scholar
Teresi, J. A.,Kleinman, M., &Ocepek-Welikson, K.(2000). Modern psychometric methods for detection of differential item functioning: Application to cognitive assessment measures.Statistics in Medicine,19,165116833.0.CO;2-H>CrossRefGoogle ScholarPubMed
Teresi, J. A.,Ocepek-Welikson, K.,Kleinman, M.,Cook, K. F.,Crane, P. K.,Gibbons, L. E., &Cella, D.(2007). Evaluating measurement equivalence using the item response theory log-likelihood ratio (IRTLR) method to assess differential item functioning (DIF): Applications (with illustrations) to measure of physical functioning ability and general distress.Quality Life Research,16,4368CrossRefGoogle ScholarPubMed
Teresi, J.,Ocepek-Welikson, K.,Kleinman, M.,Eimicke, J. E.,Crane, P. K.,Jones, R. N.,Lai, J. S.,Choi, S. W.,Hays, R. D.,Reeve, B. B.,Reise, S. P.,Pilkonis, P. A., &Cella, D.(2009). Analysis of differential item functioning in the depression item bank from the Patient Reported Outcome Measurement Information System (PROMIS): An item response theory approach.Psychology Science Quarterly,51(2),148180 PMCID: PMC2844669. NIHMSID: 136951Google ScholarPubMed
Teresi, J. A.,Ocepek-Welikson, K.,Kleinman, M.,Ramirez, M., &Kim, G.(2016). Psychometric properties and performance of the Patient Reported Outcomes Measurement Information System®(PROMIS®) depression short forms in ethnically diverse groups.Psychological Test and Assessment Modeling,58(1),141181Google ScholarPubMed
Teresi, J. A.,Ocepek-Welikson, K.,Kleinman, M.,Ramirez, M., &Kim, G.(2016). Measurement equivalence of the Patient Reported Outcomes Measurement Information System®(PROMIS®) anxiety short forms in ethnically diverse groups.Psychological Test and Assessment Modeling,58(1),183219Google ScholarPubMed
Teresi, J. A.,Ramirez, M.,Jones, R. N.,Choi, S., &Crane, P. K.(2012). Modifying measures based on Differential Item Functioning (DIF) impact analyses.Journal of Aging & Health,24(6),10441076CrossRefGoogle ScholarPubMed
Teresi, J. A., &Reeve, B. B.(2016). Epilogue to the two-part series: Measurement equivalence of the Patient Reported Outcomes Measurement Information System (PROMIS) short forms.Psychological Tests and Assessment Modeling,58(2),423433Google Scholar
Thissen, D. (2001). IRTLRDIF v.2.0b: Software for the Computation of the Statistics Involved in Item Response Theory Likelihood Ratio Tests for Differential Item Functioning. Unpublished manual from the L.L. Thurstone Psychometric Laboratory: University of North Carolina at Chapel Hill.Google Scholar
Thissen, D.(1991). MULTILOG TM\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$^{{\rm TM}}$$\end{document}user’s guide multiple, categorical item analysis and test scoring using item response theory. Chicago: Scientific Software Inc.Google Scholar
Thissen, D.,Steinberg, L., &Kuang, D.(2002). Quick and easy implementation of the Benjamini–Hochberg procedure for controlling the false discovery rate in multiple comparisons.Journal of Educational and Behavioral Statistics,27,7783CrossRefGoogle Scholar
Thissen, D.,Steinberg, L.,Wainer, H.,Wainer, H., &Braun, H.(1988). Use of item response theory in the study of group differences in trace lines.Test validity,Hillsdale, New Jersey:Lawrence Erlbaum, Associates.147169Google Scholar
Thissen, D.,Steinberg, L.,Wainer, H.,Holland, P. W., &Wainer, H.(1993). Detection of differential item functioning using the parameters of item response models.Differential item functioning,Hillsdale, NJ:Lawrence Erlbaum Inc.67113Google Scholar
Vandenberg, R. J., &Lance, C. E.(2000). A review and synthesis of the measurement invariance literature: Suggestions, practices and recommendations for organizational research.Organizational Research Methods,3(1),470CrossRefGoogle Scholar
Wainer, H.,Holland, P. W., &Wainer, H.(1993). Model-based standardization measurement of an item’s differential impact.Differential Item Functioning,Hillsdale NJ:Lawrence Erlbaum Inc.123135Google Scholar
Wang, T.,Strobl, C.,Zeileis, A., &Merkle, E. C.(2018). Score-based test of differential item functioning via pairwise maximum likelihood estimation.Psychometrika,83,132135CrossRefGoogle ScholarPubMed
Wang, W.(2004). Effects of anchor item methods on detection of differential item functioning within the family of Rasch models.Journal of Experimental Education,72,221261CrossRefGoogle Scholar
Wang, W.-C., &Shih, C.-L.(2010). MIMIC methods for assessing differential item functioning in polytomous items.Applied Psychological Measurement,34,166180CrossRefGoogle Scholar
Wang, W.-C.,Shih, C.-L., &Sun, G.-W.(2012). The DIF-free-then DIF strategy for the assessment of differential item functioning (DIF).Educational and Psychological Measurement,72,687708CrossRefGoogle Scholar
Wang, W.-C.,Shih, C.-L., &Yang, C.-C.(2009). The MIMIC method with scale purification for detecting differential item functioning.Educational and Psychological Measurement,69,713731CrossRefGoogle Scholar
Wang, W. C., &Yeh, Y. L.(2003). Effects of anchor item methods on differential item functioning detection with likelihood ratio test.Applied Psychological Measurement,27,479498CrossRefGoogle Scholar
Wang, M., &Woods, C. M.(2017). Anchor selection using the Wald test anchor-all-test-all procedure.Applied Psychological Measurement,41,1729CrossRefGoogle ScholarPubMed
Woods, C. M.(2009). Empirical selection of anchors for tests of differential item functioning.Applied Psychological Measurement,33,4257CrossRefGoogle Scholar
Woods, C. M.(2009). Evaluation of MIMIC-model methods for DIF testing with comparison of two group analysis.Multivariate Behavioral Research,44,127CrossRefGoogle ScholarPubMed
Woods, C. M.(2011). DIF testing for ordinal items with Poly-SIBTEST, the Mantel and GMH tests and IRTLRDIF when the latent distribution is nonnormal for both groups.Applied Psychological Measurement,35,145164CrossRefGoogle Scholar
Woods, C. M.,Cai, L., &Wang, M.(2013). The Langer-improved Wald test for DIF testing with multiple groups: Evaluation and comparison to two-group IRT.Educational and Psychological Measurement,73,532547CrossRefGoogle Scholar
Woods, C. M., &Grimm, K. J.(2011). Testing for nonuniform differential item functioning with multiple indicator multiple cause models.Applied Psychological Measurement,35,339361CrossRefGoogle Scholar
Woods, C. M., &Harpole, J.(2015). How item residual heterogeneity affects tests for differential item functioning.Applied Psychological Measurement,39,251263CrossRefGoogle ScholarPubMed
Yost, K. J.,Eton, D. T.,Garcia, S. F., &Cella, D.(2011). Minimally important differences were estimated for six PROMIS cancer scales in advanced-stage cancer patients.Journal of Clinical Epidemiology,64(5),507516CrossRefGoogle ScholarPubMed
Yu, Q.,Medeiros, K. L.,Wu, X., &Jensen, R. E.(2018). Nonlinear predictive models for multiple mediation analysis with an application to explore ethnic disparities in anxiety and depression among cancer survivors.Psychometrika,83,9911006CrossRefGoogle ScholarPubMed
Zumbo, B. D.(1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense. Retrieved from http://www.educ.ubc.ca/faculty/zumbo/DIF/index.html.Google Scholar
Zwitser, R. J.,Glaser, S. F., &Maris, G.(2017). Monitoring countries in a changing world: A new look at DIF in international surveys.Psychometrika,82(1),210232CrossRefGoogle Scholar