Hostname: page-component-745bb68f8f-mzp66 Total loading time: 0 Render date: 2025-01-08T16:57:35.515Z Has data issue: false hasContentIssue false

Invariance in Measurement and Prediction Revisited

Published online by Cambridge University Press:  01 January 2025

Roger E. Millsap*
Affiliation:
Arizona State University
*
Requests for reprints should be sent to Roger E. Millsap, Department of Psychology, Box 871104, Arizona State University, Tempe, AZ 85287-1104, USA. E-mail: millsap@asu.edu

Abstract

Borsboom (Psychometrika, 71:425–440, 2006) noted that recent work on measurement invariance (MI) and predictive invariance (PI) has had little impact on the practice of measurement in psychology. To understand this contention, the definitions of MI and PI are reviewed, followed by results on the consistency between the two forms of invariance in the general case. The special parametric cases of factor analysis (strict factorial invariance) and linear regression analyses (strong regression invariance) are then described, along with findings on the inconsistency between the two forms of invariance in this context. Two numerical examples of inconsistency are reviewed in detail. The impact of violations of MI on accuracy of selection is illustrated. Finally, reasons for the slow dissemination of work on invariance are discussed, and the prospects for altering this situation are weighed.

Type
Presidential Address
Copyright
Copyright © 2007 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This paper is based on the Presidential Address given at the International Meeting of the Psychometric Society in Tokyo, Japan, on July 11, 2007. This research was supported by National Institute of Mental Health grants 1P30 MH 068685-01A1 and RO1 MH64707-01.

References

Ackerman, T.A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 6791.CrossRefGoogle Scholar
Ahmavaara, Y. (1954). The mathematical theory of factorial invariance under selection. Psychometrika, 19, 2738.CrossRefGoogle Scholar
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education Joint Committee on Standards for Educational and Psychological Testing (1999). Standards for educational and psychological testing, Washington: AERA.Google Scholar
Birnbaum, M.H. (1979). Procedures for the detection and correction of salary inequities. In Pezzullo, T.R., Brittingham, B.E. (Eds.), Salary equity (pp. 121144). Lexington: Lexington Books.Google Scholar
Bloxom, B. (1972). Alternative approaches to factorial invariance. Psychometrika, 37, 425440.CrossRefGoogle Scholar
Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71, 425440.CrossRefGoogle ScholarPubMed
Bridgeman, M.H., Lewis, C. (1996). Gender differences in college mathematics grades and SAT-M scores: A reanalysis of Wainer and Steinberg. Journal of Educational Measurement, 33, 257270.CrossRefGoogle Scholar
Brown, C.H., Liao, J. (1999). Principles for designing randomized preventive trials in mental health: An emerging developmental epidemiology paradigm. American Journal of Community Psychology, 27, 673710.CrossRefGoogle ScholarPubMed
Byrne, B.M. (1994). Testing for factorial validity, replication, and invariance of a measuring instrument: A paradigmatic application based on the Maslach Burnout Inventory. Multivariate Behavioral Research, 29, 289311.CrossRefGoogle ScholarPubMed
Clark, L.E. (2006). When a psychometric advance falls in the forest. Psychometrika, 71, 447450.CrossRefGoogle Scholar
Cleary, T.A. (1968). Test bias: Prediction of grades of Negro and white students in integrated colleges. Journal of Educational Measurement, 5, 115124.CrossRefGoogle Scholar
Drasgow, F., Probst, T.A. (2004). The psychometrics of adaptation: Evaluating measurement equivalence across languages and cultures. In Hambleton, R.K., Merenda, P.F., Spielberger, C.D. (Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 265296). Hillsdale: Lawrence Erlbaum.Google Scholar
Gottfredson, L.S. (1994). The science and politics of race-norming. American Psychologist, 49, 955963.CrossRefGoogle ScholarPubMed
Hambleton, R.K., Merenda, P.F., Spielberger, C.D. (2006). Adapting educational and psychological tests for cross-cultural assessment, Hillsdale: Lawrence Erlbaum.Google Scholar
Hartigan, J.A., Wigdor, A.K. (1989). Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery, Washington: National Academy Press.Google Scholar
Hofer, S.M., Horn, J.L., Eber, H.W. (1997). A robust five-factor structure of the 16PF: Strong evidence from independent rotation and confirmatory factorial invariance procedures. Personality and Individual Differences, 23, 247269.CrossRefGoogle Scholar
Horn, J.L., McArdle, J.J. (1992). A practical guide to measurement invariance in research on aging. Experimental Aging Research, 18, 117144.CrossRefGoogle Scholar
Humphreys, L.G. (1952). Individual differences. Annual Review of Psychology, 3, 131150.CrossRefGoogle ScholarPubMed
Humphreys, L.G. (1986). An analysis and evaluation of test and item bias in the prediction context. Psychological Bulletin, 71, 327333.Google Scholar
Hunter, J.E., Schmidt, F.L. (2000). Racial and gender bias in ability and achievement tests: Resolving the apparent paradox. Psychology, Public Policy, and Law, 6, 151158.CrossRefGoogle Scholar
Jensen, A.R. (1980). Bias in mental testing, New York: Free Press.Google Scholar
Kok, F. (1988). Item bias and test multidimensionality. In Langeheine, R., Rost, J. (Eds.), Latent trait and latent models (pp. 263275). New York: Plenum.CrossRefGoogle Scholar
Krakowski, M., Czobor, P. (2004). Gender differences in violent behaviors: Relationship to clinical symptoms and psychosocial factors. American Journal of Psychiatry, 161, 459465.CrossRefGoogle ScholarPubMed
Lehmann, E.L. (1986). Testing statistical hypotheses, New York: Wiley.CrossRefGoogle Scholar
Linn, R.L. (1984). Selection bias: Multiple meanings. Journal of Educational Measurement, 21, 3347.CrossRefGoogle Scholar
Linn, R.L., Werts, C.E. (1971). Considerations for studies of test bias. Journal of Educational Measurement, 8, 14.CrossRefGoogle Scholar
Lord, F.M. (1980). Applications of item response theory to practical testing problems, Hillsdale: Lawrence Erlbaum.Google Scholar
Mellenbergh, G.J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127143.CrossRefGoogle Scholar
Meredith, W. (1964). Notes on factorial invariance. Psychometrika, 29, 177185.CrossRefGoogle Scholar
Meredith, W. (1964). Rotation to achieve factorial invariance. Psychometrika, 29, 187206.CrossRefGoogle Scholar
Meredith, W. (1993). Measurement invariance, factor analysis, and factorial invariance. Psychometrika, 58, 525543.CrossRefGoogle Scholar
Meredith, W., Millsap, R.E. (1992). On the misuse of manifest variables in the detection of measurement bias. Psychometrika, 57, 289311.CrossRefGoogle Scholar
Millsap, R.E. (1995). Measurement invariance, predictive invariance, and the duality paradox. Multivariate Behavioral Research, 30, 577605.CrossRefGoogle ScholarPubMed
Millsap, R.E. (1997). Invariance in measurement and prediction: Their relationship in the single-factor case. Psychological Methods, 2, 248260.CrossRefGoogle Scholar
Millsap, R.E. (1998). Group differences in regression intercepts: Implications for factorial invariance. Multivariate Behavioral Research, 33, 403424.CrossRefGoogle ScholarPubMed
Millsap, R.E., Hartog, S.B. (1988). Alpha, beta, and gamma change in evaluation research: A structural equation approach. Journal of Applied Psychology, 73, 574584.CrossRefGoogle Scholar
Millsap, R.E., Kwok, O.M. (2004). Evaluating the impact of partial factorial invariance on selection in two populations. Psychological Methods, 9, 93115.CrossRefGoogle ScholarPubMed
Millsap, R.E., Meredith, W. (1992). Inferential conditions in the statistical detection of measurement bias. Applied Psychological Measurement, 16, 389402.CrossRefGoogle Scholar
Neisser, U., Boodoo, G., Bourchard, T.J., Boykin, A.W., Brody, N., Ceci, S.J., Halpern, D.F., Loehlin, J.C., Perloff, R., Sternberg, R.J., Urbina, S. (1996). Intelligence: Knowns and unknowns. American Psychologist, 51, 77101.CrossRefGoogle Scholar
Pentz, M.A., Chou, C. (1994). Measurement invariance in longitudinal clinical research assuming change from development and intervention. Journal of Consulting and Clinical Psychology, 62, 450462.CrossRefGoogle ScholarPubMed
Potthoff, R.F. (1966). Statistical aspects of the problem of biases in psychological tests (Institute of Statistics Mimeo Series No. 479). Chapel Hill, NC: Department of Statistics, University of North Carolina.Google Scholar
Riordan, C.R., Richardson, H.A., Schaffer, B.S., Vandenberg, R.J. (2001). Alpha, beta, and gamma change: A review of past research with recommendations for new directions. In Neider, L.L., Schriesheim, C. (Eds.), Equivalence in measurement (pp. 5198). Greenwich: Information Age Publishing.Google Scholar
Sackett, P.R., Wilk, S.L. (1994). Within-group norming and other forms of score adjustment in preemployment testing. American Psychologist, 49, 929954.CrossRefGoogle ScholarPubMed
Sackett, P.R., Schmitt, N., Ellington, J.E., Kabin, M.B. (2001). High-stakes testing in employment, credentialing, and higher education: Prospects in a post-affirmative action world. American Psychologist, 56, 302318.CrossRefGoogle Scholar
Schmidt, F.L., Hunter, J.E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of over 85 years of research findings. Psychological Bulletin, 124, 262274.CrossRefGoogle Scholar
Schmidt, F.L., Pearlman, K., Hunter, J.E. (1980). The validity and fairness of employment and educational tests for Hispanic Americans: A review and analysis. Personnel Psychology, 33, 705724.CrossRefGoogle Scholar
Shealy, R., Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159194.CrossRefGoogle Scholar
Society for Industrial/Organizational Psychology (2003). Principles for the application and use of personnel selection procedures, Bowling Green: Society for Industrial Organizational Psychology.Google Scholar
Stout, W. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293325.CrossRefGoogle Scholar
Thissen, D., Steinberg, L., Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In Wainer, H., Braun, H.I. (Eds.), Test validity (pp. 147169). Hillsdale: Lawrence Erlbaum.Google Scholar
Thomson, G.H., Lederman, W. (1939). The influence of multivariate selection on the factorial analysis of ability. British Journal of Psychology, 29, 288305.Google Scholar
Thurstone, L.L. (1947). Multiple factor analysis, Chicago: University of Chicago Press.Google Scholar
Zwick, R. (1990). When do item response function and Mantel–Haenszel definitions of differential item functioning coincide?. Journal of Educational Statistics, 15, 185197.CrossRefGoogle Scholar