Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-01-07T19:15:36.777Z Has data issue: false hasContentIssue false

Commentary on Coefficient Alpha: A Cautionary Tale

Published online by Cambridge University Press:  01 January 2025

Samuel B. Green*
Affiliation:
Arizona State University
Yanyun Yang
Affiliation:
Florida State University
*
Requests for reprints should be sent to Samuel B. Green, Arizona State University, P.O. Box 870611, Tempe, AZ 85287-0611, USA. E-mail: samgreen@asu.edu

Abstract

The general use of coefficient alpha to assess reliability should be discouraged on a number of grounds. The assumptions underlying coefficient alpha are unlikely to hold in practice, and violation of these assumptions can result in nontrivial negative or positive bias. Structural equation modeling was discussed as an informative process both to assess the assumptions underlying coefficient alpha and to estimate reliability

Type
Theory and Methods
Copyright
Copyright © 2008 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Becker, G. (2000). How important is transient error in estimating reliability? Going beyond simulation studies. Psychological Methods, 5, 370379.CrossRefGoogle ScholarPubMed
Bentler, P.M., Woodward, J.A. (1980). Inequalities among lower bounds to reliability: With applications to test construction and factor analysis. Psychometrika, 45, 249267.CrossRefGoogle Scholar
Bollen, K.A. (1989). Structural equations with latent variables, New York: Wiley.CrossRefGoogle Scholar
Cattell, R.B., Tsujioka, B. (1964). The importance of factor-trueness and validity, versus homogeneity and orthogonality in test scales. Educational and Psychological Measurement, 24, 330.CrossRefGoogle Scholar
Chen, F.F., West, S.G., Sousa, K.H. (2006). A comparison of bifactor and second-order models of quality of life. Multivariate Behavioral Research, 41, 189224.CrossRefGoogle ScholarPubMed
Cortina, J.M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98104.CrossRefGoogle Scholar
Crocker, L., Algina, J. (1986). Introduction to classical and modern test theory, New York: Holt, Rinehart, and Winston.Google Scholar
Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297334.CrossRefGoogle Scholar
Feldt, L.S., Qualls, A.L. (1996). Bias in coefficient alpha arising from heterogeneity of test content. Applied Measurement in Education, 9, 277286.CrossRefGoogle Scholar
Fleishman, J., Benson, J. (1987). Using LISREL to evaluate measurement models and scale reliability. Educational and Psychological Measurement, 47, 925939.CrossRefGoogle Scholar
Gerbing, D.W., Anderson, J.C. (1988). An updated paradigm for scale development incorporating unidimensionality and its assessment. Journal of Marketing Research, 25, 186192.CrossRefGoogle Scholar
Gessaroli, M.E., Folske, J.C. (2002). Generalizing the reliability of tests comprised of testlets. International Journal of Testing, 2, 277295.CrossRefGoogle Scholar
Green, S.B. (2003). A coefficient alpha for test-retest data. Psychological Methods, 8, 88101.CrossRefGoogle ScholarPubMed
Green, S.B., Hershberger, S.L. (2000). Correlated errors in true score models and their effect on coefficient alpha. Structural Equation Modeling, 7, 251270.CrossRefGoogle Scholar
Green, S.B., & Yang, Y. (2009). Reliability of summed item scores using structural equation modeling: An alternative to coefficient alpha. Psychometrika, 94. doi: 10.1007/s11336-008-9099-3.CrossRefGoogle Scholar
Green, S.B., Lissitz, R.W., Mulaik, S.A. (1977). Limitations of coefficient alpha as an index of test unidimensionality. Educational and Psychological Measurement, 37, 827838.CrossRefGoogle Scholar
Green, S.B., Akey, T.M., Fleming, K.K., Hershberger, S.L., Marquis, J.G. (1997). Effect of the number of scale points on chi-square fit indices in confirmatory factor analysis. Structural Equation Modeling, 4, 108120.CrossRefGoogle Scholar
Guttman, L.A. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255282.CrossRefGoogle ScholarPubMed
Hattie, J. (1985). Methodology review: Assessing unidimensionality of test and items. Applied Psychological Measurement, 9, 139164.CrossRefGoogle Scholar
Horn, J.L. (1965). A rationale and a test for the number of factors in factor analysis. Psychometrika, 30, 179185.CrossRefGoogle Scholar
Humphreys, L.G. (1985). General intelligence: An integration of factor, test, and simplex theory. In Wolman, B.B. (Eds.), Handbook of intelligence: Theories, measurements, and applications (pp. 1535). New York: Wiley.Google Scholar
Jackson, P.H., Agunwamba, C.C. (1977). Lower bounds for the reliability of the total score on a test composed of non-homogeneous items: I. Algebraic lower bounds. Psychometrika, 42, 567578.CrossRefGoogle Scholar
Jöreskog, K.G. (1971). Statistical analysis of sets of congeneric test. Psychometrika, 36, 109133.CrossRefGoogle Scholar
Leary, L.F., Dorans, N.J. (1985). Implications for altering the context in which test items appear: A historical perspective on an immediate concern. Review of Educational Research, 55, 387411.CrossRefGoogle Scholar
Lee, G., Frisbie, D.A. (1999). Estimating reliability under a generalizability theory model for test scores composed of testlets. Applied Measurement in Education, 12, 237255.CrossRefGoogle Scholar
Lee, G., Dunbar, S.B., Frisbie, D.A. (2001). The relative appropriateness of eight measurement models for analyzing scores from tests composed of testlets. Educational and Psychological Measurement, 61, 958975.CrossRefGoogle Scholar
Lord, F.M., Novick, M.R. (1968). Statistical theories of mental test scores, Reading: Addison-Wesley.Google Scholar
Lucke, J.F. (2005). “Rassling the hog” The influence of correlated item error on internal consistency, classical reliability, and congeneric reliability. Applied Psychological Measurement, pp. 106–125.CrossRefGoogle Scholar
Maxwell, A.E. (1968). The effect of correlated errors on estimates of reliability coefficients. Educational and Psychological Measurement, 28, 803811.CrossRefGoogle Scholar
McDonald, R.P. (1981). The dimensionality of test and items. British Journal of Mathematical and Statistical Psychology, 34, 100117.CrossRefGoogle Scholar
McDonald, R.P. (1999). Test theory: A unified approach, Hillsdale: Erlbaum.Google Scholar
Miller, M.B. (1995). Coefficient alpha: A basic introduction from the perspectives of classical test theory and structural equation modeling. Structural Equation Modeling, 2, 255273.CrossRefGoogle Scholar
Novick, M.R., Lewis, C. (1967). Coefficient alpha and the reliability of composite measurements. Psychometrika, 32, 113.CrossRefGoogle ScholarPubMed
Ochieng, C.O. (2001). Effects of item order on consistency and precision under different ordering schemes in attitudinal scales: A case of physical self-concept scales (Paper No. ESQESS-2001-3). University of British Columbia. Edgeworth Laboratory for Quantitative Educational and Social Science, Vancouver, B.C.Google Scholar
Raykov, T. (1997). Estimation of composite reliability for congeneric measures. Applied Psychological Measurement, 21, 173184.CrossRefGoogle Scholar
Raykov, T. (1998). Coefficient alpha and composite reliability with interrelated nonhomogeneous items. Applied Psychological Measurement, 22, 375385.CrossRefGoogle Scholar
Raykov, T. (2001). Bias of coefficient α for fixed congeneric measures with correlated errors. Applied Psychological Measurement, 25, 6976.CrossRefGoogle Scholar
Raykov, T., Shrout, P. (2002). Reliability of scales with general structure: Point and interval estimation using a structural equation modeling approach. Structural Equation Modeling, 9, 195212.CrossRefGoogle Scholar
Reise, S.P., Waller, N.G., Comrey, A.L. (2000). Factor analysis and scale revision. Psychological Assessment, 12, 287297.CrossRefGoogle ScholarPubMed
Reise, S.P., Morizot, J., Hays, R.D. (2007). The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Quality of Life Research, 16, 1931.CrossRefGoogle ScholarPubMed
Rindskopf, D., Rose, T. (1988). Some theory and applications of confirmatory second-order factor analysis. Multivariate Behavioral Research, 23, 5167.CrossRefGoogle Scholar
Rozeboom, W.W. (1966). Foundations of the theory of prediction, Homewood: Dorsey.Google Scholar
Rozeboom, W.W. (1989). The reliability of a linear composite of nonequivalent subtests. Applied Psychological Measurement, 13, 277283.CrossRefGoogle Scholar
Roznowski, M., Tucker, L.R., Humphreys, L.G. (1991). Three approaches to determining the dimensionality of binary items. Applied Psychological Measurement, 15, 109127.CrossRefGoogle Scholar
Schmid, J., Leiman, J.M. (1957). The development of hierarchical factor solutions. Psychometrika, 22, 5361.CrossRefGoogle Scholar
Schurr, K.T., Henriksen, L.W. (1983). Effects of item sequencing and grouping in low-inference type questionnaires. Journal of Educational Measurement, 20, 379391.CrossRefGoogle Scholar
Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 94. doi: 10.1007/s11336-008-9101-0.CrossRefGoogle Scholar
Sireci, S.G., Thissen, D., Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237247.CrossRefGoogle Scholar
Sparfeldt, J.E., Schilling, S.R., Rost, D.H. (2006). Blocked versus randomized format of questionnaires: A confirmatory. Educational and Psychological Measurement, 66, 961974.CrossRefGoogle Scholar
Steinberg, L. (2001). The consequences of pairing questions: Context effects in personality measurement. Journal of Personality and Social Psychology, 81, 332342.CrossRefGoogle ScholarPubMed
Steinberg, L., Thissen, D. (1996). Uses of item response theory and the testlet concept in the measurement of psychopathology. Psychological Methods, 1, 8197.CrossRefGoogle Scholar
Ten Berge, J.M.F., Kiers, H.A.L. (1991). A numerical approach to the exact and the approximate minimum rank of a covariance matrix. Psychometrika, 56, 309315.CrossRefGoogle Scholar
Ten Berge, J.M.F., & Kiers, H.A.L. (2003). The minimum rank factor analysis program MRFA. Internal report, Department of Psychology, University of Groningen, The Netherlands.Google Scholar
Veres, J.G., Sims, R.R., Locklear, T.S. (1991). Improving the reliability of Kolb’s revised learning style inventory. Educational & Psychological Measurement, 51, 143150.CrossRefGoogle Scholar
Wainer, H., Kiely, G.L. (1987). Item clusters and computerized adaptive testing: A case of testlets. Journal of Educational Measurement, 24, 185201.CrossRefGoogle Scholar
Woodhouse, B., Jackson, E.H. (1977). Lower bounds for the reliability of a test composed of nonhomogeneous items II: A search procedure to locate the greatest lower bound. Psychometrika, 42, 579591.CrossRefGoogle Scholar
Yang, Y., & Green, S.B. (2007). Coefficient alpha and SEM estimates of reliability. Presented at annual meeting of the American Educational Research Association.Google Scholar
Yen, W.M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125145.CrossRefGoogle Scholar
Yen, W.M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187214.CrossRefGoogle Scholar
Yung, Y.F., Thissen, D., McLeod, L.D. (1999). On the relationship between the higher-order factor model and the hierarchical factor model. Psychometrika, 64, 113128.CrossRefGoogle Scholar
Zimmerman, D.W., Zumbo, R.D., Lalonde, C. (1993). Coefficient alpha as an estimate of test reliability under violation of two assumptions. Educational and Psychological Measurement, 53, 3349.CrossRefGoogle Scholar
Zinbarg, R.E., Revelle, W., Yovel, I., Li, W. (2005). Cronbach’s α, Revelle’s β, and McDonald’s ω H: Their relations with each other and two alternative conceptualizations of reliability. Psychometrika, 70, 123133.CrossRefGoogle Scholar
Zinbarg, R.E., Revelle, W., Yovel, I. (2007). Estimating ω h for structures containing two group factors: Perils and prospects. Applied Psychological Measurement, 15, 135157.CrossRefGoogle Scholar
Zumbo, B.D., Rupp, A.A. (2004). Responsible modeling of measurement data for appropriate inferences: Important advances in reliability and validity theory. In Kaplan, D. (Eds.), The SAGE handbook of quantitative methodology for the social sciences (pp. 7392). Thousand Oaks: Sage.Google Scholar
Zwick, W.R., Velicer, W.F. (1986). Comparison of five rules for determining the number of components to retain. Psychological Bulletin, 99, 432442.CrossRefGoogle Scholar