Hostname: page-component-745bb68f8f-f46jp Total loading time: 0 Render date: 2025-01-07T19:51:49.724Z Has data issue: false hasContentIssue false

A Unified Approach to Exploratory Factor Analysis with Missing Data, Nonnormal Data, and in the Presence of Outliers

Published online by Cambridge University Press:  01 January 2025

Ke-Hai Yuan*
Affiliation:
University of Notre Dame
Linda L. Marshall
Affiliation:
University of North Texas
Peter M. Bentler
Affiliation:
University of California, Los Angeles
*
Requests for reprints should be sent to Ke-Hai Yuan, Laboratory for Social Research, 919 Flanner Hall, University of Notre Dame, Notre Dame IN 46556. E-Mail: kyuan@nd.edu

Abstract

Factor analysis is regularly used for analyzing survey data. Missing data, data with outliers and consequently nonnormal data are very common for data obtained through questionnaires. Based on covariance matrix estimates for such nonstandard samples, a unified approach for factor analysis is developed. By generalizing the approach of maximum likelihood under constraints, statistical properties of the estimates for factor loadings and error variances are obtained. A rescaled Bartlett-corrected statistic is proposed for evaluating the number of factors. Equivariance and invariance of parameter estimates and their standard errors for canonical, varimax, and normalized varimax rotations are discussed. Numerical results illustrate the sensitivity of classical methods and advantages of the proposed procedures.

Type
Articles
Copyright
Copyright © 2002 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This project was supported by a University of North Texas Faculty Research Grant, Grant #R49/CCR610528 for Disease Control and Prevention from the National Center for Injury Prevention and Control, and Grant DA01070 from the National Institute on Drug Abuse. The results do not necessarily represent the official view of the funding agencies. The authors are grateful to three reviewers for suggestions that improved the presentation of this paper.

References

Aitchison, J., & Silvey, S.D. (1958). Maximum likelihood estimation of parameters subject to restraints. Annals of Mathematical Statistics, 29, 813828.CrossRefGoogle Scholar
Algina, J. (1980). A note on identification in the oblique and orthogonal factor analysis models. Psychometrika, 45, 393396.CrossRefGoogle Scholar
Allison, P.D. (1987). Estimation of linear models with incomplete data. Sociological Methodology, 17, 71103.CrossRefGoogle Scholar
Ammann, L.P. (1989). Robust principal components. Communications in Statistics: Simulation and Computation, 18, 857874.CrossRefGoogle Scholar
Anderson, T.W., & Rubin, H. (1956). Statistical inference in factor analysis. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability (pp. 111150). Berkeley and Los Angeles: University of California Press.Google Scholar
Arbuckle, J.L. (1996). Full information estimation in the presence of incomplete data. In Marcoulides, G.A., & Schumacker, R.E. (Eds.), Advanced structural equation modeling: Issues and techniques (pp. 243277). New Jersey, NJ: Lawrence Erlbaum Associates.Google Scholar
Archer, C.O., & Jennrich, R.I. (1973). Standard errors for orthogonally rotated factor loadings. Psychometrika, 38, 581592.CrossRefGoogle Scholar
Arminger, G., & Sobel, M.E. (1990). Pseudo-maximum likelihood estimation of mean and covariance structures with missing data. Journal of the American Statistical Association, 85, 195203.CrossRefGoogle Scholar
Bartlett, M.S. (1951). The effect of standardisation on an approximation in factor analysis. Biometrika, 38, 337344.Google Scholar
Bentler, P.M., & Yuan, K.-H. (1999). Structural equation modeling with small samples: Test statistics. Multivariate Behavioral Research, 34, 181197.CrossRefGoogle ScholarPubMed
Birch, J.B., & Myers, R.H. (1982). Robust analysis of covariance. Biometrics, 38, 699713.CrossRefGoogle ScholarPubMed
Bishop, Y.M.M., Fienberg, S.E., & Holland, P.W. (1975). Discrete multivariate analysis: Theory and practice. Cambridge: MIT Press.Google Scholar
Brown, C.H. (1983). Asymptotic comparison of missing data procedures for estimating factor loadings. Psychometrika, 48, 269291.CrossRefGoogle Scholar
Browne, M.W. (1982). Covariance structures. In Hawkins, D.M. (Eds.), Topics in applied multivariate analysis (pp. 72141). Cambridge, England: Cambridge University Press.CrossRefGoogle Scholar
Browne, M.W. (1984). Asymptotic distribution-free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 6283.CrossRefGoogle ScholarPubMed
Browne, M.W., Cudeck, R., Tateneni, K., & Mels, G. (1998). CEFA: Comprehensive exploratory factor analysis. Columbus, OH: Authors.Google Scholar
Browne, M.W., Du Toit, S.H.C. (1992). Automated fitting of nonstandard models. Multivariate Behavior Research, 27, 269300.CrossRefGoogle ScholarPubMed
Browne, M.W., & Shapiro, A. (1986). The asymptotic covariance matrix of sample correlation coefficients under general conditions. Linear Algebra and Its Applications, 82, 169176.CrossRefGoogle Scholar
Campbell, N.A. (1980). Robust procedures in multivariate analysis I: Robust covariance estimation. Applied Statistics, 29, 231237.CrossRefGoogle Scholar
Campbell, N.A. (1982). Robust procedures in multivariate analysis II: Robust canonical variate analysis. Applied Statistics, 31, 18.CrossRefGoogle Scholar
Castaño-Tostado, E., & Tanaka, Y. (1991). Sensitivity measures of influence on the loading matrix in exploratory factor analysis. Communications in Statistics: Theory and Methods, 20, 13291343.CrossRefGoogle Scholar
Chung, E.K.P., & Zak, S.H. (1996). An introduction to optimization. New York, NY: Wiley.Google Scholar
Cudeck, R. (1989). Analysis of correlation matrices using covariance structure models. Psychological Bulletin, 105, 317327.CrossRefGoogle Scholar
Cudeck, R., & O'Dell, L.L. (1994). Applications of standard error estimates in unrestricted factor analysis: Significance tests for factor loadings and correlations. Psychological Bulletin, 115, 475487.CrossRefGoogle ScholarPubMed
Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, 138.CrossRefGoogle Scholar
Devlin, S.J., Gnanadesikan, R., & Kettenring, J.R. (1981). Robust estimation of dispersion matrices and principal components. Journal of the American Statistical Association, 76, 354362.CrossRefGoogle Scholar
Fang, K.-T., Kotz, S., & Ng., K.W. (1990). Symmetric multivariate and related distributions. London, England: Chapman & Hall.CrossRefGoogle Scholar
Ferguson, T.S. (1996). A course in large sample theory. London, England: Chapman & Hall.CrossRefGoogle Scholar
Finkbeiner, C. (1979). Estimation for the multiple factor model when data are missing. Psychometrika, 44, 409420.CrossRefGoogle Scholar
Fouladi, R.T. (2000). Performance of modified test statistics in covariance and correlation structure analysis under conditions of multivariate nonnormality. Structural Equation Modeling, 7, 356410.CrossRefGoogle Scholar
Gabriel, K.R., & Odoroff, L. (1984). Resistant lower rank approximation of matrices. In Diday, E., Jambu, M., Lebart, L., Pages, J., & Tomassone, R. (Eds.), Data analysis and informatics III (pp. 2330). Amsterdam: North-Holland.Google Scholar
Gnanadesikan, R. (1997). Methods for statistical data analysis of multivariate observations. New York, NY: Wiley.CrossRefGoogle Scholar
Gorsuch, R.L. (1983). Factor analysis 2nd ed., Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
Green, P.J. (1984). Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistent alternatives (with discussion). Journal of the Royal Statistical Society, Series B, 46, 149192.CrossRefGoogle Scholar
Hampel, F.R. (1974). The influence curve and its role in robust estimation. Journal of the American Statistical Association, 69, 383393.CrossRefGoogle Scholar
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., & Stahel, W.A. (1986). Robust statistics: The approach based on influence functions. New York, NY: Wiley.Google Scholar
Harman, H.H. (1976). Modern factor analysis 3rd ed., Chicago, IL: The University of Chicago Press.Google Scholar
Hayashi, K., & Sen, P.K. (1998). On covariance estimators of factor loadings in factor analysis. Journal of Multivariate Analysis, 66, 3845.CrossRefGoogle Scholar
Hayashi, K., & Yung, Y.F. (1999). Standard errors for the class of orthomax-rotated factor loadings: Some matrix results. Psychometrika, 64, 451460.CrossRefGoogle Scholar
Heiser, W.J. (1987). Correspondence analysis with least absolute residuals. Computational Statistics & Data Analysis, 5, 337356.CrossRefGoogle Scholar
Hoaglin, D.C., Mosteller, F., & Tukey, J.W. (1983). Understanding robust and exploratory data analysis. New York, NY: Wiley.Google Scholar
Holland, P.W., & Welsch, R.E. (1977). Robust regression using iteratively reweighted least-squares. Communications in Statistics-Theory and Methods, Series A, 6, 813827.CrossRefGoogle Scholar
Holzinger, K.J., & Swineford, F. (1939). A Study in factor analysis: The stability of a bi-factor solution. Chicago, IL: University of Chicago.Google Scholar
Hu, L.T., Bentler, P.M., & Kano, Y. (1992). Can test statistics in covariance structure analysis be trusted?. Psychological Bulletin, 112, 351362.CrossRefGoogle ScholarPubMed
Huber, P.J. (1977). Robust covariances. In Gupta, S.S., & Moore, D.S. (Eds.), Statistical decision theory and related topics, Vol. 2 (pp. 165191). New York, NY: Academic Press.CrossRefGoogle Scholar
Huber, P.J. (1981). Robust statistics. New York, NY: Wiley.CrossRefGoogle Scholar
Ichikawa, M., & Konishi, S. (1995). Application of the bootstrap methods in factor analysis. Psychometrika, 60, 7793.CrossRefGoogle Scholar
Jamshidian, M., & Bentler, P.M. (1999). Using complete data routines for ML estimation of mean and covariance structures with missing data. Journal Educational and Behavioral Statistics, 23, 2141.CrossRefGoogle Scholar
Jennrich, R.I. (1973). Standard errors for obliquely rotated factor loadings. Psychometrika, 38, 593604.CrossRefGoogle Scholar
Jennrich, R.I. (1974). Simplified formulae for standard errors in maximum-likelihood factor analysis. British Journal of Mathematical and Statistical Psychology, 27, 122131.CrossRefGoogle Scholar
Jennrich, R.I. (1978). Rotational equivalence of factor loading matrices with specified values. Psychometrika, 43, 421426.CrossRefGoogle Scholar
Jennrich, R.I., & Thayer, D.T. (1973). A note on Lawley's formulas for standard errors in maximum likelihood factor analysis. Psychometrika, 38, 571580.CrossRefGoogle Scholar
Kaiser, H.F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187200.CrossRefGoogle Scholar
Kano, Y. (1994). Consistency property of elliptical probability density functions. Journal of Multivariate Analysis, 51, 139147.CrossRefGoogle Scholar
Kano, Y., Berkane, M., & Bentler, P.M. (1993). Statistical inference based on pseudo-maximum likelihood estimators in elliptical populations. Journal of the American Statistical Association, 88, 135143.CrossRefGoogle Scholar
Kenward, M.G., & Molenberghs, G. (1998). Likelihood based frequentist inference when data are missing at random. Statistical Science, 13, 236247.CrossRefGoogle Scholar
Kharin, Y.S. (1996). Robustness in discriminant analysis. In Rieder, H. (Eds.), Robust statistics, data analysis, and computer intensive methods (pp. 225234). New York, NY: Springer.CrossRefGoogle Scholar
Krane, W.R., & McDonald, R.P. (1978). Scale invariance and the factor analysis of correlation matrices. British Journal of Mathematical and Statistical Psychology, 31, 218228.CrossRefGoogle Scholar
Krijnen, W.P., Dijkstra, T.K., & Gill, R.D. (1998). Conditions for factor (in)determinacy in factor analysis. Psychometrika, 63, 359367.CrossRefGoogle Scholar
Kwan, C.W., & Fung, W.K. (1998). Assessing local influence for specific restricted likelihood: Application to factor analysis. Psychometrika, 63, 3546.CrossRefGoogle Scholar
Laird, N.M. (1988). Missing data in longitudinal studies. Statistics in Medicine, 7, 305315.CrossRefGoogle ScholarPubMed
Lange, K.L., Little, R.J.A., & Taylor, J.M.G. (1989). Robust statistical modeling using the t distribution. Journal of the American Statistical Association, 84, 881896.Google Scholar
Lawley, D.N., & Maxwell, A.E. (1971). Factor analysis as a statistical method 2nd ed., New York, NY: American Elsevier.Google Scholar
Lee, S.-Y. (1986). Estimation for structural equation models with missing data. Psychometrika, 51, 9399.CrossRefGoogle Scholar
Lehmann, E.L., & Casella, G. (1998). Theory of point estimation. New York, NY: Springer-Verlag.Google Scholar
Liang, K.Y., & Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 1322.CrossRefGoogle Scholar
Little, R.J.A. (1988). Robust estimation of the mean and covariance matrix from data with missing values. Applied Statistics, 37, 2338.CrossRefGoogle Scholar
Little, R.J.A., & Rubin, D.B. (1987). Statistical analysis with missing data. New York, NY: Wiley.Google Scholar
Little, R.J.A., & Smith, P.J. (1987). Editing and imputation for quantitative survey data. Journal of the American Statistical Association, 82, 5868.CrossRefGoogle Scholar
Liu, C., & Rubin, D.B. (1998). Maximum likelihood estimation of factor analysis using the ECME algorithm with complete and incomplete data. Statistica Sinica, 8, 729747.Google Scholar
Lopuhaä, H.P. (1989). On the relation between S-estimators and M-estimators of multivariate location and covariances. Annals of Statistics, 17, 16621683.CrossRefGoogle Scholar
Magnus, J.R., & Neudecker, H. (1988). Matrix differential calculus with applications in statistics and econometrics. New York, NY: Wiley.Google Scholar
Mardia, K.V. (1970). Measure of multivariate skewness and kurtosis with applications. Biometrika, 57, 519530.CrossRefGoogle Scholar
Maronna, R.A. (1976). Robust M-estimators of multivariate location and scatter. Annals of Statistics, 4, 5167.CrossRefGoogle Scholar
McDonald, R.P. (1999). Test theory: A unified treatment. New Jersey, NJ: Lawrence Erlbaum Associates.Google Scholar
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156166.CrossRefGoogle Scholar
Mooijaart, A. (1985). Factor analysis for nonnormal variables. Psychometrika, 50, 323342.CrossRefGoogle Scholar
Mooijaart, A., & Bentler, P.M. (1985). The weight matrix in asymptotic distribution-free methods. British Journal of Mathematical and Statistical Psychology, 38, 190196.CrossRefGoogle Scholar
Muthén, B., Kaplan, D., & Hollis, M. (1987). On structural equation modeling with data that are not missing completely at random. Psychometrika, 52, 431462.CrossRefGoogle Scholar
Ogasawara, H. (1996). Standard errors for rotated factor loadings by normalized orthomax method. Japanese Journal of Behaviormetrics, 23, 122129.Google Scholar
Ogasawara, H. (1998). Standard errors for rotation matrices with an application to promax solution. British Journal of Mathematical and Statistical Psychology, 51, 163178.CrossRefGoogle Scholar
Ogasawara, H. (1999). Standard errors for procrustes solutions. Japanese Psychological Research, 41, 121130.CrossRefGoogle Scholar
Rousseeuw, P.J., & van Zomeren, B.C. (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association, 85, 633639.CrossRefGoogle Scholar
Rao, C.R. (1955). Estimation and tests of significance in factor analysis. Psychometrika, 20, 93111.CrossRefGoogle Scholar
Rao, C.R. (1973). Linear statistical inference and its applications 2nd ed., New York, NY: Wiley.CrossRefGoogle Scholar
Rovine, M.J. (1994). Latent variables models and missing data analysis. In von Eye, A., & Clogg, C.C. (Eds.), Latent variables analysis: Applications for developmental research (pp. 181225). Thousand Oaks, CA: Sage.Google Scholar
Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley.CrossRefGoogle Scholar
Rudin, W. (1976). Principles of mathematical analysis 3rd ed., New York, NY: McGraw-Hill.Google Scholar
SAS Institute (1999). SAS/STAT (V.8) PROC TFACTOR. Cary, NC: Author.Google Scholar
Satorra, A., & Bentler, P.M. (1986). Some robustness properties of goodness of fit statistics in covariance structure analysis. 1986 Proceedings of Business and Economics Sections of the American Statistical Association (pp. 549554). Alexandria, VA: American Statistical Association.Google Scholar
Satorra, A., & Bentler, P.M. (1988). Scaling corrections for chi-square statistic in covariance structure analysis. Proceedings of the American Statistical Association (pp. 308313). Alexandria, VA: American Statistical Association.Google Scholar
Satorra, A., & Bentler, P.M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In von Eye, A., & Clogg, C.C. (Eds.), Latent variables analysis: Applications for developmental research (pp. 399419). Newbury Park, CA: Sage.Google Scholar
Shapiro, A. (1985). Identifiability of factor analysis: Some results and open problems. Linear Algebra and Its Applications, 70, 17.CrossRefGoogle Scholar
Shapiro, A., & Browne, M.W. (1990). On the treatment of correlation structures as covariance structures. Linear Algebra and Its Applications, 127, 567587.CrossRefGoogle Scholar
Steiger, J.H., & Hakstian, A.R. (1982). The asymptotic distribution of elements of a correlation matrix: Theory and application. British Journal of Mathematical and Statistical Psychology, 35, 208215.CrossRefGoogle Scholar
Swaminathan, H., & Algina, J. (1978). Scale freeness in factor analysis. Psychometrika, 43, 581583.CrossRefGoogle Scholar
Tanaka, Y., & Odaka, Y. (1989). Influential observations in principal factor analysis. Psychometrika, 54, 475485.CrossRefGoogle Scholar
Tyler, D.E. (1983). Robustness and efficiency properties of scatter matrices. Biometrika, 70, 411420.CrossRefGoogle Scholar
Verboon, P., & Heiser, W.J. (1994). Resistant lower rank approximation of matrices by iterative majorization. Computational Statistics & Data Analysis, 18, 457467.CrossRefGoogle Scholar
Wilcox, R.R. (1997). Introduction to robust estimation and hypothesis testing. San Diego, CA: Academic Press.Google Scholar
Yuan, K.-H., & Bentler, P.M. (1998). Robust mean and covariance structure analysis. British Journal of Mathematical and Statistical Psychology, 51, 6388.CrossRefGoogle ScholarPubMed
Yuan, K.-H., & Bentler, P.M. (1998). Structural equation modeling with robust covariances. Sociological methodology, 28, 363396.CrossRefGoogle Scholar
Yuan, K.-H., & Bentler, P.M. (2000). Three likelihood-based methods for mean and covariance structure analysis with nonnormal missing data. Sociological Methodology, 30, 167202.CrossRefGoogle Scholar
Yuan, K.-H., & Bentler, P.M. (2000). On equivariance and invariance of standard errors in three exploratory factor models. Psychometrika, 65, 121133.CrossRefGoogle Scholar
Yuan, K.-H., Bentler, P.M., & Chan, W. (1999). Structural equation modeling with heavy tailed distributions through bootstrap. Manuscript submitted for publication.Google Scholar
Yuan, K.-H., & Jennrich, R.I. (1998). Asymptotics of estimating equations under natural conditions. Journal of Multivariate Analysis, 65, 245260.CrossRefGoogle Scholar