Hostname: page-component-cd9895bd7-q99xh Total loading time: 0 Render date: 2025-01-05T01:04:01.356Z Has data issue: false hasContentIssue false

Selecting the Number of Classes Under Latent Class Regression: A Factor Analytic Analogue

Published online by Cambridge University Press:  01 January 2025

Guan-Hua Huang*
Affiliation:
National Chiao Tung University
*
Request for reprints should be sent to Guan-Hua Huang, Institute of Statistics, National Chiao Tung University, 1001 Ta Hsueh Road, Hsinchu 300, Taiwan. E-mail: ghuang@stat.nctu.edu.tw. Webpage: http://www.stat.nctu.edu.tw/subhtml/source/teachers/ghuang/

Abstract

Recently, the regression extension of latent class analysis (RLCA) model has received much attention in the field of medical research. The basic RLCA model summarizes shared features of measured multiple indicators as an underlying categorical variable and incorporates the covariate information in modeling both latent class membership and multiple indicators themselves. To reduce complexity and enhance interpretability, one usually fixes the number of classes in a given RLCA. Often, goodness of fit methods comparing various estimated models are used as a criterion to select the number of classes. In this paper, we propose a new method that is based on an analogous method used in factor analysis and does not require repeated fitting. Two ideas with application to many settings other than ours are synthesized in deriving the method: a connection between latent class models and factor analysis, and techniques of covariate marginalization and elimination. A Monte Carlo simulation study is presented to evaluate the behavior of the selection procedure and compare to alternative approaches. Data from a study of how measured visual impairments affect older persons’ functioning are used for illustration.

Type
Original Paper
Copyright
Copyright © 2005 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This work was supported by National Institute on Aging (NIA) Program Project P01-AG-10184-03. The author wishes to thank Dr. Karen Bandeen-Roche for her stimulating comments and helpful discussions, and Drs. Gary Rubin and Sheila West for kindly making the Salisbury Eye Evaluation data available.

References

Agresti, A. (1984). Analysis of Categorical Data. New York: J. Wiley and Sons.Google Scholar
Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317332.CrossRefGoogle Scholar
Bandeen-Roche, K., & Miglioretti, D.L., Zeger, S.L., & Rathouz, P.J. (1997). Latent variable regression for multiple discrete outcomes. Journal of the American Statistical Association, 92, 13751386.CrossRefGoogle Scholar
Bartholomew, D.J., & Knott, M. (1999). Latent Variable Models and Factor Analysis (2nd ed.). Arnold, London: Kendall Library of Statistics.Google Scholar
Cattell, R.B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245276.CrossRefGoogle ScholarPubMed
Cattell, R.B., & Vogelmann, S. (1977). A comprehensive trial of the scree and KG criteria for determining the number of factors. Multivariate Behavioral Research, 12, 289325.CrossRefGoogle ScholarPubMed
Cook, R.D., & Weisberg, S. (1982). Residuals and Influence in Regression. London: Chapman Hall.Google Scholar
Dayton, C.M., & Macready, G.B. (1988). Concomitant-variable latent-class models. Journal of the American Statistical Association, 83, 173178.CrossRefGoogle Scholar
Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 138.CrossRefGoogle Scholar
Folstein, M.F., Folstein, S.E., & McHugh, P.R. (1975). Mini-mental state: a practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12, 189.CrossRefGoogle Scholar
Formann, A.K. (1992). Linear logistic latent class analysis for polytomous data. Journal of the American Statistical Association, 87, 476486.CrossRefGoogle Scholar
Francisco, C.A., & Finch, M.D. (1979) A comparison of methods used for determining the number of factors to retain in factor analysis. American Statistical Association Proceedings of the Statistical Computing Section 105–110Google Scholar
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721741.CrossRefGoogle ScholarPubMed
Goldberg, D. (1972). GHQ The Selection of Psychiatric Illness by Questionnaire. London: Oxford University Press.Google Scholar
Goodman, L.A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61, 215231.CrossRefGoogle Scholar
Graybill, F.A. (1983). Matrices with Applications in Statistics. Belmont: Wadsworth.Google Scholar
Green, B.F. (1951). A general solution of the latent class model of latent structure analysis and latent profile analysis. Psychometrika, 16, 151166.CrossRefGoogle Scholar
Guttman, L. (1954). Some necessary conditions for common-factor analysis. Psychometrika, 19, 149161.CrossRefGoogle Scholar
Hagenaars, J.A. (1993) Loglinear Models with Latent Variables Sage. University Paper series on Quantitative Applications in the Social Sciences, series no. 07–094. Newbury Park, CA: Sage Publications.CrossRefGoogle Scholar
Huang, G.H., & Bandeen-Roche, K. (2004). Building an identifiable latent class model with covariate effects on underlying and measured variables. Psychometrika, 69, 532.CrossRefGoogle Scholar
Humphreys, L.G. (1964). Number of cases and number of factors: an example where N is very large. Educational and Psychological Measurement, 24, 457466.CrossRefGoogle Scholar
Kashyap, R.L. (1982). Optimal choice of AR and MA parts in autoregressive moving average models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 4, 99104.CrossRefGoogle Scholar
Landwehr, J.M., Pregibon, D., & Shoemaker, C. (1984). Graphical methods for assessing logistic regression models. Journal of the American Statistical Association, 79, 6171.CrossRefGoogle Scholar
Lazarsfeld, P.F., & Henry, N.W. (1968). Latent Structure Analysis. New York: Houghton-Mifflin.Google Scholar
Liang, K.Y., & Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 1322.CrossRefGoogle Scholar
Linn, R. (1968). A monte carlo approach to the number of factors problem. Psychometrika, 33, 3771.CrossRefGoogle Scholar
McCullagh, P., & Nelder, J.A. (1989). Generalized Linear Models (2nd ed.). London: Chapman and Hall.CrossRefGoogle Scholar
Melton, B., Liang, K.Y., & Pulver, A.E. (1994). Extended latent class approach to the study of familial/sporadic forms of a disease: its application to the study of the heterogeneity of schizophrenia. Genetic Epidemiology, 11, 311327.CrossRefGoogle Scholar
Moustaki, I. (1996). A latent trait and a latent class model for mixed observed variables. British Journal of Mathematical and Statistical Psychology, 49, 313334.CrossRefGoogle Scholar
Muthén, L.K., & Muthén, B.O. (1998). Mplus User’s Guide. Los Angeles, CA: Muthén & Muthén.Google Scholar
O’Hara Hines, R.J., & Carter, E.M. (1993). Improved added variable and partial residual plots for the detection of influential observation in generalized linear models. Applied Statistics, 42, 320.CrossRefGoogle Scholar
Rubin, G.S., West, S.K., Munoz, B., Bandeen-Roche, K., Zeger, S.L., Schein, O., & Fried, L.P. (1997). A comprehensive assessment of visual impairment in an older american population: SEE study. Investigative Ophthalmology and Visual Science, 38, 557568.Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461464.CrossRefGoogle Scholar
Statistical Sciences Inc. (1995). S-PLUS User’s Manual, Version 3.3 for Windows. Seattle: Statistical Sciences Inc..Google Scholar
Strang, G. (1976). Linear Algebra and Its Applications. New York: Academic Press.Google Scholar
Titterington, D.M., Smith, A.F.M., & Makov, U.E. (1985). Statistical Analysis of Finite Mixture Distributions. Chichester: Wiley.Google Scholar
Van der Heijden, P.G.M., Dessens, J., & Böckenholt, U. (1996). Estimating the concomitant-variable latent-class model with the EM algorithm. Journal of Educational and Behavioral Statistics, 21, 215229.CrossRefGoogle Scholar
Vermunt, J.K. (1996). Log-linear Event History Analysis: A General Approach with Missing Data, Unobserved Heterogeneity, and Latent Variables. Tilburg: Tilburg University Press.Google Scholar
Vermunt, J.K., & Magidson, J. (2000). Latent GOLD 2.0 User’s Guide. Belmont, MA: Statistical Innovations Inc..Google Scholar
Wang, P.C. (1985). Adding a variable in generalized linear models. Technometrics, 27, 273276.CrossRefGoogle Scholar
Wang, P.C. (1987). Residual plots for detecting nonlinearity in generalized linear models. Technometrics, 29, 435438.CrossRefGoogle Scholar
Wedel, M., Desarbo, W.S., Bult, J.R., & Ramaswamy, V. (1993). A latent class poisson regression model for heterogeneous count data. Journal of Applied Econometrics, 8, 397411.CrossRefGoogle Scholar
West, S.K., Munoz, B., Rubin, G.S., Schein, O.D., Bandeen-Roche, K., Zeger, S.L., German, P.S., & Fried, L.P. (1997). Function and visual impairment in a population-based study of older adults: SEE project. Investigative Ophthalmology and Visual Science, 38, 7282.Google Scholar
Yakowitz, S.J., & Spragins, J.D. (1968). On the identifiability of finite mixtures. The Annals of Mathematical Statistics, 39, 209214.CrossRefGoogle Scholar