Hostname: page-component-745bb68f8f-mzp66 Total loading time: 0 Render date: 2025-01-07T18:46:04.649Z Has data issue: false hasContentIssue false

Optimization-Based Model Fitting for Latent Class and Latent Profile Analyses

Published online by Cambridge University Press:  01 January 2025

Guan-Hua Huang*
Affiliation:
Institute of Statistics, National Chiao Tung University, Taiwan
Su-Mei Wang
Affiliation:
Institute of Statistics, National Chiao Tung University, Taiwan
Chung-Chu Hsu
Affiliation:
Institute of Statistics, National Chiao Tung University, Taiwan
*
Requests for reprints should be sent to Guan-Hua Huang, Institute of Statistics, National Chiao Tung University, 1001 Ta Hsueh Road, Hsinchu 30010, Taiwan. E-mail: ghuang@stat.nctu.edu.tw

Abstract

Statisticians typically estimate the parameters of latent class and latent profile models using the Expectation-Maximization algorithm. This paper proposes an alternative two-stage approach to model fitting. The first stage uses the modified k-means and hierarchical clustering algorithms to identify the latent classes that best satisfy the conditional independence assumption underlying the latent variable model. The second stage then uses mixture modeling treating the class membership as known. The proposed approach is theoretically justifiable, directly checks the conditional independence assumption, and converges much faster than the full likelihood approach when analyzing high-dimensional data. This paper also develops a new classification rule based on latent variable models. The proposed classification procedure reduces the dimensionality of measured data and explicitly recognizes the heterogeneous nature of the complex disease, which makes it perfect for analyzing high-throughput genomic data. Simulation studies and real data analysis demonstrate the advantages of the proposed method.

Type
Original Paper
Copyright
Copyright © 2011 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Albert, P.S., McShane, L.M., Shih, J.H. (2001). Latent class modeling approaches for assessing diagnostic error without a gold standard: with applications to p53 immunohistochemical assays in bladder tumors. Biometrics, 57, 610619.CrossRefGoogle Scholar
Bandeen-Roche, K., Miglioretti, D.L., Zeger, S.L., Rathouz, P.J. (1997). Latent variable regression for multiple outcomes. Journal of the American Statistical Association, 92, 13751386.CrossRefGoogle Scholar
Brusco, M.J., Cradit, J.D. (2001). A variable selection heuristic for k-means clustering. Psychometrika, 66, 249270.CrossRefGoogle Scholar
Bryant, P., Williamson, J.A. (1978). Asymptotic behavior of classification maximum likelihood estimates. Biometrika, 65, 273281.CrossRefGoogle Scholar
Celeux, G., Govaert, G. (1992). A classification EM algorithm for clustering and two stochastic versions. Computational Statistics and Data Analysis, 14, 315332.CrossRefGoogle Scholar
Chang, C.J., Chen, W.J., Liu, S.K., Cheng, J.J., Ou Yang, W.C., Chang, H.J., Lane, H.Y., Lin, S.K., Yang, T.W., Hwu, H.G. (2002). Morbidity risk of psychiatric disorders among the first degree relatives of schizophrenia patients in Taiwan. Schizophrenia Bulletin, 28, 379392.CrossRefGoogle ScholarPubMed
Chen, W.J., Liu, S.K., Chang, C.J., Lien, Y.J., Chang, Y.H., Hwu, H.G. (1998). Sustained attention deficit and schizotypal personality features in nonpsychotic relatives of schizophrenic patients. American Journal of Psychiatry, 155, 12141220.CrossRefGoogle ScholarPubMed
Cheng, J.J., Ho, H., Chang, C.J., Lane, S.Y., Hwu, H.G. (1996). Positive and Negative Syndrome Scale (PANSS): establishment and reliability study of a Mandarin Chinese language version. Taiwanese Journal Psychiatry, 10, 251258.Google Scholar
Clogg, C.C. (1995). Latent class models. In Arminger, G., Clogg, C.C., Sobel, M.E. (Eds.), Handbook of statistical modeling for the social and behavioral sciences (pp. 311360). New York: Plenum.CrossRefGoogle Scholar
Cook, R.D., Weisberg, S. (1982). Residuals and influence in regression, London: Chapman Hall.Google Scholar
Dayton, C.M., Macready, G.B. (1998). Concomitant-variable latent-class models. Journal of the American Statistical Association, 83, 173178.CrossRefGoogle Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B, 39, 138.CrossRefGoogle Scholar
Dudoit, S., Fridlyand, J., Speed, T.P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97, 7787.CrossRefGoogle Scholar
Friedman, J.H., Meulman, J.J. (2004). Clustering objects on subsets of attributes. Journal of the Royal Statistical Society. Series B, 66, 815849.CrossRefGoogle Scholar
Goodman, L.A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61, 215231.CrossRefGoogle Scholar
Huang, G.H. (2005). Selecting the number of classes under latent class regression: a factor analytic analogue. Psychometrika, 70, 325345.CrossRefGoogle Scholar
Huang, G.H., Bandeen-Roche, K. (2004). Building an identifiable latent class model with covariate effects on underlying and measured variables. Psychometrika, 69, 532.CrossRefGoogle Scholar
Hughes, T.R., Mao, M., Jones, A.R., Burchard, J., Marton, M.J., Shannon, K.W., Lefkowitz, S.M., Ziman, M., Schelter, J.M., Meyer, M.R., Kobayashi, S., Davis, C., Dai, H., He, Y.D., Stephaniants, S.B., Cavet, G., Walker, W.L., West, A., Coffey, E., Shoemaker, D.D., Stoughton, R., Blanchard, A.P., Friend, S.H., Linsley, P.S. (2001). Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nature Biotechnology, 19, 342347.CrossRefGoogle Scholar
Landwehr, J.M., Pregibon, D., Shoemaker, C. (1984). Graphical methods for assessing logistic regression models. Journal of the American Statistical Association, 79, 6171.CrossRefGoogle Scholar
Lazarsfeld, P.F., Henry, N.W. (1968). Latent structure analysis, New York: Houghton-Mifflin.Google Scholar
Ledoit, O., Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88, 365411.CrossRefGoogle Scholar
Liu, S.K., Hwu, H.G., Chen, W.J. (1997). Clinical symptom dimensions and deficits on the continuous performance test in schizophrenia. Schizophrenia Research, 25, 211219.CrossRefGoogle ScholarPubMed
Lubke, G.H., Carey, G., Lessem, J., Hewitt, J. (2008). Using observed genetic variables to predict latent class membership: a comparison of two methods. Behavior Genetics, 38, 612653.Google Scholar
Lux, V., Kendler, K.S. (2010). Deconstructing major depression: a validation study of the DSM-IV symptomatic criteria. Psychological Medicine, 40, 16791690.CrossRefGoogle ScholarPubMed
Marriott, F.H.C. (1975). Separating mixtures of normal distributions. Biometrics, 31, 767769.CrossRefGoogle Scholar
McCullagh, P., Nelder, J.A. (1989). Generalized linear models, (2nd ed.). London: Chapman and Hall.CrossRefGoogle Scholar
Melton, B., Liang, K.Y., Pulver, A.E. (1994). Extended latent class approach to the study of familial/sporadic forms of a disease: its application to the study of the heterogeneity of schizophrenia. Genetic Epidemiology, 11, 311327.CrossRefGoogle Scholar
Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525543.CrossRefGoogle Scholar
Mohr, P.E., Cheng, C.M., Claxton, K., Conley, R.R., Feldman, J.J., Hargreaves, W.A., Lehman, A.F., Lenert, L.A., Mahmoud, R., Marder, S.R., Neumann, P. (2004). The heterogeneity of schizophrenia in disease states. Schizophrenia Research, 71, 8395.CrossRefGoogle ScholarPubMed
Moustaki, I. (1996). A latent trait and a latent class model for mixed observed variables. British Journal of Mathematical and Statistical Psychology, 49, 313334.CrossRefGoogle Scholar
Muthén, L.K., Muthén, B.O. (2007). Mplus user’s guide, (5th ed.). Los Angeles: Muthén & Muthén.Google Scholar
Qu, Y., Tan, M., Kunter, M.H. (1996). Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics, 52, 797810.CrossRefGoogle ScholarPubMed
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests, Kopenhagen: Nielsen & Lydiche.Google Scholar
Rosvold, H.E., Mirsk, A.F., Sarason, I., Bransome, E.D. Jr., Bech, L.H. (1956). A continuous performance test of brain damage. Journal of Consulting Psychology, 20, 343350.CrossRefGoogle Scholar
Titterington, D.M., Smith, A.F., Makov, U.E. (1985). Statistical analysis of finite mixture distributions, New York: Wiley.Google Scholar
van’t Veer, L.J., Dai, H., van de Vijver, M.J., He, Y.D., Hart, A.A., Mao, M., Peterse, H.L., van der Kooy, K., Marton, M.J., Witteveen, A.T., Schreiber, G.J., Kerkhoven, R.M., Roberts, C., Linsley, P.S., Bernards, R., Friend, S.H. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530536.CrossRefGoogle ScholarPubMed