Hostname: page-component-745bb68f8f-5r2nc Total loading time: 0 Render date: 2025-01-08T13:23:28.013Z Has data issue: false hasContentIssue false

Global Convergence of the EM Algorithm for Unconstrained Latent Variable Models with Categorical Indicators

Published online by Cambridge University Press:  01 January 2025

Alexander Weissman*
Affiliation:
Psychometric Research, Law School Admission Council
*
Requests for reprints should be sent to Alexander Weissman, Psychometric Research, Law School Admission Council, 662 Penn Street, Box 40, Newtown, PA 18940, USA. E-mail: aweissman@lsac.org

Abstract

Convergence of the expectation-maximization (EM) algorithm to a global optimum of the marginal log likelihood function for unconstrained latent variable models with categorical indicators is presented. The sufficient conditions under which global convergence of the EM algorithm is attainable are provided in an information-theoretic context by interpreting the EM algorithm as alternating minimization of the Kullback–Leibler divergence between two convex sets. It is shown that these conditions are satisfied by an unconstrained latent class model, yielding an optimal bound against which more highly constrained models may be compared.

Type
Original Paper
Copyright
Copyright © The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Petrov, B.N., & Csaki, F. (Eds.), Proceeding of the second international symposium on information theory, Budapest: Akademiai Kiado 267281Google Scholar
Amari, S.-i. (1994). Information geometry of the EM and em algorithms for neural networks, Tokyo: Department of Mathematical Engineering, University of TokyoGoogle Scholar
Baker, F.B., & Kim, S.-H. (2004). Item response theory: parameter estimation techniques, New York: DekkerCrossRefGoogle Scholar
Beal, M.J. (2003). Variational algorithms for approximate Bayesian inference. Unpublished Doctoral dissertation. University of London. Google Scholar
Beal, M.J., & Ghahramani, Z. (2003). The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures. In Bernardo, J.M., Bayarri, M.J., Dawid, A.P., Berger, J.O., Heckerman, D., Smith, A.F.M., & West, M. (Eds.), Bayesian statistics 7: proceedings of the seventh Valencia international meeting, June 2–6, 2002, Oxford: Oxford University PressGoogle Scholar
Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika, 46(4), 443459CrossRefGoogle Scholar
Boyd, S.P., & Vandenberghe, L. (2004). Convex optimization, Cambridge: Cambridge University PressCrossRefGoogle Scholar
Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika, 52(3), 345370CrossRefGoogle Scholar
Byrd, R.H., Nocedal, J., & Waltz, R.A. (2006). Knitro: an integrated package for nonlinear optimization. Paper presented at the workshop on large scale nonlinear optimization held in Erice, Italy, at the “G. Stampacchia” International School of Mathematics of the “E. Majorana” Centre for Scientific Culture, during June 22–July 1, 2004, Erice, Italy. CrossRefGoogle Scholar
Cover, T.M., & Thomas, J.A. (2006). Elements of information theory, (2nd ed.). Hoboken: WileyGoogle Scholar
Csiszár, I., & Tusnády, G. (1984). Information geometry and alternating minimization procedures. Statistics & Decisions, Supplement Issue 1, 205237Google Scholar
Dellaert, F. (2002). The expectation maximization algorithm (No. GIT-GVU-02-20). Atlanta: Georgia Institute of Technology. Google Scholar
Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39(1), 138CrossRefGoogle Scholar
Fourer, R., Gay, D., & Kernighan, B. (2002). AMPL: a modeling language for mathematical programming, Stamford: Duxbury Press/Brooks/Cole Publishing CompanyGoogle Scholar
Fuchs, M., & Neumaier, A. (2010). Optimization in latent class analysis (Technical Report TR/PA/10/89). Centre Européen de Recherche et de Formation Avancée en Calcul Scientifique (CERFACS). Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.188.5558. Google Scholar
Harpaz, R., & Haralick, R. (2006). The EM algorithm as a lower bound optimization technique (No. TR-2006001). Graduate Center, City University of New York. Google Scholar
Harwell, M.R., Baker, F.B., & Zwarts, M. (1988). Item parameter estimation via marginal maximum likelihood and an EM algorithm: a didactic. Journal of Educational Statistics, 13(3), 243271CrossRefGoogle Scholar
Humphreys, K., & Titterington, D.M. (2003). Variational approximations for categorical causal modeling with latent variables. Psychometrika, 68(3), 391412CrossRefGoogle Scholar
Ip, E.H., & Lalwani, N. (2000). Notes and Comments—A note on the geometric interpretation of the EM algorithm in estimating item characteristics and student abilities. Psychometrika, 65(4), 533537CrossRefGoogle Scholar
Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., & Saul, L.K. (1999). An introduction to variational methods for graphical models. Machine Learning, 37, 183233CrossRefGoogle Scholar
Kohlmann, T., & Formann, A.K. (1997). Using latent class models to analyze response patterns in epidemiologic mail surveys. In Rost, J., & Langeheine, R. (Eds.), Applications of latent trait and latent class models in the social sciences, Münster: WaxmannGoogle Scholar
Kotz, S., Read, C.B., & Banks, D.L. (1999). Encyclopedia of statistical sciences, New York: WileyGoogle Scholar
Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 7986CrossRefGoogle Scholar
McLachlan, G.J., & Krishnan, T. (2008). The EM algorithm and extensions, Hoboken: Wiley-InterscienceCrossRefGoogle Scholar
Minka, T.P. (1998). Expectation-maximization as lower bound maximization. Retrieved from http://research.microsoft.com/en-us/um/people/minka/papers/em.html. Google Scholar
Minka, T.P. (2009). Automating variational inference for statistics and data mining. Paper presented at the 74th annual and 16th international meeting of the psychometric society, Cambridge, UK. Google Scholar
Moustaki, I., & Knott, M. (2000). Generalized latent trait models. Psychometrika, 65(3), 391411CrossRefGoogle Scholar
Neal, R.M., & Hinton, G.E. (1999). A view of the EM algorithm that justifies incremental, sparse, and other variants. In Jordan, M.I. (Eds.), Learning in graphical models, Cambridge: MIT Press 355368Google Scholar
Prescher, D. (2004). A tutorial on the expectation-maximization algorithm including maximum-likelihood estimation and EM training of probabilistic context-free grammars. Retrieved from http://arxiv.org/abs/cs/0412015. Google Scholar
Rijmen, F. (2011). A variational approximation estimation method for the item response theory model with random item effects across groups. Paper presented at the 76th annual and 17th international meeting of the psychometric society, Hong Kong. Google Scholar
Rijmen, F., Jeon, M., & Rabe-Hesketh, S. (in press). Variational approximation methods for IRT. In W.J. van der Linden & R.K. Hambleton (Eds.) Handbook of item response theory: models, statistical tools, and applications. London: Chapman & Hall. Google Scholar
Rockafellar, R.T. (1970). Convex analysis, Princeton: Princeton University PressCrossRefGoogle Scholar
Rossi, N., Wang, X., & Ramsay, J.O. (2002). Nonparametric item response function estimates with the EM algorithm. Journal of Educational and Behavioral Statistics, 27(3), 291317CrossRefGoogle Scholar
SAS Institute (2008). SAS-IML: Interactive Matrix Language (Version 9.2). Cary, NC. Google Scholar
Wets, R.J.B. (1999). Statistical estimation from an optimization viewpoint. Annals of Operations Research, 85(1), 79CrossRefGoogle Scholar
Wu, C.F.J. (1983). On the convergence properties of the EM algorithm. The Annals of Statistics, 11(1), 95103CrossRefGoogle Scholar