Hostname: page-component-cd9895bd7-lnqnp Total loading time: 0 Render date: 2025-01-05T01:16:27.487Z Has data issue: false hasContentIssue false

Cluster Correspondence Analysis

Published online by Cambridge University Press:  01 January 2025

M. van de Velden*
Affiliation:
Erasmus University Rotterdam
A. Iodice D’Enza
Affiliation:
Università di Cassino e del Lazio Meridionale
F. Palumbo
Affiliation:
Università degli Studi di Napoli Federico II
*
Correspondence should be made to M. van de Velden, Econometric Institute, Erasmus University Rotterdam, P.O. Box 1738, 3000 DR Rotterdam, The Netherlands. Email: vandevelden@ese.eur.nl

Abstract

A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unified framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study confirms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.

Type
Original Paper
Copyright
Copyright © 2016 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Electronic supplementary material The online version of this article (doi:10.1007/s11336-016-9514-0) contains supplementary material, which is available to authorized users.

References

Bäck, T. (1996). Evolutionary algorithms in theory and practice: Evolution strategies, evolutionary programming, genetic algorithms, Oxford: Oxford University Press.CrossRefGoogle Scholar
Borg, I., & Groenen, P. J. (2005). Modern multidimensional scaling: Theory and applications, New York: Springer.Google Scholar
De Soete, G., Carroll, J. D., Diday, E., Lechevallier, Y., Schader, M., Bertrand, P., & Burtschy, B. (1994). K-means clustering in a low-dimensional euclidean space. New approaches in classification and data analysis, Berlin: Springer 212219.CrossRefGoogle Scholar
Gifi, A. (1990). Nonlinear multivariate analysis, Chichester: Wiley.Google Scholar
Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27, 623637.CrossRefGoogle Scholar
Gower, J. C., Lubbe, S. G., & Le Roux, N. J. (2011). Understanding biplots, New York: Wiley.CrossRefGoogle Scholar
Gower, J. C., Groenen, PJF, & van de Velden, M. (2010). Area biplots. Journal of Computational and Graphical Statistics, 19, (1), 4661.CrossRefGoogle Scholar
Gower, J. C., & Hand, D. J. (1996). Biplots, London: Chapman and Hall.Google Scholar
Greenacre, M. J. (1984). Theory and applications of correspondence analysis, London: Academic Press.Google Scholar
Greenacre, M. J. (1993). Biplots in correspondence analysis. Journal of Applied Statistics, 20, (2), 251269.CrossRefGoogle Scholar
Greenacre, M. J. (2007). Correspondence analysis in practice, Boca Raton: CRC Press.CrossRefGoogle Scholar
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218. doi:10.1007/BF01908075CrossRefGoogle Scholar
Hwang, H., Dillon, W. R., & Takane, Y. (2006). An extension of multiple correspondence analysis for identifying heterogenous subgroups of respondents. Psychometrika, 71, 161171.CrossRefGoogle Scholar
Iodice D’Enza, A., & Palumbo, F. (2013). Iterative factor clustering of binary data. Computational Statistics, 789-807. doi:10.1007/s00180-012-0329-xCrossRefGoogle Scholar
Iodice D’Enza, A., van de Velden, M., Palumbo, F., Vicari, D., Okada, A., Ragozini, G., & Weihs, C. (2014). On joint dimension reduction and clustering of categorical data. Analysis and modeling of complex data in behavioral and social sciences, Berlin: Springer.Google Scholar
Jolliffe, J. (2002). Principal component analysis, New York: Springer.Google Scholar
Kroonenberg, P. M., & Lombardo, R. (1999). Nonsymmetric correspondence analysis: A tool for analysing contingency tables with a dependence structure. Multivariate Behavioral Research, 34, 367396.CrossRefGoogle Scholar
Lauro, N., & D’Ambra, L. (1984). L’ analyse non symetrique des correspondances [nonsymmetric correspondence analysis]. In E. Diday, L. Lebart, M. Jambu, & Thomassone (Eds.), Data analysis and informatics III (pp. 433–446). Amsterdam: Elsevier.Google Scholar
MacQueen, J., Cam, L., & Neyman, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the fifth berkeley symposium on mathematical statistics and probability, California: University of California Press 281297.Google Scholar
Martin, R. A., Puhlik-Doris, P., Larsen, G., Gray, J., & Weir, K. (2003). Individual differences in uses of humor and their relation to psychological well-being: Development of the humor styles questionnaire. Journal of Research in Personality, 37, (1), 4875.CrossRefGoogle Scholar
Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications, Toronto: University of Toronto Press.CrossRefGoogle Scholar
Nishisato, S. (1994). Elements of dual scaling: An introduction to practical data analysis, Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
van de Velden, M., & Bijmolt, T. (2006). Generalized canonical correlation analysis of matrices with missing rows: A simulation study. Psychometrika, 71, (2), 323331.CrossRefGoogle Scholar
van de Velden, M., & Takane, Y. (2012). Generalized canonical correlation analysis with missing values. Computational Statistics, 27, (3), 551571.CrossRefGoogle Scholar
Van Buuren, S., & Heiser, W. (1989). Clustering n objects into k groups under optimal scaling of variables. Psychometrika, 54, 699706.CrossRefGoogle Scholar
Vichi, M., & Kiers, HAL (2001). Factorial k-means analysis for two-way data. Computational Statistics and Data Analysis, 37, 4964.CrossRefGoogle Scholar
Vichi, M., Vicari, D., & Kiers, H. (2009). Clustering and dimensional reduction for mixed variables. (Unpublished manuscript)Google Scholar
Yamamoto, M., & Hwang, H. (2014). A general formulation of cluster analysis with dimension reduction and subspace separation. Behaviormetrika, 41, 115129.CrossRefGoogle Scholar
Supplementary material: File

van de Velden et al. supplementary material

van de Velden et al. supplementary material
Download van de Velden et al. supplementary material(File)
File 1.2 MB