Hostname: page-component-5f745c7db-szhh2 Total loading time: 0 Render date: 2025-01-06T07:55:26.519Z Has data issue: true hasContentIssue false

Clustering N Objects into K Groups under Optimal Scaling of Variables

Published online by Cambridge University Press:  01 January 2025

Stef van Buuren*
Affiliation:
Department of Psychonomy, University of Utrecht
Willem J. Heiser
Affiliation:
Department of Data Theory, University of Leiden
*
Requests for reprints should be sent to Stef van Buuren, Department of Psychonomy, Trans II 17.25, Heidelberglaan 2, 3584 CS Utrecht, THE NETHERLANDS.

Abstract

We propose a method to reduce many categorical variables to one variable with k categories, or stated otherwise, to classify n objects into k groups. Objects are measured on a set of nominal, ordinal or numerical variables or any mix of these, and they are represented as n points in p-dimensional Euclidean space. Starting from homogeneity analysis, also called multiple correspondence analysis, the essential feature of our approach is that these object points are restricted to lie at only one of k locations. It follows that these k locations must be equal to the centroids of all objects belonging to the same group, which corresponds to a sum of squared distances clustering criterion. The problem is not only to estimate the group allocation, but also to obtain an optimal transformation of the data matrix. An alternating least squares algorithm and an example are given.

Type
Original Paper
Copyright
Copyright © 1989 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

The authors thank Eveline Kroezen and Teije Euverman for their comments on a previous draft of this paper.

References

Arthanari, T. S., & Dodge, Y. (1981). Mathematical programming in statistics, New York: Wiley.Google Scholar
Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms, New York: Plenum Press.CrossRefGoogle Scholar
Binder, D. A. (1978). Bayesian cluster analysis. Biometrika, 65, 3138.CrossRefGoogle Scholar
Bock, H. H. (1972). Statistische Modelle und Bayesische Verfahren zur Bestimmung einer unbekannten Klassifikation normalverteilter zufälliger Vektoren [Statistical models and Bayesian problems for estimating an unknown classification of normally distributed random vectors]. Metrika, 18, 120132.CrossRefGoogle Scholar
de Leeuw, J. (1984). The Gifi system of nonlinear multivariate analysis. In Diday, E., Jambu, M., Lebart, L., Pagès, J., & Tomassone, R. (Eds.), Data analysis and informatics III (pp. 415424). Amsterdam: North-Holland.Google Scholar
DeSarbo, W. S., Carroll, J. D., Clark, L. A., & Green, P. E. (1984). Synthesized clustering: A method for amalgamating alternative clustering bases with differential weighting of variables. Psychometrika, 49, 5778.CrossRefGoogle Scholar
De Soete, G., DeSarbo, W. S., & Carroll, J. D. (1985). Optimal variable weighting for hierarchical clustering: An alternating least-squares algorithm. Journal of Classification, 2, 173192.CrossRefGoogle Scholar
Fienberg, S. E. (1980). The analysis of cross-classified categorical data 2nd ed.,, Cambridge, MA: MIT Press.Google Scholar
Fisher, W. D. (1958). On grouping for maximum homogeneity. Journal of the American Statistical Association, 53, 789798.CrossRefGoogle Scholar
Friedman, H. P., & Rubin, J. (1967). On some invariant criteria for grouping data. Journal of the American Statistical Association, 62, 11591178.CrossRefGoogle Scholar
Gifi, A. (1981). Nonlinear multivariate analysis, Leiden: University of Leiden, Department of Data Theory.Google Scholar
Gordon, A. D. (1981). Classification, London: Chapman and Hall.Google Scholar
Gordon, A. D., & Henderson, J. T. (1977). An algorithm for Euclidean sum of squares classification. Biometrics, 33, 355362.CrossRefGoogle Scholar
Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27, 857872.CrossRefGoogle Scholar
Hartigan, J. A. (1975). Clustering Algorithms, New York: Wiley.Google Scholar
Lance, G. N., Williams, W. T. (1967). Mixed data classificatory programs: I. Agglomerative systems. Australian Computer Journal, 1, 1520.Google Scholar
Lance, G. N., & Williams, W. T. (1968). Mixed data classificatory programs: II. Divisive systems. Australian Computer Journal, 1, 8285.Google Scholar
Littschwager, J. M., & Wong, C. (1978). Integer programming solution of a classification problem. Management Science, 24, 151165.CrossRefGoogle Scholar
McCutcheon, A. L. (1987). Latent class analysis, Beverly Hills: Sage.CrossRefGoogle Scholar
Milligan, G. W. (1980). An examination of the effect of six types of error perturbation of fifteen clustering algorithms. Psychometrika, 45, 325342.CrossRefGoogle Scholar
Nishisato, S. (1984). Forced classification: A simple application of a quantification method. Psychometrika, 49, 2536.CrossRefGoogle Scholar
Opitz, O. (1980). Numerische taxonomie, Stuttgart: Gustav Fisher.Google Scholar
Scheibler, D., & Schneider, W. (1985). Monte Carlo tests of the accuracy of cluster analysis algorithms. Multivariate Behavioral Research, 20, 283304.CrossRefGoogle ScholarPubMed
Späth, H. (1985). Cluster dissection and analysis, Chichester: Ellis Horwood.Google Scholar
van der Burg, E., & de Leeuw, J. (1983). Nonlinear canonical correlation. British Journal of Mathematical and Statistical Psychology, 36, 5480.CrossRefGoogle Scholar
van Rijckevorsel, J. L. A., & de Leeuw, J. (1988). Component and correspondence analysis, Chichester: Wiley.Google Scholar
Winsberg, S., & Ramsay, J. O. (1982). Monotone splines: A family of functions useful for data analysis. In Caussinus, H., Ettinger, P., & Tomassone, R. (Eds.), COMPSTAT 1982, Proceedings in Computational Statistics, Vienna: Physica Verlag.Google Scholar
Wishart, D. (1969). Mode analysis. In Cole, A. J. (Eds.), Numerical taxonomy, London: Academic Press.Google Scholar
Young, F. W. (1981). Quantitative analysis of qualitative data. Psychometrika, 46, 357388.CrossRefGoogle Scholar