Hostname: page-component-5f745c7db-6bmsf Total loading time: 0 Render date: 2025-01-06T06:23:38.648Z Has data issue: true hasContentIssue false

An Algorithm for Generating Artificial Test Clusters

Published online by Cambridge University Press:  01 January 2025

Glenn W. Milligan*
Affiliation:
Faculty of Management Sciences, The Ohio State University
*
Requests for reprints and program listings should be sent to Glenn W. Milligan, Faculty of Management Sciences, 301 Hagerty Hall, The Ohio State University, Columbus, OH 43210.

Abstract

An algorithm for generating artificial data sets which contain distinct nonoverlapping clusters is presented. The algorithm is useful for generating test data sets for Monte Carlo validation research conducted on clustering methods or statistics. The algorithm generates data sets which contain either 1, 2, 3, 4, or 5 clusters. By default, the data are embedded in either a 4, 6, or 8 dimensional space. Three different patterns for assigning the points to the clusters are provided. One pattern assigns the points equally to the clusters while the remaining two schemes produce clusters of unequal sizes. Finally, a number of methods for introducing error in the data have been incorporated in the algorithm.

Type
Computational Psychometrics
Copyright
Copyright © 1985 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bayne, C. K., Beauchamp, J. J., Begovich, C. L., & Kane, V. E. (1980). Monte Carlo comparisons of selected clustering procedures. Pattern Recognition, 12, 5162.CrossRefGoogle Scholar
Blashfield, R. K. (1976). Mixture model test of cluster analysis: Accuracy of four agglomerative hierarchical methods. Psychological Bulletin, 83, 377388.CrossRefGoogle Scholar
Blashfield, R. K., & Morey, L. C. (1980). A comparison of four clustering methods using MMPI Monte Carlo data. Applied Psychological Measurement, 4, 5764.CrossRefGoogle Scholar
Cormack, R. M. (1971). A review of classification. Journal of the Royal Statistical Society, 14, 279298.Google Scholar
Dubes, R., & Jain, A. K. (1979). Validity studies in clustering methodologies. Pattern Recognition, 11, 235254.CrossRefGoogle Scholar
Edelbrock, C. (1979). Comparing the accuracy of hierarchical grouping techniques: The problem of classifying everybody. Multivariate Behavioral Research, 14, 367384.CrossRefGoogle Scholar
Everitt, B. S. (1980). Cluster analysis 2nd ed., London: Halstead Press.Google Scholar
Hartigan, J. A. (1975). Clustering algorithms, New York: Wiley.Google Scholar
Kuiper, F. K., & Fisher, L. (1975). A Monte Carlo comparison of six clustering procedures. Biometrika, 31, 86101.Google Scholar
Milligan, G. W. (1980). An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika, 45, 325342.CrossRefGoogle Scholar
Milligan, G. W. (1981). A Monte Carlo study of thirty internal criterion measures for cluster analysis. Psychometrika, 46, 187199.CrossRefGoogle Scholar
Milligan, G. W. (1981). A review of Monte Carlo tests of cluster analysis. Multivariate Behavioral Research, 16, 379407.CrossRefGoogle ScholarPubMed
Milligan, G. W., & Cooper, M. C. (in press). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50.Google Scholar
Milligan, G. W., & Isaac, P. D. (1980). The validation of four ultrametric clustering algorithms. Pattern Recognition, 12, 4150.CrossRefGoogle Scholar
Milligan, G. W., & Mahajan, V. (1980). A note on procedures for testing the quality of a clustering of a set of objects. Decision Sciences, 11, 669677.CrossRefGoogle Scholar
Milligan, G. W., & Schilling, D. A. (in press). Asymptotic and Finite Sample Characteristics of Four External Criterion Measures. Multivariate Behavioral Research.Google Scholar
Milligan, G. W., Soon, S. C., & Sokol, L. M. (1983). The effect of cluster size, dimensionality, and the number of clusters on recovery of true cluster structure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5, 4047.CrossRefGoogle ScholarPubMed
Mojena, R. (1977). Hierarchical grouping methods and stopping rules: An evaluation. The Computer Journal, 20, 359363.CrossRefGoogle Scholar
Morey, L., & Agresti, A. (1984). The measurement of classification agreement: An adjustment to the Rand statistic for chance agreement. Educational and Psychological Measurement, 44, 3337.CrossRefGoogle Scholar