An Algorithm for Generating Artificial Test Clusters

Glenn W. Milligan

doi:10.1007/BF02294153

An Algorithm for Generating Artificial Test Clusters

Published online by Cambridge University Press: 01 January 2025

Glenn W. Milligan

Show author details

Glenn W. Milligan*: Affiliation:
Faculty of Management Sciences, The Ohio State University
*: Requests for reprints and program listings should be sent to Glenn W. Milligan, Faculty of Management Sciences, 301 Hagerty Hall, The Ohio State University, Columbus, OH 43210.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

An algorithm for generating artificial data sets which contain distinct nonoverlapping clusters is presented. The algorithm is useful for generating test data sets for Monte Carlo validation research conducted on clustering methods or statistics. The algorithm generates data sets which contain either 1, 2, 3, 4, or 5 clusters. By default, the data are embedded in either a 4, 6, or 8 dimensional space. Three different patterns for assigning the points to the clusters are provided. One pattern assigns the points equally to the clusters while the remaining two schemes produce clusters of unequal sizes. Finally, a number of methods for introducing error in the data have been incorporated in the algorithm.

Keywords

Classification Monte Carlo methods numerical taxonomy

Type: Computational Psychometrics
Information: Psychometrika , Volume 50 , Issue 1 , March 1985 , pp. 123 - 127

DOI: https://doi.org/10.1007/BF02294153 [Opens in a new window]
Copyright: Copyright © 1985 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bayne, C. K., Beauchamp, J. J., Begovich, C. L., & Kane, V. E. (1980). Monte Carlo comparisons of selected clustering procedures. Pattern Recognition, 12, 51–62.CrossRef Google Scholar

Blashfield, R. K. (1976). Mixture model test of cluster analysis: Accuracy of four agglomerative hierarchical methods. Psychological Bulletin, 83, 377–388.CrossRef Google Scholar

Blashfield, R. K., & Morey, L. C. (1980). A comparison of four clustering methods using MMPI Monte Carlo data. Applied Psychological Measurement, 4, 57–64.CrossRef Google Scholar

Cormack, R. M. (1971). A review of classification. Journal of the Royal Statistical Society, 14, 279–298.Google Scholar

Dubes, R., & Jain, A. K. (1979). Validity studies in clustering methodologies. Pattern Recognition, 11, 235–254.CrossRef Google Scholar

Edelbrock, C. (1979). Comparing the accuracy of hierarchical grouping techniques: The problem of classifying everybody. Multivariate Behavioral Research, 14, 367–384.CrossRef Google Scholar

Everitt, B. S. (1980). Cluster analysis 2nd ed., London: Halstead Press.Google Scholar

Hartigan, J. A. (1975). Clustering algorithms, New York: Wiley.Google Scholar

Kuiper, F. K., & Fisher, L. (1975). A Monte Carlo comparison of six clustering procedures. Biometrika, 31, 86–101.Google Scholar

Milligan, G. W. (1980). An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika, 45, 325–342.CrossRef Google Scholar

Milligan, G. W. (1981). A Monte Carlo study of thirty internal criterion measures for cluster analysis. Psychometrika, 46, 187–199.CrossRef Google Scholar

Milligan, G. W. (1981). A review of Monte Carlo tests of cluster analysis. Multivariate Behavioral Research, 16, 379–407.CrossRef Google Scholar PubMed

Milligan, G. W., & Cooper, M. C. (in press). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50.Google Scholar

Milligan, G. W., & Isaac, P. D. (1980). The validation of four ultrametric clustering algorithms. Pattern Recognition, 12, 41–50.CrossRef Google Scholar

Milligan, G. W., & Mahajan, V. (1980). A note on procedures for testing the quality of a clustering of a set of objects. Decision Sciences, 11, 669–677.CrossRef Google Scholar

Milligan, G. W., & Schilling, D. A. (in press). Asymptotic and Finite Sample Characteristics of Four External Criterion Measures. Multivariate Behavioral Research.Google Scholar

Milligan, G. W., Soon, S. C., & Sokol, L. M. (1983). The effect of cluster size, dimensionality, and the number of clusters on recovery of true cluster structure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5, 40–47.CrossRef Google Scholar PubMed

Mojena, R. (1977). Hierarchical grouping methods and stopping rules: An evaluation. The Computer Journal, 20, 359–363.CrossRef Google Scholar

Morey, L., & Agresti, A. (1984). The measurement of classification agreement: An adjustment to the Rand statistic for chance agreement. Educational and Psychological Measurement, 44, 33–37.CrossRef Google Scholar

Article contents

An Algorithm for Generating Artificial Test Clusters

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests