Imputation of Missing Categorical Data by Maximizing Internal Consistency

Stef van Buuren; Jan L. A. van Rijckevorsel

doi:10.1007/BF02294420

Imputation of Missing Categorical Data by Maximizing Internal Consistency

Published online by Cambridge University Press: 01 January 2025

Stef van Buuren and

Jan L. A. van Rijckevorsel

Show author details

Stef van Buuren*: Affiliation:
Department of Statistics, TNO Institute of Preventive Health Care, Leiden
Jan L. A. van Rijckevorsel: Affiliation:
Department of Statistics, TNO Institute of Preventive Health Care, Leiden
*: Requests for reprints should be sent to Stef van Buuren, TNO Institute of Preventive Health Care, PO Box 124, 2300 AC Leiden, THE NETHERLANDS. Email: buuren@nipg.tno.nl.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This paper suggests a method to supplant missing categorical data by “reasonable” replacements. These replacements will maximize the consistency of the completed data as measured by Guttman's squared correlation ratio. The text outlines a solution of the optimization problem, describes relationships with the relevant psychometric theory, and studies some properties of the method in detail. The main result is that the average correlation should be at least 0.50 before the method becomes practical. At that point, the technique gives reasonable results up to 10–15% missing data.

Keywords

missing data correlation ratio optimal scaling

Type: Original Paper
Information: Psychometrika , Volume 57 , Issue 4 , December 1992 , pp. 567 - 580

DOI: https://doi.org/10.1007/BF02294420 [Opens in a new window]
Copyright: Copyright © 1992 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Dear, R. E. (1959). A principal component missing data method for multiple regression models (SP-86), Santa Monica, CA: System Development Corporation.Google Scholar

Fisher, W. D. (1958). On grouping for maximum homogeneity. Journal of the American Statistical Association, 53, 789–798.CrossRef Google Scholar

Gifi, A. (1990). Nonlinear multivariate analysis, Chichester: Wiley.Google Scholar

Gleason, T. C., Staelin, R. (1975). A proposal for handling missing data. Psychometrika, 40, 229–252.CrossRef Google Scholar

Greenacre, M. J. (1984). Theory and applications of correspondence analysis, New York: Academic Press.Google Scholar

Guttman, L. et al. (1941). The quantification of a class of attributes: A theory and method of scale construction. In Horst, P. et al. (Eds.), The prediction of personal adjustment (pp. 319–348). New York: Social Science Research Council.Google Scholar

Hartigan, J. A. (1975). Clustering algorithms, New York: Wiley.Google Scholar

Hartley, H. O., Hocking, R. R. (1971). The analysis of incomplete data. Biometrics, 27, 783–808.CrossRef Google Scholar

Kalton, G., Kasprzyk, D. (1982). Imputing for missing survey responses. Proceedings of the Section of Survey Research Methods, 1982 (pp. 22–23). Alexander, VA: American Statistical Association.Google Scholar

Little, R. J. A., Rubin, D. B. (1990). The analysis of social science data with missing values. In Fox, J., Scott Long, T. (Eds.), Modern methods of data analysis (pp. 374–409). London: Sage.Google Scholar

Madow, W. G., Olkin, I., Rubin, D. B. (1983). Incomplete data in sample surveys, New York: Academic Press.Google Scholar

Meulman, J. (1982). Homogeneity analysis of incomplete data, Leiden: DSWO Press.Google Scholar

Milligan, G. W. (1980). An examination of the effect of six types of error perturbation of fifteen clustering algorithms. Psychometrika, 45, 325–342.CrossRef Google Scholar

Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications, Toronto: University of Toronto Press.CrossRef Google Scholar

Nishisato, S., & Ahn, H. (in press). When not to analyze data: Decision making on missing responses in dual scaling. Annals of Operations Research.Google Scholar

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys, New York: Wiley.CrossRef Google Scholar

Rubin, D. B. (1991). EM and beyond. Psychometrika, 56, 241–254.CrossRef Google Scholar

Scheibler, D., Schneider, W. (1985). Monte Carlo tests of the accuracy of cluster analysis algorithms. Multivariate Behavioral Research, 20, 283–304.CrossRef Google Scholar PubMed

Späth, H. (1985). Cluster dissection and analysis, Chichester: Ellis Horwood.Google Scholar

Tanner, M. A., Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82, 528–550.CrossRef Google Scholar

van Buuren, S., Heiser, W. J. (1989). Clustering n objects into k groups under optimal scaling of variables. Psychometrika, 54, 699–706.CrossRef Google Scholar

van Buuren, S., van Rijckevorsel, J. L. A. (1992). Data augmentation and optimal scaling. In Steyer, R., Wender, K. F., Widaman, K. F. (Eds.), Psychometric Methodology. Proceedings of the 7th European Meeting of the Psychometric Society in Trier (pp. 80–84). Stuttgart and New York: Gustav Fischer Verlag.Google Scholar

van der Heijden, P. G. M., & Escofier, B. (1989). Multiple correspondence analysis with missing data. Unpublished manuscript, University of Leiden, Department of Psychometrics and Research Methods.Google Scholar

van Rijckevorsel, J. L. A., & de Leeuw, J. (1992). Some results about the importance of knot selection in nonlinear multivariate analysis. Statistica Applicata: Italian Journal of Applied Statistics, 4.Google Scholar

Article contents

Imputation of Missing Categorical Data by Maximizing Internal Consistency

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests