Hostname: page-component-745bb68f8f-g4j75 Total loading time: 0 Render date: 2025-01-07T18:50:21.244Z Has data issue: false hasContentIssue false

Identifying Variables Responsible for Data not Missing at Random

Published online by Cambridge University Press:  01 January 2025

Ke-Hai Yuan*
Affiliation:
University of Notre Dame
*
Requests for reprints should be sent to Ke-Hai Yuan, Department of Psychology, University of Notre Dame, Notre Dame, IN 46556, USA. E-mail: kyuan@nd.edu

Abstract

When data are not missing at random (NMAR), maximum likelihood (ML) procedure will not generate consistent parameter estimates unless the missing data mechanism is correctly modeled. Understanding NMAR mechanism in a data set would allow one to better use the ML methodology. A survey or questionnaire may contain many items; certain items may be responsible for NMAR values in other items. The paper develops statistical procedures to identify the responsible items. By comparing ML estimates (MLE), statistics are developed to test whether the MLEs are changed when excluding items. The items that cause a significant change of the MLEs are responsible for the NMAR mechanism. Normal distribution is used for obtaining the MLEs; a sandwich-type covariance matrix is used to account for distribution violations. The class of nonnormal distributions within which the procedure is valid is provided. Both saturated and structural models are considered. Effect sizes are also defined and studied. The results indicate that more missing data in a sample does not necessarily imply more significant test statistics due to smaller effect sizes. Knowing the true population means and covariances or the parameter values in structural equation models may not make things easier either.

Type
Theory and Methods
Copyright
Copyright © 2008 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

The research was supported by NSF grant DMS04-37167, the James McKeen Cattell Fund.

References

Allison, P.A. (2001). Missing data, Thousand Oaks: Sage.Google Scholar
Chen, H.Y., Little, R.J.A. (1999). A test of missing completely at random for generalized estimating equations with missing data. Biometrika, 86, 113.CrossRefGoogle Scholar
Cohen, J.J. (1977). Statistical power analysis for the behavioral sciences, (2nd ed.). Hillsdale: Erlbaum.Google Scholar
Collins, L.M., Schafer, J.L., Kam, C.K. (2001). A comparison of inclusive and restrictive strategies in modern missing-data procedures. Psychological Methods, 6, 330351.CrossRefGoogle ScholarPubMed
Curran, P.S., West, S.G., Finch, J.F. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1, 1629.CrossRefGoogle Scholar
Fears, T.R., Benichou, J., Gail, M.H. (1996). A reminder of the fallibility of the Wald statistic. American Statistician, 50, 226227.CrossRefGoogle Scholar
Graham, J.W. (2003). Adding missing-data-relevant variables to FIML-based structural equation models. Structural Equation Modeling, 10, 80100.CrossRefGoogle Scholar
Heagerty, P.J., Kurland, B.F. (2001). Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika, 88, 973985.CrossRefGoogle Scholar
Hu, L., Bentler, P.M., Kano, Y. (1992). Can test statistics in covariance structure analysis be trusted?. Psychological Bulletin, 112, 351362.CrossRefGoogle ScholarPubMed
Jamshidian, M., Bentler, P.M. (1999). Using complete data routines for ML estimation of mean and covariance structures with missing data. Journal Educational and Behavioral Statistics, 23, 2141.CrossRefGoogle Scholar
Jamshidian, M., Schott, J.R. (2007). Testing equality of covariance matrices when data are incomplete. Computational Statistics & Data Analysis, 51, 42274239.CrossRefGoogle Scholar
Kim, K.H., Bentler, P.M. (2002). Tests of homogeneity of means and covariance matrices for multivariate incomplete data. Psychometrika, 67, 609624.CrossRefGoogle Scholar
Laird, N.M. (1988). Missing data in longitudinal studies. Statistics in Medicine, 7, 305315.CrossRefGoogle ScholarPubMed
Lee, S.-Y. (1986). Estimation for structural equation models with missing data. Psychometrika, 51, 9399.CrossRefGoogle Scholar
Lee, S.-Y., Jennrich, R.I. (1979). A study of algorithms for covariance structure analysis with specific comparisons using factor analysis. Psychometrika, 44, 99113.CrossRefGoogle Scholar
Lee, S.-Y., Tang, N.-S. (2006). Bayesian analysis of nonlinear structural equation models with nonignorable missing data. Psychometrika, 71, 541564.CrossRefGoogle Scholar
Little, R.J.A. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association, 83, 11981202.CrossRefGoogle Scholar
Little, R.J.A. (1988). Robust estimation of the mean and covariance matrix from data with missing values. Applied Statistics, 37, 2338.CrossRefGoogle Scholar
Little, R.J. (1995). Modeling the drop-out mechanism in longitudinal studies. Journal of the American Statistical Association, 90, 11121121.CrossRefGoogle Scholar
Little, R.J.A., Rubin, D.B. (2002). Statistical analysis with missing data, (2nd ed.). New York: Wiley.CrossRefGoogle Scholar
Liu, C. (1997). ML estimation of the multivariate t distribution and the EM algorithms. Journal of Multivariate Analysis, 63, 296312.CrossRefGoogle Scholar
Louis, T.A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society B, 44, 226233.CrossRefGoogle Scholar
Magnus, J.R., Neudecker, H. (1999). Matrix differential calculus with applications in statistics and econometrics, New York: Wiley.Google Scholar
Muthén, B., Kaplan, D., Hollis, M. (1987). On structural equation modeling with data that are not missing completely at random. Psychometrika, 52, 431462.CrossRefGoogle Scholar
Orchard, T., Woodbury, M.A. (1972). A missing information principle: theory and applications. Proceedings of the sixth Berkeley symposium on mathematical statistics and probability (pp. 697715). Berkeley: University of California Press.Google Scholar
Park, T., Davis, C.S. (1993). A test of missing data mechanisms for repeated categorical data. Biometrics, 49, 631638.CrossRefGoogle ScholarPubMed
Park, T., Lee, S.-Y. (1997). A test of missing completely at random for longitudinal data with missing observations. Statistics in Medicine, 16, 18591871.3.0.CO;2-3>CrossRefGoogle ScholarPubMed
Park, T., Lee, S., Woolson, R.F. (1993). A test of the missing data mechanism for repeated measures data. Communications in Statistics—Theory and Methods, 22, 28132829.CrossRefGoogle Scholar
Poon, W.Y., Lee, S.Y., Tang, M.L. (1997). Analysis of structural equation models with censored data. British Journal of Mathematical and Statistical Psychology, 50, 227241.CrossRefGoogle Scholar
Rubin, D.B. (1976). Inference and missing data (with discussions). Biometrika, 63, 581592.CrossRefGoogle Scholar
Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys, New York: Wiley.CrossRefGoogle Scholar
Savalei, V., Bentler, P.M. (2005). A statistically justified pairwise ML method for incomplete nonnormal data: a comparison with direct ML and pairwise ADF. Structural Equation Modeling, 12, 183214.CrossRefGoogle Scholar
Savalei, V., & Bentler, P.M. (2007). A two-stage ML approach to missing data: Theory and application to auxiliary variables. UCLA Statistics Electronic Publications #511.Google Scholar
Schafer, J.L. (1997). Analysis of incomplete multivariate data, London: Chapman & Hall.CrossRefGoogle Scholar
Schafer, J.L., Graham, J.W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147177.CrossRefGoogle ScholarPubMed
Schafer, J.L., Olsen, M.K. (1998). Multiple imputation for multivariate missing-data problems: A data analyst’s perspective. Multivariate Behavioral Research, 33, 545571.CrossRefGoogle ScholarPubMed
Simes, R.J. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika, 73, 751754.CrossRefGoogle Scholar
Song, X.-Y., Lee, S.-Y. (2002). Analysis of structural equation model with ignorable missing continuous and polytomous data. Psychometrika, 67, 261288.CrossRefGoogle Scholar
Song, X.-Y., Lee, S.-Y. (2003). Full maximum likelihood estimation of polyphonic and polyserial correlations with missing data. Multivariate Behavioral Research, 38, 5779.CrossRefGoogle Scholar
Tang, M., Bentler, P.M. (1998). Theory and method for constrained estimation in structural equation models with incomplete data. Computational Statistics & Data Analysis, 27, 257270.CrossRefGoogle Scholar
Tang, M.-L., Lee, S.-Y. (1998). Analysis of structural equation model with non-ignorable missing data. Computational Statistics & Data Analysis, 27, 3346.CrossRefGoogle Scholar
Yuan, K.-H. (1997). A theorem on uniform convergence of stochastic functions with applications. Journal of Multivariate Analysis, 62, 100109.CrossRefGoogle Scholar
Yuan, K.-H. (2006). Normal distribution based pseudo ML for missing data: With applications to mean and covariance structure analysis. Journal of Multivariate Analysis. Under review.Google Scholar
Yuan, K.-H. (2008). Effect sizes for testing not missing at random mechanism. In Shigemasu, K. (Eds.), New trends in psychometrics, Tokyo: Universal Academy Press.Google Scholar
Yuan, K.-H., Bentler, P.M. (2000). Three likelihood-based methods for mean and covariance structure analysis with nonnormal missing data. In Sobel, M.E., Becker, M.P. (Eds.), Sociological methodology 2000 (pp. 167202). Oxford: Blackwell.Google Scholar