Published online by Cambridge University Press: 01 January 2025
When data are not missing at random (NMAR), maximum likelihood (ML) procedure will not generate consistent parameter estimates unless the missing data mechanism is correctly modeled. Understanding NMAR mechanism in a data set would allow one to better use the ML methodology. A survey or questionnaire may contain many items; certain items may be responsible for NMAR values in other items. The paper develops statistical procedures to identify the responsible items. By comparing ML estimates (MLE), statistics are developed to test whether the MLEs are changed when excluding items. The items that cause a significant change of the MLEs are responsible for the NMAR mechanism. Normal distribution is used for obtaining the MLEs; a sandwich-type covariance matrix is used to account for distribution violations. The class of nonnormal distributions within which the procedure is valid is provided. Both saturated and structural models are considered. Effect sizes are also defined and studied. The results indicate that more missing data in a sample does not necessarily imply more significant test statistics due to smaller effect sizes. Knowing the true population means and covariances or the parameter values in structural equation models may not make things easier either.
The research was supported by NSF grant DMS04-37167, the James McKeen Cattell Fund.