Hostname: page-component-745bb68f8f-b6zl4 Total loading time: 0 Render date: 2025-01-07T18:05:12.328Z Has data issue: false hasContentIssue false

Using Multiple Imputation with GEE with Non-monotone Missing Longitudinal Binary Outcomes

Published online by Cambridge University Press:  01 January 2025

Stuart R. Lipsitz*
Affiliation:
Brigham and Women’s Hospital and Ariadne Labs
Garrett M. Fitzmaurice
Affiliation:
McLean Hospital
Roger D. Weiss
Affiliation:
McLean Hospital
*
Correspondence should be made to Stuart R. Lipsitz, Division of General Internal Medicine, Brigham and Women’s Hospital and Ariadne Labs, 1620 Tremont St. 3rd Floor, BC3 002D, Boston, MA02120-1613, USA. Email: slipsitz@bwh.harvard.edu

Abstract

This paper considers multiple imputation (MI) approaches for handling non-monotone missing longitudinal binary responses when estimating parameters of a marginal model using generalized estimating equations (GEE). GEE has been shown to yield consistent estimates of the regression parameters for a marginal model when data are missing completely at random (MCAR). However, when data are missing at random (MAR), the GEE estimates may not be consistent; the MI approaches proposed in this paper minimize bias under MAR. The first MI approach proposed is based on a multivariate normal distribution, but with the addition of pairwise products among the binary outcomes to the multivariate normal vector. Even though the multivariate normal does not impute 0 or 1 values for the missing binary responses, as discussed by Horton et al. (Am Stat 57:229–232, 2003), we suggest not rounding when filling in the missing binary data because it could increase bias. The second MI approach considered is the fully conditional specification (FCS) approach. In this approach, we specify a logistic regression model for each outcome given the outcomes at other time points and the covariates. Typically, one would only include main effects of the outcome at the other times as predictors in the FCS approach, but we explore if bias can be reduced by also including pairwise interactions of the outcomes at other time point in the FCS. In a study of asymptotic bias with non-monotone missing data, the proposed MI approaches are also compared to GEE without imputation. Finally, the proposed methods are illustrated using data from a longitudinal clinical trial comparing four psychosocial treatments from the National Institute on Drug Abuse Collaborative Cocaine Treatment Study, where patients’ cocaine use is collected monthly for 6 months during treatment.

Type
Theory and Methods
Copyright
Copyright © 2020 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Electronic Supplementary material The online version contains supplementary material available at https://doi.org/10.1007/s11336-020-09729-y.

References

Bahadur, R. R. A representation of the joint distribution of responses to n dichotomous items. Solomon, H. (1961). Studies in item analysis and prediction, Stanford:Stanford University Press 158–68.Google Scholar
Barnard, J., & Meng, X. L. (1999). Applications of multiple imputation in medical studies: From AIDS to NHANES. Statistical Methods in Medical Research, 8, 1736.CrossRefGoogle ScholarPubMed
Beunckens, C., Sotto, C., & Molenberghs, G. (2008). A simulation study comparing weighted estimating equations with multiple imputation based estimating equations for longitudinal binary data. Computational Statistics & Data Analysis, 52, 15331548.CrossRefGoogle Scholar
Carey, V. J., Lumley, T., & Ripley, B. D. (2012). gee: Generalized estimation equation solver. http://CRAN.R-project.org/package=gee. R package version 4.13-18.Google Scholar
Carey, V., Zeger, S. L., & Diggle, P. J. (1993). Modelling multivariate binary data with alternating logistic regressions. Biometrika, 80, 517526.CrossRefGoogle Scholar
Carpenter, J. R., & Kenward, M. G. (2013). Multiple imputation and its application, New York:Wiley.CrossRefGoogle Scholar
Crits-Christoph, P. (1999). Psychosocial treatments for cocaine dependence: National institute on drug abuse collaborative cocaine treatment study. Archives of General Psychiatry, 56, 493502.CrossRefGoogle ScholarPubMed
Enders, C. K. (2010). Applied missing data analysis, New York:The Guilford Press.Google Scholar
Gilks, W. R., Richardson, S., & Spiegelhalter, DJE (1996). Markov Chain Monte Carlo in practice, New York:Chapman & Hall.Google Scholar
Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8, 206213.CrossRefGoogle ScholarPubMed
Horton, N. J., & Lipsitz, S. R. (2001). Multiple imputation in practice: Comparison of software packages for regression models with missing variables. American Statistician, 55, 244254.CrossRefGoogle Scholar
Horton, N. J., Parzen, M., & Lipsitz, S. R. (2003). A potential for bias when rounding in multiple imputation. The American Statistician, 57, 229232.CrossRefGoogle Scholar
Johnson, R. A., & Wichern, D. W. (2002). Applied multivariate statistical analysis, Upper Saddle River, NJ:Prentice Hall.Google Scholar
Laird, N. M. (1988). Missing data in longitudinal studies. Statistics in Medicine, 7, 305315.CrossRefGoogle ScholarPubMed
Liang, K. Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 1322.CrossRefGoogle Scholar
Lipsitz, S. R., Fitzmaurice, G. M., Orav, E. J., & Laird, N. M. (1994). Performance of generalized estimating equations in practical situations. Biometrics, 50, 270278.CrossRefGoogle ScholarPubMed
Lipsitz, S. R., Laird, N. M., & Harrington, D. P. (1992). A three-stage estimator for studies with repeated and possibly missing binary outcomes. Applied Statistics, 41, 203213.CrossRefGoogle Scholar
Lipsitz, S. R., Molenberghs, G., Fitzmaurice, G. M., & Ibrahim, J. (2000). GEE with Gaussian estimation of the correlations when data are incomplete. Biometrics, 56, 528536.CrossRefGoogle ScholarPubMed
Little, RJA, & Rubin, D. B. (2002). MStatistical analysis with missing data, 2New York:Wiley.CrossRefGoogle Scholar
Liu, M., Taylor, J. M., & Belin, T. R. (2000). Multiple imputation and posterior simulation for multivariate missing data in longitudinal studies. Biometrics, 56, 11571163.CrossRefGoogle ScholarPubMed
Paik, M. (1997). The generalized estimating equation approach when data are not missing completely at random. Journal of the American Statistical Association, 92, 13201329.CrossRefGoogle Scholar
Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association, 90, 106121.CrossRefGoogle Scholar
Rotnitzky, A., & Wypij, D. (1994). A note on the bias of estimators with missing data. Biometrics, 50, 11631170.CrossRefGoogle ScholarPubMed
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581592.CrossRefGoogle Scholar
Rubin, D. B. (1978). Multiple imputations in sample surveys—A phenominological bayesian approach to nonresponse. In Proceedings of the International Statistical Institute, Manila (pp. 517–532).Google Scholar
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys, New York:Wiley.CrossRefGoogle Scholar
Rubin, D. B., & Schenker, N. (1986). Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. JASA, 81, 366374.CrossRefGoogle Scholar
SAS Institute Inc. (2020). SAS/STAT Software, Version 9.4. Cary, NC. http://www.sas.com/.Google Scholar
Schafer, J. L. (1997). Analysis of incomplete multivariate data, London:Chapman & Hall Ltd.CrossRefGoogle Scholar
Schafer, J. L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8, 315.CrossRefGoogle ScholarPubMed
Scheuren, F. (2005). Multiple imputation: How it began and continues. The American Statistician, 59, 315319.CrossRefGoogle Scholar
Tchetgen, E., Wang, L. & Sun, B. (2017). Discrete choice models for nonmonotone nonignorable missing data: Identification and inference. Unpublished Manuscript. Archived as arXiv:1607.02631v3 [stat.ME].Google Scholar
Van Buuren, S. (2007). Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research, 16, 219242.CrossRefGoogle ScholarPubMed
Van Buuren, S. (2012). Flexible imputation of missing data, Boca Raton, FL:Chapman & Hall/CRC.CrossRefGoogle Scholar
Van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45, 168.Google Scholar
Zellner, A., & Rossi, P. E. (1984). Bayesian analysis of dichotomous quantal response models. Journal of Econometrics, 25, 365393.CrossRefGoogle Scholar
Supplementary material: File

Lipsitz et al. supplementary material

Lipsitz et al. supplementary material
Download Lipsitz et al. supplementary material(File)
File 40.2 KB