Hostname: page-component-5f745c7db-8qdnt Total loading time: 0 Render date: 2025-01-06T21:18:05.815Z Has data issue: true hasContentIssue false

Robust Machine Learning for Treatment Effects in Multilevel Observational Studies Under Cluster-level Unmeasured Confounding

Published online by Cambridge University Press:  01 January 2025

Youmi Suk*
Affiliation:
University of Virginia
Hyunseung Kang
Affiliation:
University of Wisconsin-Madison
*
Correspondence should be made to Youmi Suk, School of Data Science, University of Virginia, 31 Bonnycastle Dr, Charlottesville, VA 22903, USA. Email: eub6uw@virginia.edu

Abstract

Recently, machine learning (ML) methods have been used in causal inference to estimate treatment effects in order to reduce concerns for model mis-specification. However, many ML methods require that all confounders are measured to consistently estimate treatment effects. In this paper, we propose a family of ML methods that estimate treatment effects in the presence of cluster-level unmeasured confounders, a type of unmeasured confounders that are shared within each cluster and are common in multilevel observational studies. We show through simulation studies that our proposed methods are robust from biases from unmeasured cluster-level confounders in a variety of multilevel observational studies. We also examine the effect of taking an algebra course on math achievement scores from the Early Childhood Longitudinal Study, a multilevel observational educational study, using our methods. The proposed methods are available in the CURobustML R package.

Type
Theory and Methods
Copyright
Copyright © 2021 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Supplementary Information the online version supplementary material available at https://doi.org/10.1007/s11336-021-09805-x.

References

Arkhangelsky, D., & Imbens, G..(2019). The role of the propensity score in fixed effect models.arXiv. Retrieved from arxiv: 1807.02099.https://doi.org/10.3386/w24814. CrossRefGoogle Scholar
Arpino, B., & Cannas, M..(2016). Propensity score matching with clustered data. an application to the estimation of the impact of caesarean section on the apgar score. Statistics in Medicine, 35(12), 2074–2091. https://doi.org/10.1002/sim.6880 CrossRefGoogle Scholar
Arpino, B., & Mealli, F.. (2011). The specification of the propensity score in multilevel observational studies. Computational Statistics & Data Analysis, 55 (4), 17701780. CrossRefGoogle Scholar
Athey, S., & Imbens, G.. (2016). Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113 (27), 73537360. CrossRefGoogle ScholarPubMed
Austin, P.C.. (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46 399424. CrossRefGoogle ScholarPubMed
Bang, H., & Robins, JM.. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61 (4), 962973. CrossRefGoogle ScholarPubMed
Bates, D. .chler, M. . Bolker, B. . & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01.CrossRefGoogle Scholar
Carvalho, C., Feller, A., Murray, J., Woody, S., & Yeager, D.. (2019). Assessing treatment effect variation in observational studies: Results from a data challenge. Observational Studies, 5 2135. CrossRefGoogle Scholar
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J.. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21 (1), C1C68. CrossRefGoogle Scholar
Ding, P., Feller, A., & Miratrix, L.. (2019). Decomposing treatment effect variation. Journal of the American Statistical Association, 114 (525), 304317. CrossRefGoogle Scholar
Dorie, V. & Hill, J., (2019). bartcause: Causal inference using bayesian additive regression trees [Computer software manual]. Retrieved from https://github.com/vdorie/bartCause (R package version 1.0-0)Google Scholar
Dorie, V. Hill, J. Shalit, U. Scott, M. & Cervone, D., (2019, 02). Automated versus do-it-yourself methods for causal inference Lessons: Learned from a data analysis competition. Statistical Science, 34(1), 43–68. https://doi.org/10.1214/18-STS667 CrossRefGoogle Scholar
Evdokimov, K., (2010). Identification and estimation of a nonparametric panel data model with unobserved heterogeneity. Working paper, Princeton University.Google Scholar
Firebaugh, G. Warner, C. & Massoglia, M., (2013). Fixed effects, random effects, and hybrid models for causal analysis. In S. L. Morgan (Ed.), Handbook of causal analysis for social research (pp. 113–132). Springer. https://doi.org/10.1007/978-94-007-6094-3_7.CrossRefGoogle Scholar
Glynn, A.N., & Quinn, K.M.. (2010). An introduction to the augmented inverse propensity weighted estimator. Political Analysis, 18 (1), 3656. CrossRefGoogle Scholar
Gruber, S. & van der Laan, M. J., (2012). tmle: An R package for targeted maximum likelihood estimation. Journal of Statistical Software, 51(13), 1–35. Retrieved from http://www.jstatsoft.org/v51/i13/. https://doi.org/10.18637/jss.v051.i13 CrossRefGoogle Scholar
Hansen, L.P., (1982). Large sample properties of generalized method of moments estimators. Econometrica: Journal of the Econometric Society, pp. 1029–1054. https://doi.org/10.2307/1912775 CrossRefGoogle Scholar
He, Z., (2018). Inverse conditional probability weighting with clustered data in causal inference. arXiv. Retrieved from arxiv https://arxiv.org/abs/1808.01647 Google Scholar
Henderson, D.J., Carroll, R.J., & Li, Q.. (2008). Nonparametric estimation and testing of fixed effects panel data models. Journal of Econometrics, 144 (1), 257275. CrossRefGoogle ScholarPubMed
Hernan, MA., & Robins, JM.. (2020). Causal inference: What if, Boca Raton Chapman & HallCRC Google Scholar
Hill, J.L.. (2011). Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20 (1), 217240. CrossRefGoogle Scholar
Holloway, J.H.. (2004). Closing the minority achievement gap in math. Educational Leadership, 61 (5), 84Google Scholar
Hong, G., & Hong, Y.. (2009). Reading instruction time and homogeneous grouping in kindergarten: An application of marginal mean weighting through stratification. Educational Evaluation and Policy Analysis, 31 (1), 5481. CrossRefGoogle Scholar
Hong, G., & Raudenbush, S.W.. (2006). Evaluating kindergarten retention policy: A case study of causal inference for multilevel observational data. Journal of the American Statistical Association, 101 901910. CrossRefGoogle Scholar
Hong, G. . & Raudenbush, S.W., (2013). Heterogeneous agents, social interactions, and causal inference. In S. L. Morgan (Ed.), Handbook of causal analysis for social research (pp. 331–352). Springer. https://doi.org/10.1007/978--94--007--6094--3_16 CrossRefGoogle Scholar
Imai, K., & Kim, I.S.. (2019). When should we use unit fixed effects regression models for causal inference with longitudinal data?. American Journal of Political Science, 63 (2), 467490. CrossRefGoogle Scholar
Kim, J-.S., & Frees, E.W.. (2006). Omitted variables in multilevel models. Psychometrika, 71 (4), 659CrossRefGoogle Scholar
Kim, J-.S., & Frees, E.W.. (2007). Multilevel modeling with correlated effects. Psychometrika, 72 (4), 505533. CrossRefGoogle Scholar
Künzel, S.R., Sekhon, J.S., Bickel, P.J., & Yu, B.. (2019). Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the National Academy of Sciences, 116 (10), 41564165. CrossRefGoogle ScholarPubMed
LeDell, E. Gill, N. Aiello, S. Fu, A. Candel, A. Click, C. . . Malohlava, M., (2020). h2o: R interface for the ’h2o’ scalable machine learning platform [Computer software manual]. Retrieved from https://github.com/h2oai/h2o-3 (R package version 3.30.1.1)Google Scholar
Lee, Y. Nguyen, T.Q.,& Stuart, E.A.,(2019). Partially pooled propensity score models for average treatment effect estimation with multilevel data. arXiv Retrieved from arxiv:1910.05600 Google Scholar
Li, F., Zaslavsky, A.M., & Landrum, M.B.. (2013). Propensity score weighting with multilevel data. Statistics in Medicine, 32 (19), 33733387. CrossRefGoogle ScholarPubMed
Li, Y., Lee, Y., Port, F.K., & Robinson, B.M.. (2020). The impact of unmeasured within-and between-cluster confounding on the bias of effect estimators of a continuous exposure. Statistical Methods in Medical Research, 29 (8), 21192139. CrossRefGoogle ScholarPubMed
Lin, Z., Li, Q., & Sun, Y.. (2014). A consistent nonparametric test of parametric regression functional form in fixed effects panel data models. Journal of Econometrics, 178 167179. CrossRefGoogle Scholar
McCaffrey, D.F., Ridgeway, G., & Morral, A.R.. (2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological Methods, 9 (4), 403425. CrossRefGoogle ScholarPubMed
Meyers, J.L., & Beretvas, S.N.. (2006). The impact of inappropriate modeling of cross-classified data structures. Multivariate Behavioral Research, 41 (4), 473497. CrossRefGoogle ScholarPubMed
Mullen, K. M.,& van Stokkum, I. H. M.,(2012). nnls: The lawson-hanson algorithm for non-negative least squares (nnls) [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=nnls (R package version 1.4)Google Scholar
Neyman, J.S.. (1923). On the application of probability theory to agricultural experiments: Essay on principles. section 9 (with discussion). Statistical Science, 4 465480. Google Scholar
Noguera, P.A., & Wing, J.Y.,(2008) Unfinished business: Closing the racial achievement gap in our schools. Wiley.Google Scholar
Polley, E., C., & van der Laan, M., J. (2010). Super learner in prediction. U.C. Berkeley Division of Biostatistics Working Paper Series. Paper 226. https://doi.org/10.1007/978-1-4419-9782-1_3 CrossRefGoogle Scholar
Core Team, R.. (2020). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria., Retrieved from https://www.R-project.org/ Google Scholar
Raudenbush, S.W., & Bryk, A.S.,(2002). Hierarchical linear models: Applications and data analysis methods (Vol. 1). Sage.Google Scholar
Rickles, J.H.. (2013). Examining heterogeneity in the effect of taking algebra in eighth grade. The Journal of Educational Research, 106 (4), 251268. CrossRefGoogle Scholar
Rosenbaum, P.R., & Rubin, D.B.. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70 4155. CrossRefGoogle Scholar
Rubin, D.B.. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66 (5), 688701. CrossRefGoogle Scholar
Rubin, D.B.. (1986). Comment: Which ifs have causal answers. Journal of the American Statistical Association, 81 (396), 961962. Google Scholar
Rubin, D.B.. (2001). Using propensity scores to help design observational studies: Application to the tobacco litigation. Health Services and Outcomes Research Methodology, 2 (3–4), 169188. CrossRefGoogle Scholar
Schafer, J.L., & Kang, J.. (2008). Average causal effects from nonrandomized studies: A practical guide and simulated example. Psychological Methods, 13 (4), 279313. CrossRefGoogle ScholarPubMed
Semenova, V. & Chernozhukov, V., (2020). Estimation and inference about conditional average treatment effect and other structural functions. arXiv Retrieved from arxiv. http://arxiv.org/abs/1702.06240 Google Scholar
Shadish, W.R., Clark, M.H., & Steiner, P.M.. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random and nonrandom assignments. Journal of the American Statistical Association, 103 (484), 13341344. CrossRefGoogle Scholar
Steiner, P., M., & Cook, D., (2013). Matching and propensity scores. In T. Little (Ed.), The oxford handbook of quantitative methods (p. 236–258). New York, NY: Oxford University Press. CrossRefGoogle Scholar
Steiner, P.M., Cook, T.D., Shadish, W.R., & Clark, M.H.. (2010). The importance of covariate selection in controlling for selection bias in observational studies. Psychological Methods, 15 (3), 250CrossRefGoogle ScholarPubMed
Su, X., Tsai, C-.L., Wang, H., Nickerson, D.M., & Li, B.. (2009). Subgroup analysis via recursive partitioning. Journal of Machine Learning Research, 10 (2), 141158. Google Scholar
Suk, Y., Kang, H., & Kim, J-.S.. (2020). Random forests approach for causal inference with clustered observational data. Multivariate Behavioral Research, Google ScholarPubMed
Sun, Y. Carroll, R. J., & Li, D., (2009). Semiparametric estimation of fixed-effects panel data varying coefficient models. In Q. Li & J. S. Racine (Eds.), Nonparametric econometric methods (pp. 101–130). Emerald Group Publishing Limited. https://doi.org/10.1108/S0731-9053(2009)0000025006 CrossRefGoogle Scholar
van Buuren, S. &Groothuis-Oudshoorn, K.,(2011). mice: Multivariate imputation by chained equations in R.Journal of Statistical Software, 45(3), 1–67.CrossRefGoogle Scholar
van der Laan, M. J.,Polley, E. C.,& Hubbard, A. E..(2007). Super learner. Statistical Applications in Genetics and Molecular Biology, 6(1).CrossRefGoogle Scholar
van der Laan, M.J.,& Rose, S., (2011). Targeted learning: Causal inference for observational and experimental data. Springer.CrossRefGoogle Scholar
Wager, S., & Athey, S.. 2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113 (523), 12281242. CrossRefGoogle Scholar
Walston, J. & McCarroll, J. C.,(2010). Eighth-grade algebra: Findings from the eighth-grade round of the early childhood longitudinal study, kindergarten class of 1998–99 (ECLS-K). statistics in brief. nces 2010–016. National Center for Education Statistics.Google Scholar
Wenglinsky, H., (2004). Closing the racial achievement gap: The role of reforming instructional practices. Education Policy Analysis Archives, 12, 64. https://doi.org/10.14507/epaa.v12n64.2004.CrossRefGoogle Scholar
Westreich, D., Lessler, J., & Funk, M.J.. 2010). Propensity score estimation: neural networks, support vector machines, decision trees (cart), and meta-classifiers as alternatives to logistic regression. Journal of Clinical Epidemiology, 63 (8), 826833. CrossRefGoogle ScholarPubMed
White, I.R., Royston, P., & Wood, A.M.. 2011). Multiple imputation using chained equations: Issues and guidance for practice. Statistics in Medicine, 30 (4), 377399. CrossRefGoogle ScholarPubMed
Wooldridge, J.M.,(2010). Econometric analysis of cross section and panel data. The MIT press.Google Scholar
Wooldridge, J. M.,(2012). Introductory econometrics: A modern approach. South-Western Cengage Learning.Google Scholar
Yang, S., (2018). Propensity score weighting for causal inference with clustered data. Journal of Causal Inference, 6(2). https://doi.org/10.1515/jci-2017-0027.CrossRefGoogle Scholar
Zetterqvist, J., & Sjölander, A.. (2015). Doubly robust estimation with the R package drgee. Epidemiologic Methods, 4 (1), 6986. CrossRefGoogle Scholar
Zetterqvist, J., Vansteelandt, S., Pawitan, Y., & Sjölander, A.. (2016). Doubly robust methods for handling confounding by cluster. Biostatistics, 17 (2), 264276. CrossRefGoogle ScholarPubMed
Supplementary material: File

Suk and Kang supplementary material

Suk and Kang supplementary material 1
Download Suk and Kang supplementary material(File)
File 21.3 KB
Supplementary material: File

Suk and Kang supplementary material

Suk and Kang supplementary material 2
Download Suk and Kang supplementary material(File)
File 6.2 KB
Supplementary material: File

Suk and Kang supplementary material

Suk and Kang supplementary material 3
Download Suk and Kang supplementary material(File)
File 38.6 KB
Supplementary material: File

Suk and Kang supplementary material

Suk and Kang supplementary material 4
Download Suk and Kang supplementary material(File)
File 456.8 KB
Supplementary material: File

Suk and Kang supplementary material

Suk and Kang supplementary material 5
Download Suk and Kang supplementary material(File)
File 5.9 KB