Hostname: page-component-745bb68f8f-b6zl4 Total loading time: 0 Render date: 2025-01-14T20:11:57.103Z Has data issue: false hasContentIssue false

Sparse Estimation and Uncertainty with Application to Subgroup Analysis

Published online by Cambridge University Press:  22 February 2017

Marc Ratkovic*
Affiliation:
Assistant Professor, Department of Politics, Princeton University, Princeton NJ 08544, USA. Email: ratkovic@princeton.edu, http://www.princeton.edu/∼ratkovic
Dustin Tingley
Affiliation:
Professor of Government, Harvard University, USA. Email: dtingley@gov.harvard.edu, http://scholar.harvard.edu/dtingley

Abstract

We introduce a Bayesian method, LASSOplus, that unifies recent contributions in the sparse modeling literatures, while substantially extending pre-existing estimators in terms of both performance and flexibility. Unlike existing Bayesian variable selection methods, LASSOplus both selects and estimates effects while returning estimated confidence intervals for discovered effects. Furthermore, we show how LASSOplus easily extends to modeling repeated observations and permits a simple Bonferroni correction to control coverage on confidence intervals among discovered effects. We situate LASSOplus in the literature on how to estimate subgroup effects, a topic that often leads to a proliferation of estimation parameters. We also offer a simple preprocessing step that draws on recent theoretical work to estimate higher-order effects that can be interpreted independently of their lower-order terms. A simulation study illustrates the method’s performance relative to several existing variable selection methods. In addition, we apply LASSOplus to an existing study on public support for climate treaties to illustrate the method’s ability to discover substantive and relevant effects. Software implementing the method is publicly available in the R package sparsereg.

Type
Articles
Copyright
Copyright © The Author(s) 2017. Published by Cambridge University Press on behalf of the Society for Political Methodology. 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Authors’ note: We are grateful to Neal Beck, Scott de Marchi, In Song Kim, John Londregan, Luke Miratrix, Michael Peress, Jasjeet Sekhon, Yuki Shiraito, Brandon Stewart, and Susan Athey for helpful comments on an earlier draft. Earlier versions presented at the 2015 Summer Methods Meeting, Harvard IQSS Applied Statistics Workshop, Princeton Political Methodology Colloquium, DARPA/ISAT Conference “What If? Machine Learning for Causal Inference,” and EITM 2016. We are also grateful to two anonymous reviewers for detailed feedback on an earlier version. All mistakes are because of the authors. Replication data is available at Ratkovic and Tingley 2016.

Contributing Editor: R. Michael Alvarez

References

Albert, James H., and Chib, Siddhartha. 1993. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88:669679.Google Scholar
Alhamzawi, Rahim, Yu, Keming, and Benoit, Dries F.. 2012. Bayesian adaptive Lasso quantile regression. Statistical Modelling 12(3):279297.Google Scholar
Armagan, Artin, Dunson, David B., and Lee, Jaeyong. 2013. Generalized double pareto shrinkage. Statistica Sinica 23:119143.Google Scholar
Bechtel, Michael M., and Scheve, Kenneth F.. 2013. Mass support for global climate agreements depends on institutional design. Proceedings of the National Academy of Sciences 110(34):1376313768.Google Scholar
Belloni, A., Chen, D., Chernozhukov, V., and Hansen, C.. 2012. Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80(6):23692429, doi:10.3982/ECTA9626.Google Scholar
Belloni, Alexandre, and Chernozhukov, Victor. 2013. Least squares after model selection in high-dimensional sparse models. Bernoulli 19(2):521547.Google Scholar
Belloni, Alexandre, Chernozhukov, Victor, and Hansen, Christian. 2011. Inference for high-dimensional sparse econometric models. CeMMAP Working Papers CWP41/11 Centre for Microdata Methods and Practice, Institute for Fiscal Studies.Google Scholar
Benjamini, Yoav, and Yekutieli, Daniel. 2005. False discovery rate-adjusted multiple confidence intervals for selected parameters. Journal of the American Statistical Association 100(469):7193.Google Scholar
Berger, J. O., and Bernardo, J. M.. 1989. Estimating a product of means: Bayesian analysis with reference priors. Journal of American Statistical Association 84:200207.Google Scholar
Berger, James O. 2006. The case for objective Bayesian analysis. Bayesian Analysis 1(3):385402.Google Scholar
Berger, James O., Bernardo, Jose M., and Sun, Dongchu. 2009. The formal definition of reference priors. The Annals of Statistics 37(2):905938.Google Scholar
Berger, James O., Wang, Xiaojing, and Shen, Lei. 2015. A Bayesian approach to subgroup identification. Journal of Biopharmaceutical Statistics 24(1):110129.Google Scholar
Berk, Richard, Brown, Lawrence, Buja, Andreas, Zhang, Kai, and Zhao, Linda. 2013. Valid post-selection inference. Annals of Statistics 41(2):802837.Google Scholar
Bernardo, J. M. 1979. Reference posterior distributions for Bayesian inference. Journal of the Royal Statistical Society Series B 41:113147.Google Scholar
Bernardo, Jose M. 2005. Reference analysis. ed. Dey, D. K. and Rao, C. R.. Handbook of statistics . Elsevier.Google Scholar
Berry, Donald. 1990. Subgroup analysis. Biometrics 46(4):12271230.Google Scholar
Bhadra, Anindya, Datta, Jyotishka, Polson, Nicholas G., and Willard, Brandon. 2015. The Horseshoe $+$ estimator of ultra-sparse signals. Working Paper.Google Scholar
Bhattacharya, Anirban, Pati, Debdeep, Pillai, Natesh S., and Dunson, David B.. 2015. Dirichlet-laplace priors for optimal shrinkage. Journal of the Americal Statistical Association 110:14791490.Google Scholar
Bickel, Peter, Ritov, Ya’acov, and Tsybakov, Alexandre. 2009. Simultaneous analysis of Lasso and Dantzig selector. Annals of Statistics 37(4):17051732.Google Scholar
Buhlmann, Peter, and van de Geer, Sara. 2013. Statistics for high-dimensional data . Berlin: Springer.Google Scholar
Candes, E., and Tao, T.. 2007. The Dantzig selector: Statistical estimation when p is much larger than n (with discussion). Annals of Statistics 35:23132404.Google Scholar
Candes, Emmanuel J. 2006. Modern statistical estimation via oracle inequalities. Acta Numerica 15:169.Google Scholar
Carvalho, C., Polson, N., and Scott, J.. 2010. The Horseshoe estimator for sparse signals. Biometrika 97:465480.Google Scholar
Chatterjee, A., and Lahiri, S. N.. 2011. Bootstrapping lasso estimators. Journal of the American Statistical Association 106(494):608625.Google Scholar
Chatterjee, Sourav. 2014. Assumptionless consistency of the LASSO. arXiv:1303.5817v5.Google Scholar
Chernozhukov, Victor, Fernández-Val, Iván, and Melly, Blaise. 2013. Inference on counterfactual distributions. Econometrica 81(6):22052268.Google Scholar
Datta, Jyotishka, and Ghosh, Jayanta K.. 2013. Asymptotic properties of bayes risk for the Horseshoe prior. Bayesian Analysis 8(1):111132.Google Scholar
Donoho, David L., and Johnstone, Iain M.. 1994. Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3):425455.Google Scholar
Efron, Bradley. 2015. Frequentist accuracy of Bayesian estimates. Journal of the Royal Statistical Society Series B 77(3):617646.Google Scholar
Esarey, Justin, and Summer, Jane Lawrence. 2015. Marginal effects in interaction models: Determining and controlling the false positive rate. Working Paper.Google Scholar
Fan, Jianqing, and Peng, Heng. 2004. Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics 32(3):928961.Google Scholar
Fan, Jianqing, and Li, Runze. 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456):13481360.Google Scholar
Figueiredo, Mario. 2004. Lecture Notes on the EM Algorithm. Lecture notes. Instituto de Telecomunicacoes, Instituto Superior Tecnico.Google Scholar
Foster, J. C., Taylor, J. M., and Ruberg, S. J.. 2011. Subgroup identification from randomized clinical trial data. Statistics in Medicine 30(2867-2880).Google Scholar
Gelman, Andrew. 2006. Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Analysis 1(3):515534.Google Scholar
Gelman, Andrew, Jakulin, Aleks, Pittau, Maria Grazia, and Su, Yu-Sung. 2008. A weakly informative default prior distribution for logistic and other regression models. Annals of Applied Statistics 2(4):13601383.Google Scholar
Gelman, Andrew, and Hill, Jennifer. 2007. Data analysis using regression and multilevel/hierarchical models . Cambridge: Cambrdige University Press.Google Scholar
Gelman, Andrew, Carlin, John B., Stern, Hal S., Dunson, David B., Vehtari, Aki, and Rubin, Donald B.. 2014. Bayesian data analysis . Text in statistical science series. Boca Raton, FL: CRC Press.Google Scholar
Gill, Jeff. 2014. Bayesian methods: A social and behavioral sciences approach . 3rd ed. CRC Press.Google Scholar
Gillen, B., Montero, S., Moon, H. R., and Shum, M.. 2016. BLP-Lasso for aggregate discrete choice models applied to elections with rich demographic covariates. Working Paper.Google Scholar
Green, Donald P., and Kern, Holger L.. 2012. Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees. Public Opinion Quarterly 76:491511.Google Scholar
Griffin, J. E., and Brown, P. J.. 2010. Inference with normal-gamma prior distributions in regression problems. Bayesian Analysis 5(1):171188.Google Scholar
Griffin, J. E., and Brown, P. J.. 2012. Structuring shrinkage: Some correlated priors for regression. Biometrika 99(2):481487.Google Scholar
Grimmer, Justin, Messing, Solomon, and Westwood, Sean. Forthcoming. Estimating heterogeneous treatment effects and the effects of heterogeneous treatments with ensemble methods. Political Analysis.Google Scholar
Hahn, P. Richard, and Carvalho, Carlos M.. 2015. Decoupling shrinkage and selection in Bayesian linear models: A posterior summary perspective. Journal of the American Statistical Association 110(509):435448.Google Scholar
Hainmueller, Jens, and Hazlett, Chad. 2013. Kernel regularized least squares: Reducing misspecification bias with a flexible and interpretable machine learning approach. Political Analysis 22:143169.Google Scholar
Hainmueller, Jens, Hopkins, Daniel J., and Yamamoto, Teppei. 2014. Causal inference in conjoint analysis: Understanding multidimensional choices via stated preference experiments. Political Analysis 22(1):130.Google Scholar
Hans, Chris. 2009. Bayesian lasso regression. Biometrika 96(4):835845.Google Scholar
Harding, Matthew, and Lamarche, Carlos. 2016. Penalized quantile regression with semiparametric correlated effects: An application with heterogeneous preferences. Journal of Applied Econometrics , doi:10.1002/jae.2520.Google Scholar
Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. 2010. The elements of statistical learning: Data mining, inference, and prediction . New York: Springer.Google Scholar
Imai, Kosuke, and Strauss, Aaron. 2011. Estimation of heterogeneous treatment effects from randomized experiments, with application to the optimal planning of the get- out-the-vote campaign. Political Analysis 19(1):119.Google Scholar
Imai, Kosuke, and Ratkovic, Marc. 2013. Estimating treatment effect heterogeneity in randomized program evaluation. The Annals of Applied Statistics 7(1):443470.Google Scholar
Jackman, Simon. 2009. Bayesian analysis for the social sciences . West Sussex, UK: Wiley.Google Scholar
Jaynes, E. T. 1982. On the rationale of maximum-entropy methods. Proceedings of the IEEE 70:939952.Google Scholar
Kang, Jian, and Guo, Jian. 2009. Self-adaptive Lasso and its Bayesian estimation. Working Paper.Google Scholar
Kenkel, Brenton, and Signorino, Curtis. 2012. A method for flexible functional form estimation: Bootstrapped basis regression with variable selection. Working Paper.Google Scholar
Kyung, Minjung, Gill, Jeff, Ghosh, Malay, and Casella, George. 2010. Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis 5(2):369412.Google Scholar
Kyung, Minjung, Gill, Jeff, Ghosh, Malay, and Casella, George et al. . 2010. Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis 5(2):369411.Google Scholar
Leeb, Hannes, and Potscher, Benedikt. 2008. Sparse estimators and the oracle property, or the return of hodges estimator. Journal of Econometrics 142:201211.Google Scholar
Leeb, Hannes, Potscher, Benedikt, and Ewald, Karl. 2015. On various confidence intervals post-model-selection. Statistical Science 30(2):216227.Google Scholar
Leng, Chenlei, Tran, Minh-Ngoc, and Nott, David. 2014. Bayesian adaptive LASSO. Annals of the Institute of Statistical Mathematics 66(2):221244.Google Scholar
Lipkovich, I., Dmitrienko, A., Denne, J., and Enas, G.. 2011. Subgrosup identification based on differential effect search—A recursive partitioning method for establishing response to treatment in patient subpopulations. Statistics in Medicine 30:26012621.Google Scholar
Liu, H., and Yu, B.. 2013. Asymptotic properties of Lasso $+$ mLS and Lasso $+$ Ridge in sparse high-dimensional linear regression. Electronic Journal of Statistics 7:31243169.Google Scholar
Lockhart, Richard, Taylor, Jonathan, Tibshirani, Ryan J., and Tibshirani, Robert. 2014. A significance test for the lasso. The Annals of Statistics 42(2):413468.Google Scholar
Loh, Wei-Yin, Heb, Xu, and Manc, Michael. 2015. A regression tree approach to identifying subgroups with differential treatment effects. Statistics in Medicine 34:18181833.Google Scholar
Minnier, Jessica, Tian, Lu, and Cai, Tianxi. 2011. A perturbation method for inference on regularized regression estimates. Journal of the American Statistical Association 106:13711382.Google Scholar
Mitchell, T. J., and Beauchamp, J. J.. 1988. Bayesian variable selection in linear regression. Journal of the Americal Statistical Association 83(404):10231032.Google Scholar
O’Hara, R. B., and Silanapaa, M. J.. 2009. A review of Bayesian variable selection methods: What, how and which. Bayesian Analysis 4(1):85118.Google Scholar
Park, Trevor, and Casella, George. 2008. The bayesian lasso. Journal of the American Statistical Association 103(482):681686.Google Scholar
Polson, Nicholas, and Scott, James. 2012. Local shrinkage rules, Levy processes and regularized regression. Journal of the Royal Statistical Society, Series B 74(2):287311.Google Scholar
Potscher, Benedikt, and Leeb, Hannes. 2009. On the distribution of penalized maximum likelihood estimators: The LASSO, SCAD, and thresholding. Journal of Multivariate Analysis 100(9):20652082.Google Scholar
Ratkovic, Marc, and Tingley, Dustin. 2016. Replication data for: Sparse estimation and uncertainty with application to subgroup analysis. doi:10.7910/DVN/RNMB1Q, Harvard Dataverse, September 6, 2016.Google Scholar
Stewart, Brandon M.Latent factor regressions for the social sciences. Working Paper.Google Scholar
Strezhnev, Anton, Hainmueller, Jens, Hopkins, Daniel, and Yamamoto, Teppei. 2014. cjoint: AMCE estimator for conjoint experiments. R package version 1.0.3.Google Scholar
Su, Xiaogang, Tsai, Chih-Ling, Wang, Hansheng, Nickerson, David M., and Li, Bogong. 2009. Subgroup analysis via recursive partitioning. Journal of Machine Learning Research 10:141158.Google Scholar
Tibshirani, Robert. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58:267288.Google Scholar
Tierney, Luke. 1994. Markov chains for exploring posterior distributions. The Annals of Statistics 22(4):17011728.Google Scholar
Tingley, Dustin, and Tomz, Michael. 2013. Conditional cooperation and climate change. Comparative Political Studies , p. 0010414013509571.Google Scholar
Wager, S., and Athey, S.. 2015. Estimation and inference of heterogeneous treatment effects using random forests. Working paper.Google Scholar
West, M. 1987. On scale mixtures of normal distributions. Biometrika 74:646648.Google Scholar
Zou, Hui. 2006. The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101(476):14181429.Google Scholar
Supplementary material: File

Ratkovic and Tingley supplementary material

Ratkovic and Tingley supplementary material 1

Download Ratkovic and Tingley supplementary material(File)
File 257.5 KB