Hostname: page-component-cd9895bd7-8ctnn Total loading time: 0 Render date: 2024-12-27T11:30:28.006Z Has data issue: false hasContentIssue false

Messy Data, Robust Inference? Navigating Obstacles to Inference with bigKRLS

Published online by Cambridge University Press:  26 September 2018

Pete Mohanty*
Affiliation:
Stanford University, Statistics, Sequoia Hall, 390 Serra Mall, Stanford, CA 94305, USA. Email: pmohanty@stanford.edu
Robert Shaffer
Affiliation:
Department of Government, The University of Texas at Austin, Batts Hall 2.116, Austin, TX 78712-1704, USA. Email: rbshaffer@utexas.edu

Abstract

Complex models are of increasing interest to social scientists. Researchers interested in prediction generally favor flexible, robust approaches, while those interested in causation are often interested in modeling nuanced treatment structures and confounding relationships. Unfortunately, estimators of complex models often scale poorly, especially if they seek to maintain interpretability. In this paper, we present an example of such a conundrum and show how optimization can alleviate the worst of these concerns. Specifically, we introduce bigKRLS, which offers a variety of statistical and computational improvements to the Hainmueller and Hazlett (2013) Kernel-Regularized Least Squares (KRLS) approach. As part of our improvements, we decrease the estimator’s single-core runtime by 50% and reduce the estimator’s peak memory usage by an order of magnitude. We also improve uncertainty estimates for the model’s average marginal effect estimates—which we test both in simulation and in practice—and introduce new visual and statistical tools designed to assist with inference under the model. We further demonstrate the value of our improvements through an analysis of the 2016 presidential election, an analysis that would have been impractical or even infeasible for many users with existing software.

Type
Articles
Copyright
Copyright © The Author(s) 2018. Published by Cambridge University Press on behalf of the Society for Political Methodology. 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Authors’ note: Authors, who have contributed equally to this project, are listed alphabetically. This project has benefited immensely from feedback at the Stanford University Statistics Seminar, April 18, 2017; useR! 2016, hosted by Stanford University; American Political Science Association 2016; the International Methods Colloquium, hosted by Justin Esarey on November 11, 2016; the Stevens Institute of Technology on February 27, 2017; and the Bay Area R Users Group Official Meetups, hosted by Treasure Data (May 2016), Santa Clara University (October 2016), and GRAIL (June 2017). Thanks in particular to Susan Holmes, Joseph Rickert, Stefan Wager, Stephen Jessee, Christopher Wlezien, Trevor Hastie, Christian Fong, Luke Sonnet, Chad Hazlett, Kristyn Karl, Jacob Berman, Jonathan Katz, Gaurav Sood, Maraam Dwidar, and anonymous reviewers for additional comments (mistakes, of course, are ours). Pete Mohanty thanks Stanford University’s Vice Provost for Undergraduate Education for research leave. For replication materials, see Mohanty and Shaffer (2018).

Contributing Editor: Jonathan N. Katz

References

Beck, A., and Ben-Tal, A.. 2006. On the solution of the Tikhonov regularization of the total least squares problem. Journal of Optimization 17:98118.Google Scholar
Boutsidis, C., Mahoney, M. W., and Drineas, P.. 2009. An improved approximation algorithm for the column subset selection problem. Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms . SIAM, pp. 968977.Google Scholar
Breiman, L. 2001. Random forests. Machine Learning 45(1):532.Google Scholar
Case, A., and Deaton, A.. 2015. Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century. Proceedings of the National Academy of Sciences 112(49):1507815083.Google Scholar
Chipman, H. A., George, E. I., and McCulloch, R. E. et al. . 2010. Bart: Bayesian additive regression trees. The Annals of Applied Statistics 4(1):266298.Google Scholar
Demmel, J. W. 1997. Applied numerical linear algebra . Philadelphia, PA: SIAM.Google Scholar
Diaconis, P., Goel, S., and Holmes, S.. 2008. Horseshoes in multidimensional scaling and local kernel methods. The Annals of Applied Statistics 2:777807.Google Scholar
El Karoui, N. 2010. The spectrum of kernel random matrices. The Annals of Statistics 38(1):150.Google Scholar
Eleftheriadis, S., Rudovic, O., and Pantic, M.. 2015. Discriminative shared Gaussian processes for multiview and view-invariant facial expression recognition. IEEE Transactions on Image Processing 24(1):189204.Google Scholar
Ferwerda, J., Hainmueller, J., and Hazlett, C.. 2017. Kernel-based regularized least squares in R (KRLS) and Stata (krls). Journal of Statistical Software, Articles 79(3):126.Google Scholar
Gill, J. 1999. The insignificance of null hypothesis significance testing. Political Research Quarterly 52(3):647674.Google Scholar
Gu, C., Jeon, Y., and Lin, Y.. 2013. Nonparametric density estimation in high-dimensions. Statistica Sinica 23(3):11311153.Google Scholar
Guo, J.2016. Death predicts whether people vote for Donald Trump. Washington Post. Available at: https://www.washingtonpost.com/news/wonk/wp/2016/03/04/death-predicts-whether-people-vote-for-donald-trump/?utm_term=.7d2dd542d4cd.Google Scholar
Hainmueller, J., and Hazlett, C.. 2013. Kernel regularized least squares: Reducing misspecification bias with a flexible and interpretable machine learning approach. Political Analysis 22(2):143168.Google Scholar
Hastie, T., Tibshirani, R., and Fiedman, J.. 2008. The elements of statistical learning . 2nd edn. New York: Springer.Google Scholar
Hastie, T., Tibshirani, R., and Wainwright, M.. 2015. Statistical learning with sparsity: The LASSO and generalizations . New York: CRC Press.Google Scholar
Hazlett, C.2016. Kernel balancing: A flexible non-parametric weighting procedure for estimating causal effects. Available at: https://arxiv.org/abs/1605.00155.Google Scholar
Homrighausen, D., and McDonald, D. J.. 2016. On the Nyström and column-sampling methods for the approximate principal components analysis of large datasets. Journal of Computational and Graphical Statistics 25(2):344362.Google Scholar
Imai, K., Lo, J., and Olmsted, J.. 2016. Fast estimation of ideal points with massive data. American Political Science Review 110(4):631656.Google Scholar
Jackman, S. 2009. Bayesian analysis for the social sciences . West Sussex: John Wiley & Sons.Google Scholar
Keele, L. 2015. The statistics of causal inference: A view from political methodology. Political Analysis 23:313335.Google Scholar
Mohanty, P., and Shaffer, R.. 2018. Replication data for: Messy data, Robust inference? Navigating obstacles to inference with bigKRLS. https://doi.org/10.7910/DVN/CYYLOK, Harvard Dataverse, V1.Google Scholar
Monnat, S. M.2016. Deaths of despair and support for Trump in the 2016 presidential election, Pennsylvania State University Department of Agricultural Economic Research Brief. Available at: https://aese.psu.edu/directory/smm67/Election.16.pdf.Google Scholar
Papadimitriou, C. H. 2003. Computational complexity. In Encyclopedia of computer science . Chichester, UK: John Wiley and Sons Ltd, pp. 260265.Google Scholar
Ratkovic, M., and Tingley, D.. 2017. Sparse estimation and uncertainty with application to subgroup analysis. Political Analysis 25(1):140.Google Scholar
Rifkin, R., Yeo, G., and Poggio, T. et al. . 2003. Regularized least-squares classification. NATO Science Series Sub Series III Computer and Systems Sciences 190:131154.Google Scholar
Rifkin, R. M., and Lippert, R. A.. 2007. Notes on regularized least squares. Computer Science and Artificial Intelligence Laboratory Technical Report.Google Scholar
Siegel, Z.2016. Is the opioid crisis partly to blame for President Trump? Slate Magazine. Available at: http://www.slate.com/articles/health_and_science/medical_examiner/2016/12/the_trump_heroin_connection_is_still_unclear.html.Google Scholar
Taylor, J., and Tibshirani, R. J.. 2015. Statistical learning and selective inference. Proceedings of the National Academy of Sciences 112:76297634.Google Scholar
Tibshirani, R. 1996. Regression shrinkage and selection via the LASSO. Journal of Royal Statistical Society 58:267288.Google Scholar
Tibshirani, R. J., and Rosset, S.. 2016. Excess optimism: How biased is the apparent error of an estimator tuned by sure? Preprint, arXiv:1612.09415.Google Scholar
Wahba, G. 1983. Bayesian “confidence intervals”. Journal of the Royal Statistical Society 45(1):133150.Google Scholar
Witten, I. H., Frank, E., and Hall, M. A.. 2011. Data mining: Practical machine learning tools and techniques . 3rd edn. Burlington, MA: Elsevier.Google Scholar
Yu, K., Xu, W., and Gong, Y.. 2009. Deep learning with kernel regularization for visual recognition. In Advances in neural information processing systems , ed. Koller, D., Schuurmans, D., Bengio, Y., and Bottou, L.. Red Hook, NY: Curran Associates, pp. 18891896.Google Scholar
Zhang, Z., Dai, G., and Jordan, M. I.. 2011. Bayesian generalized kernel mixed models. Journal of Machine Learning Research 12:111139.Google Scholar
Supplementary material: File

Mohanty and Shaffer supplementary material

Mohanty and Shaffer supplementary material 1

Download Mohanty and Shaffer supplementary material(File)
File 2.9 MB