Kernel Regularized Least Squares: Reducing Misspecification Bias with a Flexible and Interpretable Machine Learning Approach

Jens Hainmueller; Chad Hazlett

doi:10.1093/pan/mpt019

Kernel Regularized Least Squares: Reducing Misspecification Bias with a Flexible and Interpretable Machine Learning Approach

Published online by Cambridge University Press: 04 January 2017

Jens Hainmueller and

Chad Hazlett

Show author details

Jens Hainmueller*: Affiliation:
Department of Political Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139
Chad Hazlett: Affiliation:
Department of Political Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139. e-mail: hazlett@mit.edu
*: e-mail: jhainm@mit.edu (corresponding author)

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

We propose the use of Kernel Regularized Least Squares (KRLS) for social science modeling and inference problems. KRLS borrows from machine learning methods designed to solve regression and classification problems without relying on linearity or additivity assumptions. The method constructs a flexible hypothesis space that uses kernels as radial basis functions and finds the best-fitting surface in this space by minimizing a complexity-penalized least squares problem. We argue that the method is well-suited for social science inquiry because it avoids strong parametric assumptions, yet allows interpretation in ways analogous to generalized linear models while also permitting more complex interpretation to examine nonlinearities, interactions, and heterogeneous effects. We also extend the method in several directions to make it more effective for social inquiry, by (1) deriving estimators for the pointwise marginal effects and their variances, (2) establishing unbiasedness, consistency, and asymptotic normality of the KRLS estimator under fairly general conditions, (3) proposing a simple automated rule for choosing the kernel bandwidth, and (4) providing companion software. We illustrate the use of the method through simulations and empirical examples.

Type: Research Article
Information: Political Analysis , Volume 22 , Issue 2 , Spring 2014 , pp. 143 - 168

DOI: https://doi.org/10.1093/pan/mpt019 [Opens in a new window]
Copyright: Copyright © The Author 2013. Published by Oxford University Press on behalf of the Society for Political Methodology

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Authors' note: The authors are listed in alphabetical order and contributed equally. We thank Jeremy Ferwerda, Dominik Hangartner, Danny Hidalgo, Gary King, Lorenzo Rosasco, Marc Ratkovic, Teppei Yamamoto, our anonymous reviewers, the editors, and participants in seminars at NYU, MIT, the Midwest Political Science Conference, and the European Political Science Association Conference for helpful comments. Companion software written by the authors to implement the methods proposed in this article in R, Matlab, and Stata can be downloaded from the authors' Web pages. Replication materials are available in the Political Analysis Dataverse at http://dvn.iq.harvard.edu/dvn/dv/pan. The usual disclaimer applies. Supplementary materials for this article are available on the Political Analysis Web site.

References

Beck, N., King, G., and Zeng, L. 2000. Improving quantitative studies of international conflict: A conjecture. American Political Science Review 94: 21–36.CrossRef Google Scholar

Brambor, T., Clark, W., and Golder, M. 2006. Understanding interaction models: Improving empirical analyses. Political Analysis 14(1): 63–82.Google Scholar

De Vito, E., Caponnetto, A., and Rosasco, L. 2005. Model selection for regularized least-squares algorithm in learning theory. Foundations of Computational Mathematics 5(1): 59–85.CrossRef Google Scholar

Evgeniou, T., Pontil, M., and Poggio, T. 2000. Regularization networks and support vector machines. Advances in Computational Mathematics 13(1): 1–50.Google Scholar

Friedrich, R. J. 1982. In defense of multiplicative terms in multiple regression equations. American Journal of Political Science 26(4): 797–833.CrossRef Google Scholar

Golub, G. H., Heath, M., and Wahba, G. 1979. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21(2): 215–23.Google Scholar

Harff, B. 2003. No lessons learned from the Holocaust? Assessing risks of genocide and political mass murder since 1955. American Political Science Review 97(1): 57–73.Google Scholar

Hastie, T., Tibshirani, R., and Friedman, J. 2009. The elements of statistical learning: Data mining, inference, and prediction. 2nd ed. New York, NY: Springer.Google Scholar

Jackson, J. E. 1991. Estimation of models with variable coefficients. Political Analysis 3(1): 27–49.Google Scholar

Kimeldorf, G., and Wahba, G. 1970. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Annals of Mathematical Statistics 41(2): 495–502.Google Scholar

King, G., and Zeng, L. 2006. The dangers of extreme counterfactuals. Political Analysis 14(2): 131–59.Google Scholar

Rifkin, R. M., and Lippert, R. A. 2007. Notes on regularized least squares. Technical report, MIT Computer Science and Artificial Intelligence Laboratory.Google Scholar

Rifkin, R., Yeo, G., and Poggio, T. 2003. Regularized least-squares classification. Nato Science Series Sub Series III Computer and Systems Sciences 190: 131–54.Google Scholar

Saunders, C., Gammerman, A., and Vovk, V. 1998. Ridge regression learning algorithm in dual variables. In Proceedings of the 15th International Conference on Machine Learning. Volume 19980, 515–21. San Francisco, CA: Morgan Kaufmann.Google Scholar

Schölkopf, B., and Smola, A. 2002. Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT Press.Google Scholar

Tychonoff, A. N. 1963. Solution of incorrectly formulated problems and the regularization method. Doklady Akademii Nauk SSSR 151: 501–4. Translated in Soviet Mathematics 4: 1035–8.Google Scholar

Wood, S. N. 2003. Thin plate regression splines. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65(1): 95–114.Google Scholar

Hainmueller and Hazlett supplementary material

Appendix

PDF 843.9 KB

Article contents

Kernel Regularized Least Squares: Reducing Misspecification Bias with a Flexible and Interpretable Machine Learning Approach

Abstract

Access options

Footnotes

References

Hainmueller and Hazlett supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests