Hostname: page-component-78c5997874-4rdpn Total loading time: 0 Render date: 2024-11-10T12:29:13.212Z Has data issue: false hasContentIssue false

Scaling Data from Multiple Sources

Published online by Cambridge University Press:  23 November 2020

Ted Enamorado*
Affiliation:
Assistant Professor, Department of Political Science, Washington University in St. Louis, St. Louis, MO63130, USA. Email: ted@wustl.edu, URL: http://www.tedenamorado.com
Gabriel López-Moctezuma
Affiliation:
Assistant Professor, Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, CA91125, USA. Email: glmoctezuma@caltech.edu, URL: http://glmoctezuma.com
Marc Ratkovic
Affiliation:
Assistant Professor, Department of Politics, Princeton University, Princeton, NJ08544, USA. Email: ratkovic@princeton.edu, URL: http://www.princeton.edu/~ratkovic
*
Corresponding author Ted Enamorado

Abstract

We introduce a method for scaling two datasets from different sources. The proposed method estimates a latent factor common to both datasets as well as an idiosyncratic factor unique to each. In addition, it offers a flexible modeling strategy that permits the scaled locations to be a function of covariates, and efficient implementation allows for inference through resampling. A simulation study shows that our proposed method improves over existing alternatives in capturing the variation common to both datasets, as well as the latent factors specific to each. We apply our proposed method to vote and speech data from the 112th U.S. Senate. We recover a shared subspace that aligns with a standard ideological dimension running from liberals to conservatives, while recovering the words most associated with each senator’s location. In addition, we estimate a word-specific subspace that ranges from national security to budget concerns, and a vote-specific subspace with Tea Party senators on one extreme and senior committee leaders on the other.

Type
Article
Copyright
© The Author(s) 2020. Published by Cambridge University Press on behalf of the Society for Political Methodology

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Edited by Betsy Sinclair

References

Albert, J. H., and Chib, S.. (1993). “Bayesian Analysis of Binary and Polychotomous Response Data.” Journal of the American Statistical Association 88(422):669679.CrossRefGoogle Scholar
Aldrich, J., and McKelvey, R.. (1977). “A Method of Scaling with Applications to the 1968 and 1972 Presidential Elections.” American Political Science Review 71(1):111130.CrossRefGoogle Scholar
Anderson, T. (1989). “Linear Latent Variable Models and Covariance Structures.” Journal of Econometrics 41: 91119.CrossRefGoogle Scholar
Bach, F., and Jordan, M. (2005). “A Probabilistic Interpretation of Canonical Correlation Analysis.” Technical Report 688, Department of Statistics, University of California at Berkeley.Google Scholar
Bafumi, J., and Herron, M. (2010). “Leapfrog Representation and Extremism: A Study of American Voters and Their Members in Congress.” American Political Science Review 104(3):519542.Google Scholar
Barbera, P. (2016). “Birds of the Same Feather Tweet Together. Bayesian Ideal Point Estimation Using Twitter Data.” Political Analysis 23(1):7691.Google Scholar
Bonica, A. (2014). “Mapping the Ideological Marketplace.” American Journal of Political Science 58(2):367386.Google Scholar
Borg, I., Groenen, P. J.F., and Mair, P. (2013). Applied Multidimensional Scaling. New York: Springer.CrossRefGoogle Scholar
Borg, I., and Groenen, P. J. (2005). Modern Multidimensional Scaling: Theory and Applications. New York: Springer.Google Scholar
Browne, M. W. (1979). “The Maximum-Likelihood Solution in Inter-Battery Factor Analysis.” British Journal of Mathematical and Statistical Psychology 32:7586.CrossRefGoogle Scholar
Clinton, J., Jackman, S., and Rivers, D. (2004). “The Statistical Analysis of Roll Call Data.” American Political Science Review 98(2):355370.CrossRefGoogle Scholar
Coppedge, M. et al. (2015). “V-dem Codebook v5.” Varieties of Democracy (V- Dem) Project.Google Scholar
Denny, M. J., and Spirling, A. (2018). “Text Preprocessing for Unsupervised Learning: Why it Matters, When It Misleads, and What to do About it.” Political Analysis 26(2):168189.CrossRefGoogle Scholar
Enamorado, T., López-Moctezuma, G., and Ratkovic, M. (2020a). “Replication Data for: Scaling Data from Multiple Sources.” https://doi.org/10.24433/CO.3824807.v1, Code Ocean, V1.CrossRefGoogle Scholar
Enamorado, T., López-Moctezuma, G., and Ratkovic, M. (2020b). “Replication Data for: Scaling Data from Multiple Sources.” https://doi.org/10.7910/DVN/FOUVEL, Harvard Dataverse, V1.CrossRefGoogle Scholar
Gentzkow, M., and Shapiro, J. M. (2010). “What Drives Media Slant? Evidence from US Daily Newspapers.” Econometrica 78(1):3571.Google Scholar
Goplerud, M. (2019). “A Multinomial Framework for Ideal Point Estimation.” Political Analysis 27(1):6989.CrossRefGoogle Scholar
Groseclose, T., and Milyo, J. (2005). “A Measure of Media Bias.” The Quarterly Journal of Economics 120(4):11911237.CrossRefGoogle Scholar
Gupta, S. K., Phung, D., Adams, B., and Venkatesh, S. (2011). “A Bayesian Framework for Learning Shared and Individual Subspaces from Multiple Data Sources.” In Advances in Knowledge Discovery and Data Mining Lecture Notes in Computer Science, 136147. New York: Springer-Verlag.Google Scholar
Hahn, P. R., Carvalho, C. M., and Scott, J. G. (2012). “A Sparse Factor Analytic Probit Model for Congressional Voting Patterns.” Journal of the Royal Statistical Society, Series A 61(4):619635.CrossRefGoogle Scholar
Hansen, S., McMahon, M., and Prat, A. (2018). “Transparency and Deliberation within the FOMC: A Computational Linguistics Approach.” Quarterly Journal of Economics 133(2):801870.CrossRefGoogle Scholar
Hare, C., Armstrong, D. A. II, Carroll, R. B. R., and Poole, K. T. (2015). “Using Bayesian Aldrich-Mckelvey Scaling to Study Citizens’ Ideological Preferences and Perceptions.” American Journal of Political Science 59(3):759774.CrossRefGoogle Scholar
Hastie, T., Tibshirani, R., and Friedman, J.. (2013). The Elements of Statistical Learning. (10 edn). New York: Springer-Verlag.Google Scholar
Hobbs, W. (2017). Pivoted Text Scaling for Open-Ended Survey Responses. Unpublished manuscript.CrossRefGoogle Scholar
Hobbs, W. R., and Roberts, M. E. (2018). “How Sudden Censorship Can Increase Access to Information.” American Political Science Review 112(3):621636.CrossRefGoogle Scholar
Hoff, P. D. (2007). “Extending the Rank Likelihood for Semiparametric Copula Estimation.” The Annals of Applied Statistics 1(1):265283.Google Scholar
Jackman, S., and Trier, S. (2008). “Democracy as a Latent Variable.” American Journal of Political Science 52(1):201–17.Google Scholar
Jacoby, W. G. (1986). “Levels of Conceptualization and Reliance on the Liberal-Conservative Continuum.” The Journal of Politics 48(2):423432.CrossRefGoogle Scholar
Jacoby, W. G. (2009). “Public Opinion During a Presidential Campaign: Distinguishing the Effects of Environmental Evolution and Attitude Change.” Electoral Studies 28(3):422436.Google Scholar
Jacoby, W. G., and Armstrong, D. A. II (2014). “Bootstrap Confidence Regions for Multidimensional Scaling Solutions.” American Journal of Political Science 58(1):264278.CrossRefGoogle Scholar
Jessee, S. (2016). “(How) Can We Estimate the Ideology of Citizens and Political Elites on the Same Scale?American Journal of Political Science 60(4):11081124.CrossRefGoogle Scholar
Keele, L., McConnaughy, C., and White, I. (2012). “Strengthening the Experimenter’s Toolbox: Statistical Estimation of Internal Validity.” American Journal of Political Science 56(2):484499.Google Scholar
Kellerman, M. (2012). “Estimating Ideal Points in the British House of Commoms Using Early Day Motions.” American Journal of Political Science 56(3):757771.Google Scholar
Kim, I. S., Londregan, J., and Ratkovic, M. (2018). “Estimating Spatial Preferences from Votes and Text.” Political Analysis 26(2):210229.CrossRefGoogle Scholar
Klami, A., Virtanen, S., and Kaski, S. (2013). “Bayesian Canonical Correlation Analysis.” Journal of Machine Learning Research 14(Apr):9651003.Google Scholar
Ladha, K. (1991). “A Spatial Model of Leglslative Voting with Perceptual Error.” Public Choice 68(1/3):151–74.CrossRefGoogle Scholar
Lauderdale, B., and Clark, T. (2014). “Scaling politically meaningful dimensions using texts and votes.” American Journal of Political Science 58(3):754771.Google Scholar
Lewis, J. B., and Tausanovitch, C. (2015). “When Does Joint Scaling Allow for Direct Comparisons of Preferences?” Technical Report, University of California, Los Angeles.Google Scholar
Mair, P., Borg, I., and Rusch, T. (2016). “Goodness-of-Fit Assessment in Multidimensional Scaling and Unfolding.” Multivariate Behavioral Research 51(6):772789.Google Scholar
Martin, G. J., and Yurukoglu, A. (2017). “Bias in Cable News: Persuasion and polarization.” American Economic Review 107(9):2565–99.CrossRefGoogle Scholar
Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT Press.Google Scholar
Murray, J. S., Dunson, D. B., Carin, L., and Lucas, J. E. (2013). “Bayesian Gaussian Copula Factor Models for Mixed Data.” Journal of the American Statistical Association 108(502):656665.CrossRefGoogle ScholarPubMed
Poole, K., and Rosenthal, H. (1997). Congress: A Political Economic History of Roll Call Voting. New York: Oxford University Press.Google Scholar
Poole, K. T. (2005). Spatial Models of Parliamentary Voting. Analytical Methods for Social Research. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Quinn, K. M. (2004). “Bayesian Factor Analysis for Mixed Ordinal and Continuous Responses.” Political Analysis 12(4):338353.CrossRefGoogle Scholar
Roberts, M., et al. (2014). “Structural Topic Models for Open Ended Survey Responses.” American Journal of Political Science 58(4):10641082.CrossRefGoogle Scholar
Rockova, V., and George, E. I. (2016). “Fast Bayesian Factor Analysis Via Automatic Rotations to Sparsity.” Journal of the American Statistical Association 111(516):16081622.Google Scholar
Shor, B., and McCarty, N. (2011). “The Ideological Mapping of American Legislatures.” American Political Science Review 105(3):530551.CrossRefGoogle Scholar
Stewart, C. I., and Woon, J. (1998). Congressional Committee Assignments, 103rd to 114th Congresses, 1993–2017: Senate, 11/17/2017.Google Scholar
Tausanovitch, C., and Warshaw, C. (2013). “Measuring Constituent Policy Preferences in Congress, State Legislatures, and Cities.” The Journal of Politics 75(2):330342.CrossRefGoogle Scholar
Tipping, M. E., and Bishop, C. M. (1999). “Probabilistic Principal Component Analysis.” Journal of the Royal Statistcal Society, Series B 61(3):611622.Google Scholar
Tucker, L. R. (1958). “An Inter-Battery Method of Factor Analysis.” Psychometrika 23(2):111136.CrossRefGoogle Scholar
Supplementary material: Link

Enamorado et al. Dataset

Link
Supplementary material: PDF

Enamorado et al. Supplementary Materials

Enamorado et al. Supplementary Materials

Download Enamorado et al. Supplementary Materials(PDF)
PDF 502.1 KB