Skip to main content Accessibility help
×
Hostname: page-component-78c5997874-xbtfd Total loading time: 0 Render date: 2024-11-10T10:24:48.015Z Has data issue: false hasContentIssue false

Unsupervised Machine Learning for Clustering in Political and Social Research

Published online by Cambridge University Press:  15 December 2020

Philip D. Waggoner
Affiliation:
University of Chicago

Summary

In the age of data-driven problem-solving, applying sophisticated computational tools for explaining substantive phenomena is a valuable skill. Yet, application of methods assumes an understanding of the data, structure, and patterns that influence the broader research program. This Element offers researchers and teachers an introduction to clustering, which is a prominent class of unsupervised machine learning for exploring and understanding latent, non-random structure in data. A suite of widely used clustering techniques is covered in this Element, in addition to R code and real data to facilitate interaction with the concepts. Upon setting the stage for clustering, the following algorithms are detailed: agglomerative hierarchical clustering, k-means clustering, Gaussian mixture models, and at a higher-level, fuzzy C-means clustering, DBSCAN, and partitioning around medoids (k-medoids) clustering.
Get access
Type
Element
Information
Online ISBN: 9781108883955
Publisher: Cambridge University Press
Print publication: 28 January 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Anscombe, F. J. 1973. “Graphs in statistical analysis.” The American Statistician 27:1721.Google Scholar
Baumer, Benjamin S., Kaplan, Daniel T., and Horton, Nicholas J.. 2017. Modern Data Science with R. Chapman and Hall/CRC.Google Scholar
Benaglia, Tatiana, Chauveau, Didier, Hunter, David, and Young, Derek. 2009. “mixtools: An R package for analyzing finite mixture models.” Journal of Statistical Software 32(6):129.CrossRefGoogle Scholar
Bezdek, James C., and Hathaway, Richard J.. 2002. VAT: A tool for visual assessment of (cluster) tendency. In IJCNN’02. Proceedings of the 2002 International Joint Conference on Neural Networks. Vol. 3 IEEE pp. 22252230.Google Scholar
Bezdek, James C., Ehrlich, Robert, and Full, William. 1984. “FCM: The Fuzzy C-Means clustering algorithm.” Computers & Geosciences 10(2–3):191203.Google Scholar
Bouveyron, Charles, Celeux, Gilles, Murphy, T. Brendan, and Raftery, Adrian E.. 2019. Model-Based Clustering and Classification for Data Science: With Applications in R. Cambridge University Press.Google Scholar
Bowen, Daniel C., and Greene, Zachary. 2014. “Should we measure professionalism with an index? A note on theory and practice in state legislative professionalism research.” State Politics & Policy Quarterly 14(3): 277296.Google Scholar
Brock, Guy, Pihur, Vasyl, Datta, Susmita, Datta, Somnath, et al. 2011. “clValid, an R package for cluster validation.” Journal of Statistical Software.Google Scholar
Day, William H. E., and Edelsbrunner, Herbert. 1984. “Efficient algorithms for agglomerative hierarchical clustering methods.” Journal of Classification 1(1):724.Google Scholar
Ester, Martin, Kriegel, Hans-Peter, Sander, Jörg, Xu, Xiaowei, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd. Vol. 96 pp. 226231.Google Scholar
Figueiredo, Mario A. T., and Jain, Anil K.. 2002. “Unsupervised learning of finite mixture models.” IEEE Transactions on Pattern Analysis and Machine Intelligence 24(3):381396.CrossRefGoogle Scholar
Friedman, Jerome, Hastie, Trevor, and Tibshirani, Robert. 2001. The Elements of Statistical Learning. Springer series in statistics. New York, NY.Google Scholar
Gong, Xiaoliang, Long, Bozhong, Fang, Kun, Di, Zongling, Hou, Yichu, and Cao, Lei. 2016. A prediction based on clustering and personality questionnaire data for IGD risk: A preliminary work. In 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). IEEE pp. 16991703.Google Scholar
Hara, Kotaro, Adams, Abigail, Milland, Kristy, Savage, Saiph, Callison-Burch, Chris, and Bigham, Jeffrey P.. 2018. A data-driven analysis of workers’ earnings on Amazon Mechanical Turk. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM p. 449.Google Scholar
Hartigan, John A., and Wong, Manchek A.. 1979. “Algorithm AS 136: A kmeans clustering algorithm.” Journal of the Royal Statistical Society. Series C (Applied Statistics) 28(1):100108.Google Scholar
Johnson, Stephen C. 1967. “Hierarchical clustering schemes.” Psychometrika 32(3):241254.CrossRefGoogle ScholarPubMed
Kanungo, Tapas, Mount, David M., Netanyahu, Nathan S., Piatko, Christine D., Silverman, Ruth, and Wu, Angela Y.. 2002. “An efficient k-means clustering algorithm: Analysis and implementation.” IEEE Transactions on Pattern Analysis & Machine Intelligence (7):881892.Google Scholar
Kassambara, Alboukadel. 2017. Practical Guide to Cluster Analysis in R: Unsupervised Machine Learning. Vol. 1 STHDA.Google Scholar
Kaufman, Leonard, and Rousseeuw, Peter J.. 2009. Finding Groups in Data: an Introduction to Cluster Analysis. Vol. 344 John Wiley & Sons.Google Scholar
Matejka, Justin, and Fitzmaurice, George. 2017. Same stats, different graphs: generating datasets with varied appearance and identical statistics through simulated annealing. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM pp. 12901294.Google Scholar
Moon, Todd K. 1996. “The expectation-maximization algorithm.” IEEE Signal Processing Magazine 13(6):4760.CrossRefGoogle Scholar
Muthén, Bengt, and Shedden, Kerby. 1999. “Finite mixture modeling with mixture outcomes using the EM algorithm.” Biometrics 55(2): 463469.Google Scholar
Squire, Peverill. 1992. “Legislative professionalization and membership diversity in state legislatures.” Legislative Studies Quarterly pp. 6979.Google Scholar
Squire, Peverill. 2000. “Uncontested seats in state legislative elections.” Legislative Studies Quarterly pp. 131146.Google Scholar
Squire, Peverill. 2007. “Measuring state legislative professionalism: The squire index revisited.” State Politics & Policy Quarterly 7(2): 211227.Google Scholar
Squire, Peverill. 2017. “A Squire Index update.” State Politics & Policy Quarterly 17(4):361371.Google Scholar
Tukey, John W. 1980. “We need both exploratory and confirmatory.” The American Statistician 34(1):2325.Google Scholar
Wickham, Hadley, and Grolemund, Garrett. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.Google Scholar

Save element to Kindle

To save this element to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Unsupervised Machine Learning for Clustering in Political and Social Research
Available formats
×

Save element to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Unsupervised Machine Learning for Clustering in Political and Social Research
Available formats
×

Save element to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Unsupervised Machine Learning for Clustering in Political and Social Research
Available formats
×