Hostname: page-component-78c5997874-s2hrs Total loading time: 0 Render date: 2024-11-13T01:44:33.470Z Has data issue: false hasContentIssue false

What's in a Name? A Method for Extracting Information about Ethnicity from Names

Published online by Cambridge University Press:  04 January 2017

J. Andrew Harris*
Affiliation:
Political Science, New York University - Abu Dhabi, e-mail: andy.harris@nyu.edu
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Questions about racial or ethnic group identity feature centrally in many social science theories, but detailed data on ethnic composition are often difficult to obtain, out of date, or otherwise unavailable. The proliferation of publicly available geocoded person names provides one potential source of such data'if researchers can effectively link names and group identity. This article examines that linkage and presents a methodology for estimating local ethnic or racial composition using the relationship between group membership and person names. Common approaches for linking names and identity groups perform poorly when estimating group proportions. I have developed a new method for estimating racial or ethnic composition from names which requires no classification of individual names. This method provides more accurate estimates than the standard approach and works in any context where person names contain information about group membership. Illustrations from two very different contexts are provided: the United States and the Republic of Kenya.

Type
Articles
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open-Access article, distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Author 2015. Published by Oxford University Press on behalf of the Society for Political Methodology

Footnotes

Author's note: The author is grateful for comments from Andy Eggers, Arthur Spirling, Rachel Gould, Ben Ansell, Bernard Grofman, Gary King, Ken Benoit, Dominik Hangartner, Geoffrey Evans, and Lucy Barnes. Two anonymous reviewers provided excellent comments that resulted in a significantly improved manuscript. The author gratefully acknowledges his time at Nuffield College, Oxford University, as Postdoctoral Prize Research Fellow, during which much of this work was written. Computation for the research was carried out on the High Performance Computing resources at New York University–Abu Dhabi, with the enthusiastic support of Muataz Al-Barwani and Benoit Marchand. The replication archive for this article is available at the Political Analysis Dataverse as Harris (2014). Supplementary materials for this article are available on the Political Analysis Web site.

References

Ambekar, A., Ward, C., Mohammed, J., Male, S., and Skiena, S. 2009. Name-ethnicity classification from open sources. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 49–58. KDD ‘09, New York, NY, USA: ACM.Google Scholar
Anderson, D., and Lochery, E. 2008. Violence and exodus in Kenya's Rift Valley, 2008: Predictable and preventable. Journal of Eastern African Studies 2(2): 328–43.Google Scholar
Brown, S., and Sriram, C. L. 2012. The big fish won't fry themselves: Criminal accountability for post-election violence in Kenya. African Affairs 111(443): 244–60.Google Scholar
Byrne, K., and O’Malley, E. 2012. What's in a name? Using surnames as data for party research. Party Politics 19(6): 985–97.Google Scholar
Coldman, A. J., Braun, T., and Gallagher, R. P. 1988. The classification of ethnic status using name information. Journal of Epidemiology and Community Health 42:390–95.Google Scholar
Cook, R. D. 1977. Detection of influential observation in linear regression. Technometrics 19(1): 1518.Google Scholar
Electoral Commission of Kenya. October 2007. Register of electors. Kuresoi Constituency.Google Scholar
Enos, R. D. 2011. What tearing down public housing projects teaches us about the effect of racial threat on political participation. Working Paper, Department of Government, Harvard University.Google Scholar
Enos, R. D. 2012. Testing the elusive: A field experiment on racial threat. Working Paper, Harvard University.Google Scholar
Enos, R. D. 2015. Forthcoming. What the demolition of public housing teaches us about the impact of racial threat on political behavior. American Journal of Political Science.Google Scholar
Goldfarb, D., and Idnani, A. 1982. Dual and primal-dual methods for solving strictly convex quadratic programs. In Numerical Analysis, ed. Hennart, J. P., 226–39. Berlin: Springer-Verlag.Google Scholar
Goldfarb, D., and Idnani, A. 1983. A numerically stable dual method for solving strictly convex quadratic programs. Mathematical Programming 27:133.Google Scholar
Greiner, D. J. 2007. Ecological inference in voting rights act disputes: Where are we now, and where do we want to be? Jurimetrics 47:115–67.Google Scholar
Greiner, D. J., and Quinn, K. M. 2009. R x C ecological inference: Bounds, correlations, flexibility, and transparency of assumptions. Journal of the Royal Statistical Society, Series A 172(1): 6781.Google Scholar
Grofman, B., and Garcia, J. 2014. Using Spanish surname to estimate Hispanic voting population in voting rights litigation: A model of context effects. Election Law Journal 13(3): 375–93.Google Scholar
Harris, J. A. 2014. Replication data for: What's in a name? A method for extracting information about ethnicity from names. Dataverse Network, doi:10.7910/DVN/27691 (v1).Google Scholar
He, H., and Garcia, E. 2009. Learning from imbalanced data. Knowledge and Data Engineering, IEEE Transactions 21(9): 1263–84.Google Scholar
Hopkins, D. J. 2010. Politicized places: Explaining where and when immigrants provoke local opposition. American Political Science Review 104(1): 4060.Google Scholar
Hopkins, D., and King, G. 2010. A method of automated nonparametric content analysis for social science. American Journal of Political Science 54(1): 229–47.Google Scholar
Interim Independent Electoral Commission. July 2010. Voter's register. Kuresoi Constituency.Google Scholar
Kasara, K. 2013. Separate and suspicious: Local social and political context and ethnic tolerance in Kenya. Journal of Politics 75(4): 921–36.Google Scholar
King, G., and Lu, Y. 2008. Verbal autopsy methods with multiple causes of death. Statistical Science 23(1): 7891.Google Scholar
Klopp, J., and Kamungi, P. 2007. Violence and elections: Will Kenya collapse? World Policy Journal 24(4): 1118.Google Scholar
Mateos, P. 2007. A review of name-based ethnicity classification methods and their potential in population studies. Population, Space, and Place 13(4): 243–63.Google Scholar
Mateos, P. 2011. Uncertain segregation: The challenge of defining and measuring ethnicity in segregation studies. Built Environment 37(2): 226–38.Google Scholar
Mueller, S. D. 2014. Kenya and the International Criminal Court: Politics, the election, and the law. Journal of Eastern African Studies 8(1): 2542.Google Scholar
NCSBOE. 2012a Voter statistics file.Google Scholar
NCSBOE. 2012b Voting history file.Google Scholar
Rosenwaike, I. 1994. Surname analysis as a means of estimating minority elderly: An application using Asian surnames. Research on Aging 16(2): 212–27.Google Scholar
Sun, Y., Wong, A. K. C., and Kamel, M. S. 2009. Classification of imbalanced data: A review. International Journal of Pattern Recognition and Artificial Intelligence 23(4): 687719.Google Scholar
Susewind, R. 2015. What's in a name? Probabilistic inference of religious community from South Asian names. Field Methods 27(3): 114.Google Scholar
Treeratpituk, P., and Giles, C. L. 2012. Name-ethnicity classification and ethnicity-sensitive name matching. Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence.Google Scholar
Turlach, B. A., and Weingessel, A. 2013. quadprog: Functions to solve quadratic programming problems.Google Scholar
UNSD. Ethnicity: A review of data collection and dissemination. Technical report, United Nations Statistics Division, Demographic and Social Statistics Branch, Social and Housing Statistics Section.Google Scholar
Waki, J. P. 2008. Report of the Commission of Inquiry into Post Election Violence. Nairobi, Kenya: Government Printer.Google Scholar
Word, D., Coleman, C., Nunziata, R., and Kominski, R. n.d.a. Data accompanying “Demographic Aspects of Surnames from Census 2000.” U.S. Census Bureau, Washington, DC.Google Scholar
Word, D., Coleman, C., Nunziata, R., and Kominski, R. n.d.a. Data accompanying “Demographic Aspects of Surnames from Census n.d.b. Demographic aspects of surnames from Census 2000. Technical report, U.S. Census Bureau, Washington, DC.Google Scholar
Supplementary material: PDF

Harris supplementary material

Appendix

Download Harris supplementary material(PDF)
PDF 233.1 KB