Hostname: page-component-5b777bbd6c-gtgcz Total loading time: 0 Render date: 2025-06-19T10:47:37.109Z Has data issue: false hasContentIssue false

Digital accents, homogeneity-by-design, and the evolving social science of written language

Published online by Cambridge University Press:  13 June 2025

AJ Alvero*
Affiliation:
Center for Data Science for Enterprise and Society, Cornell University, Ithaca, NY, USA
Quentin Sedlacek
Affiliation:
Department of Teaching & Learning, Annette Caldwell Simmons School of Education & Human Development, Southern Methodist University, Dallas, TX, USA
Maricela León
Affiliation:
Department of Teaching & Learning, Annette Caldwell Simmons School of Education & Human Development, Southern Methodist University, Dallas, TX, USA
Courtney Peña
Affiliation:
Stanford University School of Medicine, Stanford, CA, USA
*
Corresponding author: AJ Alvero; Email: ajalvero@cornell.edu

Abstract

Human language is increasingly written rather than just spoken, primarily due to the proliferation of digital technology in modern life. This trend has enabled the creation of generative artificial intelligence (AI) trained on corpora containing trillions of words extracted from text on the internet. However, current language theory inadequately addresses digital text communication’s unique characteristics and constraints. This paper systematically analyzes and synthesizes existing literature to map the theoretical landscape of digitized language. The evidence demonstrates that, parallel to spoken language, features of written communication are frequently correlated with the socially constructed demographic identities of writers, a phenomenon we refer to as “digital accents.” This conceptualization raises complex ontological questions about the nature of digital text and its relationship to social identity. The same line of questioning, in conjunction with recent research, shows how generative AI systematically fails to capture the breadth of expression observed in human writing, an outcome we call “homogeneity-by-design.” By approaching text-based language from this theoretical framework while acknowledging its inherent limitations, social scientists studying language can strengthen their critical analysis of AI systems and contribute meaningful insights to their development and improvement.

Type
Research Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Adam, D. (2024). Lethal AI weapons are here: How can we control them? Nature, 629(8012), 521523. https://doi.org/10.1038/d41586-024-01029-0CrossRefGoogle Scholar
Adames, A. (2023). The cumulative effects of colorism: Race, wealth, and skin tone. Social Forces, 102(2), 539560. https://doi.org/10.1093/sf/soad038CrossRefGoogle Scholar
Agarwal, D., Naaman, M., & Vashistha, A. (2024). AI suggestions homogenize writing toward western styles and diminish cultural nuances. arXiv preprint arXiv:2409.11360. https://doi.org/10.48550/arXiv.2409.11360CrossRefGoogle Scholar
Alvero, A. (2023). Sociolinguistic perspectives on machine learning with text data. In Borch, C. & Pardo-Guerra, J. P. (Eds.), The Oxford handbook of the sociology of machine learning (pp. 7997). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780197653609.013.15CrossRefGoogle Scholar
Alvero, A., Arthurs, N., Antonio, A. L., Domingue, B. W., Gebre-Medhin, B., Giebel, S., & Stevens, M. L. (2020). AI and holistic review: Informing human reading in college admissions. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society: (pp. 200206). Association for Computing Machinery, New York, NY, United States. https://doi.org/10.1145/3375627.3375871CrossRefGoogle Scholar
Alvero, A., Giebel, S., Gebre-Medhin, B., Antonio, A. L., Stevens, M. L., & Domingue, B. W. (2021). Essay content and style are strongly related to household income and sat scores: Evidence from 60,000 undergraduate applications. Science Advances, 7(42), 110. https://doi.org/10.1126/sciadv.abi9031CrossRefGoogle ScholarPubMed
Alvero, A., Lee, J., Regla-Vargas, A., Kizilcec, R. F., Joachims, T., & Antonio, A. L. (2024). Large language models, social demography, and hegemony: Comparing authorship in human and synthetic text. Journal of Big Data, 11(1), . https://doi.org/10.1186/s40537-024-00986-7CrossRefGoogle Scholar
Alvero, A., Pal, J., & Moussavian, K. M. (2022). Linguistic, cultural, and narrative capital: Computational and human readings of transfer admissions essays. Journal of Computational Social Science, 5(2), 17091734. https://doi.org/10.1007/s42001-022-00185-5CrossRefGoogle ScholarPubMed
Alvero, A., & Pattichis, R. (2024). Multilingualism and mismatching: Spanish language usage in college admissions essays. Poetics, 105, . https://doi.org/10.1016/j.poetic.2024.101903CrossRefGoogle Scholar
Anderson, B. R., Shah, J. H., & Kreminski, M. (2024). Homogenization effects of large language models on human creative ideation. In Proceedings of the 16th Conference on Creativity & Cognition (pp. 413425). Association for Computing Machinery. https://doi.org/10.1145/3635636.3656204CrossRefGoogle Scholar
Annamma, S. A., Jackson, D. D., & Morrison, D. (2017). Conceptualizing color-evasiveness: Using dis/ability critical race theory to expand a color-blind racial ideology in education and society. Race Ethnicity and Education, 20(2), 147162. https://doi.org/10.1080/13613324.2016.1248837CrossRefGoogle Scholar
Bailey, L. R., & Durham, M. (2021). A cheeky investigation: Tracking the semantic change of cheeky from monkeys to wines: Can social media spread linguistic change? English Today, 37(4), 214223. https://doi.org/10.1017/S0266078420000073CrossRefGoogle Scholar
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on Fairness, Accountability, and Transparency (FAccT’21) (pp. 610623). Association for Computing Machinery. https://doi.org/10.1145/3442188.3445922CrossRefGoogle Scholar
Bender, E. M., & Grissom, A. II. (2024). Power shift: Toward inclusive natural language processing. In Hudley, A. H. C., Mallinson, C. & Bucholtz, M. (Eds.), Inclusion in linguistics (pp. 199224). Oxford University Press. https://doi.org/10.1093/oso/9780197755303.003.0010CrossRefGoogle Scholar
Benjamin, R. (2019). Race after technology: Abolitionist tools for the new Jim code. Polity Press.Google Scholar
Bonilla-Silva, E. (2021). Racism without racists: Color-blind racism and the persistence of racial inequality in America. Rowman & Littlefield.Google Scholar
Bourdieu, P. (1991). Language and symbolic power. Harvard University Press.Google Scholar
Brauneis, R., & Oliar, D. (2018). An empirical study of the race, ethnicity, gender, and age of copyright registrants. George Washington Law Review, 86(1), 4698.Google Scholar
Bucholtz, M. (2003). Sociolinguistic nostalgia and the authentication of identity. Journal of Sociolinguistics, 7(3), 398416. https://doi.org/10.1111/1467-9481.00232CrossRefGoogle Scholar
Davidson, T., Bhattacharya, D., & Weber, I. (2019). Racial bias in hate speech and abusive language detection datasets. In Proceedings of the Third Workshop on Abusive Language Online (pp. 2535). Association for Computational Linguistics. https://doi.org/10.18653/v1/W19-3504CrossRefGoogle Scholar
Eckert, P., & Rickford, J. R., Eds. (2001). Style and sociolinguistic variation. Cambridge University Press. https://doi.org/10.1017/CBO9780511613258Google Scholar
Eichstaedt, J. C., Schwartz, H. A., Kern, M. L., Park, G., Labarthe, D. R., Merchant, R. M., … Seligman, M. E. (2015). Psychological language on Twitter predicts county-level heart disease mortality. Psychological Science, 26(2), 159169. https://doi.org/10.1177/0956797614557867CrossRefGoogle ScholarPubMed
Eisenstein, J., O’Connor, B., Smith, N. A., & Xing, E. P. (2014). Diffusion of lexical change in social media. PLoS One, 9(11), . https://doi.org/10.1371/journal.pone.0113114CrossRefGoogle ScholarPubMed
Fishman, J. A. (1997). The sociology of language. In Coupland, N. & Jaworski, A. (Eds.), Sociolinguistics: A reader and coursebook (pp. 2530). Macmillan Press. https://doi.org/10.1007/978-1-349-25582-5_4CrossRefGoogle Scholar
Flores, N., & Rosa, J. (2015). Undoing appropriateness: Raciolinguistic ideologies and language diversity in education. Harvard Educational Review, 85(2), 149171. https://doi.org/10.17763/0017-8055.85.2.149CrossRefGoogle Scholar
Gershman, S. J., & Cikara, M. (2023). Structure learning principles of stereotype change. Psychonomic Bulletin and Review, 30(4), 12731293. https://doi.org/10.3758/s13423-023-02252-yCrossRefGoogle ScholarPubMed
Gururangan, S., Card, D., Dreier, S. K., Gade, E. K., Wang, L. Z., Wang, Z., … Smith, N. A. (2022). Whose language counts as high quality? Measuring language ideologies in text data selection. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (pp. 25622580). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.emnlp-main.165CrossRefGoogle Scholar
Haraway, D. (2013). A cyborg manifesto: Science, technology, and socialist-feminism in the late twentieth century. In Stryker, S & Whittle, S. (Eds.), The transgender studies reader (). Routledge. https://doi.org/10.1007/978-1-4020-3803-7_4Google Scholar
Hinojos, S. (2023). 4 “Accented” Latinx Textese: Bilingual scriptural economies and digital literacies. In Rangan, P. (Ed.), Thinking with an accent: Toward a new object, method, and practice (pp. 7394). University of California Press. https://doi.org/10.1525/9780520389748-009Google Scholar
Hofmann, V., Kalluri, P. R., Jurafsky, D., & King, S. (2024). AI generates covertly racist decisions about people based on their dialect. Nature, 633(8028), 147154. https://doi.org/10.1038/s41586-024-07856-5CrossRefGoogle ScholarPubMed
Hofstra, B., Kulkarni, V. V., Munoz-Najar Galvez, S., He, B., Jurafsky, D., & McFarland, D. A. (2020). The diversity–innovation paradox in science. Proceedings of the National Academy of Sciences, 117(17), 92849291. https://doi.org/10.1073/pnas.1915378117CrossRefGoogle ScholarPubMed
Höhn, S., Migge, B., Dippold, D., Schneider, B., & Mauw, S. (2023). Language ideology bias in conversational technology. In International Workshop on Chatbot Research and Design (pp. 133148). Springer. https://doi.org/10.1007/978-3-031-54975-5_8Google Scholar
Horner, K., & Weber, J. J. (2017). Introducing multilingualism: A social approach. Routledge. https://doi.org/10.4324/9781315276892CrossRefGoogle Scholar
Huang, H., Grieve, J., Jiao, L., & Cai, Z. (2024). Geographic structure of Chinese dialects: A computational dialectometric approach. Linguistics, 62(4), 937976. https://doi.org/10.1515/ling-2021-0138CrossRefGoogle Scholar
Huang, Y., Guo, D., Kasakoff, A., & Grieve, J. (2016). Understanding US regional linguistic variation with Twitter data analysis. Computers, Environment and Urban Systems, 59, 244255. https://doi.org/10.1016/j.compenvurbsys.2015.12.003CrossRefGoogle Scholar
Hudley, A. H. C., Mallinson, C., & Bucholtz, M. (2022). Talking college: Making space for black language practices in higher education. Teachers College Press.Google Scholar
Hudley, A. H. C., Mallinson, C., & Bucholtz, M. (2024). Inclusion in linguistics. Oxford University Press. https://doi.org/10.1093/oso/9780197755393.001.0001CrossRefGoogle Scholar
Ilbury, C. (2020). “Sassy queens”: Stylistic orthographic variation in Twitter and the enregisterment of AAVE. Journal of Sociolinguistics, 24(2), 245264. https://doi.org/10.1111/josl.12366CrossRefGoogle Scholar
Joint Task Force on Computing Curricula, Association for Computing Machinery (ACM), & IEEE Computer Society. (2013). Computer science curricula 2013: Curriculum guidelines for undergraduate degree programs in computer science. Association for Computing Machinery. https://doi.org/10.1145/2534860CrossRefGoogle Scholar
Kedrick, K., Levitskaya, E., & Funk, R. J. (2022). Investigating writing style as a contributor to gender gaps in science and technology. arXiv preprint arXiv:2204.13805.Google Scholar
Keith, V. M., & Herring, C. (1991). Skin tone and stratification in the black community. American Journal of Sociology, 97(3), 760778. https://doi.org/10.1086/229819CrossRefGoogle Scholar
King, S. (2020). From African American vernacular English to African American language: Rethinking the study of race and language in African Americans’ speech. Annual Review of Linguistics, 6(1), 285300. https://doi.org/10.1146/annurev-linguistics-011619-030556CrossRefGoogle Scholar
Knox, L. (2023). Admissions offices deploy AI. Inside Higher Education. Retrieved January 30, 2025 from https://www.insidehighered.com/news/admissions/traditional-age/2023/10/09/admissions-offices-turn-ai-application-reviewsGoogle Scholar
Kouritzin, S., & Nakagawa, S. (2018). Toward a non-extractive research ethics for transcultural, translingual research: Perspectives from the coloniser and the colonised. Journal of Multilingual and Multicultural Development, 39(8), 675687. https://doi.org/10.1080/01434632.2018.1427755CrossRefGoogle Scholar
Kutlu, E. (2023). Now you see me, now you mishear me: Raciolinguistic accounts of speech perception in different English varieties. Journal of Multilingual and Multicultural Development, 44(6), 511525. https://doi.org/10.1080/01434632.2020.1835929CrossRefGoogle Scholar
Labov, W. (2010). Unendangered dialect, endangered people: The case of African American vernacular English. Transforming Anthropology, 18(1), 1527. https://doi.org/10.1111/j.1548-7466.2010.01066.xCrossRefGoogle Scholar
Liang, W., Izzo, Z., Zhang, Y., Lepp, H., Cao, H., Zhao, X., … Zou, J. Y. (2024). Monitoring AI-modified content at scale. A case study on the impact of ChatGPT on AI conference peer reviews. Proceedings of the 41st International Conference on Machine Learning, 235, 2957529620. JMLR.org.Google Scholar
Mallinson, C. (2024). Linguistic variation and linguistic inclusion in the us educational context. Annual Review of Linguistics, 10(1), 3757. https://doi.org/10.1146/annurev-linguistics-031120-121546.CrossRefGoogle Scholar
Masis, T., Eggleston, C., Green, L. J., Jones, T., Armstrong, M., & O’Connor, B. (2023). Investigating morphosyntactic variation in African American English on Twitter. Society for Computation in Linguistics, 6(1), 392393. https://doi.org/10.7275/zdg0-0914Google Scholar
Monk, E. P. Jr. (2015). The cost of color: Skin color, discrimination, and health among African-Americans. American Journal of Sociology, 121(2), 396444. https://doi.org/10.1086/682162CrossRefGoogle ScholarPubMed
Moon, K., Green, A., & Kushlev, K. (2024). Homogenizing effect of a large language model (LLM) on creative diversity: An empirical comparison of human and ChatGPT writing. PsyArXiv. https://doi.org/10.31234/osf.io/8p9wuGoogle Scholar
Morgan, M. (1994). Theories and politics in African American English. Annual Review of Anthropology, 23(1), 325345. https://doi.org/10.1146/annurev.an.23.100194.001545CrossRefGoogle Scholar
Mosteller, F., & Wallace, D. L. (1963). Inference in an authorship problem: A comparative study of discrimination methods applied to the authorship of the disputed federalist papers. Journal of the American Statistical Association, 58(302), 275309. https://doi.org/10.1080/01621459.1963.10500849Google Scholar
Neff, G., & Nagy, P. (2016). Automation, algorithms, and politics| Talking to bots: Symbiotic agency and the case of Tay. International Journal of Communication, 10, 49154931.Google Scholar
Ostrowski, D. (2020). Who wrote that? Authorship controversies from Moses to Sholokhov. Cornell University Press. https://doi.org/10.1515/9781501749728Google Scholar
Payne, A. L., Austin, T., & Clemons, A. M. (2024). Beyond the front yard: The dehumanizing message of accent-altering technology. Applied Linguistics, 45(3), 553560. https://doi.org/10.1093/applin/amae002CrossRefGoogle Scholar
Porter, G. (2012). Mobile phones, livelihoods and the poor in sub-Saharan Africa: Review and prospect. Geography Compass, 6(5), 241259. https://doi.org/10.1111/j.1749-8198.2012.00484.xCrossRefGoogle Scholar
Ramjattan, V. A. (2019). Raciolinguistics and the aesthetic labourer. Journal of Industrial Relations, 61(5), 726738. https://doi.org/10.1177/0022185618792990CrossRefGoogle Scholar
Ramjattan, V. A. (2022). Accenting racism in labour migration. Annual Review of Applied Linguistics, 42, 8792. https://doi.org/10.1017/S0267190521000143CrossRefGoogle Scholar
Rey, S. J., & Knaap, E. (2024). The legacy of redlining: A spatial dynamics perspective. International Regional Science Review, 47(1), 344. https://doi.org/10.1177/01600176221116566CrossRefGoogle Scholar
Ritzer, G. (2021). The McDonaldization of society. In Mofield, E & Stambaugh, T (Eds.), In the mind’s eye (pp. 143152). Routledge.10.4324/9781003235750-15CrossRefGoogle Scholar
Romero-Rivas, C., Morgan, C., & Collier, T. (2022). Accentism on trial: Categorization/stereotyping and implicit biases predict harsher sentences for foreign-accented defendants. Journal of Language and Social Psychology, 41(2), 191208. https://doi.org/10.1177/0261927X211022785CrossRefGoogle Scholar
Roth, W. D., van Stee, E. G., & Regla-Vargas, A. (2023). Conceptualizations of race: Essentialism and constructivism. Annual Review of Sociology, 49(1), 3958. https://doi.org/10.1146/annurev-soc-031021-034017CrossRefGoogle Scholar
Roth-Gordon, J., Harris, J., & Zamora, S. (2020). Producing white comfort through “corporate cool”: Linguistic appropriation, social media, and@ BrandsSayingBae. International Journal of the Sociology of Language, 2020(265), 107128. https://doi.org/10.1515/ijsl-2020-2105CrossRefGoogle Scholar
Rothstein, R. (2017). The color of law: A forgotten history of how our government segregated America. Liveright Publishing.Google Scholar
Rubin, D. L. (1992). Nonlanguage factors affecting undergraduates’ judgments of nonnative English-speaking teaching assistants. Research in Higher Education, 33(4), 511531. https://doi.org/10.1007/BF00973770CrossRefGoogle Scholar
Sap, M., Card, D., Gabriel, S., Choi, Y., & Smith, N. A. (2019). The risk of racial bias in hate speech detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 16681678). Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1163CrossRefGoogle Scholar
Schulte, N., Basch, J. M., Hay, H.-S., & Melchers, K. G. (2024). Do ethnic, migration-based, and regional language varieties put applicants at a disadvantage? A meta-analysis of biases in personnel selection. Applied Psychology, 73(4), 18661892. https://doi.org/10.1111/apps.12528CrossRefGoogle Scholar
Smith, G., Fleisig, E., Bossi, M., Rustagi, I., & Yin, X. (2024). Standard language ideology in AI-generated language. arXiv preprint arXiv:2406.08726. https://doi.org/10.48550/arXiv.2406.08726CrossRefGoogle Scholar
Stuhler, O. (2024). The gender agency gap in fiction writing (1850 to 2010). Proceedings of the National Academy of Sciences, 121(29), . https://doi.org/10.1073/pnas.2319514121CrossRefGoogle ScholarPubMed
Tao, Y., Viberg, O., Baker, R. S., & Kizilcec, R. F. (2024). Cultural bias and cultural alignment of large language models. PNAS Nexus, 3(9), . https://doi.org/10.1093/pnasnexus/pgae346CrossRefGoogle ScholarPubMed
Thomas, A. K., McKinney de Royston, M., & Powell, S. (2023). Color-evasive cognition: The unavoidable impact of scientific racism in the founding of a field. Current Directions in Psychological Science, 32(2), 137144. https://doi.org/10.1177/09637214221141713CrossRefGoogle Scholar
van Loon, A., Giorgi, S., Willer, R., & Eichstaedt, J. (2022). Negative associations in word embeddings predict anti-black bias across regions – But only via name frequency. Proceedings of the International AAAI Conference on Web and Social Media, 16, 14191424. https://doi.org/10.1609/icwsm.v16i1.19399CrossRefGoogle ScholarPubMed
Wallentin, M., & Trecca, F. (2023). Cross-cultural sex/gender differences in produced word content before the age of 3 years. Psychological Science, 34(4), 411423. https://doi.org/10.1177/09567976221146537CrossRefGoogle ScholarPubMed
Wang, Z., Arndt, A. D., Singh, S. N., Biernat, M., & Liu, F. (2013). “You lost me at hello”: How and when accent-based biases are expressed and suppressed. International Journal of Research in Marketing, 30(2), 185196. https://doi.org/10.1016/j.ijresmar.2012.09.004CrossRefGoogle Scholar
Watson, K., & Jensen, M. M. (2020). Automatic analysis of dialect literature: Advantages and challenges. In Honeybone, P. & Maguire, W. (Eds.), Dialect writing and the north of England. Edinburgh University Press.Google Scholar
Wright, K. E. (2023). Housing policy and linguistic profiling: An audit study of three American dialects. Language, 99(2), e58e85. https://doi.org/10.1353/lan.2023.a900094CrossRefGoogle Scholar
Zhang, S., Xu, J., & Alvero, A.J. (2025). Generative AI Meets Open-Ended Survey Responses: Research Participant Use of AI and Homogenization. Sociological Methods & Research. https://doi.org/10.1177/00491241251327130CrossRefGoogle Scholar