Digital accents, homogeneity-by-design, and the evolving social science of written language

AJ Alvero; Quentin Sedlacek; Maricela León; Courtney Peña

doi:10.1017/S0267190525000042

Digital accents, homogeneity-by-design, and the evolving social science of written language

Published online by Cambridge University Press: 13 June 2025

AJ Alvero ,

Quentin Sedlacek ,

Maricela León and

Courtney Peña

Show author details

AJ Alvero*: Affiliation:
Center for Data Science for Enterprise and Society, Cornell University, Ithaca, NY, USA
Quentin Sedlacek: Affiliation:
Department of Teaching & Learning, Annette Caldwell Simmons School of Education & Human Development, Southern Methodist University, Dallas, TX, USA
Maricela León: Affiliation:
Department of Teaching & Learning, Annette Caldwell Simmons School of Education & Human Development, Southern Methodist University, Dallas, TX, USA
Courtney Peña: Affiliation:
Stanford University School of Medicine, Stanford, CA, USA
*: Corresponding author: AJ Alvero; Email: ajalvero@cornell.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Human language is increasingly written rather than just spoken, primarily due to the proliferation of digital technology in modern life. This trend has enabled the creation of generative artificial intelligence (AI) trained on corpora containing trillions of words extracted from text on the internet. However, current language theory inadequately addresses digital text communication’s unique characteristics and constraints. This paper systematically analyzes and synthesizes existing literature to map the theoretical landscape of digitized language. The evidence demonstrates that, parallel to spoken language, features of written communication are frequently correlated with the socially constructed demographic identities of writers, a phenomenon we refer to as “digital accents.” This conceptualization raises complex ontological questions about the nature of digital text and its relationship to social identity. The same line of questioning, in conjunction with recent research, shows how generative AI systematically fails to capture the breadth of expression observed in human writing, an outcome we call “homogeneity-by-design.” By approaching text-based language from this theoretical framework while acknowledging its inherent limitations, social scientists studying language can strengthen their critical analysis of AI systems and contribute meaningful insights to their development and improvement.

Keywords

large language models computational text analysis AI homogenization sociolinguistics sociology

Type: Research Article
Information: Annual Review of Applied Linguistics , First View , pp. 1 - 19

DOI: https://doi.org/10.1017/S0267190525000042 [Opens in a new window]
Copyright: © The Author(s), 2025. Published by Cambridge University Press.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Adam, D. (2024). Lethal AI weapons are here: How can we control them? Nature, 629(8012), 521–523. https://doi.org/10.1038/d41586-024-01029-0CrossRef Google Scholar

Adames, A. (2023). The cumulative effects of colorism: Race, wealth, and skin tone. Social Forces, 102(2), 539–560. https://doi.org/10.1093/sf/soad038CrossRef Google Scholar

Agarwal, D., Naaman, M., & Vashistha, A. (2024). AI suggestions homogenize writing toward western styles and diminish cultural nuances. arXiv preprint arXiv:2409.11360. https://doi.org/10.48550/arXiv.2409.11360CrossRef Google Scholar

Alvero, A. (2023). Sociolinguistic perspectives on machine learning with text data. In Borch, C. & Pardo-Guerra, J. P. (Eds.), The Oxford handbook of the sociology of machine learning (pp. 79–97). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780197653609.013.15CrossRef Google Scholar

Alvero, A., Arthurs, N., Antonio, A. L., Domingue, B. W., Gebre-Medhin, B., Giebel, S., & Stevens, M. L. (2020). AI and holistic review: Informing human reading in college admissions. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society: (pp. 200–206). Association for Computing Machinery, New York, NY, United States. https://doi.org/10.1145/3375627.3375871CrossRef Google Scholar

Alvero, A., Giebel, S., Gebre-Medhin, B., Antonio, A. L., Stevens, M. L., & Domingue, B. W. (2021). Essay content and style are strongly related to household income and sat scores: Evidence from 60,000 undergraduate applications. Science Advances, 7(42), 1–10. https://doi.org/10.1126/sciadv.abi9031CrossRef Google Scholar PubMed

Alvero, A., Lee, J., Regla-Vargas, A., Kizilcec, R. F., Joachims, T., & Antonio, A. L. (2024). Large language models, social demography, and hegemony: Comparing authorship in human and synthetic text. Journal of Big Data, 11(1), . https://doi.org/10.1186/s40537-024-00986-7CrossRef Google Scholar

Alvero, A., Pal, J., & Moussavian, K. M. (2022). Linguistic, cultural, and narrative capital: Computational and human readings of transfer admissions essays. Journal of Computational Social Science, 5(2), 1709–1734. https://doi.org/10.1007/s42001-022-00185-5CrossRef Google Scholar PubMed

Alvero, A., & Pattichis, R. (2024). Multilingualism and mismatching: Spanish language usage in college admissions essays. Poetics, 105, . https://doi.org/10.1016/j.poetic.2024.101903CrossRef Google Scholar

Anderson, B. R., Shah, J. H., & Kreminski, M. (2024). Homogenization effects of large language models on human creative ideation. In Proceedings of the 16th Conference on Creativity & Cognition (pp. 413–425). Association for Computing Machinery. https://doi.org/10.1145/3635636.3656204CrossRef Google Scholar

Annamma, S. A., Jackson, D. D., & Morrison, D. (2017). Conceptualizing color-evasiveness: Using dis/ability critical race theory to expand a color-blind racial ideology in education and society. Race Ethnicity and Education, 20(2), 147–162. https://doi.org/10.1080/13613324.2016.1248837CrossRef Google Scholar

Bailey, L. R., & Durham, M. (2021). A cheeky investigation: Tracking the semantic change of cheeky from monkeys to wines: Can social media spread linguistic change? English Today, 37(4), 214–223. https://doi.org/10.1017/S0266078420000073CrossRef Google Scholar

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on Fairness, Accountability, and Transparency (FAccT’21) (pp. 610–623). Association for Computing Machinery. https://doi.org/10.1145/3442188.3445922CrossRef Google Scholar

Bender, E. M., & Grissom, A. II. (2024). Power shift: Toward inclusive natural language processing. In Hudley, A. H. C., Mallinson, C. & Bucholtz, M. (Eds.), Inclusion in linguistics (pp. 199–224). Oxford University Press. https://doi.org/10.1093/oso/9780197755303.003.0010CrossRef Google Scholar

Benjamin, R. (2019). Race after technology: Abolitionist tools for the new Jim code. Polity Press.Google Scholar

Bonilla-Silva, E. (2021). Racism without racists: Color-blind racism and the persistence of racial inequality in America. Rowman & Littlefield.Google Scholar

Bourdieu, P. (1991). Language and symbolic power. Harvard University Press.Google Scholar

Brauneis, R., & Oliar, D. (2018). An empirical study of the race, ethnicity, gender, and age of copyright registrants. George Washington Law Review, 86(1), 46–98.Google Scholar

Bucholtz, M. (2003). Sociolinguistic nostalgia and the authentication of identity. Journal of Sociolinguistics, 7(3), 398–416. https://doi.org/10.1111/1467-9481.00232CrossRef Google Scholar

Davidson, T., Bhattacharya, D., & Weber, I. (2019). Racial bias in hate speech and abusive language detection datasets. In Proceedings of the Third Workshop on Abusive Language Online (pp. 25–35). Association for Computational Linguistics. https://doi.org/10.18653/v1/W19-3504CrossRef Google Scholar

Eckert, P., & Rickford, J. R., Eds. (2001). Style and sociolinguistic variation. Cambridge University Press. https://doi.org/10.1017/CBO9780511613258Google Scholar

Eichstaedt, J. C., Schwartz, H. A., Kern, M. L., Park, G., Labarthe, D. R., Merchant, R. M., … Seligman, M. E. (2015). Psychological language on Twitter predicts county-level heart disease mortality. Psychological Science, 26(2), 159–169. https://doi.org/10.1177/0956797614557867CrossRef Google Scholar PubMed

Eisenstein, J., O’Connor, B., Smith, N. A., & Xing, E. P. (2014). Diffusion of lexical change in social media. PLoS One, 9(11), . https://doi.org/10.1371/journal.pone.0113114CrossRef Google Scholar PubMed

Fishman, J. A. (1997). The sociology of language. In Coupland, N. & Jaworski, A. (Eds.), Sociolinguistics: A reader and coursebook (pp. 25–30). Macmillan Press. https://doi.org/10.1007/978-1-349-25582-5_4CrossRef Google Scholar

Flores, N., & Rosa, J. (2015). Undoing appropriateness: Raciolinguistic ideologies and language diversity in education. Harvard Educational Review, 85(2), 149–171. https://doi.org/10.17763/0017-8055.85.2.149CrossRef Google Scholar

Gershman, S. J., & Cikara, M. (2023). Structure learning principles of stereotype change. Psychonomic Bulletin and Review, 30(4), 1273–1293. https://doi.org/10.3758/s13423-023-02252-yCrossRef Google Scholar PubMed

Gururangan, S., Card, D., Dreier, S. K., Gade, E. K., Wang, L. Z., Wang, Z., … Smith, N. A. (2022). Whose language counts as high quality? Measuring language ideologies in text data selection. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (pp. 2562–2580). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.emnlp-main.165CrossRef Google Scholar

Haraway, D. (2013). A cyborg manifesto: Science, technology, and socialist-feminism in the late twentieth century. In Stryker, S & Whittle, S. (Eds.), The transgender studies reader (). Routledge. https://doi.org/10.1007/978-1-4020-3803-7_4Google Scholar

Hinojos, S. (2023). 4 “Accented” Latinx Textese: Bilingual scriptural economies and digital literacies. In Rangan, P. (Ed.), Thinking with an accent: Toward a new object, method, and practice (pp. 73–94). University of California Press. https://doi.org/10.1525/9780520389748-009Google Scholar

Hofmann, V., Kalluri, P. R., Jurafsky, D., & King, S. (2024). AI generates covertly racist decisions about people based on their dialect. Nature, 633(8028), 147–154. https://doi.org/10.1038/s41586-024-07856-5CrossRef Google Scholar PubMed

Hofstra, B., Kulkarni, V. V., Munoz-Najar Galvez, S., He, B., Jurafsky, D., & McFarland, D. A. (2020). The diversity–innovation paradox in science. Proceedings of the National Academy of Sciences, 117(17), 9284–9291. https://doi.org/10.1073/pnas.1915378117CrossRef Google Scholar PubMed

Höhn, S., Migge, B., Dippold, D., Schneider, B., & Mauw, S. (2023). Language ideology bias in conversational technology. In International Workshop on Chatbot Research and Design (pp. 133–148). Springer. https://doi.org/10.1007/978-3-031-54975-5_8Google Scholar

Horner, K., & Weber, J. J. (2017). Introducing multilingualism: A social approach. Routledge. https://doi.org/10.4324/9781315276892CrossRef Google Scholar

Huang, H., Grieve, J., Jiao, L., & Cai, Z. (2024). Geographic structure of Chinese dialects: A computational dialectometric approach. Linguistics, 62(4), 937–976. https://doi.org/10.1515/ling-2021-0138CrossRef Google Scholar

Huang, Y., Guo, D., Kasakoff, A., & Grieve, J. (2016). Understanding US regional linguistic variation with Twitter data analysis. Computers, Environment and Urban Systems, 59, 244–255. https://doi.org/10.1016/j.compenvurbsys.2015.12.003CrossRef Google Scholar

Hudley, A. H. C., Mallinson, C., & Bucholtz, M. (2022). Talking college: Making space for black language practices in higher education. Teachers College Press.Google Scholar

Hudley, A. H. C., Mallinson, C., & Bucholtz, M. (2024). Inclusion in linguistics. Oxford University Press. https://doi.org/10.1093/oso/9780197755393.001.0001CrossRef Google Scholar

Ilbury, C. (2020). “Sassy queens”: Stylistic orthographic variation in Twitter and the enregisterment of AAVE. Journal of Sociolinguistics, 24(2), 245–264. https://doi.org/10.1111/josl.12366CrossRef Google Scholar

Joint Task Force on Computing Curricula, Association for Computing Machinery (ACM), & IEEE Computer Society. (2013). Computer science curricula 2013: Curriculum guidelines for undergraduate degree programs in computer science. Association for Computing Machinery. https://doi.org/10.1145/2534860CrossRef Google Scholar

Kedrick, K., Levitskaya, E., & Funk, R. J. (2022). Investigating writing style as a contributor to gender gaps in science and technology. arXiv preprint arXiv:2204.13805.Google Scholar

Keith, V. M., & Herring, C. (1991). Skin tone and stratification in the black community. American Journal of Sociology, 97(3), 760–778. https://doi.org/10.1086/229819CrossRef Google Scholar

King, S. (2020). From African American vernacular English to African American language: Rethinking the study of race and language in African Americans’ speech. Annual Review of Linguistics, 6(1), 285–300. https://doi.org/10.1146/annurev-linguistics-011619-030556CrossRef Google Scholar

Knox, L. (2023). Admissions offices deploy AI. Inside Higher Education. Retrieved January 30, 2025 from https://www.insidehighered.com/news/admissions/traditional-age/2023/10/09/admissions-offices-turn-ai-application-reviews Google Scholar

Kouritzin, S., & Nakagawa, S. (2018). Toward a non-extractive research ethics for transcultural, translingual research: Perspectives from the coloniser and the colonised. Journal of Multilingual and Multicultural Development, 39(8), 675–687. https://doi.org/10.1080/01434632.2018.1427755CrossRef Google Scholar

Kutlu, E. (2023). Now you see me, now you mishear me: Raciolinguistic accounts of speech perception in different English varieties. Journal of Multilingual and Multicultural Development, 44(6), 511–525. https://doi.org/10.1080/01434632.2020.1835929CrossRef Google Scholar

Labov, W. (2010). Unendangered dialect, endangered people: The case of African American vernacular English. Transforming Anthropology, 18(1), 15–27. https://doi.org/10.1111/j.1548-7466.2010.01066.xCrossRef Google Scholar

Liang, W., Izzo, Z., Zhang, Y., Lepp, H., Cao, H., Zhao, X., … Zou, J. Y. (2024). Monitoring AI-modified content at scale. A case study on the impact of ChatGPT on AI conference peer reviews. Proceedings of the 41st International Conference on Machine Learning, 235, 29575–29620. JMLR.org.Google Scholar

Mallinson, C. (2024). Linguistic variation and linguistic inclusion in the us educational context. Annual Review of Linguistics, 10(1), 37–57. https://doi.org/10.1146/annurev-linguistics-031120-121546.CrossRef Google Scholar

Masis, T., Eggleston, C., Green, L. J., Jones, T., Armstrong, M., & O’Connor, B. (2023). Investigating morphosyntactic variation in African American English on Twitter. Society for Computation in Linguistics, 6(1), 392–393. https://doi.org/10.7275/zdg0-0914Google Scholar

Monk, E. P. Jr. (2015). The cost of color: Skin color, discrimination, and health among African-Americans. American Journal of Sociology, 121(2), 396–444. https://doi.org/10.1086/682162CrossRef Google Scholar PubMed

Moon, K., Green, A., & Kushlev, K. (2024). Homogenizing effect of a large language model (LLM) on creative diversity: An empirical comparison of human and ChatGPT writing. PsyArXiv. https://doi.org/10.31234/osf.io/8p9wuGoogle Scholar

Morgan, M. (1994). Theories and politics in African American English. Annual Review of Anthropology, 23(1), 325–345. https://doi.org/10.1146/annurev.an.23.100194.001545CrossRef Google Scholar

Mosteller, F., & Wallace, D. L. (1963). Inference in an authorship problem: A comparative study of discrimination methods applied to the authorship of the disputed federalist papers. Journal of the American Statistical Association, 58(302), 275–309. https://doi.org/10.1080/01621459.1963.10500849Google Scholar

Neff, G., & Nagy, P. (2016). Automation, algorithms, and politics| Talking to bots: Symbiotic agency and the case of Tay. International Journal of Communication, 10, 4915–4931.Google Scholar

Ostrowski, D. (2020). Who wrote that? Authorship controversies from Moses to Sholokhov. Cornell University Press. https://doi.org/10.1515/9781501749728Google Scholar

Payne, A. L., Austin, T., & Clemons, A. M. (2024). Beyond the front yard: The dehumanizing message of accent-altering technology. Applied Linguistics, 45(3), 553–560. https://doi.org/10.1093/applin/amae002CrossRef Google Scholar

Porter, G. (2012). Mobile phones, livelihoods and the poor in sub-Saharan Africa: Review and prospect. Geography Compass, 6(5), 241–259. https://doi.org/10.1111/j.1749-8198.2012.00484.xCrossRef Google Scholar

Ramjattan, V. A. (2019). Raciolinguistics and the aesthetic labourer. Journal of Industrial Relations, 61(5), 726–738. https://doi.org/10.1177/0022185618792990CrossRef Google Scholar

Ramjattan, V. A. (2022). Accenting racism in labour migration. Annual Review of Applied Linguistics, 42, 87–92. https://doi.org/10.1017/S0267190521000143CrossRef Google Scholar

Rey, S. J., & Knaap, E. (2024). The legacy of redlining: A spatial dynamics perspective. International Regional Science Review, 47(1), 3–44. https://doi.org/10.1177/01600176221116566CrossRef Google Scholar

Ritzer, G. (2021). The McDonaldization of society. In Mofield, E & Stambaugh, T (Eds.), In the mind’s eye (pp. 143–152). Routledge.10.4324/9781003235750-15CrossRef Google Scholar

Romero-Rivas, C., Morgan, C., & Collier, T. (2022). Accentism on trial: Categorization/stereotyping and implicit biases predict harsher sentences for foreign-accented defendants. Journal of Language and Social Psychology, 41(2), 191–208. https://doi.org/10.1177/0261927X211022785CrossRef Google Scholar

Roth, W. D., van Stee, E. G., & Regla-Vargas, A. (2023). Conceptualizations of race: Essentialism and constructivism. Annual Review of Sociology, 49(1), 39–58. https://doi.org/10.1146/annurev-soc-031021-034017CrossRef Google Scholar

Roth-Gordon, J., Harris, J., & Zamora, S. (2020). Producing white comfort through “corporate cool”: Linguistic appropriation, social media, and@ BrandsSayingBae. International Journal of the Sociology of Language, 2020(265), 107–128. https://doi.org/10.1515/ijsl-2020-2105CrossRef Google Scholar

Rothstein, R. (2017). The color of law: A forgotten history of how our government segregated America. Liveright Publishing.Google Scholar

Rubin, D. L. (1992). Nonlanguage factors affecting undergraduates’ judgments of nonnative English-speaking teaching assistants. Research in Higher Education, 33(4), 511–531. https://doi.org/10.1007/BF00973770CrossRef Google Scholar

Sap, M., Card, D., Gabriel, S., Choi, Y., & Smith, N. A. (2019). The risk of racial bias in hate speech detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 1668–1678). Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1163CrossRef Google Scholar

Schulte, N., Basch, J. M., Hay, H.-S., & Melchers, K. G. (2024). Do ethnic, migration-based, and regional language varieties put applicants at a disadvantage? A meta-analysis of biases in personnel selection. Applied Psychology, 73(4), 1866–1892. https://doi.org/10.1111/apps.12528CrossRef Google Scholar

Smith, G., Fleisig, E., Bossi, M., Rustagi, I., & Yin, X. (2024). Standard language ideology in AI-generated language. arXiv preprint arXiv:2406.08726. https://doi.org/10.48550/arXiv.2406.08726CrossRef Google Scholar

Stuhler, O. (2024). The gender agency gap in fiction writing (1850 to 2010). Proceedings of the National Academy of Sciences, 121(29), . https://doi.org/10.1073/pnas.2319514121CrossRef Google Scholar PubMed

Tao, Y., Viberg, O., Baker, R. S., & Kizilcec, R. F. (2024). Cultural bias and cultural alignment of large language models. PNAS Nexus, 3(9), . https://doi.org/10.1093/pnasnexus/pgae346CrossRef Google Scholar PubMed

Thomas, A. K., McKinney de Royston, M., & Powell, S. (2023). Color-evasive cognition: The unavoidable impact of scientific racism in the founding of a field. Current Directions in Psychological Science, 32(2), 137–144. https://doi.org/10.1177/09637214221141713CrossRef Google Scholar

van Loon, A., Giorgi, S., Willer, R., & Eichstaedt, J. (2022). Negative associations in word embeddings predict anti-black bias across regions – But only via name frequency. Proceedings of the International AAAI Conference on Web and Social Media, 16, 1419–1424. https://doi.org/10.1609/icwsm.v16i1.19399CrossRef Google Scholar PubMed

Wallentin, M., & Trecca, F. (2023). Cross-cultural sex/gender differences in produced word content before the age of 3 years. Psychological Science, 34(4), 411–423. https://doi.org/10.1177/09567976221146537CrossRef Google Scholar PubMed

Wang, Z., Arndt, A. D., Singh, S. N., Biernat, M., & Liu, F. (2013). “You lost me at hello”: How and when accent-based biases are expressed and suppressed. International Journal of Research in Marketing, 30(2), 185–196. https://doi.org/10.1016/j.ijresmar.2012.09.004CrossRef Google Scholar

Watson, K., & Jensen, M. M. (2020). Automatic analysis of dialect literature: Advantages and challenges. In Honeybone, P. & Maguire, W. (Eds.), Dialect writing and the north of England. Edinburgh University Press.Google Scholar

Wright, K. E. (2023). Housing policy and linguistic profiling: An audit study of three American dialects. Language, 99(2), e58–e85. https://doi.org/10.1353/lan.2023.a900094CrossRef Google Scholar

Zhang, S., Xu, J., & Alvero, A.J. (2025). Generative AI Meets Open-Ended Survey Responses: Research Participant Use of AI and Homogenization. Sociological Methods & Research. https://doi.org/10.1177/00491241251327130CrossRef Google Scholar

Article contents

Digital accents, homogeneity-by-design, and the evolving social science of written language

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests