No CrossRef data available.
Published online by Cambridge University Press: 13 June 2025
Human language is increasingly written rather than just spoken, primarily due to the proliferation of digital technology in modern life. This trend has enabled the creation of generative artificial intelligence (AI) trained on corpora containing trillions of words extracted from text on the internet. However, current language theory inadequately addresses digital text communication’s unique characteristics and constraints. This paper systematically analyzes and synthesizes existing literature to map the theoretical landscape of digitized language. The evidence demonstrates that, parallel to spoken language, features of written communication are frequently correlated with the socially constructed demographic identities of writers, a phenomenon we refer to as “digital accents.” This conceptualization raises complex ontological questions about the nature of digital text and its relationship to social identity. The same line of questioning, in conjunction with recent research, shows how generative AI systematically fails to capture the breadth of expression observed in human writing, an outcome we call “homogeneity-by-design.” By approaching text-based language from this theoretical framework while acknowledging its inherent limitations, social scientists studying language can strengthen their critical analysis of AI systems and contribute meaningful insights to their development and improvement.