Published online by Cambridge University Press: 19 February 2018
This article presents preliminary findings from a multi-year, multi-disciplinary text analysis project using an ancient and medieval Chinese corpus of over five million characters in works that date from the earliest received texts to the Song dynasty. It describes “distant reading” methods in the humanities and the authors’ corpus; introduces topic-modeling procedures; answers questions about the authors’ data; discusses complementary relationships between machine learning and human expertise; explains topics represented in Analects, Mencius, and Xunzi that set each of those texts apart from the other two; and explains topics that intersect all three texts. The authors’ results confirm many scholarly opinions derived from close-reading methods, suggest a reappraisal of Xunzi’s shared semantic content with Analects, and yield several actionable research questions for traditional scholarship. The aim of this article is to initiate a new conversation about implications of machine learning for the study of Asian texts.