Introduction
The pedagogy of ancient languages has long been distinct from that of modern languages, primarily due to one significant challenge: the absence of living native speakers. This has led, over the past 200 years, to the teaching of ancient languages being heavily focused on the study of grammar, vocabulary, and translation, an approach often referred to as the ‘Grammar-Translation’ (GT) method. The GT method has been shown to yield mediocre results (e.g. August and Shanahan, Reference August and Shanahan2010; Krashen, Reference Krashen1982; Walter, Reference Walter2008 on modern languages and Koutropoulos, Reference Koutropoulos2011 on ancient languages) in comparison to approaches based on comprehensible input or communication. Comprehensible Input (CI) emphasises the importance of understanding language input that is slightly above the current level of the learner (Krashen, Reference Krashen1982), while communication-focused methods concentrate on language usage and interaction (Canale and Swain, Reference Canale and Swain1980; Nation, 2007; Swain, Reference Swain, Gass and Madden1985). While applying CI theory to ancient languages is relatively straightforward due to its focus on understanding language input, implementing a Communicative Approach (CA) presents a significant challenge. This challenge arises primarily from the absence of living native speakers, which deprives learners and teachers of a source of valid and real-time feedback during communication in the target language. To mitigate this issue, we explore the use of Artificial Intelligence (AI) to partially simulate interaction with a ‘native’ speaker, thereby facilitating communication. The integration of AI into secondary school education, particularly for the teaching of ancient languages, remains a relatively untapped field. Recent literature has seen a few studies and proposals beginning to explore this area. For instance, Ross (Reference Ross2023) discussed a number of theoretical applications for ChatGPT (OpenAI, 2022), with an emphasis on its usability as a virtual tutor. Additionally, Díaz-Sánchez and Chapinal-Heras (Reference Díaz-Sánchez and Chapinal-Heras2023) have proposed using Midjourney – an image generator – to assist students during translation. In prior work, we introduced several practical applications of AI for teaching Latin to high school students. We leveraged AI-based image generators, primarily Midjourney, to help students visualise texts and to illustrate abstract grammatical concepts (Cavaleri, Reference Cavaleri2022). Furthermore, we demonstrated how the integration of a Latinity classifier – a custom-trained ELECTRA model – with a Latin text generator – a custom-trained GPT-2 – could help students engage with various language concepts (Cavaleri, Reference Cavaleri2021, Reference Cavaleri2022).
In this paper, we present the outcomes of integrating AI-based tools into the Latin classroom. Our experiment featured written communicative activities with real-time feedback, aiming to make writing in Latin more engaging and game-like for students – who were not accustomed to producing output in Latin. The importance of incorporating output activities in the process of acquiring a language is well-documented (Erlam et al., Reference Erlam, Philp, Feick, Erlam, Philp and Feick2021; Nation, 2007, 2009; Swain, Reference Swain, Gass and Madden1985, Reference Swain and Hinkel2005). The gamification aspect of the activities, provided by the AI, primarily served as a facilitating and motivating factor. Motivation is a key element in language acquisition, since it acts as a driving force sustaining learner engagement and perseverance, especially when confronted with challenging tasks (e.g. Pike, Reference Pike2015). To align our activities with the syllabi of our collaborating teachers, we focused on Ciceronian idioms. Cicero, a prominent first-century BC Roman lawyer, author, and statesman, is renowned for his speeches, and curricula are often geared towards his Latin. Idioms are mostly language-, sometimes author-specific expressions that cannot be directly translated into another language.
Our focus on Ciceronian idioms, or ‘Ciceronianity’ – a term we use to describe the degree to which language use reflects Cicero's style – was driven by two main reasons: the significant presence of Cicero in current syllabi and the skill set developed from the study of idioms, which is readily transferable to a GT approach – the prevalent method in public schools. For instance, scholars like Swain (Reference Swain, Gass and Madden1985), Irujo (Reference Irujo1986), Lazar (Reference Lazar1996) and Lewis (Reference Lewis1997) have highlighted that focusing on idioms or ‘chunks’ deepens understanding of the language and its cultural context, enhancing language comprehension, increasing learners' capacity to absorb linguistic input and thereby expanding their exposure to the language (see also Boers and Lindstromberg, Reference Boers and Lindstromberg2008).
With this in mind, students were asked not only to write in Latin but to incorporate Ciceronian idioms. They were supported by a lightweight AI-based tool that we developed and made accessible on the web at https://latin-ia.hepl.ch, the Ciceronianity classifier. Given a piece of text, this tool outputs a ‘Ciceronianity score’ that the students tried to incrementally improve through trial and error. As mentioned above, the goal of this scoring system was to make the process of composing Ciceronian Latin more engaging and game-like for students, while also assisting them in building their own ‘toolbox’ of Ciceronian idioms.
Method
Experimental setting
The experiment was conducted over a period of three months in public secondary schools located in the French-speaking part of Switzerland, where Latin is offered as an optional subject. Participants were divided into five distinct groups, consisting of a total of three Latin teachers and 26 students. Teachers were recruited on a voluntary basis and were paid for their participation. The groups represented various stages of Latin proficiency, ensuring a comprehensive analysis across different learning levels. Specifically, our participant pool included six students who were in their 6th year of Latin studies, seven students in their 5th year, ten in their 3rd year and, lastly, three in their 2nd year of Latin. This diversity allowed us to assess the impact of AI-based tools across a spectrum of language competency levels (see Table 1).
The experiment consisted of six unique activities. The introductory activity lasted 1 hour and 30 minutes, while the subsequent activities typically lasted less than 45 minutes each. The teachers could choose and adapt their activities, depending on their pedagogical context. Each activity was conducted either by the participating teacher or by the researcher and concluded with a written report from the teacher. This report included their own observations, the students' observations as well as the students' output. This data was then used as a basis to evaluate the effects on the student's motivation, immersion in the task, as well as knowledge and usage of idioms.
Pedagogical aspects
The central objective of the activities designed for this study was to enhance learners' proficiency in Ciceronian Latin, with the broader goal of improving their overall mastery of Latin idioms through communication. Thus activities were primarily geared towards output. Students had access to a Ciceronianity classifier, which, by providing a real-time ‘Ciceronianity score’, assisted them in ‘Ciceronianizing’ their compositions. To implement the experimental framework, we presented participating teachers with a variety of activities, most of which were tailored to their needs and goals. For instance, activities might include excerpts either previously encountered or written by the author the class was working through, sentence structures that had recently posed problems for the students, or any specific subject of interest be it cultural or linguistic. For example, a group working through Suetonius' description of Domitian was tasked with reformulating Suetonius' words as if Cicero were accusing the emperor in front of the Senate. Other groups were tasked with composing descriptions of Catiline or Verres by mimicking a sentence structure that had been problematic in a recent test. Some groups also worked on excerpts from later periods (e.g., Copernic, Marco Polo).
The first activity served as an introduction to essential concepts, setting the stage for subsequent activities. These foundational concepts included:
Cicero's distinctive style, notably, in his speeches, covered in two steps: first, identifying characteristics of Cicero's style in isolation; and second, distinguishing it from both non-Ciceronian and non-idiomatic Latin, mirroring the classifier's function. Both steps involved pinpointing elements that either lowered or increased the likelihood of a text being authored by Cicero.
The nature and untranslatability of idioms. In this context, the presence of Latin idioms in a text was used as evidence of a higher Ciceronianity.
Characteristics of non-idiomatic Latin, exemplified by non-Latin idioms found in texts generated by ChatGPT or written by contemporary authors. Here, the presence of English idioms in a Latin text was used as evidence of a low Ciceronianity.
The Ciceronianity classifier, initially used to highlight wording, phrasing or formulas it deemed Ciceronian or not, and subsequently, to provide feedback to students when composing in Latin.
In practice, we introduced these concepts in three steps:
1. First, participants were tasked with reading an excerpt of a speech by Cicero (the Pro Cluentio in our case) and highlighting at least three elements that they deemed different from usual prose. The teacher did not present Ciceronian idioms nor correct the participants' answers, but ensured that the following four characteristics were at least mentioned:
a. The Latin text is a speech.
b. The speaker is presenting facts in a biased manner: e.g. ‘nubit genero socrus nullis auspicibus, nullis auctoribus, funestis ominibus omnium.’ [‘And so mother-in-law marries son-in-law, with none to bless, none to sanction the union, and amid nought but general foreboding.’] (Hodge, 1927).
c. The speaker seeks to turn judges against the other side: e.g. ‘tum vero illa egregia ac praeclara mater palam exsultare laetitia, triumphare gaudio coepit, victrix filiae non libidinis.’ [‘Then does this exemplary, this illustrious mother make open display of her delight, revelling and rejoicing in her triumph not over her lust but over her daughter.’] (Hodge, 1927).
d. The speaker seeks to garner the judges' favour: e.g., by positioning themselves as the virtuous party: ‘nonne timuisse, si minus vim deorum hominumque famam, at illam ipsam noctem facesque illas nuptiales, non limen cubiculi, non cubile filiae, non parietes denique ipsos, superiorum testes nuptiarum?’ [‘To think that she did not quail, if not before the vengeance of Heaven, or the scandal among men, at least before the night itself with its wedding torches, the threshold of the bridal chamber, her daughter's bridal bed, or even the walls themselves which had witnessed that other union.’] (Hodge, 1927).
2. Secondly, participants were provided with five Latin sentences expressing a specific idea, one authored by Cicero and another by ChatGPT. The three remaining sentences contained both Ciceronian and non-idiomatic Latin elements. Participants were then tasked with evaluating the Ciceronianity of each sentence according to the previously highlighted elements and identifying those written by Cicero and by ChatGPT. We ensured that the only sentence featuring an untranslatable Latin idiom – that is, one that cannot be transposed into French – was Cicero's sentence. The participants were explicitly made aware of this.
3. Thirdly, the classifier's evaluations of the above sentences were compared with those of the participants and discussed. This allowed participants to assess the accuracy of the classifier and to pinpoint elements that caused the Ciceronianity score to either increase or decrease.
After gaining a better understanding of Cicero's style, Latin idioms, and the classifier, participants received a writing prompt and were tasked with reformulating it in order to ‘fool’ the classifier – that is, to make it believe that they were Cicero himself. This approach, echoing the methodology of our previous experiments with Latin text generators (Cavaleri, Reference Cavaleri2021, Reference Cavaleri2022), positioned the classifier not as an infallible entity, but rather as a fallible learner of the language. The students were explicitly made aware that, like other AIs, this one sometimes makes mistakes.
The subsequent activities were primarily focused on applying the concepts described above and Ciceronianizing both authentic and non-authentic Latin sentences. The primary goal was to achieve the highest possible Ciceronianity score through trial and error, with the aid of the Ciceronianity classifier. Accordingly, the writing exercises were conducted directly from Latin to Latin, minimising the use of the students' primary language as much as possible.
The Latin sentences (or prompts) provided to the students for these activities fell into two categories:
Non-authentic prompts: These included non-idiomatic or less idiomatic Latin sentences, usually created by ChatGPT or by contemporary authors. Activities using non-authentic prompts focused on highlighting the differences between non-idiomatic and idiomatic Latin.
Authentic prompts: These were usually excerpts from Latin authors. Activities using authentic prompts focused on highlighting the differences between idiomatic Latin and Ciceronian Latin.
This variety of prompts enabled students to experience Ciceronianity from different angles (Marton and Tsui, Reference Marton and Tsui2004). Regardless of the prompt type, students, in groups of two to four, were tasked with Ciceronianizing it. For every activity, students were provided with a list of synonyms for the words contained in the writing prompt, along with sample sentences and, if necessary, indications of whether Cicero used them or not. Any additional vocabulary was supplied orally, and the use of dictionaries was discouraged to maintain immersion in the task and time-efficiency. In addition, students had access to laptops and to the Ciceronianity classifier hosted on https://latin-ia.hepl.ch.
The classifier
The Ciceronianity classifier, used for obtaining a Ciceronianity score, is a lightweight, browser-based neural network developed and trained for the purpose of this experiment. Its architecture is inspired by basic sentiment classifiers, with the main difference being that it incorporates a transformer-based embedding layer. This embedding layer was adapted from Google's Universal Sentence Encoder (Cer et al., Reference Cer, Yang, Kong, Hua, Limtiaco, John, Constant, Guajardo-Cespedes, Yuan, Tar, Strope and Kurzweil2018). We trained a WordPiece tokenizer (Wu et al., Reference Wu, Schuster, Chen, Le, Norouzi, Macherey, Krikun, Cao, Gao, Macherey, Klingner, Shah, Johnson, Liu, Kaiser, Gouws, Kato, Kudo, Kazawa and Dean2016) on a hand-assembled Latin corpus to encode Latin inputs before feeding them to the embedding layer. The Universal Sentence Encoder was adapted accordingly. Following the embedding layer, we utilised a dense multilayer perceptronFootnote 1. The model was trained to differentiate authentic Ciceronian inputs from authentic non-Ciceronian inputs. As such the classifier is currently only capable of distinguishing between Ciceronian and non-Ciceronian Latin – not between Latin and other languages. This limitation is inherent to the current model.
Results
Observations
The incorporation of artificial intelligence and the real-time feedback it provided significantly impacted students' motivation levels. As summarised by one teacher: ‘[Students] were captivated by the activity, their curiosity was piqued, and their reflections were stimulated by the AI.’ The use of the classifier spurred students to actively explore, question, and debate different formulations, and experiment with various word combinations in pursuit of a higher Ciceronianity score. This fostered a naturally interactive environment, which often evolved into a class-wide collaborative atmosphere. For example, as one teacher noted, ‘a group of students came up with the idea of [reformulating a statement] as a question, which allowed other groups to make progress (i.e. by increasing their score).’ This collaborative environment motivated students to experiment with various word combinations, synonyms, and phrases, and to incorporate Ciceronian elements, such as addresses to judges, senators, and the Roman people, invocations to the gods, exclamations, and other rhetorical devices. For instance, instead of writing a simple statement like ‘Verres malus est […]’ [‘Verres is evil…’], students were encouraged to adopt a more Ciceronian approach, such as ‘Verres mores pessimos, impuros, improbos nostis, iudices, […]’ [‘You know of Verres’ most evil, impure, and dishonest character, judges, …’]/. By emulating Cicero, students aimed not just to convey information but to persuade and potentially manipulate their hypothetical audience, focusing on presenting facts in a biased or incriminating manner (see Table 2).
Table 3 presents an example that illustrates the trial-and-error process students used to increasingly mimic Cicero's style, experimenting with Latin constructions without fear of making mistakes.
Feedback
Overall, the students' feedback on the activities was predominantly positive. The engaging and enjoyable nature of the activities was especially appreciated. The majority of students acknowledged the value of focusing specifically on idioms for language acquisition, although a few of them questioned the benefits of practising output in Latin.
The teachers' feedback was also positive. The fun aspect of the activities, stemming from the scoring system and real-time feedback, was particularly noted for its positive impact on student motivation. In addition, the teachers valued the emphasis on stylistic and idiomatic aspects of Latin, recognising its importance in enhancing students' comprehension skills. For example, one participating teacher observed the following regarding the usage of infinitive clauses:
[The] task of composing an infinitive clause in Latin held a more prominent utility in the students’ minds than when they simply had to translate sentences from French into Latin that required an infinitive clause. Through this exercise, the utility of an infinitive clause was better understood.
However, the difficulty level of the activities was sometimes found to be too challenging, especially for groups with less than three years of Latin experience. Nevertheless, in these instances, teachers acknowledged that students were still capable of achieving high Ciceronianity scores (exceeding 70%). Moreover, occasional concerns arose regarding the classifier's reliability, as it is designed solely to distinguish between Ciceronian and non-Ciceronian Latin, and not to detect mistakes in Latin.
Discussion
As noted above, the introductory activity's main objective was to familiarise students with the theoretical concepts necessary for subsequent output activities, as well as to provide an initial exposure to Ciceronian composition. During this phase, students primarily focused on identifying elements from Cicero's style to emulate. Following the introduction, participants engaged in a series of practical activities designed to help them incorporate these stylistic elements into their own compositions. A key component of our experimental design was providing each participant with access to an online Ciceronianity classifier. This allowed for real-time, personalised feedback on their output. During the output-centred activities, students attempted to re-use Ciceronian phrases and experimented with various synonyms, word orders, and phrasing to determine, through trial and error, which elements would boost their Ciceronianity score. This real-time scoring system was designed to introduce a game-like element to the exercises, thereby making the process of producing Latin more engaging and accessible. It also aimed to reduce mental barriers to writing in Latin – a new experience for the students – and to promote Latin-to-Latin work, without relying on the language of schooling.
Through a process of experimentation, trial and error, and discussion, students were prompted to read, research, and internalise Ciceronian idioms, ultimately compiling a ‘toolbox’ of Ciceronian expressions. For example, they learned that choosing a particular wording might add +0.1% to their score, a specific phrase could add +2%, and so forth. This toolbox concept was especially pertinent in the second part of the experiment, aligning with class readings and addressing translation challenges, primarily targeting comprehension difficulties. However, it also found other applications beyond the experiment's context, as one teacher noted: ‘I refer to [the ‘Ciceronianity score’] occasionally to highlight an idiomatic expression or a Ciceronian stylistic effect in our readings.’ This activity, differing from the usual types of activities encountered in the Latin class, thus served as a memory anchor for the concepts and notions addressed when using the AI.
Overall, the activities heightened students' awareness of Cicero's stylistic features and enhanced their mastery of Ciceronian, and by extension, Latin idioms. The format of the activities, coupled with the use of a classifier providing real-time scores, facilitated students' engagement in the activities and, importantly, promoted Latin-to-Latin output exercises.
In terms of difficulty, the activities were ideally suited for students with prior experience in Ciceronian texts, or those who had recently studied a text by Cicero, enabling a more in-depth analysis in the reformulation exercises. Nevertheless, even students without this background managed to achieve high scores (above 70%). It is noteworthy that the length of the prompts, rather than their origin, tended to increase the exercise's difficulty level. Hence, a composition exercise was not inherently more challenging than reformulating a text by a classical author or one generated by ChatGPT, given similar prompt lengths.
In addition to aiding students in acquiring Latin, these activities enabled interaction with a simple AI, showcasing issues commonly found in available AIs, yet more conspicuously. For instance, students noted that the classifier could make mistakes and was unable to provide explanations for its output, a concern of significant relevance in various contexts where AI is being used (e.g., judiciary evaluations, oversight and control, benefit allocation, etc.). This situation can serve as a foundation to explore the concept of performativity with the students – that the AI, through its widespread usage, essentially comes to define what it is supposed to measure. In our case, this amounts to asking whether the Ciceronianity of a text exists independently of this AI's framework and if the scores issued by the AI should be accepted without scrutiny. It also opens up critical discussions on who holds the authority to qualify a sentence as idiomatic in the absence of native speakers. Such discussions illustrate how the study of ancient languages can be used to think about complex contemporary issues.
Conclusion
In this paper, we presented an AI classifier that we developed and trained for pedagogical purposes to distinguish between Latin texts authored by Cicero and texts by other Latin authors. Given a Latin sentence (or prompt), this classifier outputs a ‘Ciceronianity score’ that can be interpreted as a confidence level that the text was authored by Cicero. Along with the classifier, we developed a pedagogical scenario that uses this tool to familiarise Latin students with the concepts of idioms, particularly Ciceronian idioms. After being exposed to the specifics of Cicero's style, the students were given Latin sentences and tasked with increasing their Cicerionianity as determined by the classifier. This activity proved very motivating, and both teachers and students frequently became engrossed in the challenge of trying to outsmart the Ciceronianity classifier and achieve higher scores. The concept of scoring and the immediate feedback gamified the process of writing in Latin. This significantly improved the learning environment by promoting output, interactivity, and exploration. It also facilitated the practical usage of vocabulary, grammar, and idioms, ultimately boosting students' confidence in using Latin. Classroom activities were consistently marked by deep learner immersion and high levels of interactivity. As students progressed through the exercises, they developed their own collection of Ciceronian idioms to incorporate into their compositions. This ‘toolbox’ enabled them to familiarise themselves with Ciceronian idioms, and increase their confidence when writing in Latin.
In conclusion, we believe that employing a Ciceronianity classifier for real-time feedback during output exercises offers significant benefits. These activities not only allow learners to engage more deeply with language elements but also create a conducive learning environment by reducing the perceived difficulty of output tasks and encouraging Latin-to-Latin work. Consequently, by simplifying the process of output production – that is, facilitating communication – this approach has the potential to help align Latin pedagogy more closely with contemporary language teaching methodologies. In addition, this activity can also be used to address important questions about artificial intelligence, its use, reliability and performativity. Looking ahead, we plan to expand the classifier's corpus to include languages other than Latin. This enhancement would address the current decline in accuracy when dealing with non-Latin texts, as the classifier has so far been trained only on authentic Latin. Further research is also necessary to explore whether the model can be utilised or adapted for other applications, such as spell checking, grammar checking, syntax checking on a per-sentence-token basis, and author attribution or identification within the domain of Latin.
Acknowledgments
We extend our sincere gratitude to teachers Amélie Noël, Cendrine Chavan and Samuel Junod for their invaluable participation and contribution to this project. Their expertise and insights were pivotal to the success of our research.