1. Introduction
The proliferation of artificial intelligence (AI) in natural language processing has guided the construction of intelligent chatbots gifted at comprehending and generating human language (Caldarini, Jaf & McGarry, Reference Caldarini, Jaf and McGarry2022). The popularity of devices like laptops, wearable tech, and smart home gadgets has expanded the use of readily accessible innovative technologies such as chatbots, embodied agents, and conversational agents like Alexa and Siri (Kukulska-Hulme & Lee, Reference Kukulska-Hulme, Lee, Frederiksen, Larsen, Bradley and Thouësny2020). In their earlier versions, chatbots communicated with users in a textual mode through keyword matching (Jia, Reference Jia2009). However, in the 1980s, chatbots with speech recognition and text-to-speech capabilities were developed (Godwin-Jones, Reference Godwin-Jones, Toffoli, Sockett and Kusyk2023). The combination of automatic speech recognition and text-to-speech technologies in AI-powered chatbots has resulted in a mechanism capable of conversing naturally and interactively with learners (Belda-Medina & Kokošková, Reference Belda-Medina and Kokošková2023). Although both text- and voice-based modalities have been employed towards learning goals, each has had different use venues. Voice modality chatbots have been mainly used in language classes to transform classroom experience to construct aural input and opportunities for oral interaction. Learners are increasingly adopting voice-based chatbots because, compared to text-based interaction, the voice-based interaction modality is more akin to face-to-face interaction and has certain benefits over text-based communication. Namely, it exposes learners to paralinguistic elements like stress, intonation, and other suprasegmental components (Rassaei, Reference Rassaei2023).
Thus, the adoption of voice-based AI chatbots in English language learning has surged in recent years (Hwang, Guo, Hoang, Chang & Wu, Reference Hwang, Guo, Hoang, Chang and Wu2022; K.-A. Lee & Lim, Reference Lee and Lim2023; S. Lee & Jeon, Reference Lee and Jeon2022; Park, Reference Park2022; Timpe-Laughlin, Sydorenko & Daurio, Reference Timpe-Laughlin, Sydorenko and Daurio2022; Wu, Lam, Kong & Wong, Reference Wu, Lam, Kong and Wong2023) and their use is becoming more widespread in line with the extensive use of portable computer-assisted language learning (CALL) and mobile-assisted language learning (MALL) devices (Kukulska-Hulme & Lee, Reference Kukulska-Hulme, Lee, Frederiksen, Larsen, Bradley and Thouësny2020). However, more recent and comprehensive evidence is still needed on implementing voice-based AI chatbots in language classes, as new voice-based and AI-supported chatbot tools are being introduced almost weekly with little or no prior research or piloting in educational contexts. Educators especially are in need of continuous input, support, and guidance in order to keep up with this dynamic field and the use of chatbots, which can be achieved via dynamic and systematic research. In particular, demonstrating how voice-based AI chatbots are used in various countries at different levels of institutions can provide English language teachers and future researchers with guidelines and insights about the practical characteristics of AI chatbots for particular groups of learners. There is also a need for more guidance for English language teachers on the task, material, and activity design for integrating voice-based AI chatbots into their classes. In addition, even though voice-based AI chatbots are gaining popularity with a general audience, there are still areas that need to be improved in terms of context and culture-specific discourse formation; therefore, more research in this area can give insight into the development of pedagogically sound and data-driven voice-based chatbots geared primarily for language classes. This study aims to demonstrate a clear picture of the theoretical grounds of voice-based AI chatbot research conducted in many countries worldwide and illuminate learning and affective outcomes in line with specific characteristics of AI chatbots employed in various educational institutions.
Thus, a meta-synthesis of the studies mentioned above that center around voice-based AI chatbots might provide insights into recognizing what these technologies offer for English language learning at both micro and macro levels by analyzing each study in detail and also by providing a holistic picture presenting the synthesis of the studies carried out in recent years. Up to this date, several review studies on AI chatbots have been implemented (Huang, Hew & Fryer, Reference Huang, Hew and Fryer2022; Ji, Han & Ko, Reference Ji, Han and Ko2023; Zhai & Wibowo, Reference Zhai and Wibowo2022), but these review studies have different focal points. Huang et al. (Reference Huang, Hew and Fryer2022) analyzed 25 studies from 2008 to 2020 based on chatbots’ technological, pedagogical, and social affordances for language learning in first-language and foreign-language learning contexts. Zhai and Wibowo (Reference Zhai and Wibowo2022) explored the empathy, human, and cultural dimensions of using chatbots in language learning. Moreover, Ji et al.’s (Reference Ji, Han and Ko2023) review of 24 studies focused on the role of collaboration with teachers in AI chatbot–supported language classes. In addition, two systematic reviews have recently been published, particularly on voiced-based AI chatbots in language learning. Jeon, Lee and Choe (Reference Jeon, Lee and Choe2023) analyzed the studies based on goal orientation, embodiment, and multimodality in their review. Jeon, Lee and Choi (Reference Jeon, Lee and Choi2023) explored general research trends and the role of voice-based chatbots (i.e. feedback providers) as language learning resources. The present meta-synthesis aims to further the investigation by exploring other dimensions and synthesizing the findings of the reviewed studies on voiced-based chatbots in terms of linguistic and affective factors, theoretical frameworks, technological implementations, and pedagogical implications. In line with the aims of this study, the following research questions were generated.
-
1. What theoretical backgrounds are employed in AI chatbot–integrated studies in the English language learning domain published between 2010 and 2024?
-
2. How can the literature on voice-based AI chatbot research be characterized regarding methodology between 2010 and 2024?
-
3. What technologies and/or chatbots are used in the published research between 2010 and 2024?
-
4. What strengths and challenges were reported in the studies published between 2010 and 2024?
-
5. What pedagogical implementations were prevalent in the studies published between 2010 and 2024?
2. Methodology
The meta-synthesis presented in this paper employed the PRISMA framework (Preferred Reporting Items for Systematic Reviews and Meta-Analyses), a protocol by Moher, Liberati, Tetzlaff, Altman and the PRISMA Group (Reference Moher, Liberati, Tetzlaff and Altman2009) to provide authors with explicit criteria for transparent and effective reporting. The methodology of this systematic meta-synthesis involves three main steps: searching for articles, selecting articles, and qualitative coding of data from selected articles.
2.1 Search strategy
Following the guidelines of the PRISMA framework to carry out a qualitative systematic review study, a rigorous search strategy was adopted. The key terms “chatbot,” “conversational agent,” “conversational system,” “dialogue-based,” “conversational dialog agent,” “human-computer dialog,” “language learning,” “English learning,” “foreign language,” “second language,” “L2,” “EFL (English as a Foreign Language),” “ESL (English as a Second Language),” “language acquisition,” “CALL,” and “MALL” were used with a mixture of Boolean expressions “AND” and “OR.” While in some studies researchers created web-based AI chatbots (Ayedoun, Hayashi & Seta, Reference Ayedoun, Hayashi and Seta2020; El Shazly, Reference El Shazly2021), others developed app-integrated AI chatbots (Belda-Medina & Calvo-Ferrer, Reference Belda-Medina and Calvo-Ferrer2022). Therefore, since researchers have used a variety of concepts to describe the utilization of AI chatbots depending on their digital format, we decided to include both “CALL” and “MALL” as keywords in our article search of databases. All keywords were searched in journal names, abstracts, and full articles in databases ranging from Web of Science, ERIC, Scopus, Elsevier, EBSCOhost, and Springer to access articles, book chapters, and conference proceedings to be reviewed matching our criteria for selection. In addition to the articles retrieved from these databases, articles found manually in the reference sections of the articles were also included. During the writing and revision process of the paper presented here, some of the initially reviewed studies (Chen, Yang & Lai, Reference Chen, Yang and Lai2023; M.-H. Hsu, Chen & Yu, Reference Hsu, Chen and Yu2023; Jeon, Reference Jeon2024) that were published with a DOI number and with early access date were published in the journals with an issue number at a later date. The present study’s authors updated the volume, issue numbers, and publishing dates. The selected studies for review have been shown with an asterisk in the reference section. The whole list of selected studies for review is given in the supplementary materials.
2.2 Selection criteria for the studies
Conceptual, theoretical papers, articles using languages other than English, studies outside of the English language teaching field, and studies that do not report the experiences of chatbot users were eliminated. The works on voice-based AI chatbots and English language learning were the main focus of the search. Under AI chatbots, we also included studies using intelligent personal assistants (IPAs) such as Google Assistant and Alexa because IPAs are also branched under closed-domain retrieval-based AI chatbots (H. Kim, Yang, Shin & Lee, Reference Kim, Yang, Shin and Lee2022). Although this meta-synthesis study is based on the English language learning experiences of chatbot users, feasibility studies using state-of-art technologies in creating this much-needed technology have also been included, as one dimension in the study was to explore these tools’ technical qualities.
This study’s focus on the time frame from 2010 to 2024 derives from our intention to concentrate on studies utilizing voice recognition chatbots with the latest AI, automatic speech recognition (ASR), and natural language processing technologies. The literature review showed that although the development of voice-based AI chatbots dates back to the 1980s, the research on voice-based chatbots in students’ English language learning experience did not surge until after 2016, as we could trace back to only one study published each year during 2010 and 2012. Table 1 demonstrates the number of papers reviewed and analyzed by journal name and year of publication (see supplementary materials).
2.3 Data extraction and analysis of the studies selected
Using PRISMA guidelines, an article search was carried out, and 57 articles were selected based on our selection criteria. Figure 1 outlines the selection of studies following PRISMA guidelines (Moher et al., Reference Moher, Liberati, Tetzlaff and Altman2009). The constructivist grounded theory (CGT) by Thornberg and Charmaz (Reference Thornberg and Charmaz2014) was used to analyze the findings and research trends in the studies. The main results of the AI chatbot studies were initially extracted in the form of raw data. The data were first read line by line in the initial coding phase and coded through a constant comparison method to examine data in contrast to other data, analyze data in contrast to codes, and compare code with code to identify any arising discrepancies or congruities. Subsequently, the major findings in the reviewed studies were specified and synthesized through a qualitative coding process while interpreting the emergent codes in the focused coding stage.
Considering every article as an analytic component, we comprehensively and iteratively analyzed all the articles. First, we read the literature on reviews of chatbots in general (e.g. Pérez, Daradoumis & Puig, Reference Pérez, Daradoumis and Puig2020) to create a scheme for a deductive coding scheme. We constructed an initial coding framework of the methodology, settings, and participants based on the previous reviews. Through data-driven inductive coding, four further categories were constructed: theoretical frameworks, strengths and challenges, technologies, and pedagogical implications. These dimensions were deemed particularly important to this study, as examining these factors can provide insights into the technical-pedagogical aspects of voice-based AI chatbots used in past studies. The findings can be employed to understand the implications of using such chatbots with particular student profiles in various educational settings.
This study followed the CGT in the data analysis, as this theory allowed the enactment of an interpretative research hypothesis with a cyclical data analysis process through constant data comparison. This process enabled us to co-construct meaning during the analysis phase instead of validating a priori hypothesis. The selected studies for the present study were systematically synthesized through a meta-synthesis. The rationale behind using a meta-synthesis approach was that we aimed to reach a novel and unifying understanding of findings that was more substantial and profound than a single study and to get a broader understanding of the use of voice-based AI chatbots in English language learning (Finfgeld, Reference Finfgeld2003; Saini & Shlonsky, Reference Saini and Shlonsky2012). These unified findings were illustrated as an analysis table in Table 2 (see supplementary materials). After reaching a consensus on the data extraction form by two researchers, one researcher extracted related information from the reviewed studies. The validity of this process was ensured via data extraction of the reviewed studies by one other researcher.
3. Findings
3.1 Theoretical backgrounds in the reviewed studies
The first category is theoretical backgrounds, which comprise the theories employed in chatbot-assisted language learning research (see Table 3 in supplementary materials). Namely, theoretical frameworks point to theoretical foundations of AI chatbot research targeting English language learners at K–12 and higher education levels, which comprise linguistic, cognitive, social, and technological-pedagogical theories. The linguistic category is related to language learning theories (i.e. the input hypothesis). The cognitive category comprises cognitive learning theories like the interest development model. The social category refers to social learning theories such as the constructivist theory. At the same time, other theories explaining general technological frameworks applied to studies, such as community of inquiry (CoI), were categorized as technological-pedagogical.
As displayed in Figure 2, linguistic frameworks were the most frequently adopted theoretical backgrounds (n = 21), followed by technological-pedagogical (n = 11), cognitive (n = 4), and social (n = 2) theories. More particularly, the interaction hypothesis (n = 8), willingness to communicate (WTC) (n = 4), autonomous second language learning (n = 3), foreign language anxiety (FLA) (n = 3), and skill acquisition theory (n = 3) were most frequently used for grounding theories in studies.
Based on Long’s (Reference Long, Ritchie and Bhatia1996) interaction hypothesis, studies sought to discover if chatbots would increase interaction in L2 (i.e. referring to a second language) by offering favorable circumstances for the negotiation of meaning, interaction, input, and output (Dizon, Reference Dizon2017, Reference Dizon2020). The studies reviewed were also based on the claim that the use of technology might help increase the WTC levels of learners – that is “a readiness to enter into discourse at a particular time with a specific person or persons, using an L2” (MacIntyre, Clément, Dörnyei & Noels, Reference MacIntyre, Clément, Dörnyei and Noels1998: 547), and it has been found that learners demonstrated a higher expected WTC after talking to a chatbot (Ayedoun, Hayashi & Seta, Reference Ayedoun, Hayashi and Seta2019a; Ayedoun et al., Reference Ayedoun, Hayashi and Seta2020; Tai & Chen, Reference Tai and Chen2023). Other studies also based their arguments on Gardner and Miller’s (Reference Gardner and Miller1999) self-access language learning theoretical framework, claiming that learners’ autonomy and independence can be promoted through the use of chatbots as supplementary language learning materials (Dizon & Tang, Reference Dizon and Tang2020; Moussalli & Cardoso, Reference Moussalli and Cardoso2020).
Another salient theory used by researchers was the interest development model (Hidi & Renninger, Reference Hidi and Renninger2006). Following this theory, researchers investigated whether AI chatbots could trigger students’ interest in language learning (Fryer, Ainley, Thompson, Gibson & Sherlock, Reference Fryer, Ainley, Thompson, Gibson and Sherlock2017; Fryer, Nakao & Thompson, Reference Fryer, Nakao and Thompson2019; Fryer, Thompson, Nakao, Howarth & Gallacher, Reference Fryer, Thompson, Nakao, Howarth and Gallacher2020). Basing their studies on the self-determination theory, researchers also explored if chatbots would facilitate autonomy, competence, and relatedness feelings of language learners (Annamalai et al., Reference Annamalai, Eltahir, Zyoud, Soundrarajan, Zakarneh and Al Salhi2023; Jeon, Reference Jeon2022; Morton & Jack, Reference Morton and Jack2010; Ryan & Deci, Reference Ryan and Deci2017). Moreover, based on the theory of FLA, researchers have examined the impact of using chatbots on learning anxiety (Dizon, Reference Dizon2017; Horwitz, Horwitz & Cope, Reference Horwitz, Horwitz and Cope1986; Tai & Chen, Reference Tai and Chen2023) and the meditation and relaxation levels of language learners (L. Hsu, Reference Hsu2022).
The investigation of chatbot usage has also been carried out under the principles of the skill acquisition theory (Lyster & Sato, Reference Lyster, Sato, García Mayo, Gutierrez Mangado and Martínez Adrián2013), revealing that chatbots would be constructive tools for language learners thanks to their potential to provide structured oral practice (Sydorenko, Smits, Evanini & Ramanarayanan, Reference Sydorenko, Smits, Evanini and Ramanarayanan2019). Studies also sought to discover the potential of AI-powered chatbots for fostering oral interaction by raising awareness of specific L2 pragmatic structures based on Schmidt’s (Reference Schmidt and Schmidt1995) noticing hypothesis (Timpe-Laughlin & Dombi, Reference Timpe-Laughlin and Dombi2020). Moreover, Ericsson, Sofkova Hashemi and Lundin (Reference Ericsson, Sofkova Hashemi and Lundin2023) also grounded their studies on the sociocultural theory, arguing that interaction with AI chatbots might provide instances for social interaction through human–computer dialogue (Lantolf, Thorne & Poehner, Reference Lantolf, Thorne, Poehner, VanPatten and Williams2014). Based on the constructivist theory in technological learning settings, Park (Reference Park2022) explored and identified the role of a chatbot in promoting student-centered and scaffolded learning as a learning tool for geographically remote learners (Jonassen, Davidson, Collins, Campbell & Haag, Reference Jonassen, Davidson, Collins, Campbell and Haag1995).
Besides the linguistic and social theories, learners’ perspectives in using chatbots for language learning concerning factors such as user engagement and linguistic and technological components were explored through the technology acceptance model (TAM) and the chatbot–human interaction satisfaction model (CHISM) (Belda-Medina & Calvo-Ferrer, Reference Belda-Medina and Calvo-Ferrer2022; Davis, Reference Davis1989). A similar theory, the theory of planned behavior by Fishbein and Ajzen (Reference Fishbein and Ajzen1975), was adopted by Wu et al. (Reference Wu, Lam, Kong and Wong2023) to analyze the relationship between the factors of the behavioral intention of using chatbot applications and learner beliefs, such as the usefulness of voice-based AI chatbots for learning.
The computers as social actors (CASA) theory by Nass, Moon, Fogg, Reeves and Dryer (Reference Nass, Moon, Fogg, Reeves and Dryer1995) has also been adopted to explore how AI chatbots affect language learners’ learning motivation. Researchers aimed to observe whether AI chatbots would create a feeling of social presence among learners in cases where they are described as having human-like properties (Ebadi & Amini, Reference Ebadi and Amini2024). Epley, Waytz and Cacioppo’s (Reference Epley, Waytz and Cacioppo2007) three-factor theory of anthropomorphism, consisting of “elicited agent knowledge, effectance, and sociality” factors, referring to knowledge about agents, motivation to interact with them, and intention to communicate with agents, respectively, was also used to analyze if AI chatbots were described with human-like characteristics (S. Lee & Jeon, Reference Lee and Jeon2022: 3). Adopting the affordance theory by Gibson (Reference Gibson1986), researchers also analyzed the experiences of language learners in using voice-based AI chatbots in terms of the technological, pedagogical, and social contributions to language learning (Jeon, Reference Jeon2024). Furthermore, the CoI framework by Garrison and Arbaugh (Reference Garrison and Arbaugh2007) has also been used in studies to observe the effect of the social, cognitive, and teaching presence on social dynamics, learning outcomes, and learning performance of language learners after interacting with voice-based AI chatbots (Wang et al., Reference Wang, Liu, Pang, Tan, Lei, Wallace and Li2023; Wang, Pang, Wallace, Wang & Chen, Reference Wang, Pang, Wallace, Wang and Chen2022). However, 19 studies were not based on any theoretical groundings. Therefore, a clear-cut theoretical framework still needed to be included in many studies.
3.2 Methodological trends in the reviewed studies
The second category is the methodologies, consisting of four variables: design features, countries, education level, and sample size. Regarding the methodology, there were 4,062 participants in these 57 studies. In 23 of the studies using an experimental methodology, data were collected from 1,408 participants. Two studies used sequential explanatory mixed methods with 432 participants. Ten feasibility studies with 246 participants were also conducted, and the five case studies comprised 57 participants. However, the remaining studies did not specify their methodologies explicitly, and they had 1,674 participants. Most participants were undergraduate and graduate students (49%, n = 1,984), while high school students constituted 23% of the population (n = 947), and primary school students constituted 23% of the population (n = 951). The remaining studies did not specify the education level of the students.
The meta-synthesis study presented here included studies conducted in various countries. However, most studies, namely 13 out of 57, have been conducted in Japan. The sample consisted of a total of 986 participants from Korea, followed by Japan (n = 604), China (n = 528), and Taiwan (n = 447), respectively. Figure 3 below displays the distribution of the participants in the reviewed studies by countries.
Moreover, the publication dates of the reviewed articles were evenly distributed over six years, with an upward trend in the number of publications beginning in 2016. Figure 4 denotes the trend for voice-based AI chatbot research in English language teaching over the years.
3.3 The types of technologies and/or chatbots used in the studies analyzed
The next category of analysis is the types of voice-based AI technologies, which comprised three codes: general audience chatbots (i.e. Alexa, Siri, Cortana, etc.), specific-purpose chatbots created for language learners, and AI chatbots embedded in virtual learning environments. Twenty-seven types of chatbots were identified (see Figure 5).
The results showed that the these chatbots had been adopted to support English language learning. The most common practice among the synthesized studies was employing already present open-access or commercial applications created for native speakers or L2 learners. Thus, two such chatbots, Alexa and Google Assistant, were used the most. Two studies among the reviewed studies also employed Cleverbot, an open-access chatbot (Fryer et al., Reference Fryer, Ainley, Thompson, Gibson and Sherlock2017, Reference Fryer, Nakao and Thompson2019), which is an updated version of Jabberwacky, a Loebner prize winner for producing the most similar output to human speech.
Another trend in the synthesized studies was the creation of unique AI chatbots by the authors and testing their effectiveness on language learners. However, these chatbots were created for academic purposes and were not publicly available. For instance, Ayedoun et al. (Reference Ayedoun, Hayashi and Seta2019a, Reference Ayedoun, Hayashi and Seta2019b, 2020) designed DiMaCA, a dialogue management model consisting of the dialogue flow management and the strategies management modules to create a chatbot, Peter, which used affective backchannels and communication strategies. Sydorenko et al. (Reference Sydorenko, Smits, Evanini and Ramanarayanan2019) also created a voice-based HALEF application targeting students of English as a second language.
The results also showed that chatbots aiming at lower language proficiency groups were rare in the studies reviewed. In one study, TextEvaluator was created targeting third- and fifth-grade English language learners in the USA (Forsyth et al., Reference Forsyth, Luce, Zapata-Rivera, Jackson, Evanini and So2019); another chatbot, SPELL, was created for junior high school students in China (Morton & Jack, Reference Morton and Jack2010), and one other chatbot was designed for elementary school learners in China (Wang et al., Reference Wang, Liu, Pang, Tan, Lei, Wallace and Li2023). Additionally, in a few studies, chatbots were embedded in virtual learning environments (Hassani, Nahvi & Ahmadi, Reference Hassani, Nahvi and Ahmadi2016; Morton, Gunson & Jack, Reference Morton, Gunson and Jack2012; Park, Reference Park2022), yet the integration of virtual reality technology into AI chatbots was relatively uncommon among the reviewed studies.
3.4 Strengths and challenges in the use of chatbots in the reviewed studies
Lastly, strengths and challenges were determined by analyzing the results and discussion parts of the reviewed articles. The strengths and challenges discovered in the reviewed studies are illustrated in Table 4 (see supplementary materials). The most commonly encountered strengths in studies were developing language skills (n = 27), enhancing affective factors (n = 27), facilitating communicative competence (n = 14), and increasing pragmatic awareness and competence (n = 10). It has been discovered in our synthesis that language learners can develop their linguistic skills to a great extent thanks to the interaction with voice-based AI chatbots during English language learning. Significant gains in speaking scores (Dizon, Reference Dizon2020; El Shazly, Reference El Shazly2021; D.-E. Han, Reference Han2020; M.-H. Hsu et al., Reference Hsu, Chen and Yu2023; Hwang et al., Reference Hwang, Guo, Hoang, Chang and Wu2022; Moussalli & Cardoso, Reference Moussalli, Cardoso, Zoghlami, Brudermann, Sarré, Grosbois, Bradley and Thouësny2021; Park, Reference Park2022; C. T.-Y. Yang, Lai & Chen, Reference Yang, Lai and Chen2022) and listening scores (N.-Y. Kim, Reference Kim2018; Tai & Chen, Reference Tai and Chen2022) were reported after engaging in negotiating meaning with chatbots.
The findings of the analyzed studies have also demonstrated positive aspects for affective factors in the use of chatbots by English language learners: higher attention and meditation levels, reduced anxiety, higher WTC, and motivation levels after interacting with voice-based AI chatbots (K.-A. Lee & Lim, Reference Lee and Lim2023; Tai & Chen, Reference Tai and Chen2023; C. T.-Y. Yang et al., Reference Yang, Lai and Chen2022). For instance, L. Hsu (Reference Hsu2022) carried out a neuroscientific experimental study using a chatbot they created and tested it on EFL learners to analyze their attention – the level of mental “focus” and meditation and calmness/relaxation states – by measuring their brainwave activities when they practice speaking in three different environments (face-to-face, virtual human to human, and human to an AI chatbot conditions) to explore the interlocutor effect. Meditation level was the highest in human chatbot conditions, which implied the beneficial effects of chatbots in reducing English language learning anxiety.
The synthesized studies also report the challenges of using chatbot systems. The low intelligibility of learner utterances due to problems with speech recognition (n = 15), the unnaturalness of AI chatbot–human interaction (n = 9), and a lack of explicit corrective feedback (n = 5) were the most commonly reported problems. These challenges were faced due to various issues. According to the studies analyzed, learners appeared less motivated to converse with chatbots than human partners due to novelty factors and unnaturalistic speaking conditions (Ericsson, Lundin & Sofkova Hashemi, Reference Ericsson, Lundin and Sofkova Hashemi2023; Fryer et al., Reference Fryer, Ainley, Thompson, Gibson and Sherlock2017). For instance, the results of a study by Belda-Medina and Calvo-Ferrer (Reference Belda-Medina and Calvo-Ferrer2022) showed only moderate interest in using AI chatbots for a more extended period. Furthermore, Çakmak (Reference Çakmak2022) and El Shazly (Reference El Shazly2021) identified increased levels of anxiety after interacting with the AI chatbots, which was attributed to factors such as the educational context that could ignite worries about failing the course as well as self-consciousness or language ego of students exposed to error correction, and lack of empathy and complex reasoning in chatbots.
3.5 Pedagogical implementations in the studies analyzed
The last category pertained to the pedagogical implementations, which are concerned with guiding and helping learners use AI chatbots to interact as conversational partners in learning languages through activities such as training and scaffolding and designing conversational scenarios that best fit this purpose. Chatbots have substantial potential for teaching languages, yet there is a need for more guidance on classroom implementations. Therefore, the researchers’ use of these tools in classroom settings is relevant to this study. Learners need guidance to utilize chatbots effectively for language learning purposes. An analysis of the reviewed studies on pedagogical practices in AI chatbot implementation has shown that two strategies guide AI chatbot users: (1) providing structured or semi-structured prompts, and (2) providing linguistic strategies.
First, regarding the prompts, researchers have discovered that several strategies were implemented to direct learners to utilize AI chatbots for English language learning. The analyzed studies displayed that teachers applied methods containing structured and semi-structured prompts for learners’ guidance during the conversations. Students were given topics, questions, and hints in question-and-reply stems (Ayedoun et al., Reference Ayedoun, Hayashi and Seta2019a, 2020; Fryer et al., Reference Fryer, Ainley, Thompson, Gibson and Sherlock2017, Reference Fryer, Nakao and Thompson2019, Reference Fryer, Thompson, Nakao, Howarth and Gallacher2020; L. Hsu, Reference Hsu2022; Johnson, Reference Johnson2019; Tai & Chen, Reference Tai and Chen2023). Furthermore, while interacting with chatbots, learners were given prompts of pre-established questions and language commands such as “Play me a song/joke/story” and instructed to ask chatbots like Google Home Hub and Alexa other general-knowledge-related facts such as the weather, temperature, and the capital of the countries. This way, students could learn to utter appropriate commands during conversations (Chen et al., Reference Chen, Yang and Lai2023; Dizon, Reference Dizon2017; Moussalli & Cardoso, Reference Moussalli and Cardoso2020).
The synthesized studies also reported that students were provided with linguistic strategies in the pre-task phase before using the chatbots to familiarize them with these tools. The strategies they were suggested to use to get into a dialogical conversation included reflecting on the misunderstood aspects of their speech, modifying their output by using more precise and slower speech, altering the pronunciation of words (Dizon, Reference Dizon2020), and asking for the definition, spelling, synonym, or the translation of the words to be able to notice the linguistic gaps in their knowledge in the face of a communication breakdown (Chen et al., Reference Chen, Yang and Lai2023; Dizon, Reference Dizon2017; Moussalli & Cardoso, Reference Moussalli and Cardoso2020).
Another theme regarding the pedagogy of AI chatbots in English language learning was the design methodology of pedagogical activities in the reviewed studies. Our review illustrated that researchers used the principles of Ellis’s (Reference Ellis2003) task-based language teaching using tasks in speaking scenarios for language practice in designing AI chatbots (Ayedoun et al., Reference Ayedoun, Hayashi and Seta2019a, 2020; Li, Chang & Wu, Reference Li, Chang and Wu2020; Sydorenko, Daurio & Thorne, Reference Sydorenko, Daurio and Thorne2018; Sydorenko et al., Reference Sydorenko, Smits, Evanini and Ramanarayanan2019). To illustrate, in a study using HALEF, the users practiced speaking English in a coffee shop simulation. The chatbot and the users interacted to complete an order (Sydorenko et al., Reference Sydorenko, Daurio and Thorne2018, Reference Sydorenko, Smits, Evanini and Ramanarayanan2019; Timpe-Laughlin & Dombi, Reference Timpe-Laughlin and Dombi2020). Similarly, students talked to an animated character in task-based simulation modules in Enskill English. The tasks included asking for directions and buying train tickets (Ericsson, Lundin & Sofkova Hashemi, Reference Ericsson, Lundin and Sofkova Hashemi2023; Ericsson, Sofkova Hashemi & Lundin, Reference Ericsson, Sofkova Hashemi and Lundin2023; Johnson, Reference Johnson2019). Thus, through partially scripted task-based or non-task-based dialogue activities in studies (Ayedoun et al., Reference Ayedoun, Hayashi and Seta2019a, Reference Ayedoun, Hayashi and Seta2020; Johnson, Reference Johnson2019; Sydorenko et al., Reference Sydorenko, Smits, Evanini and Ramanarayanan2019), students got engaged in two-way reciprocal communication with voice-based AI chatbots.
4. Discussion and implications
This study analyzed studies on integrating voice-based AI chatbots in the English language learning field. It aimed to demonstrate past research trends and suggest directions for future research on voice-based AI chatbots in English language learning.
4.1 An overview of the theoretical foundations of AI chatbot–integrated research in EFL/ESL contexts
The first research question concerned the theoretical focuses of the studies reviewed. In this study, most language learning activities through voice-based AI chatbots were guided by linguistic frameworks, as researchers might aim to construct their studies on these frames to clarify the interchange between characteristics of chatbot-supported language learning activities and learning outcomes.
Our findings demonstrated that several studies in AI chatbot–integrated language learning research had a basis in the interaction hypothesis (Long, Reference Long, Ritchie and Bhatia1996). This finding is unsurprising as there is insufficient input, especially in the EFL settings, and chatbots provide opportunities to be exposed to input in a foreign language (H. Yang, Kim, Lee & Shin, Reference Yang, Kim, Lee and Shin2022). This study also provided evidence for other theories related to language development in AI chatbot–integrated language research. Namely, WTC, interest development, and FLA theories have been discovered to help explore chatbot-integrated language learning. This study demonstrated that in human chatbot conditions, learners had comparatively less anxiety about speaking in L2 (El Shazly, Reference El Shazly2021; Tai & Chen, Reference Tai and Chen2023). This finding implies that teachers can employ AI chatbots to aid learners in overcoming their anxiety about speaking in a foreign language. Among the reviewed studies, the higher WTC and interest levels due to interaction with chatbots suggested these tools’ potential as motivators and sources of interest for language learners. Furthermore, this study demonstrated that self-regulated and autonomous learning theories could be influential in assisting and guiding chatbot-integrated language learning. According to self-regulated learning theories, when learners can regulate the content and way of learning and actively participate in learning activities, they can benefit most from the learning process (Reinders & White, Reference Reinders and White2016). Similarly, AI chatbots are tools that have the potential to promote language learner autonomy by providing ample opportunities for language practice without the need for the presence of a real speaking partner for language learners.
Social learning theories have also been applied as the theoretical frames in voice-based AI chatbot research. Thus, based on the findings, it could be suggested that among the many reasons for adopting chatbot technology, one particular aim can be to create dialogical spaces for interaction using chatbots as mediating tools for language learning. Teaching professionals can also adopt numerous technological-pedagogical theories, such as CASA and CoI frameworks, to plan the language teaching process. The findings of this study imply that voice-based AI chatbots can serve as human-resembling language partners, as learners tend to attribute anthropomorphic characteristics to these learning tools. Language learners are most likely to benefit from using AI chatbots, as they positively affect learning engagement, mastery of the language, and constructive learning experiences.
All in all, various theories have been put forward to demonstrate the role of AI chatbots in L2 learning. On the other hand, only some studies referred to a solid theoretical background on which they conducted AI chatbot studies. Namely, 19 studies still needed to build their studies on theoretical backgrounds. This might indicate these researchers’ tendency to center on other aspects of chatbot-supported language learning, such as intervention impacts (Hwang et al., Reference Hwang, Guo, Hoang, Chang and Wu2022; N.-Y. Kim, Reference Kim2018) and chatbot design features (Chen et al., Reference Chen, Yang and Lai2023; K.-A. Lee & Lim, Reference Lee and Lim2023). Without concrete theoretical backgrounds, it could be hard to determine which variables impact AI chatbot–supported language learning (AI CSLL) and explore how they can be incorporated into learning activities or curricula. Therefore, studies are suggested to refer to particular theories in future voice-based AI chatbot research. On the other hand, the extensive range of frameworks identified in the reviewed studies might also show that scholars are still hypothesizing how chatbot-supported learning and teaching could be theorized, suggesting that voice-based chatbot-supported language learning is at an earlier phase in its progress.
4.2 The necessity for more varied methodologies and contexts to carry out studies on speech-recognition chatbots
The second research question concerned research methodologies, settings, and participants. Regarding the educational level, we discovered that most research on AI chatbot use for English language learning was conducted in higher education settings. This result could be attributed to increased online and computer-assisted learning in higher education environments. This could also result from the accessibility of undergraduate students and the convenience of conducting chatbot studies with this group of learners. Undergraduate students might demonstrate more autonomous learning behaviors, as using chatbots for language learning might require self-regulated capabilities to cope with the requirements of this novel technology in and out of class. There were also fewer primary and junior high school participants. This result pointed towards a need for further studies to be carried out with groups of language learners, especially at primary levels of education, to explore the needs of an underrepresented population.
Furthermore, it was also discovered that a mixed-methods methodology was mainly adopted to represent stakeholders’ opinions in the English language field and to demonstrate the learning outcomes of using voice-based AI chatbots. The prevalent use of quantitative and mixed methods in this study could be because of the impetus to comprehensively analyze the effects of AI chatbot technology. Studies with more varied methods are thus suggested to investigate the impacts of voice-based AI chatbots in English language learning.
When the geographical distribution of the countries was screened, it was seen that the studies were primarily implemented in the Asian context. The adoption of voice-based AI chatbots to support English language learning has gained more attention from researchers in certain Asian countries than elsewhere. This could be attributed to the fast incorporation of technology into language classes in these regions. Studies that cover other geographical areas might yield different results. Teachers and researchers in different parts of the world with different educational trends could conduct research in this field to provide comparable data.
4.3 The ever-changing background of voice-based AI chatbot research
The third and fourth research questions were related to the technologies and chatbots adopted and the strengths and challenges experienced by English language learners. The findings demonstrated that researchers predominantly used open-access general audience chatbots such as Alexa and Siri by target language learner groups. Instead, there were fewer cases of creating voice-based AI chatbots specifically for English language teaching purposes. This result could point towards a need for more guidance for language teaching professionals to design specific-purpose AI chatbots.
The studies also addressed various strengths and challenges of using voice-based AI chatbots for English language learning. Our meta-synthesis shares similar findings to that of N.-Y. Kim, Cha and Kim’s (Reference Kim, Cha and Kim2019) review in that studies demonstrated the effectiveness of AI chatbots in facilitating conversation, boosting interaction and negotiation of meaning, and elevating communicative competence, motivation, and engagement in English language learning tasks. On the other hand, the results of this meta-synthesis study align with Huang et al.’s (Reference Huang, Hew and Fryer2022) and N.-Y. Kim et al.’s (Reference Kim, Cha and Kim2019) reviews, which report the narrow scope of the knowledge database, the unnatural voice of AI chatbots and chatbots’ inability to comprehend users’ language input as prevalent limitations. Some of the reasons behind these technical problems might be that although there is progress towards more functioning ASR and text-to-speech technologies, specific issues, such as inaccurate speech recognition and incomprehensibility of emotions (i.e. rhythm, intonation, and pitch), still need to be resolved (Belda-Medina & Kokošková, Reference Belda-Medina and Kokošková2023).
These findings suggested a need to create more developed voice-based AI chatbots to address the various language learner profiles. Within this regard, one suggestion for application developers can be to train chatbots based on non-native speaker data, as most of the current chatbots are trained on native speaker corpus, which restricts the intelligibility of non-native utterances by the ASR system (Chen et al., Reference Chen, Yang and Lai2023). Our findings point out a need to improve the interactional capabilities of the present voice-based AI chatbots. Another suggestion is to develop their speech synthesis rather than relying on the present built-in synthesis on operating systems on mobile devices. Adding a distinctive voice that matches the personality and appearance of a chatbot as well as gestures and expressions are other suggested technical features expected to be inherent in chatbots (Yuan, Reference Yuan2023).
Based on the strengths and challenges reported in the AI chatbot–assisted language learning studies, another factor to be developed is error correction by the AI chatbot. Indisputably, personalized learning is one crucial goal that AI chatbot developers must seek. Therefore, one aim of future AI chatbot technologies should be to construct error correction and immediate feedback algorithm systems that can add to the personalization of the learning experience (Yuan, Reference Yuan2023). Therefore, AI chatbot creation is still an area that needs development.
On the other hand, even though chatbots still need many features to be added before being fully compatible with the aims of language learners, the increasing number of studies in the last few years can be attributed to the recent attention in the field and the particular benefits they provide, suggesting a further trend of interest in the following years.
4.4 Pedagogical implications in AI chatbot–integrated research
This study also specified perspectives regarding pedagogical implementations in voice-based AI chatbot research. The findings pointed towards the positive impact of the provision of strategies by educators prior to the use of the chatbot by language learners. The scaffolded learning support principle is one approach that could be implemented to help language learners participate in tasks conducted with voice-based AI chatbots. This principle consists of demonstrating and modeling communication with chatbots and providing cues to language learners to prevent interaction disruptions through meaningful negotiation strategies (J. Han & Lee, Reference Han and Lee2024). The provision of structured or semi-structured prompts or language commands as pre-task activities could equip learners with prior knowledge and strategies to be fully immersed in chatbot-assisted activities.
Designing the learning activities through task-based learning scenarios was another common and effective practice among the studies. This finding suggests that AI chatbots can be incorporated into language tasks through activities to provide language learners with authentic communication and instant feedback, contributing to L2 enhancement (Ellis, Reference Ellis2003; Xiao, Zhao, Sha, Yang & Warschauer, Reference Xiao, Zhao, Sha, Yang and Warschauer2023). Employing voice-based AI chatbots based on a task-based design of the educational activity can help language learners develop the authentic use of the language and drill real-life interaction skills (J. Han & Lee, Reference Han and Lee2024). It is also necessary to advance teaching approaches and curricula to effectively integrate AI chatbots into language education. A particular method could be to provide learning modules on various sorts of chatbots (i.e. mobile or web-based, etc.) in the curriculum. This could afford language learners and teachers a more intricate understanding of these tools and their possible ways of utilization in language education (Belda-Medina & Kokošková, Reference Belda-Medina and Kokošková2023).
In addition, for the chatbots to function as good conversation partners, there is a need for the AI system to converse on a broader range of topics. The latest developments in AI that could contribute to this aspect are language models such as Google’s LaMDA and OpenAI’s GPT-4. Founded on the extension of large language models, they promise to merge the benefits of maintaining an open-ended interaction with characteristics that could elevate learners’ language learning benefits and motivation (Godwin-Jones, Reference Godwin-Jones, Arnbjörnsdóttir, Bédi, Bradley, Friðriksdóttir, Garðarsdóttir, Thouësny and Whelpton2022). The scope of further studies on voice-based chatbots can focus on such language models in different language learning settings and with different student profiles.
5. Conclusion
The results of this meta-synthesis have demonstrated that chatbots have gained noticeable consideration in English language learning research. The results of this review corroborated the positive impacts of employing voice-based chatbots on English language learning. Based on our findings, implications for teachers and researchers could be proposed. Teachers and researchers should reevaluate the use of voice-based chatbots in English language learning research, as they are a potentially valuable means of developing listening and speaking skills, vocabulary knowledge, and pragmatic and oral communicative competence.
The meta-synthesis findings suggest that the international literature on voice-based AI chatbot use in the English language field needs more varied methods, theoretical frameworks, and research trends. Our findings may contribute to the line of knowledge on this subject and address these challenges as well as potential affordances for upcoming studies. Practitioners and researchers can employ the current research findings to further the practical uses of dynamically evolving voiced-based chatbots in language classes.
Supplementary material
To view supplementary material referred to in this article, please visit https://doi.org/10.1017/S0958344024000168
Ethical statement and competing interests
This study did not involve any human or animal participants, and thus no ethical approval was required. All the studies included in this meta-synthesis research can be accessed online. The authors declare no competing interests. The authors also declare no use of generative AI.
About the authors
Fatma Şeyma Koç is currently a research assistant in the Department of English Language Teaching at Akdeniz University, Türkiye. She holds a PhD in the English Language Teaching program at Middle East Technical University. Her research interests include teaching language skills via instructional technology in English as a foreign language classes.
Perihan Savaş is a professor of English as a foreign language (EFL) at Middle East Technical University, Türkiye. She received her PhD from the University of Florida. Her scholarly interests include integrating technology into EFL curriculum, mobile-assisted language learning (MALL), AI in EFL, and teacher training/faculty support in online education.