1. Introduction
Technology and pedagogical changes are transforming language instruction. In this dynamic context, generative artificial intelligence (GenAI) in education, especially in informal language learning (Godwin-Jones, Reference Godwin-Jones2022), has become essential to improve instructional pedagogies and learning outcomes (Labadze, Grigolia & Machaidze, Reference Labadze, Grigolia and Machaidze2023). Considering the holistic learning ecology (Brown, Reference Brown2000; Luckin, Reference Luckin2008), GenAI in formal instruction provides personalised learning opportunities that strengthen learner motivation and personal agency and can provide authentic, social-like interactions in extended or virtual reality for language learning beyond coded responses (Shadiev, Sun & Huang, Reference Shadiev, Sun and Huang2019). GenAI technology has demonstrated unparalleled capabilities in providing detailed feedback compared with human teachers (Ai, Reference Ai2017) and providing real-time translation for content and language integrated learning (CLIL; Liu & Chen, Reference Liu and Chen2023). However, studies have also found that only some students can benefit from using GenAI technology in learning (e.g. Niloy, Akter, Sultana, Sultana & Rahman, Reference Niloy, Akter, Sultana, Sultana and Rahman2024; Ou, Stöhr & Malmström, Reference Ou, Stöhr and Malmström2024). From the language and technology perspectives, moderators such as students’ English proficiency (Liu & Chen, Reference Liu and Chen2023) and digital literacy (Goldenthal, Park, Liu, Mieczkowski & Hancock, Reference Goldenthal, Park, Liu, Mieczkowski and Hancock2021) significantly influence the experimental results. Furthermore, from the humanistic perspective, studies have found that language teachers at all levels of education lack the willingness and capability to utilise GenAI-based language learning tools in classes (Godwin-Jones, Reference Godwin-Jones2023; Ou et al., Reference Ou, Stöhr and Malmström2024; Yang, Kim, Lee & Shin, Reference Yang, Kim, Lee and Shin2022).
Multiple frameworks have been developed to assist instructors in improving digital literacy and incorporating technology into education (Li & Lan, Reference Li and Lan2022; Ng, Reference Ng2012). This paper adopts the framework of technological pedagogical and content knowledge (TPACK) due to its comprehensiveness: it incorporates teachers’ technological literacy, pedagogical capabilities, and content knowledge (Koh & Chai, Reference Koh and Chai2014). Existing studies using TPACK have reported that teachers who are not native to digital technologies commonly have difficulties in using technology for content and language teaching (Koh & Chai, Reference Koh and Chai2014; Miguel-Revilla, Martínez-Ferreira & Sánchez-Agustí, Reference Miguel-Revilla, Martínez-Ferreira and Sánchez-Agustí2020; Tondeur, Scherer, Siddiq & Baran, Reference Tondeur, Scherer, Siddiq and Baran2017), which is heavily relied on their insufficient technological knowledge to integrate digital technology into language teaching and learning (Celik, Reference Celik2023).
Second language acquisition (SLA) involves more than just formal education. From the viewpoint of holistic learning ecology (Brown, Reference Brown2000; Lai, Liu, Hu, Benson & Lyu, Reference Lai, Liu, Hu, Benson and Lyu2022; Luckin, Reference Luckin2008), SLA depends not only on formal language instructions but also on informal language learning practices after classes (Lee, Reference Lee2019a). With GenAI, such after-class practices include a range of informal digital learning of English (IDLE) activities (Liu & Ma, Reference Liu and Ma2024) that seep into the daily life of English as a foreign language (EFL) learning (Liu, Darvin & Ma, Reference Liu, Darvin and Ma2024a). Therefore, employing GenAI-mediated IDLE practices for English learning among students could overcome the resistance of educators to use GenAI and develop a more holistic technological integration into education that fits the directional requirements of global policies (Alghamdi & Holland, Reference Alghamdi and Holland2020; Lai & Jin, Reference Lai and Jin2021).
In GenAI-mediated IDLE practices, GenAI is particularly beneficial for English speaking (Chen, Reference Chen2024; Yang et al., Reference Yang, Kim, Lee and Shin2022). Compared with the traditional classroom, which limits EFL learners’ in-class practice and interaction opportunities due to the large class sizes and limited class hours (Chen, Reference Chen2024), GenAI can provide personalised feedback (Escalante, Pack & Barrett, Reference Escalante, Pack and Barrett2023) and act as an authentic conversational partner (Yang et al., Reference Yang, Kim, Lee and Shin2022) to increase interaction frequency (Belda-Medina & Calvo-Ferrer, Reference Belda-Medina and Calvo-Ferrer2022). When students engage in English conversations with GenAI in extracurricular situations (a category of IDLE activities; see Section 2.1), researchers have found an improved willingness to communicate (Tai & Chen, Reference Tai and Chen2023) based on reduced anxiety in speaking English (Kim & Su, Reference Kim and Su2024). Moreover, this approach enhances self-regulation during out-of-class learning (García Botero, Botero Restrepo, Zhu & Questier, Reference García Botero, Botero Restrepo, Zhu and Questier2021) by promoting metacognitive learning strategies (Saadati, Zeki & Vatankhah Barenji, Reference Saadati, Zeki and Vatankhah Barenji2023). Therefore, GenAI-mediated IDLE activities may be able to generate better outcomes than conventional teacher-led oral English learning, thus allowing teachers to incorporate digital technology into SLA for better learning outcomes regardless of their technical competence. Furthermore, past studies have suggested that IDLE practices influenced by others (e.g. teachers) could promote extramural IDLE with autonomy (Zhang & Liu, Reference Zhang and Liu2022, Reference Zhang and Liu2023) through enjoyment (Liu, Zhang & Zhang, Reference Liu, Zhang and Zhang2024b). Based on this view, we proposed the following research questions for our mixed-methods study:
-
1. Do GenAI-mediated IDLE practices improve college students’ English speaking proficiency?
-
2. Do GenAI-mediated IDLE practices for speaking yield better post-test results than teacher-led speaking courses?
-
3. In the opinion of students, what factors contribute to the changes in their speaking results?
-
4. Do students who practise GenAI-mediated IDLE continue to perform such activities after the experiment? Why?
Answers to the research questions should lead to significant theoretical development and practical application. Theoretically, the research could expand the TPACK framework to involve out-of-class learning into the holistic learning ecology (Brown, Reference Brown2000; Luckin, Reference Luckin2008). Acknowledging the fundamental process of observation and imitation in language learning (Bandura, Reference Bandura and Ewen2014), we also highlight the need for a research focus on this learning process and the factors that influence it. Practically, our research could provide insights regarding a pragmatic method to integrate technology into college EFL education so that teachers can adapt to technologies for educational purposes without being restrained by their technological knowledge and literacy in and out of class. Regarding the terminology of AI and GenAI, we acknowledge that AI and GenAI have been used interchangeably in past literature, with “AI” being used to refer to ChatGPT, Copilot, Gemini and more, all of which are GenAI (e.g. Belda-Medina & Calvo-Ferrer, Reference Belda-Medina and Calvo-Ferrer2022; Liu et al., Reference Liu, Darvin and Ma2024a).
2. Literature review
2.1 IDLE and GenAI
Originating from out-of-class autonomous learning, IDLE has emerged as a crucial research concept of computer-assisted language learning. This concept addresses a research gap in English learning and technology usage that happens autonomously outside of the classroom (Soyoof, Reynolds, Vazquez-Calvo & McLay, Reference Soyoof, Reynolds, Vazquez-Calvo and McLay2023). Based on Benson’s (Reference Benson, Benson and Reinders2011) four dimensions of out-of-class learning, Lee and Dressman (Reference Lee and Dressman2018) identified IDLE as “self-directed, informal digital English learning independent of formal contexts” (p. 436). Under this definition, IDLE has been classified as “extracurricular” and “extramural”, based on the closeness between IDLE activities and formal education (Lee, Reference Lee2019b), as well as “receptive” and “productive”, based on the materialistic nature of the IDLE activities (Lee & Drajati, Reference Lee and Drajati2019).
In the literature, researchers have mostly considered GenAI usage to be an IDLE practice. Past quantitative studies have found correlations between an individual’s perception towards using technology for English learning and college students’ GenAI usage as an IDLE practice (Liu & Ma, Reference Liu and Ma2024). Moreover, factors such as peer support and enjoyment could influence students’ GenAI usage behaviour (Liu et al., Reference Liu, Zhang and Zhang2024b). From the qualitative perspective, Liu et al. (Reference Liu, Darvin and Ma2024a) suggested that when GenAI mediates IDLE practices, Chinese EFL students could seek guidance from technology; moreover, they self-reported that GenAI and teachers/tutors provided similar usefulness for EFL learning. There have been similar findings in other cultural backgrounds (e.g. Lee & Drajati, Reference Lee and Drajati2019; Ou et al., Reference Ou, Stöhr and Malmström2024). For example, a large-scale qualitative investigation into Northern European students’ GenAI usage detailed students’ view of such technology as “my teacher” (Ou et al., Reference Ou, Stöhr and Malmström2024: 6) for they rely on GenAI for knowledge consultation, demonstrating a consistence in GenAI usage behaviour across cultures. However, this does not suggest that language teachers can be replaced, but to accentuate the significance of GenAI in IDLE practices for EFL learners, especially in oral speaking where GenAI can be used as a conversational partner (Liu et al., Reference Liu, Darvin and Ma2024a; Yang et al., Reference Yang, Kim, Lee and Shin2022).
Although GenAI’s application in foreign language education has been investigated quantitatively and qualitatively (Liu & Ma, Reference Liu and Ma2024; Ou et al., Reference Ou, Stöhr and Malmström2024), this endeavour has been confined to the IDLE discipline. Since GenAI has the potential to transform education both in classes and out of classes (Meniado, Reference Meniado2023), how to bridge IDLE to teacher-involved education remains little answered. This study, by using an experimental design that alleviates teachers’ inadequacies identified by the TPACK framework, could provide an alternative to facilitate EFL speaking acquisition.
2.2 Holistic learning ecology and GenAI
A learning ecology (Brown, Reference Brown2000) is a holistic and adaptive system comprising rich resources, activities, and learning practices under formal and informal learning scenarios (Brown, Reference Brown2000; Luckin, Reference Luckin2008). Such practices are particularly sensitive to technological advancements because technology enriches resources and interactions within learning practices (Brown, Reference Brown2000; Lai et al., Reference Lai, Liu, Hu, Benson and Lyu2022; Lai, Zhu & Gong, Reference Lai, Zhu and Gong2015; Luckin, Reference Luckin2008). GenAI provides students with a personalised conversational partner for practice and feedback when learning oral English (Ai, Reference Ai2017; Yang et al., Reference Yang, Kim, Lee and Shin2022) and a simulated culturally sensitive environment that provides relatively authentic interactions (Shadiev, Wang, Chen, Gayevskaya & Borisov, Reference Shadiev, Wang, Chen, Gayevskaya and Borisov2024) that are otherwise hard to find in a foreign country.
From the humanistic perspective, GenAI technology motivates students to conduct autonomous IDLE practice (Lai et al., Reference Lai, Liu, Hu, Benson and Lyu2022; Tai, Reference Tai2024a, Reference Tai2024b). Through the simulated conversational environment, students who adopt this technology feel more motivated to engage in the conversations (Yang et al., Reference Yang, Kim, Lee and Shin2022), leading to deep language learning (Wang, Su, & Yu, Reference Wang, Su and Yu2020). In a large-scale qualitative text analysis, Ou et al. (Reference Ou, Stöhr and Malmström2024) found that students treat GenAI as a significant source of information, inspiration, and teaching, which bestows an identity of “my teacher” (Ou et al., Reference Ou, Stöhr and Malmström2024: 6) onto GenAI tools. This finding further stresses the significant role of GenAI in the holistic learning ecology.
2.3 TPACK and GenAI
TPACK (Koehler, Mishra & Cain, Reference Koehler, Mishra and Cain2013) provides a sound theoretical framework for understanding how teachers integrate technology, pedagogy, and content knowledge to support student learning (Sun, Ma, Zeng, Han & Jin, Reference Sun, Ma, Zeng, Han and Jin2023). It emphasises the dynamic interplay between these three domains (Dong, Chai, Sang, Koh & Tsai, Reference Dong, Chai, Sang, Koh and Tsai2015) and highlights the importance of teachers’ ability to effectively integrate technological tools and resources into students’ language learning practices while maintaining a focus on pedagogical goals and content (Saubern, Henderson, Heinrich & Redmond, Reference Saubern, Henderson, Heinrich and Redmond2020).
In the TPACK framework, the domain of technological knowledge refers to understanding how different technologies can be used effectively in various educational settings (Greene & Jones, Reference Greene and Jones2020). It contains three elements: knowledge of existing technologies (knowing the capabilities and limitations of existing technology for teaching and learning), skills in technology use (proficiency in using technological tools), and awareness of emerging technologies (keeping up to date with technology advancements; Adipat, Reference Adipat2021; Haleem, Javaid & Singh, Reference Haleem, Javaid and Singh2022). Teachers are typically aware of GenAI’s potential in language teaching and learning (Jiang, Jong, Lau, Chai & Wu, Reference Jiang, Jong, Lau, Chai and Wu2021; Ong & Annamalai, Reference Ong and Annamalai2024) but have technical difficulties when integrating GenAI into education (Ong & Annamalai, Reference Ong and Annamalai2024; Zhang, Zou, Cheng & Xie, Reference Zhang, Zou, Cheng and Xie2022). Hence, teachers show a low commitment and capability to integrate technology into EFL education (Ping, Reference Ping2022), despite knowing the multifaceted benefits of GenAI in SLA (Calvo & Hartle, Reference Calvo and Hartle2024; Godwin-Jones, Reference Godwin-Jones2023). We addressed this issue by contemplating the effectiveness of GenAI-mediated IDLE practices to overcome such difficulties.
2.4 Social cognitive theory and GenAI
Foreign language acquisition is multifaceted, and several theories have been developed to explain the acquisition process from different perspectives. Krashen’s (Reference Krashen and Alatis1992) input hypothesis focuses on the learning inputs and stresses on the necessity of i+1 input in SLA for effective language learning. Moreover, Swain’s (Reference Swain and Hinkel2005) output hypothesis emphasises the significance of output practices in language learning that extends beyond the suitable learning input. Social cognitive theory (SCT) transcends the discourse of input and output by focusing on the usages and practice of the materials and practices (Bandura, Reference Bandura and Ewen2014); therefore, we adopted it as the theoretical framework for this study.
SCT (Bandura, Reference Bandura1986) emphasises the interaction between individuals, their behaviour, and the environment in the process of learning and development (Bandura, Reference Bandura and Ewen2014). According to this theory, individuals are not passive recipients of information; rather, they actively engage in the learning process by setting goals, monitoring their progress, and adjusting their behaviour based on feedback and reinforcement (Ibrahim, Clark, Reese & Shingles, Reference Ibrahim, Clark, Reese and Shingles2020; Liu, Huang & Wang, Reference Liu, Huang and Wang2014).
SCT emphasises the significance of modelling and imitation in language learning (Chen, Reference Chen2014; Deng, Wang & Xu, Reference Deng, Wang and Xu2022). EFL researchers have found that learners imitate language structures, pronunciation, and communication strategies through observation from and practice with authentic and authoritative sources (LaScotte, Meyers & Tarone, Reference LaScotte, Meyers and Tarone2021; Li & Somlak, Reference Li and Somlak2019; Sasaki & Takeuchi, Reference Sasaki and Takeuchi2010). These observations and practices are rooted in the students’ self-efficacy (Zhou, Chiu, Dong & Zhou, Reference Zhou, Chiu, Dong and Zhou2023) and individuals’ belief in their ability to succeed in specific tasks (Bandura, Reference Bandura and Ewen2014). GenAI can promote self-efficacy in various ways (Tseng, Chen & Lin, Reference Tseng, Chen and Lin2023; Zhou et al., Reference Zhou, Chiu, Dong and Zhou2023). From the technological perspective, Liu, Hou, Tu, Wang and Hwang (Reference Liu, Hou, Tu, Wang and Hwang2023) suggested that immediate and personalised feedback facilitates EFL students’ writing exercises and promotes their self-efficacy. From the humanistic perspective, Chang, Hwang and Gau (Reference Chang, Hwang and Gau2022) argued that the students’ general positive perception of GenAI technology, such as convenience in obtaining information and interest in using such technology, can enhance students’ self-efficacy and academic performance.
2.5 SCT and TPACK
Using GenAI tools in language education can promote self-efficacy from both technological and humanistic perspectives (Liu et al., Reference Liu, Hou, Tu, Wang and Hwang2023; Ou et al., Reference Ou, Stöhr and Malmström2024), which in turn enhances the observation and imitation behaviours that influence SLA and academic performance (Bandura, Reference Bandura1986, Reference Bandura and Ewen2014; Zhou et al., Reference Zhou, Chiu, Dong and Zhou2023). Moreover, TPACK includes the skills that teachers should possess to integrate technology effectively to impart knowledge and stimulate learning (Greene & Jones, Reference Greene and Jones2020; Sun et al., Reference Sun, Ma, Zeng, Han and Jin2023). Therefore, SCT could provide theoretical insights into TPACK from the perspective of the holistic learning ecology. Given that the purpose of education is to provoke learning (Robinson & Aronica, Reference Robinson and Aronica2019), teachers’ ability to use technology in education only partly constitutes the holistic learning ecology. Out-of-class autonomous learning of English (Lai et al., Reference Lai, Zhu and Gong2015), called IDLE (Lee, Reference Lee2019a), is also an essential component. It could utilise the technological knowledge of the digital native students and be carried out regardless of whether the teacher has limited technological knowledge (Ong & Annamalai, Reference Ong and Annamalai2024). Thus, using SCT to investigate the effectiveness of GenAI-mediated IDLE practices to account for the challenging demand of teachers’ technological knowledge in the TPACK framework could represent a significant step towards a more comprehensive theoretical understanding of the holistic learning ecology.
3. Methodology
To examine the role of GenAI activities on EFL learners’ oral English proficiency levels and IDLE practices, we conducted an explanatory mixed-methods study comprising an experimental study supplemented with two rounds of follow-up qualitative interviews to explain the quantitative findings and to evaluate the behavioural sustainability. The experimental study used the pre- and post-test design and lasted 10 weeks. The pre- and post-tests adopted the International English Language Testing System (IELTS) speaking band descriptors for grading because of the communication-oriented nature of the IELTS speaking grading rubrics (Nakatsuhara, Inoue & Taylor, Reference Nakatsuhara, Inoue and Taylor2021).
3.1 Participants
This research initially included 48 undergraduate EFL students aged 18–21 years from a STEM-oriented institution in mainland China, divided into two groups of 24. One student dropped out of the experimental group owing to illness, leaving 24 students in the control group and 23 students in the experimental group. Among the 47 participants, 31 were men and 16 were women, which corresponds with the gender distribution at tech-oriented universities in China (Tencent Education, 2021). Based on the pre-test, there was no significant difference (t = −1.88, df = 45, p = 0.851) in English oral proficiency between the control group (M = 5.188, SD = 0.548) and the experimental group (M = 5.217, SD = 0.540). We recruited the participants through a rigorous process, with the inclusion criterion being that the participant had to have previous experience with GenAI to reduce mastery bias in the experiment (Ahn, Bong & Kim, Reference Ahn, Bong and Kim2017). Advertisements were posted in the university building designated for IELTS study to entice participation. We also encouraged the participants to refer others to the study. The participants received a complete experiment description and signed an informed consent form before starting.
3.2 Experimental procedures
The 10-week study comprised a 2-hour session for each group each week. We divided the participants randomly into the experimental and control group. Two experienced IELTS teachers who had scored 8.5 and 9 on the IELTS oral examination graded the pre- and post-tests, based on the IELTS speaking rubrics for pronunciation, fluency, grammar, and lexical resource (for detailed information, please refer to the “IELTS Speaking Band Descriptors” at https://ielts.org/). Before the start of the first week, the participants in the experimental group were trained on how to interact with the virtual companion (友伴) named Lucy and the digital English interpreter (英语翻译官) in iFlytek Spark (讯飞星火), a Chinese GenAI tool for academic purposes that individuals can interact with in English. We chose this virtual companion because it can generate communicative questions and responses for students to practise writing and speaking and provide feedback and sample answers that are personalised to each student’s input. Moreover, the students were taught how to prompt Lucy to practise speaking, to ask for feedback, and to get sample answers when they were stuck. The training – which consisted of a brief demonstration, a student practice, and a technical consultation – lasted about 30 minutes on the first day of the experiment. Although the virtual companion may sometimes ask non-IELTS questions, the students in the experimental group had a printed question bank to ask Lucy to provide sample answers and feedback on their own answers. Because SCT describes modelling and imitation as the main ways for SLA (Chen, Reference Chen2014; Deng et al., Reference Deng, Wang and Xu2022), we suggested that the students choose whichever feedback forms they prefer to model and imitate as a part of their IDLE practices. The typical interaction with Lucy for multimodal practice and feedback is shown in Figure 1. As the control group interacted with an impartial, experienced IELTS teacher who scored 9 on the IELTS oral examination, this group received no training in GenAI use. However, the students in the control group were encouraged to repeat the teacher’s modifications of their answers in class.
During the 10-week experiment, the two groups gathered in two separate self-study rooms. The control group interacted with an IELTS teacher who did not grade the pre- or post-test. This teacher asked the students authentic IELTS oral examination questions, invited the participants to answer, and gave feedback on the answers. On the other hand, the experimental group interacted with the digital English interpreter, Lucy. There was no teacher present for the self-study sessions of the experimental group, only the first author taking attendance at the beginning and the end of each session.
The pre- and post-tests were administered at Week 0 and 11, respectively. The tests simulated the IELTS oral examination, in which one examiner asks questions and records the answers from each participant. For the pre- and post-tests, questions were randomly selected from the question banks. Of note, the same student was not asked the same questions for the pre- and post-tests. The examiner also wrote comments and graded the exam. Subsequently, an additional examiner played the recordings and double-checked the comments and, if needed, adjusted the grades. The inter-rater reliability was 0.93. Moreover, neither examiner took part in teaching the control group nor had any previous relationship with the participants. Figure 2 shows the experimental procedure.
3.3 Data collection and analysis
Apart from the quantitative data obtained from pre-tests and post-tests in Week 0 and 11, we collected qualitative data by interviewing the participants at two times. First, during Week 12, we interviewed seven control group participants and six experimental group participants who volunteered (a total of six women). Second, during Week 14, we interviewed 23 students in the experimental group to address RQ4. We conducted highly flexible semi-structured interviews (Brinkmann, Reference Brinkmann and Leavy2020) to maximise the answers that students can give (Brinkmann, Reference Brinkmann and Leavy2020; Green, Camilli & Elmore, Reference Green, Camilli and Elmore2012), thus facilitating qualitative data extraction. To encourage the participants to provide as much information as possible, we conducted the interviews in the participants’ first language and subsequently translated their responses into English.
We analysed the quantitative data from the pre- and post-tests by calculating descriptive and inferential statistics with SPSS Statistics 28 to examine differences in speaking proficiency between the experimental and control groups. We used NVivo 12 to perform thematic analysis of the qualitative data and to identify recurring patterns and themes related to students’ experiences and beliefs as justifications behind the quantitative findings. Specifically, we followed the five-step guidance of Braun and Clarke (Reference Braun and Clarke2006) – data familiarisation, manual coding, thematic identification, theme reviews, and naming – to ensure the coherence, consistency, and presentation of the identified themes (Nowell, Norris, White & Moules, Reference Nowell, Norris, White and Moules2017).
4. Findings
4.1 Learning performance within the groups
As shown in Table 1, the pre- and post-test comparison indicated a significant improvement (p < .01) in oral proficiency in the control and experimental groups.
** p < .01.
4.2 Learning performance between the groups
When we compared the post-test results between the groups, we found that the experimental group had better oral proficiency (Table 2), even though both groups showed similar oral proficiency before the experiment (t = −1.88, df = 45, p = .85). The post-test individual-sample t-test result indicated a significant difference (p < .05) between the control and experimental groups.
Note. CI = confidence interval.
4.3 GenAI promotes learning performance through technological uniqueness
The interviews provide justification for GenAI’s effects on English oral proficiency from technological and humanistic perspectives. From the technological viewpoint, one of the significant benefits GenAI offers is the number of practice opportunities it provides to students. In the first round of interviews, all 13 interviewees mentioned this point:
I wished the class size was a bit smaller, as I only had opportunities to answer about 4 to 5 questions in each session for her to correct my wrongs. (I4, control group)
With iFlytec, the entire two hours is mine. It’s like getting individual tuition without paying for anything. (I1, experimental group)
The students in the control group had fewer practice opportunities due to the class size, while the experimental group had many more opportunities. From the SCT perspective, students in the experimental group had more opportunities to observe and imitate proper English usage (in the first interview, 12 of the 13 interviewees hold this opinion), which contributed to the improved English learning performance:
When the teacher explained new vocabulary to a classmate which I didn’t know about, I’d write it down and try to use it in my talk. (I2, control group)
You can ask for suggested answers or ways to develop your own answers from the virtual companion. And if you cannot understand it, it provides the texts in writing as well as in speaking so you can read them out loud. You can also mimic the intonations of the virtual companion, which is highly beneficial for my oral speaking. (I7, experimental group)
According to I7, the personalised feedback as well as the multimodality of the GenAI responses benefited SLA by enhancing modelling and imitation. Ten of the other interviewees agreed with this point. Furthermore, based on the qualitative data analysis, the multimodality feature of this particular GenAI tool may be especially helpful to students with low English proficiency before the experiment. A comparison between I7 (who scored a 4.5 on the pre-test) and I12 (who scored 6.0 in the pre-test) underscores this view:
When I listen to my teachers in high school, I often needed a very long time to think about what she said and that made me not be able to follow up. But the GenAI can give me enough time to read and understand the content, and it gives the audio for the content as well so that I can model on it. I think this has helped my speaking. (I7, experimental group)
It [the virtual companion] has given me a lot of suggestions regarding my answers. But when I have sought advice on using more complex sentence structures to answer some questions, the GenAI cannot provide many useful suggestions. Sometimes, it just explains why my answers are good and that is it. (I12, experimental group)
Based on the qualitative data, we found that the GenAI-based virtual companion technology can improve English oral proficiency by creating more opportunities to practise speaking and personalised feedback for students to hear and imitate. Moreover, such benefits may be more beneficial for students with low English proficiency who need help to develop and deliver their answers in English than those with above-average English proficiency who need help to construct more complex sentences when speaking.
4.4 GenAI promotes learning performance through humanistic perceptions
Another theme that emerged from the qualitative data is that GenAI benefits students’ self-efficacy and learner agency by improving their willingness to communicate (mentioned by 18 of the 23 interviewed participants in the experimental group) and avoiding unconscious teacher bias (mentioned by 13 of the 23 interviewed participants in the experimental group), both of which enhance English oral proficiency. This benefit may be especially valuable for students with disadvantages related to their language proficiency or personality:
I feel like I didn’t get enough chances to talk in class as the other students did. I think I’m being ignored because of my poor English skills. … The other student, a tall boy who’s good at English, was asked a lot. I didn’t even have half of his practice opportunities. (I8, control group)
I’m a little introverted and I’m not good at English. So, when speaking in English to others, I would be nervous. … I feel like I’m being judged. … With an AI tool, I felt less nervous and could speak for more. … So I practised and improved. (I11, experimental group)
I8 may have been at a disadvantage due to the teacher’s perception bias, which resulted in fewer learning opportunities, whereas I11 was encouraged by the GenAI tool because the technology improved their willingness to communicate. Hence, I8 was in a disadvantaged learning position while I11 was not. Therefore, we suggest that although specific personal characteristics may not necessarily benefit SLA, the influence of these characteristics could be mitigated by using GenAI as a mediator of IDLE practices.
4.5 GenAI alone is not enjoyable enough to foster extramural IDLE
During Week 14, the 23 students in the experimental group who we interviewed reported engaging in extramural IDLE with GenAI to some degree (mentioned by seven of the 23 interviewed participants). However, these seven participants struggled to sustain it, resulting in the abandonment of activities outside of their usual routine:
When I was using GenAI at the dorm, it was easy to be disturbed by others and it was easy to disturb them. When I tried to find a room for self-study, it was difficult to find an entire classroom that has no one else in it. So, eventually, I dropped it before the experiment ended. (I7, experimental group)
The relinquishment of such behaviours may be caused by changes in learning environment (mentioned by five of the seven participants who reported using GenAI after the experiment) as well as a lack of enjoyment (mentioned by five of the seven participants who reported using GenAI after the experiment). In general, the participants in the experimental group found GenAI to be useful or practical, but not necessarily enjoyable:
The instant feedback and the plentiful practice opportunities can help me to improve my oral speaking for sure, but it’s a bit boring to study with it. … Because the content is not exciting and the feedback modality is not interactive enough. (I7, experimental group)
Yes, I felt less nervous when talking to Lucy. But I didn’t enjoy it. I actually found it to be tiresome because I had to control every conversation and sometimes Lucy couldn’t understand my need and I had to think about different prompts to get what I needed, which is unlikely to happen with teachers. (I11, experimental group)
5. Discussion
5.1 The quantitative findings
We found that GenAI could improve EFL college students’ English oral proficiency. This result corresponds with the previous findings that GenAI technology used as a conversational partner can benefit EFL student’s language learning (e.g. Belda-Medina & Calvo-Ferrer, Reference Belda-Medina and Calvo-Ferrer2022; Liu et al., Reference Liu, Darvin and Ma2024a; Yang et al., Reference Yang, Kim, Lee and Shin2022). As suggested by Yang et al. (Reference Yang, Kim, Lee and Shin2022), GenAI chatbots could facilitate students’ language learning in informal settings by enhancing their understanding and ability to complete the tasks and, subsequently, improving their ability to use language correctly in future exams (Yang et al., Reference Yang, Kim, Lee and Shin2022). The responses and feedback based on large academic corpora and presented in forms of natural language align with SCT, which stresses the significance of modelling and imitation of appropriate language usage while learning a language (Chen, Reference Chen2014; Deng et al., Reference Deng, Wang and Xu2022).
Our study provides empirical evidence that GenAI-mediated IDLE practices can lead to significantly better outcomes in English oral proficiency than learning in traditional teacher-centred classrooms. This finding challenges the role of teachers in Education 5.0, which focuses on “learner-centredness” (Meniado, Reference Meniado2023: 467) supported by “human-machine interaction technologies” (Meniado, Reference Meniado2023: 466). Technological advancements such as GenAI can add value and effectiveness to improve learning (Ong & Annamalai, Reference Ong and Annamalai2024) and have the potential to revolutionise “the L2 teaching-learning ecosystem” (Meniado, Reference Meniado2023: 471), introducing new policies, theoretical conceptualisations, and pragmatic practices in this new era of education (Ng et al., Reference Ng, Lee, Tan, Hu, Downie and Chu2023). Our findings advocate for increased adoption of IDLE practices with GenAI technology.
5.2 The qualitative findings
We have demonstrated how GenAI promotes learning through technological perspectives, such as more learning opportunities and personalised feedback. By increasing students’ chances to speak and by generating responses to their learning needs, GenAI strengthens the bond between modelling and imitation, underscoring the significance of SCT in SLA. In addition, GenAI may be especially beneficial for disadvantaged learners by providing constructive suggestions to answers and reducing negative aspects such as L2 anxiety and teacher perception bias.
The benefits of GenAI in providing constructive suggestions and enhancing learning motivation have been discussed previously (Chiu, Reference Chiu2023; Li & Kim, Reference Li and Kim2024). Teacher bias is a widely reported phenomenon (Copur-Gencturk, Cimpian, Lubienski & Thacker, Reference Copur-Gencturk, Cimpian, Lubienski and Thacker2020; Denessen, Hornstra, van den Bergh & Bijlstra, Reference Denessen, Hornstra, van den Bergh and Bijlstra2022; Dian & Triventi, Reference Dian and Triventi2021; Umansky & Dumont, Reference Umansky and Dumont2021). However, it is not yet clear how GenAI can counteract or prevent teacher bias to improve students’ learning. As suggested by Starck, Riddle, Sinclair and Warikoo (Reference Starck, Riddle, Sinclair and Warikoo2020), “teachers are people too” (p. 273). Indeed, educators are also subjective to perception biases such as colour (Copur-Gencturk et al., Reference Copur-Gencturk, Cimpian, Lubienski and Thacker2020), weight (Dian & Triventi, Reference Dian and Triventi2021), and social stereotypes (Denessen et al., Reference Denessen, Hornstra, van den Bergh and Bijlstra2022). In practice, there are many factors for teachers to consider, such as students’ flow of experience in teacher-centred education (Ateş & Garzón, Reference Ateş and Garzón2022; Wagner, Holenstein, Wepf & Ruch, Reference Wagner, Holenstein, Wepf and Ruch2020). Therefore, even assuming every teacher possesses strong teacher agency, their teaching may not be equally beneficial to every student. From the perspective of achieving educational equality, we suggest that GenAI-mediated IDLE practices could promote equity for disadvantaged English learners to improve their oral proficiency. This step towards learner-centred SLA, facilitated by GenAI technologies in informal learning settings, ensures the holistic learning ecology of second language education.
Fully autonomous extramural IDLE practices are integral to achieve the holistic learning ecology for EFL learners. That said, our findings indicate that the use of GenAI as a conversational partner for IDLE practices may not provide students with adequate enjoyment, leading to inefficient intrinsic motivation (Deci & Ryan, Reference Deci, Ryan, Van Lange, Kruglanski and Higgins2012; Liu et al., Reference Liu, Zhang and Zhang2024b) for interest-based learning with GenAI. Although students find it useful to have GenAI as a conversational partner in EFL learning, these extramural IDLE practices may not be suitable for different learning environments or provide human likeness in the interactions. These issues need to be addressed before GenAI can be applied to attain the holistic learning ecology.
6. Conclusion and suggestions for further research
From the TPACK framework, teachers’ adequate technological knowledge is essential to benefit students’ learning with digital technologies. Moreover, although the SLA processes involve multifaceted factors such as cognitive and social skill development and the integration of linguistic knowledge with cultural context, observing and imitating the language’s proper usage has been argued to be one of the fundamental practices within SLA (Bandura, Reference Bandura and Ewen2014). Therefore, it would be possible to present observation and imitation opportunities to learners using technological means without the influence of teachers’ technological knowledge by providing GenAI-mediated oral IDLE practices. Our quantitative findings support this theoretical view. We found that GenAI technology could represent an advantageous alternative for students to practise speaking English and could yield a significant proficiency improvement between the pre- and post-tests (RQ1) and between the control and experimental groups (RQ2).
Based on the qualitative data, such improvements may derive from the technological and humanistic perspectives (RQ3). GenAI tools provide more practice opportunities and personalised feedback catered to the learner’s personal learning needs. Humanistically, such technology could advance educational equity by preventing student characteristics from negatively interacting with and influencing the learning resources and environments. We recommend wider adoption of learner-centred GenAI-facilitated SLA in informal settings to achieve the holistic learning ecology. However, our qualitative results also suggest that students are not likely to continue such actions in the long run (RQ4). The experimental group participants generally found the use of GenAI in their self-study sessions to be helpful, but not enjoyable. Hence, they would be less willing to use GenAI as a conversational partner to improve their English fluency.
Under the guidance of SCT as the theoretical framework, we analysed the EFL learning as a practice of observation and imitation of authentic materials and practice dialogues. Acknowledging the multifaceted influences of SLA, it would be useful to investigate how GenAI contributes to input and output materials such as personalised feedback to gain a more holistic comprehension of technology and language acquisition. Moreover, we recommend further investigation regarding the potential strategies that could help transform extracurricular IDLE into extramural IDLE to gain a more holistic understanding of the dynamics of these activities.
Ethical statement and competing interests
This research is supported by JC_AI research fund (Project number: 02186), funded by the Education University of Hong Kong and the Hong Kong University of Science and Technology, Hong Kong, China. All participants were voluntary, and anonymity was ensured with codes assigned to each of the participants. Furthermore, the authors declare no competing interests. The authors declare no use of generative AI.
About the authors
Ellen Yue Zhang is an assistant professor in the Department of English Language Education at the Education University of Hong Kong. Her research interests include L2 motivation, identity and investment, CALL, IDLE, and critical pedagogies. She has published in Computer Assisted Language Learning; Journal of Multilingual and Multicultural Development; TESOL Quarterly; System; ReCALL; Journal of Language, Identity, and Education; Language Awareness; and Chinese Journal of ESP.
Mingyue Michelle Gu is a professor and the dean of the Graduate School at the Education University of Hong Kong. Her research interests include E-medium instruction in higher education, multilingualism and mobility, family language policy, and identity and digital literacies studies, and she has published widely in these fields. She is listed as one of the the world’s top 2% scientists by Stanford University (2022).
Lihang Guan is a PhD student in the Department of English Language Education at the Education University of Hong Kong. His research interests include computer-assisted language learning (CALL), informal digital learning of English, and AI in education.