Introduction
With the increasing integration of artificial intelligence (AI) in education, second language (L2) learning is undergoing a significant transformation. Generative AI (GenAI), in particular, is poised to revolutionize the learner experience, building upon existing technologies like machine translation that are already widely used. Unlike earlier technologies that primarily focused on translation or rote memorization, GenAI offers a more dynamic and interactive learning environment. Its ability to generate human-like text, provide personalized feedback, and create immersive learning scenarios has the potential to significantly enhance language acquisition. This shift in language learning calls for a deeper understanding of student engagement and its role in the learning process. Previous research by Huang and Mizumoto (Reference Huang and Mizumoto2024a, Reference Huang and Mizumoto2024b, Reference Huang and Mizumoto2024c) demonstrated that using GenAI with appropriate measures not only supports and maintains learners’ motivation but also improves writing self-efficacy. Additionally, several studies have shown that technology use positively influences engagement (e.g., Chen et al., Reference Chen, Lambert and Guidry2009; Katyara et al., Reference Katyara, Dahri and Muhiuddin2023; Maričić & Lavicza, Reference Maričić and Lavicza2024; Schindler et al., Reference Schindler, Burkholder, Morad and Marsh2017), which is a key driver of language learning (Iwaniec & Khaled, Reference Iwaniec, Khaled, Cirocki, Indrarathne and McCulloch2024). This underscores the importance of investigating GenAI’s role in the language learning classroom further. While previous research has explored the impact of technology on engagement and the relationship between motivation and AI use, limited studies have examined the connection between motivation, engagement, and AI utilization in the context of L2 learning. Given that engagement is closely tied to motivation (Iwaniec & Khaled, Reference Iwaniec, Khaled, Cirocki, Indrarathne and McCulloch2024), this empirical study aimed to explore the intricate relationship between motivation, engagement, and AI utilization — with a specific focus on AI-generated written feedback (WF) — in an English as a Foreign Language (EFL) context in Japan.
Literature review
L2 motivation self system
L2 Motivational Self System (L2MSS) was introduced by Dörnyei in 2009 as a comprehensive framework to better understand language learning motivation (Boo et al., Reference Boo, Dörnyei and Ryan2015), expanding on previous theories like Gardner’s (Reference Gardner1985) integrative/instrumental dichotomy. This model is particularly relevant in contexts like EFL, where students often lack meaningful contact with native speakers. The L2MSS is divided into three distinct constructs, namely the Ideal L2 Self (IL2), the Ought-to L2 Self (OL2), and the L2 Learning Experience (L2LE). These constructs allow for deeper insight into how learners’ future-oriented goals and direct learning experiences affect their motivation. At the heart of L2MSS lies the IL2, a concept derived from the idea of possible selves (Markus & Nurius, Reference Markus and Nurius1986). This construct refers to a learner’s vision of the person they aspire to become by mastering the language, often tied to professional and personal aspirations. Studies have demonstrated that a strong IL2 often correlates with higher motivation and improved language proficiency (e.g., Al‐Hoorie, Reference Al‐Hoorie2018; Dunn & Iwaniec, Reference Dunn and Iwaniec2021). Promotion-focused goals, where learners are driven by positive outcomes, align with this concept, suggesting that students with a well-developed IL2 tend to put in more effort to achieve their goals. However, some recent studies suggest that this correlation may not be universal (Takahashi & Im, Reference Takahashi and Im2020), indicating the need for further research to fully understand this construct’s impact on language learning. The OL2, on the other hand, represents a learner’s sense of obligation to learn the language to meet external expectations, such as avoiding failure or fulfilling societal responsibilities (Dörnyei & Taguchi, Reference Dörnyei and Taguchi2009). In contrast to the IL2, this construct is more prevention-focused, where learners are motivated by a desire to avoid negative consequences. While traditionally seen as less impactful than the IL2, studies in Asian contexts, such as Japan (Suzuki, Reference Suzuki2014), suggest that the OL2 can still hold significant motivational power, particularly in educational settings where English is a compulsory subject. Researchers continue to debate whether this construct should be further divided into internally and externally motivated components. The L2LE shifts the focus from future-oriented goals to the immediate experiences learners have while engaging with the language. This construct encompasses various factors, including interactions with teachers and peers, classroom environment, and study methods. Although the L2LE is recognized as an important motivational factor (e.g., Lamb, Reference Lamb2012; Takahashi & Im, Reference Takahashi and Im2020; Teimouri, Reference Teimouri2016), it has received less attention compared to the other constructs. A recent study by Huang et al. (Reference Huang, Mizumoto and Bailey2024) demonstrated the potent factor of L2LE within the L2MSS and its influence on IL2 and OL2. They concluded that L2LE, encompassing the immediate learning environment, should serve as the strongest predictor for intended effort and proficiency. Research indicates that the quality of students’ learning experiences plays a critical role in their motivation to continue learning. Despite its potential importance, the L2LE remains underexplored, and its broad scope has prompted calls for more detailed studies to break down its components and better understand its influence on language learning. Researchers investigating the L2MSS often focus on the IL2 and OL2 and their connections to intended effort, as seen in studies like Yashima et al. (Reference Yashima, Nishida and Mizumoto2017) and Papi (Reference Papi2010). While these constructs are strong predictors of intended effort, their influence on actual achievement is less certain. Fewer studies, however, have examined the role of the L2LE in relation to these constructs. Some research suggests a strong relationship between the L2LE and both the IL2 and OL2, while others, such as Csizér and Kormos (Reference Csizér, Kormos, Dörnyei and Ushioda2009), report weaker associations. Yashima et al. (Reference Yashima, Nishida and Mizumoto2017) explored the impact of L2LE, redefined as communication orientation and grammar-translation orientation, and found positive influences on both IL2 and OL2, highlighting the importance of the L2LE, particularly in the Japanese context.
Engagement
In second language acquisition (SLA) research, learning engagement is typically categorized into three main dimensions: affective engagement (AE), behavioral engagement (BE), and cognitive engagement (CE). AE relates to students’ emotional reactions and attitudes toward learning and feedback. In the context of WF, researchers found that students generally react well to feedback that is detailed and balanced, providing motivation without overwhelming corrections, which boosts their motivation and self-efficacy (Ene & Yao, Reference Ene and Yao2021; Purnomo & Pahlevi, Reference Purnomo and Pahlevi2021). However, the type of feedback affects engagement, with feedback focused on form often receiving more positive responses than content-focused feedback (Fan & Xu, Reference Fan and Xu2020). The tone and delivery of feedback are important in influencing emotional responses and maintaining engagement (Yu et al., Reference Yu, Jiang and Zhou2020). Additionally, language proficiency plays a role in how students engage, as high- and low-proficiency learners show different reactions, particularly to localized feedback (Cheng & Liu, Reference Cheng and Liu2022). In EFL settings, students may respond better to feedback from native English-speaking teachers compared to non-native teachers (Ene & Yao, Reference Ene and Yao2021). AE is closely connected to both CE and BE, forming a complex, dynamic relationship (Cheng & Liu, Reference Cheng and Liu2022; Fan & Xu, Reference Fan and Xu2020). BE is the observable actions and efforts that learners take in response to WF, commonly measured by the extent and quality of their revisions (Fu et al., Reference Fu, Yang and Zhang2024). Although revisions are a primary indicator, CE and AE may still occur even if feedback does not lead to immediate changes (Fu et al., Reference Fu, Yang and Zhang2024). Proficiency also impacts how students engage, with higher proficiency learners showing greater cognitive involvement in response to teacher feedback (Cheng & Liu, Reference Cheng and Liu2022). In peer feedback situations, BE tends to be most evident, followed by CE and AE (Farsani & Aghamohammadi, Reference Farsani and Aghamohammadi2021). Activities that require deeper cognitive processing generally lead to a higher uptake of written corrective feedback (WCF) during revisions, although results can differ depending on the type of error (Park & Ahn, Reference Park and Ahn2022). Factors such as the quality of feedback, proficiency level, and well-structured peer review activities also significantly affect engagement (Fu et al., Reference Fu, Yang and Zhang2024). These insights point to the intricate and interconnected nature of BE, CE, and AE in L2 acquisition and EFL contexts. CE refers to the mental processes learners utilize when engaging with WF. This includes understanding, analyzing, and applying feedback, which encourages reflection on language use and fosters deeper cognitive processing, potentially leading to better writing outcomes. Activities that stimulate deeper CE are associated with a higher uptake of WCF during revisions (Park & Ahn, Reference Park and Ahn2022). Research shows that CE is most common with teacher feedback, while BE is more typical in peer feedback scenarios (Farsani & Aghamohammadi, Reference Farsani and Aghamohammadi2021). Language proficiency also shapes how learners engage, with differences in how low- and high-proficiency learners handle localized feedback, though both groups engage similarly with global feedback (Cheng & Liu, Reference Cheng and Liu2022). Additionally, self-regulation abilities influence engagement, with stronger self-regulators exhibiting higher levels of CE, BE, and AE compared to their less-skilled peers (Yang & Zhang, Reference Yang and Zhang2023). These findings emphasize the complexity of student engagement with WCF and the need to tailor feedback to maximize cognitive involvement and support writing improvement. Research has shown that the three factors of engagement — affective, behavioral, and cognitive — are closely linked and inseparable. AE leads to increased BE and CE, as demonstrated by Ebadi et al. (Reference Ebadi, Zandi and Ajabshir2024), indicating that when learners experience positive emotions, they are more likely to actively participate and invest mental effort in language learning. BE is also closely tied to AE and CE aspects, as shown in Cheng and Zhang’s (Reference Cheng and Zhang2024b) study, which found that students who actively participated were more likely to develop positive attitudes and engage in deeper cognitive processing of language input. CE, in turn, influences both AE and BE. Pearson’s (Reference Pearson2024) systematic review concluded that learners who are cognitively engaged tend to experience more positive emotions and show greater participation. In summary, these three engagement factors are intertwined — enhancing one boosts the others. Therefore, it is essential for language teachers to create environments that foster all three aspects of engagement.
Motivation and engagement
The construct of motivation and engagement are closely intertwined, forming a dynamic and reciprocal relationship. Motivation is often considered a broad concept encompassing interest and engagement (Renninger et al., Reference Renninger, Ren, Kern, Fischer, Hmelo-Silver, S. and Reimann2018). Engagement, in contrast, has been described as “motivation-plus,” acting as an extension of motivation and its critical application (Mercer & Dörnyei, Reference Mercer and Dörnyei2020). Research underscores their interplay: motivation drives higher engagement, which in turn enhances motivation, suggesting a cyclical relationship (Martin et al., Reference Martin, Ginns and Papworth2017; Sulis, Reference Sulis2020). This reciprocal relationship has been observed even before the integration of GenAI into education. In the context of language learning, motivated students tend to show higher levels of engagement, and engaged students, in turn, experience greater motivation (Abdollahzadeh et al., Reference Abdollahzadeh, Farsani and Zandi2022). Studies have consistently found that motivation fosters greater engagement and improved performance in foreign language learning (Kanellopoulou & Giannakoulopoulos, Reference Kanellopoulou and Giannakoulopoulos2020; Noels et al., Reference Noels, Lou, Lascano, Chaffee, Dincer, Zhang, Zhang, Lamb, Csizér, Henry and Ryan2019). Additionally, engagement, particularly CE, has been identified as a predictor of motivation among EFL learners (Ghelichli et al., Reference Ghelichli, Seyyedrezaei, Barani and Mazandarani2020). A key question in understanding motivation and engagement is their sequence: do they influence each other, and if so, which comes first? Some researchers argue that motivation precedes engagement. Reeve (Reference Reeve, Christenson, Reschly and Wylie2012) posits that students’ internal motivational resources enhance classroom engagement. Similarly, Reeve et al. (Reference Reeve, Jang, Carrell, Jeon and Barch2004) found that teachers trained in autonomy-supportive practices exhibited behaviors that increased students’ engagement. Anderman and Patrick (Reference Anderman, Patrick, Christenson, Reschly and Wylie2012) propose that motivation, through goals, precedes various types of engagement, such as cognitive (e.g., self-regulation), emotional (e.g., positive feelings about school), and BE (e.g., effort). Other researchers make distinctions between will (motivation) and skill (engagement) (Cleary & Zimmerman, Reference Cleary, Zimmerman, Christenson, Reschly and Wylie2012; Covington, Reference Covington2000). Schunk and Mullen (Reference Schunk, Mullen, Christenson, Reschly and Wylie2012) describe engagement as the manifestation of motivation, while Voelkl (Reference Voelkl, Christenson, Reschly and Wylie2012) sees engagement as serving as an intermediary between motivation and achievement. Pekrun and Linnenbrink-Garcia (Reference Pekrun, Linnenbrink-Garcia, Christenson, Reschly and Wylie2012) similarly suggest that engagement mediates the relationship between emotion and achievement, and Ainley (Reference Ainley, Christenson, Reschly and Wylie2012) argues that motivation (via interest) leads to achievement through engagement. Previous empirical studies further clarify this relationship. Froiland and Worrell (Reference Froiland and Worrell2016) demonstrated that intrinsic motivation positively influences academic performance indirectly through classroom engagement. Their findings were replicated among African American and Latino students. In a longitudinal study, Froiland and Davison (Reference Froiland and Davison2016) showed that intrinsic motivation in mathematics predicted engagement (e.g., enrollment in advanced mathematics courses), which subsequently led to higher achievement. These findings collectively suggest that motivation serves as the driving force behind engagement, though Reschly and Christenson (Reference Reschly, Christenson, Christenson, Reschly and Wylie2012, p. 14) caution that “motivation is necessary but not sufficient for engagement.” Hence, in this study, we acknowledge the intricate relationship between motivation and engagement, where some interpret their relationship as cyclical. However, we emphasize the predictive power of motivation towards engagement, particularly with the newly introduced GenAI, which can lead to higher engagement as an intermediary, but is not necessarily required for engagement to promote motivation. Motivation influences engagement in several ways. For example, L2LE, the strongest predictor in the L2MSS, encompasses the immediate learning environment. One study in high school by Shernoff et al. (Reference Shernoff, Tonks and Anderson2014), using the experience sampling method, demonstrated that the learning environment can predict engagement, although this study focused on five different subject areas. Similarly, research by Saeed and Zyngier (Reference Saeed and Zyngier2012) on grade five and six students and Liu et al. (Reference Liu, Ma and Chen2024) on sophomore university students both provided solid evidence of motivation influencing engagement. Therefore, fostering students’ motivation is crucial in promoting sustained engagement and successful language acquisition (Noels et al., Reference Noels, Lou, Lascano, Chaffee, Dincer, Zhang, Zhang, Lamb, Csizér, Henry and Ryan2019). As shown by Huang and Mizumoto (Reference Huang and Mizumoto2024b), the use of GenAI can help sustain and enhance students’ motivation. Consequently, even with the introduction of GenAI in the classroom, students’ engagement is likely to increase as well.
Motivation and GenAI utilization
Although the use of GenAI in EFL classrooms is still in its early stages, several studies have already shown a positive correlation between AI and student motivation. One study demonstrated that AI-mediated context positively influenced students’ motivation, though its effect on learning performance was inconclusive (Leong et al., Reference Leong, Pataranutaporn, Danry, Perteneder, Mao and Maes2024). Similarly, another study found that AI-mediated instruction not only improved English learning motivation but also enhanced self-regulated learning and learning outcomes (Yang, Reference Yang2024). Both studies focused on how AI affects students’ motivation, rather than the other way around. Zheng et al. (Reference Zheng, Wang, Liu and Jiang2024), using the Unified Theory of Acceptance and Use of Technology 2, found that motivation is a strong predictor of behavioral intention, which ultimately leads to actual technology usage. Lai et al. (Reference Lai, Cheung and Chan2023) also observed that intrinsic motivation influences behavioral intention in the Technology Acceptance Model after the classroom adoption of ChatGPT. Similarly, Huang and Mizumoto (Reference Huang and Mizumoto2024a) concluded that OL2 motivation significantly affects AI usage in language learning, although they emphasized the need for further investigation into other constructs within the L2MSS that may have a similar impact.
GenAI and engagement in language learning
ChatGPT has shown promise in educational settings, benefiting both students and teachers. A recent systematic review by Lo et al. (Reference Lo, Hew and Jong2024) on student engagement with ChatGPT found mixed results regarding AE, with positive aspects like satisfaction and interest, alongside negative ones like disappointment and anxiety. There is limited yet significant evidence of BE, including instances of both engagement and disengagement, with some use cases involving academic dishonesty. CE was noted, but it was relatively weak, with improvements in understanding but a decline in critical thinking. The review covered not only language learning but also other educational areas. An intervention study by Wang and Xue (Reference Wang and Xue2024) focusing on Chinese EFL students using AI-powered chatbots reported improvements in all three areas of engagement. The AI chatbots played a key role in enhancing these engagements. Another study by Rad et al. (Reference Rad, Alipour and Jafarpour2023) explored the use of an AI tool in L2 writing classes and found that students significantly improved their writing outcomes, engagement, and feedback literacy. Two recent studies (Huang & Teng, Reference Huang and Teng2025; Teng & Huang, Reference Teng and Huang2025) examined the effect of ChatGPT utilization on engagement in China and Japan, concluding that all three factors increased following its use. These findings suggest the need for further research to explore how AI tools like ChatGPT can effectively support student engagement in language learning.
Research question
Previous research indicates that motivation influences the use of technology, which in turn affects student engagement. Applying the principle of mediation, we can infer that motivation impacts the use of AI technology and indirectly affects student engagement through its usage. We also acknowledge the versatility of GenAI, which includes capabilities for generating images, videos, and music. However, this study only utilized the text-based functions of ChatGPT; therefore, the actual AI usage in this study is defined as the use of the ChatGPT chatbot. Figure 1 summarizes these previous findings and illustrates the hypothetical pathways for this study. This empirical study aimed to validate these findings by addressing the following research question:

Figure 1. Hypothetical pathways.
RQ: Does GenAI usage mediate the relationship between motivation and engagement?
Material and methods
Participants
The study involved 174 second-year students from a private university in the Kansai region of Japan, with 79 males and 95 females. All participants were enrolled in a required advanced reading and writing course, selected using convenience sampling. Despite their English proficiency not being formally evaluated, it was assumed to range from A2 to B2 on the Common European Framework of Reference (CEFR), based on previous research by Aizawa et al. (Reference Aizawa, Rose, Thompson and Curle2020). Additionally, all participants had completed at least eight years of compulsory English education in primary and secondary school, as well as one year of mandatory English instruction during their first year of university. While all students completed the course and required assignments, including pre- and post-writing, six students did not complete the survey. Consequently, only 168 students were included in the Structure Equation Modeling (SEM) analysis.
Instruments
In this study, a set of 37 questions, accessible on OSF (https://osf.io/3rs8t/), was adapted from the work of Taguchi et al. (Reference Taguchi, Magid, Papi, Dörnyei and Ushioda2009), Liu and Ma (Reference Liu and Ma2023), and Cheng and Zhang (Reference Cheng and Zhang2024a). Taguchi et al.’s (Reference Taguchi, Magid, Papi, Dörnyei and Ushioda2009) L2MSS questionnaire, originally developed for Japanese participants, is fully available in Japanese in Dörnyei (Reference Dörnyei2003). These questions evaluated three core aspects of the L2MSS: IL2 (5 items), OL2 (4 items), and L2LE (4 items). Liu and Ma’s (Reference Liu and Ma2023) survey was designed to measure two constructs of the Technology Acceptance Model (TAM): Behavioral Intention (3 items) and Actual Usage (6 items). For clarity, Actual Usage refers to the Actual AI Usage, which will be used hereafter. Cheng and Zhang’s (Reference Cheng and Zhang2024a) instrument was modified to assess three dimensions of engagement after using GenAI: affective (4 items), behavioral (6 items), and cognitive (6 items). The questionnaire was developed in Japanese to guarantee precise linguistic comprehension and ensure cultural relevance. A pilot test was conducted with a small group of participants to confirm the clarity and appropriateness of the questions. Based on their feedback, necessary revisions were made to refine the questionnaire. Additionally, back-translation methods were employed to verify the accuracy of the Japanese content against the original English version. These steps ensured that the questionnaire accurately captured the intended constructs without linguistic ambiguity. To ensure a diverse order and uphold the integrity of the survey, the university’s online learning management system randomized the sequence of the questions so that each participant received them in a different order. However, all participants completed the full set of 37 questions. If a survey was incomplete, a prompt was triggered, reminding participants to finish it. The survey was administered in Japanese, and responses were recorded using a 6-point Likert scale, where 1 represented “strongly disagree” and 6 represented “strongly agree.”
Writing instruction
Following the same collection method as Huang and Mizumoto (Reference Huang and Mizumoto2024a, Reference Huang and Mizumoto2024b, Reference Huang and Mizumoto2024c), students used GenAI, specifically ChatGPT, during two writing workshops throughout the semester, each lasting two weeks. These workshops were integrated into the regular academic semester and were taught by three different teachers, encompassing a total of 10 distinct classes. All participating teachers underwent standardized training on the integration of GenAI in writing workshops. This training included structured modules covering the functionality of GenAI tools, guidelines for workshop implementation, and best practices for facilitating discussions on AI-assisted writing. To ensure consistency, pre-training and post-training assessments were conducted, confirming the uniformity of instructional delivery and minimizing variations across teachers. In groups of five, students collaboratively crafted essays consisting of five paragraphs: an introduction, three body paragraphs, and a conclusion. Each student handled a specific paragraph, with essay topics aligned with class readings. Week one of each workshop focused on creating flow charts, outlines, and drafts. Unfinished sections became homework assignments. In week two, students brought their typed paragraphs to class for ChatGPT feedback, guided by carefully crafted prompts (see Huang, Reference Huang2023). These three distinct prompts were specifically designed to review different sections of an essay — introduction, body paragraphs, and conclusion — against detailed criteria. The criteria for the introduction include the hook, background information, thesis statement, organization, and clarity. For the body paragraphs, the criteria are topic sentence, development/support, organization/structure, transitions, analysis/critical thinking, and clarity/coherence. For the conclusion, the criteria encompass the restatement of the thesis, summary of key points, closing thought, connection to the introduction, and concluding sentence. The prompts provided feedback for improvement without rewriting the content. To avoid over-reliance on GenAI and ensure accuracy, students were required to submit the feedback they received from ChatGPT via university’s online learning management system for evaluation by the teachers. Additionally, students had to document any changes on paper to demonstrate their work and prevent copy-pasting. Final group essays and revised individual drafts were submitted at the end of week two.
Data analysis
In our study’s data analysis segment, we utilized R software (version 4.3.3) for a careful examination of the collected data. We commenced with basic descriptive statistics to capture the dataset’s central tendencies and variability comprehensively. Initially, we tested the assumption of multivariate normality using the Henze–Zirkler test, which was not satisfied. Consequently, we adopted the weighted least squares means and variance adjusted estimation for robust parameter estimates (Beauducel & Herzberg, Reference Beauducel and Herzberg2006). Further, we evaluated model fit indices to ensure our statistical models align with our theoretical frameworks, using SEM. This step was crucial to affirming the adequacy of our models in explaining the observed data. Next, we looked into the relationships between constructs to provide a nuanced analysis of the dataset’s structural relationships.
For reproducibility and transparency, we have made the data and R code accessible on OSF (https://osf.io/3rs8t/).
Results
Descriptive statistics
To illustrate data trends, we examined the distribution of all eight constructs. As shown in Table 1, the mean scores for the constructs ranged from 3.50 to 4.48, with standard deviations between 0.85 and 1.37. Skewness values ranged from −0.55 to −0.05, and kurtosis values ranged from −0.96 to 0.53. Based on Kline’s (2016) recommended cut-off values of ±3.0 for skewness and ±8.0 for kurtosis, the constructs displayed characteristics of a normal distribution. Additionally, Cronbach’s alpha values ranged from 0.77 to 0.91, indicating a high level of internal consistency and reliability of the measurements. Although the Cronbach’s alpha for Behavioral Intention and BE were the lowest at 0.77, it still demonstrated internal consistency, as argued by Dörnyei and Taguchi (Reference Dörnyei and Taguchi2009, p. 95).
Table 1. Descriptive statistics

Note: The abbreviations used in the data table are as follows: IL2 (Ideal L2 Self), OL2 (Ought-to L2 Self), L2LE (L2 Learning Experience), BI (Behavioral Intention), AU (Actual AI Usage), AE (Affective Engagement), BE (Behavioral Engagement), and CE (Cognitive Engagement)
Correlation coefficients
Figure 2 showcases the Pearson correlation coefficients (r) among the variables. The observed correlation coefficients display considerable strength, indicating a strong level of interrelatedness within the dataset. This finding suggests the data is highly suitable for multivariate analyses, such as SEM.

Figure 2. Histograms and correlation coefficients.
Goodness-of-fit indices
The goodness-of-fit indices shown in Table 2 demonstrate a robust model fit. Given the data did not meet the multivariate normality assumption, we used a robust estimation method. Consequently, we report robust fit indices, detailed in the supplementary material on OSF (https://osf.io/3rs8t/). The RMSEA and SRMR values of 0.048 and 0.069, respectively, are well below the recommended thresholds of RMSEA < 0.07 and SRMR < 0.08, indicating strong agreement between the model’s predictions and the observed data (Hooper et al., Reference Hooper, Coughlan and Mullen2008). Similarly, the CFI and TLI values of 0.981 and 0.979, respectively, approaching 1, suggest a good fit (Hooper et al., Reference Hooper, Coughlan and Mullen2008). Overall, these goodness-of-fit indices support a solid model fit, aligning with our research objectives and providing a reliable basis for our conclusions.
Table 2. Goodness-of-fit indices

Note: RMSEA = root mean squared error of approximation; SRMR = standardized root mean square residual; CFI = comparative normed fit index; TLI = Tucker–Lewis index.
SEM
The SEM diagram (Figure 3) successfully verified the influence of motivation on GenAI usage in the classroom, with usage further impacting students’ engagement. The L2LE has a moderate effect on Actual AI Usage (β = .33). Additionally, L2LE indirectly affects Actual AI Usage through OL2 and Behavioral Intention. The diagram also shows that Actual AI Usage strongly influences all three types of engagement — affective, behavioral, and cognitive — with respective correlations (β = .80, β = .83, and β = .80). To improve the model fit, error variances were adjusted due to item similarities. Items 3 and 4 both address the speaker’s ability to speak English, while Items 8 and 9 reflect the speaker’s motivation to study English driven by parental expectations. Items 12 and 13 convey a positive attitude towards learning English, and Items 20 and 21 emphasize the use of ChatGPT to improve English language skills. Furthermore, the three engagement factors showed covariance (.63, .99, and .65), underscoring the interrelatedness of these aspects of engagement.

Figure 3. Path diagram of structure equation modeling results.
Discussion
Based on the SEM diagram, it can be said that GenAI mediates the relationship between motivation and engagement both directly and indirectly. The motivational construct L2LE directly influences Actual AI Usage (β = .33), aligning with previous research indicating a positive correlation between GenAI usage and motivation (e.g., Leong et al., Reference Leong, Pataranutaporn, Danry, Perteneder, Mao and Maes2024; Yang, Reference Yang2024). Indirectly, motivation affects GenAI through the OL2 construct and Behavioral Intention. The pathway from L2LE to OL2 is strong (β = .81), consistent with Yashima et al.’s (Reference Yashima, Nishida and Mizumoto2017) findings. OL2 then influences the Behavioral Intention construct (β = .59), further supported by previous research (e.g., Lai et al., Reference Lai, Cheung and Chan2023; Zheng et al., Reference Zheng, Wang, Liu and Jiang2024). Although this finding deviates from Huang and Mizumoto’s (Reference Huang and Mizumoto2024a) previous results, which noted OL2’s influence on Actual AI Usage, this empirical study overcomes the limitation of a small sample size. Furthermore, Behavioral Intention influences Actual AI Usage (β = .59), following the same pathway as the original technology acceptance model, and verified in Huang and Mizumoto’s (Reference Huang and Mizumoto2024a) study. Therefore, without a doubt, motivation influences the Actual AI Usage. The Actual AI Usage influences all three factors of engagement. Based on the SEM diagram, the Actual AI Usage demonstrated strong influence on AE (β = .80), BE (β = .83), and CE (β = .80). These findings align with previous research (e.g., Lo et al., Reference Lo, Hew and Jong2024; Rad et al., Reference Rad, Alipour and Jafarpour2023; Wang & Xue, Reference Wang and Xue2024), where AI utilization in classrooms positively affects all three factors of engagement. Furthermore, the SEM pathways illustrate that all three factors are interrelated, with AE showing a covariance of .63 with BE and .65 with CE. Additionally, BE demonstrated a covariance of .99 with CE. These findings are also consistent with previous research (e.g., Cheng & Zhang, Reference Cheng and Zhang2024b; Ebadi et al., Reference Ebadi, Zandi and Ajabshir2024; Pearson, Reference Pearson2024), which suggests that the three constructs are inseparable; an increase in one result in an increase in the others. It is also worth mentioning that even after the introduction of GenAI usage in the classroom, the intercorrelation among the three factors of engagement remains intact. Thus, it is generally safe to say that the Actual AI Usage in the classroom mediates students’ motivation and engagement, based on the empirical evidence of this study. While L2LE is considered the most crucial motivational factor of the L2MSS (e.g., Lamb, Reference Lamb2012; Takahashi & Im, Reference Takahashi and Im2020; Teimouri, Reference Teimouri2016) and includes various elements like classroom interactions, environment, and learning methodology, the introduction of GenAI fits perfectly within this construct. Not only students but also teachers interact with GenAI in the classroom environment, necessitating a new teaching pedagogy that incorporates this technology. Despite concerns from the research community about overreliance and plagiarism, these issues can be mitigated if educators are well trained and informed. This empirical study demonstrates overcoming these obstacles through carefully crafted prompts. Falling back to traditional paper and pencil methods to avoid GenAI would be akin to an ostrich burying its head in the sand. Rather than viewing GenAI as a can of worms, educators should leverage it to enrich students’ L2LE. Instead of seeing GenAI as an adversary, it should be embraced as a companion (Teng, Reference Teng2024). Writing in SLA poses significant challenges for EFL learners, particularly when it comes to receiving feedback from teachers, which can also place an extra burden on educators. Feedback is essential for students to improve their writing skills, yet the traditional approach can be taxing for both parties. This study demonstrated how GenAI can assist teachers by reducing their workload, while ensuring that every student still receives effective feedback. Students might feel more comfortable receiving feedback at their own pace, which can reduce anxiety. The results showed an improvement in writing coherence. This finding highlights the effectiveness of GenAI in enhancing writing coherence, creating a win-win situation for both students and teachers. By implementing GenAI more widely, students can receive valuable feedback without placing additional strain on teachers. Furthermore, teachers can craft prompts that focus on other aspects of writing, such as accuracy, complexity or fluency, enhancing the learning experience. In essence, integrating GenAI into the feedback process can significantly benefit EFL learners and educators alike. Despite these positive outcomes, this study has limitations. First, convenience sampling was used, limiting the generalizability of the findings to other populations. The results may not apply to students of different ages, locations, or institutions. Second, this study measured students’ writing performance solely in terms of cohesion; other important aspects, such as accuracy, complexity, and fluency, were not considered. Third, the study was conducted over a single semester, a relatively short period. It is possible that the observed effects of GenAI on motivation, engagement, and writing performance might not persist over a longer timeframe. Finally, the study focused specifically on ChatGPT, a single GenAI tool, so the findings may not extend to other GenAI tools, such as Google’s Gemini or Microsoft’s Copilot. To address these limitations, future studies should aim for more diverse and representative sampling across different age groups, geographic locations, and educational institutions to enhance generalizability. Expanding the scope of writing performance measures to include aspects like accuracy, complexity, and fluency would provide a fuller picture of students’ development. Conducting longitudinal research over multiple semesters could reveal whether the effects of GenAI on motivation, engagement, and performance are sustained over time. Additionally, including multiple GenAI tools in the study, such as Google’s Gemini and Microsoft’s Copilot, would allow for comparative analysis, offering insights into the broader applicability of the findings across various platforms.
Conclusion
This empirical study confirmed that GenAI, like ChatGPT, can positively influence motivation and engagement in EFL learning when thoughtfully integrated into language classrooms. Findings indicate that AI usage supports emotional, behavioral, and cognitive engagement, reinforcing the connection between motivation and active participation. The results highlight the potential of AI to create a more interactive and stimulating learning environment, encouraging students to become more involved in their studies. Future studies could expand on this research by examining diverse AI tools and additional writing metrics over longer periods, providing deeper insight into AI’s educational role. Pedagogical implications of integrating GenAI include a need for teacher training, especially in designing effective prompts to maximize AI’s benefits (Kohnke et al., Reference Kohnke, Moorhouse and Zou2023). By doing so, educators can harness GenAI’s advantages, enhancing learning experiences for EFL students and creating more engaging classrooms.