Highlights
-
• We examined the impact of simultaneous interpreting (SI) on working memory (WM).
-
• Unlike a single-language task, SI prevented practice-driven WM boosts.
-
• This effect emerged for multimodal input, adjusting for bilingual experience factors.
-
• SI did not have a distinct effect on inhibitory or fluency outcomes.
-
• Such findings inform cognitive models of bilingualism and SI training guidelines.
1. Introduction
A distinct trait of bilingualism is the possibility to engage in simultaneous interpreting (SI), the immediate rendition of oral messages from one language into another (García et al., Reference García, Muñoz and Kogan2019). This activity is proposed to greatly tax working memory (WM), but causal designs are wanting, especially including other tasks and cognitive functions (Timarová, Reference Timarová2008). A fertile ground is thus laid to examine the real-time impact of specific bilingual skills on domain-general systems. Here, we tackle this gap via a cognitive exertion paradigm, comparing WM (and other executive) outcomes before and after SI, relative to a control language processing task.
SI ranks among the most demanding cognitive activities enabled by bilingualism (García, Reference García2014; García et al., Reference García, Muñoz and Kogan2019; Hervais-Adelman et al., Reference Hervais-Adelman, Moser-Mercer, Michel and Golestani2015). It requires perceiving, accessing and recalling source-language information to reformulate it in a different language as new input segments are processed (Ahrens, Reference Ahrens, Schwieter and Ferreira2017; Chernov, Reference Chernov2004; Gerver, Reference Gerver and Brislin1976; Paradis, Reference Paradis1994, Reference Paradis2009). Such operations occur iteratively, with input typically delivered at >120 words per minute (Chernov, Reference Chernov2004; Gerver, Reference Gerver1975), ear-voice spans lasting 2–4 seconds (Anderson, Reference Anderson, Lambert and Moser-Mercer1994; Gerver, Reference Gerver and Brislin1976) and input–output overlaps accounting for 70% of total processing time (Chernov, Reference Chernov, Lambert and Moser-Mercer1994). These challenges are such that, to prevent task disengagement, reduced output quality and early burnout, professionals are recommended not to perform SI for more than 30 minutes at a time (AAIC, 2025; García, Reference García2019; Moser-Mercer et al., Reference Moser-Mercer, Künzli and Korac1998).
Unsurprisingly, the activity entails diverse cognitive costs. SI increases stress responses such as cortisol (Moser-Mercer et al., Reference Moser-Mercer, Künzli and Korac1998) and heart rate (Rojo López et al., Reference Rojo López, Foulquié, López and Martínez Sánchez2021), especially under heightened time pressure (Korpal, Reference Korpal2016). Also, relative to single-language tasks, like text listening and shadowing, SI involves greater self-reported mental workload (Boos, Reference Boos, Kobi, Elmer and Jäncke2022), objective cognitive exertion markers, including increased pupil dilation (Hyönä et al., Reference Hyönä, Tommola and Alaja1995; Seeber & Kerzel, Reference Seeber and Kerzel2011), and activation along cortico-subcortical executive brain regions (Hervais-Adelman et al., Reference Hervais-Adelman, Moser-Mercer and Golestani2011, Reference Hervais-Adelman, Moser-Mercer, Michel and Golestani2015). As advanced by several models (Gile, Reference Gile1999; Mizuno, Reference Mizuno2005; Seeber, Reference Seeber2011), SI would place distinct demands on WM, a system supporting the transient storage and combination of information as other processes unfold (Hervais-Adelman & Babcock, Reference Hervais-Adelman and Babcock2020).
Indeed, SI entails increased pupil dilation when processing WM-taxing stimuli – for example, asymmetrical structures such as verb-final constructions, as found in Japanese or German (Seeber & Kerzel, Reference Seeber and Kerzel2011). Also, relative to single-language tasks, SI involves higher theta power (a neural marker of WM demands) (Boos, Reference Boos, Kobi, Elmer and Jäncke2022) and prefrontal activation peaks during input–output overlaps (i.e., instances that maximally tax WM) (Hervais-Adelman et al., Reference Hervais-Adelman, Moser-Mercer, Michel and Golestani2015). Moreover, SI quality correlates with WM outcomes (Bae & Jeong, Reference Bae and Jeong2021; Injoque-Ricle et al., Reference Injoque-Ricle, Barreyro, Formoso and Jaichenco2015; Mellinger & Hanson, Reference Mellinger and Hanson2019), and sustained practice is consistently linked to advantages in this domain (as SI continuously requires recalling and integrating input while concurrent processes are deployed) (Chernov, Reference Chernov2004). Such demands seem to peak in the L1–L2 direction, which, relative to the L2–L1 direction, involves greater modulation of WM-relevant brain mechanisms, such as inferior frontal activation (García, Reference García2019; Rinne et al., Reference Rinne, Tommola, Laine, Krause, Schmidt, Kaasinen and Sunnari2000; Tommola et al., Reference Tommola, Laine, Sunnari and Rinne2000), temporo-frontal connectivity (Zheng, et al., Reference Zheng, Báez, Su, Xiang, Weis, Ibáñez and García2020) and frontal theta oscillations (Pérez et al., Reference Pérez, Hesse, Dottori, Birba, Amoruso, Martorell Caro, Ibáñez and García2022).
Of note, WM taxation during SI may depend on cognitive demands. Unlike untrained multilinguals, interpreters show training effects in highly taxing (multimodal n-back) WM tasks but not in simpler (single-domain) tasks (Morales et al., Reference Morales, Padilla, Gómez-Ariza and Bajo2015). Suggestively, no systematic expertise-related advantages are observed in less critically taxed domains, like inhibition (as both languages need to be simultaneously active rather than alternatively suppressed) and verbal fluency (since SI does not require producing arbitrary sequences of decontextualized words) (García et al., Reference García, Muñoz and Kogan2019).
SI, then, would distinctly tax WM. However, evidence for this claim remains limited. Relevant studies employ cognitively ambiguous markers (e.g., pupil dilation) (Seeber & Kerzel, Reference Seeber and Kerzel2011), WM tests done in the absence of actual SI (Boos, Reference Boos, Kobi, Elmer and Jäncke2022; García et al., Reference García, Muñoz and Kogan2019) or task-related neural measures failing to isolate or manipulate WM (Boos, Reference Boos, Kobi, Elmer and Jäncke2022; Hervais-Adelman et al., Reference Hervais-Adelman, Moser-Mercer, Michel and Golestani2015). Moreover, causal evidence is scant and limited to comparisons of outcomes before and after months-long training programs. SI practice seems to boost WM performance (Ünlü & Şimşek, Reference Ünlü and Şimşek2018) and to lower the engagement of WM-relevant brain regions during the task (Hervais-Adelman et al., Reference Hervais-Adelman, Moser-Mercer, Michel and Golestani2015), suggesting that this domain is systematically recruited by practicing interpreters. Conversely, other cognitive functions, such as inhibitory skills (Van de Putte et al., Reference Van de Putte, De Baene, García-Pentón, Woumans, Dijkgraaf and Duyck2018), do not exhibit changes before and after SI training, indicating that they are less systematically recruited by this activity. Overall, the real-time impact of SI on WM remains poorly understood, precluding insights into how particular domain-general skills are recruited during specific bilingual processes.
Such gaps can be tackled via cognitive exertion paradigms. In these paradigms, specific domains are assessed before and after a given task, on the assumption that those more critically engaged will be drained and hence impaired upon post-task testing. This approach has revealed critical links between particular tasks and circumscribed cognitive functions, by revealing exertion of inhibitory capacity during emotional suppression (Wessel et al., Reference Wessel, Huntjens and Verwoerd2010), learning capacity during extended periods of instruction (Chen et al., Reference Chen, Castro-Alonso, Paas and Sweller2018), self-regulatory resources during high attentional exertion (Converse & DeShon, Reference Converse and DeShon2009) and positive emotion mechanisms during overly enthusiastic behavior (Maranges et al., Reference Maranges, Brandon, Schmeichel and Baumeister2017). More directly, a study on phonological interference effects during SI found that number recall dropped when digit lists were simultaneously translated (as opposed to simply heard or repeated in the same language) before being uttered (Darò & Fabbro, Reference Darò and Fabbro1994). Similar designs could reveal the causal impact of SI on WM, as the need to transiently store, recall and integrate input information during output production would exhaust relevant mnesic mechanisms and render them suboptimally available for further WM tasks.
Against this background, we conducted a cognitive exertion study on SI. We asked SI trainees to perform a validated WM task (involving unimodal and multimodal stimuli) before and after (a) a 10-minute-long L1–L2 SI session or (b) a control text comprehension (TC) task. To test for specificity, participants also completed tasks tapping other SI-relevant domains (viz., inhibitory control and context-free vocabulary navigation, via Stroop and verbal fluency tasks). We hypothesized that L1–L2 SI would distinctly reduce WM scores, as opposed to less critical mechanisms, such as inhibitory and fluency resources. We further predicted that such an effect would be specific to SI, compared to TC. Finally, we anticipated that the effect would be uninfluenced by inter-individual variability in bilingualism-related variables (age of L2 appropriation,Footnote 1 years of use of L2, L2 proficiency, SI competence). Strategically, the use of a high-demand (multimodal integration–recall) and a low-demand (unimodal recall-only) condition allowed testing whether predicted effects were effort-dependent. With this approach, we aim to inform models linking bilingual abilities to specific executive domains.
2. Method
2.1. Participants
We recruited 55 right-handed Chinese-English bilinguals enrolled in the Master’s in Translation and Interpreting at the University of Electronic Science and Technology of China (UESTC). They all had normal vision and hearing and reported no neurological impairment. Participation was compensated with course credits. Participants were randomly assigned to the L1–L2 SI or the TC task. Upon removal of five participants due to poor task compliance or withdrawal, the final sample of 50 (Figure 1A) involved 26 SI participants and 24 TC participants – reaching a power of 1 (Supplementary Material 1). Demographic and language profile data for group matching were collected via the Translation and Interpreting Competence Questionnaire (TICQ), a validated (Schaeffer et al., Reference Schaeffer, Huepe, Hansen-Schirra, Hofmann, Muñoz, Kogan, Herrera, Ibáñez and García2020) and previously reported (Chou et al., Reference Chou, Hu, Muñoz and García2021; Jacob et al., Reference Jacob, Schaeffer, Oster and Hansen-Schirra2024) online self-rating instrument gathering quantitative and qualitative data on demographics, language history and translation/interpreting competence. To favor between-group matching, SI and TC participants were pooled from the same master’s courses. TICQ data showed that both groups were matched for sex, age and education, as well as age of L2 appropriation, years of L2 use, L2 proficiency, SI competence and weekly dedication to SI (Table 1). Importantly, all participants showed upper-intermediate levels of L2 proficiency on the TICQ (on a scale of 0–100), consistent with the incidental observation that they each possessed at least one language proficiency certification (Test for English Majors, level 4 or higher; the China Accreditation Test for Translators and Interpreters, level 3; or the International English Language Testing System). All participants signed written informed consent. The study protocol was approved by the institutional ethics committee and adhered to the Declaration of Helsinki.

Figure 1. Study design. (A) Fifty Chinese-English bilinguals were randomly assigned to the SI group (n = 26) and to the TC group (n = 24), both matched for demographic and linguistic variables. (B) The experiment involved a pre-exertion phase (including the WM assessment and control tasks), an exertion phase (involving an SI task for the SI group and a TC task for the TC group) and a post-exertion phase (featuring the same initial tasks in reverse order). SI = simultaneous interpreting, TC = text comprehension.
Table 1. Participants’ demographic and linguistic profiles

Note: Data are shown as mean (SD). Statistical comparisons were performed via chi-squared test for sex and via ANOVA for every other variable. SI = simultaneous interpreting, TC = text comprehension.
2.2. Study design
The study consisted of three phases (Figure 1B). First, in the pre-exertion phase, participants completed three control tasks (Stroop, semantic fluency, phonemic fluency) and then performed the WM task. Second, in the exertion phase, the SI group performed an L1–L2 SI task, while the TC group completed a TC task. Finally, in the post-exertion phase, participants completed the same tasks of the pre-exertion phase, in reverse order. Across phases, participants sat at a desk in a dimly lit room, facing a desktop computer, with no external distractions. All instructions were delivered orally in L1 (Chinese). Overall, the experiment lasted roughly 2 hours.
2.2.1. Pre-exertion phase
Working memory task. WM was assessed via a widely validated task (Baddeley et al., Reference Baddeley, Allen and Hitch2011; Birba et al., Reference Birba, Hesse, Sedeño, Mikulan, García, Ávalos, Adolfi, Legaz, Bekinschtein and Zimerman2017; Brockmole et al., Reference Brockmole, Parra, Sala and Logie2008) displaying stimulus arrays on a computer screen (Figure 2). Validated in diverse populations (Brockmole et al., Reference Brockmole, Parra, Sala and Logie2008; Parra et al., Reference Parra, Abrahams, Logie, Méndez, Lopera and Della Sala2009, Reference Parra, Della Sala, Logie and Morcom2014), this task involves high WM exertion under multimodal integration conditions. Importantly, its non-verbal nature circumvents language-proficiency confounds and priming effects triggered by the SI text. The task included two conditions, tapping on (a) recall and (b) integration–recall skills. In the ‘recall-only’ condition, participants viewed initial arrays of black shapes and, after a delay, indicated whether a test array featured the same shapes. In the ‘integration–recall condition’, initial shapes were presented in different colors and, after a delay, participants indicated whether the test array presented the same shape–color combinations. A total of 8 six-sided random polygons and eight colors were utilized for the tasks. Stimuli were presented with a standard 17-inch PC monitor with a resolution of 1920*1080 pixels, each occupying 1° of visual angle and positioned within a 10° area. Participants were seated at a viewing distance of 70 cm from the screen.

Figure 2. Trial structure in the WM task. The task involves an initial array of items, followed by a retention period and then a test array. Stimuli in the test array may or may not match those of the initial array. The top row illustrates the ‘recall-only’ condition, tapping on unimodal WM retrieval by manipulating black shapes. The bottom row exemplifies the ‘integration–recall’ condition, assessing multimodal WM retrieval by manipulating both the shapes and their color.
Each trial commenced with a fixation cross displayed for 500 ms, followed by a random interval lasting between 400 and 600 ms. Initial three-item arrays were then presented for 500 ms, during which participants had to memorize the items. This was followed by a 900-ms retention interval, with no stimuli. A three-item test array appeared next for 500 ms, and participants were required to determine whether these stimuli matched those in the initial display by pressing ‘1’ for ‘match’ or ‘2’ for ‘non-match’ (with key assignments counterbalanced across trials). The next trial began only after participants made a response. Half the trials were ‘match trials’ (with initial and test trials being identical), while the other half were ‘non-match trials’ (with different initial and test trials).
In the ‘recall-only’ condition, ‘non-match trials’ were created by replacing two black shapes from the initial array with new black test shapes, requiring participants to rely solely on shape encoding to detect changes. In the ‘integration–recall’ condition, ‘non-match trials’ were created by swapping the colors of two initial items, requiring participants to integrate shape and color information to detect the change. No shape or color was repeated within a single array. Conditions were counterbalanced and trial order was randomized across participants. The task lasts roughly 20 minutes.
Inhibitory control task. Inhibitory control was evaluated via a color–word Stroop task, amply validated in diverse populations, bilinguals with SI experience (Aparicio et al., Reference Aparicio, Heidlmayr and Isel2017; MacPherson, Reference MacPherson2019; Peng et al., Reference Peng, Gao and Mao2017). Facing a computer screen, participants were shown color words (‘red’, ‘yellow’, ‘blue’, ‘green’) typed in different ink colors, and they had to indicate the name of the ink as quickly as possible by pressing a given key (‘f’ for red, ‘j’ for yellow, ‘v’ for blue, ‘n’ for green). The task included 16 combinations of words and ink colors, used to create 40 congruent trials (color words written in their corresponding color) and 60 incongruent trials (color words written in a different color), with each congruent combination shown ten times and each incongruent combination shown five times.
Each trial started with a fixation cross displayed for 500 ms, followed by a stimulus word presented for 1,500 ms, during which participants had to respond. A random interval between 500 and 1,000 ms preceded the following trial. All words were presented in the participants’ L1. Trials were randomly distributed across congruent and incongruent categories. In the pre-exertion phase, the task began with a practice block, featuring four congruent and 12 incongruent trials. Stimulus administration and response recording were handled via E-Prime 3.0 (Psychology Software Tools, Pittsburgh, PA). Altogether, the task lasted roughly 4 minutes.
Context-free vocabulary navigation tasks. Context-free vocabulary navigation skills were evaluated via one-minute verbal fluency tasks, previously reported in SI research (Dottori et al., Reference Dottori, Hesse, Santilli, Vilas, Caro, Fraiman, Sedeño, Ibáñez and García2020; Santilli et al., Reference Santilli, Vilas, Mikulan, Caro, Muñoz, Sedeño, Ibáñez and García2019). The semantic fluency task had participants name as many exemplars of a category as possible, for 60 seconds, first in their L1 and then in their L2. To mitigate practice and priming effects, we counterbalanced the categories ‘animals’ and ‘fruits’ between the pre- and post-exertion phases. Within each phase, participants were presented with the same category for both Chinese and English. As to the phonemic fluency task, participants were asked to generate as many words as possible beginning with specific phonemes, orally in English and in written form in Chinese. The order of languages was counterbalanced between the pre- and post-exertion phases, as were the phonemes used for English (/f/, /s/) and for Chinese (/k/, /t/). Following previous studies, we adopted the written modality for Chinese to circumvent issues related to tonal variations of specific characters (e.g., /ma/ means ‘mother’ with tone 1, ‘bother’ with tone 2, ‘horse’ with tone 3, ‘curse’ with tone 4) (Eng et al., Reference Eng, Vonk, Salzberger and Yoo2019). All responses were audio-recorded in .mp3 format, transcribed by one examiner and subsequently verified by a second examiner to ensure accuracy (a third examiner was summoned to settle the very few instances yielding disagreement). Following validated procedures (Nichols et al., Reference Nichols, Szoeke, Vollset, Abbasi, Abd-Allah, Abdela, Aichour, Akinyemi, Alahdab, Asgedom, Awasthi, Barker-Collo, Baune, Béjot, Belachew, Bennett, Biadgo, Bijani, Bin Sayeed and Murray2019; Selnes, Reference Selnes1991), responses were excluded as invalid if they consisted of either proper names, or repetitions, derivations or affixations of previous responses. In each language, semantic fluency is scored by counting the number of correct, unique words produced within the semantic category in the given time frame. Phonemic fluency scores are calculated by counting the number of correct, non-repeated words in the given time frame.
2.2.2. Exertion phase
The exertion phase involved a specific naturalistic language processing task for each group: SI for the SI group (as our target condition) and TC for the TC group (as our control condition). The stimulus for both tasks consisted of the first 10 minutes of former Singapore Prime Minister Lee Hsien Loong’s speech during the 2022 National Day Rally (https://www.pmo.gov.sg/Newsroom/National-Day-Rally-2022-Chinese). The material comprised 1,977 Chinese characters, delivered at a rate of 3.3 characters per second, and it was presented binaurally via a stereo headset after participants adjusted the volume to their preferred level. As in previous works (Bae & Jeong, Reference Bae and Jeong2021; Boos et al., Reference Boos, Kobi, Elmer and Jäncke2022; Tzou et al., Reference Tzou, Eslami, Chen and Vaid2012), no prior information was provided on the topic – a scenario that mirrored participants’ training conditions and is often encountered in professional settings (Chang et al., Reference Chang, Wu and Kuo2018). Note-taking with pen and paper was allowed during the task.
The SI group was instructed to listen to the speech and simultaneously interpret it into English, preserving as much information as possible while producing idiomatic L2 renditions. The use of the L1–L2 direction was strategic to enhance ecological validity, as (i) it is predominant in professional SI across China and (ii) participants were more acquainted with it than with L2–L1 SI in their training program. Output was recorded through the headset’s built-in microphone and saved as .mp3 files on Xima 3100 Digital Integrated Teaching System V2.0. SI quality was evaluated according to standard criteria, focusing on content, form and delivery (Hamidi & Pöchhacker, Reference Hamidi and Pöchhacker2007; Macías, Reference Macías2006; Pöchhacker, Reference Pöchhacker2001). The TC group was asked to listen attentively so as to respond to five open-ended questions presented in a printed survey at the end of the recording. Questions were formulated in L2 (English) and answered in writing. The use of L2 for this task increased task demands (creating more stringent conditions to test the selectivity of predicted SI-related exertion effects), while favoring comparability with the SI task by mirroring its dual-language nature.
2.2.3. Post-exertion phase
The post-exertion phase was identical to the pre-exertion phase, except that tasks were presented in reverse order, with the WM task first, followed by the Stroop and fluency tasks. Through this sequencing, the WM task was always performed immediately before and immediately after the corresponding exertion task, preventing interference from other tests.
2.3. Statistical analysis
Across tasks, accuracy and response time analyses were performed upon removing outlier values at ±2.5 standard deviations from the corresponding group’s mean. First, WM task accuracy was analyzed using linear mixed-effects models separately for each ‘condition’ (recall-only, integration–recall), with ‘phase’ (pre-exertion, post-depletion) and ‘group’ (SI, TC) as fixed effects, with a random intercept for subjects. To examine whether group differences were driven by variability in WM performance, we first conducted Levene’s tests comparing variances across groups (SI versus TC) and phases (pre-exertion versus post-exertion) for each WM condition (integration–recall and recall-only). In addition, we computed individual change scores (subtraction between the post- and pre-exertion phase outcomes) and visualized their distributions. Group effects on change scores were further assessed using quantile regression at τ = .25, .50 and .75, corresponding to the lower, median and upper parts of the distribution. This approach allowed us to test whether SI selectively influenced participants showing small, typical or large practice-related gains.
Second, for the Stroop task, linear mixed-effects models were separately run on the congruent and incongruent conditions, targeting accuracy and response times (the latter, considering correct responses only). The models included ‘phase’ (pre-exertion, post-exertion) and ‘group’ (SI, TC) as fixed effects and ‘subject’ as a random effect. Additionally, we compared congruent and incongruent trials, for accuracy and response time, using a mixed-effects model with ‘condition’ (congruent, incongruent), ‘phase’ (pre-exertion, post-exertion) and ‘group’ (SI, TC) as fixed effects and ‘subject’ as a random intercept.
Third, fluency tasks’ outcomes were analyzed via linear mixed-effects models, with ‘phase’ (pre-exertion versus post-exertion), ‘language’ (L1 versus L2) and ‘group’ (SI versus TC) as fixed effects, along with their interactions, and a random intercept for ‘subjects’ to account for repeated measures. Across tasks, ANOVA was used to assess the significance of main effects and interactions. To test for robustness despite inter-individual variability in bilingualism-related factors, all analyses were re-run while covarying for age of L2 appropriation, years of L2 use and L2 proficiency (including relevant macroskills), as well as SI competence in the L1–L2 and L2–L1 directions. Post hoc analyses for significant interactions were conducted via Tukey’s HSD tests. Alpha values were set at 0.05. Outlier-adjusted models were also fitted to confirm the robustness of the results. All analyses were performed on R 4.1.1 (Team, 2013).
3. Results
3.1. Working memory
Analysis of accuracy in the integration–recall condition revealed no significant main effects of ‘group’ (F (1,46) = 1.89, p = .17) or ‘phase’ (F (1,46) = .49, p = .48), but a significant interaction between ‘phase’ and ‘group’ (F (1,46) = 4.8387, p = .02) (Figure 3A). Post hoc comparisons (MSE = 0.002, df = 1, 46) showed that, relative to the TC group, the SI group performed similarly in the pre-exertion phase (p = 0.24) and worse in the post-exertion phase (p = .02). This interaction remained significant upon controlling for age of L2 appropriation (F (1,44.89) = 4.74, p = .034), years of L2 use (F (1,44.50) = 4.69, p = .036), global L2 proficiency (F (1,44.84) = 4.75, p = .034), L2 listening proficiency (F (1,44.87) = 4.76, p = .034), L2 speaking proficiency (F (1,44.79) = 4.75, p = .034), L2 reading proficiency (F (1,44.89) = 4.74, p = .035), L1–L2 SI competence (F (1,44.86) = 4.75, p = .035) and L2–L1 SI competence (F (1,44.86) = 4.75, p = .035). The same was true for the poorer performance of the SI group compared with the TC group in the post-exertion phase (p = .02), upon adjusting for age of L2 appropriation, years of L2 use, L1–L2 SI competence and L2–L1 SI competence. Also, integration–recall accuracy improved from the pre- to the post-exertion phase for the TC group (M pre = 0.75 [0.10], M post = 0.80 [0.11], p = .01), but not for the SI group (M pre = 0.70 [0.13], M post = 0.70 [0.14], p = .71). In the integrated model, with ‘condition’ (congruent, incongruent) as a fixed effect, we found a significant main effect of ‘phase’ (F (1,84.68) = 8.13, p = .005), indicating that participants were more accurate in the second session. There were no other significant effects or interactions, including the three-way interaction (‘condition’ × ‘phase’ × ‘group’: p = .98). Response times yielded a significant main effect of ‘condition’ (F (1,81.22) = 6.46, p = .013), with slower responses in incongruent compared to congruent trials, and a significant main effect of ‘phase’ (F (1,82.40) = 79.15, p < .001), reflecting overall faster responses in the second session. No other effects or interactions were significant (all p-values > .42).

Figure 3. Key results. (A) WM task. Accuracy analyses on the integration–recall condition revealed similar performance between groups in the pre-exertion phase and worse performance for the SI than for the TC group in the post-exertion phase. Also, performance improved between sessions for the TC group but not for the SI group. No significant accuracy effects emerged in the recall-only condition. (B) Inhibitory control task. No significant differences were found between groups or phases for either accuracy or response time. (C) Context-free vocabulary navigation tasks. Semantic (top inset) and phonemic (bottom inset) fluency results revealed significant main effects of phase, with significant increases from pre- to post-exertion. Red and blue dots represent SI and TC group participants, respectively. SI = simultaneous interpreting, TC = text comprehension.
Levene’s tests indicated no significant differences in variance between groups or phases for either WM condition (all p-values > .09). Quantile regression analyses of change scores revealed no group differences at lower or median quantiles (τ = .25–.50, all 95% CIs including 0). However, a significant negative effect of SI emerged at the upper quartile (τ = .75: β = −0.05, 95% CI [−0.087, −0.025]), indicating that the largest practice-related gains observed in controls were attenuated in SI. Complete results and graphical displays are reported in Supplementary Material 2.
3.2. Inhibitory control
Accuracy analyses on the congruent condition revealed no significant main effects or interactions (all p-values > .11). In the incongruent condition, there was a non-significant effect of ‘phase’ (F (1,28) = 3.29, p = .08), alongside a non-significant effect of ‘group’ and a non-significant ‘phase-by-group’ interaction (both p-values > .83) (Figure 3B, top inset). Response time analyses on the congruent condition showed a non-significant effect of ‘phase’ (F (1, 28) = 3.47, p = .08), together with a non-significant effect of ‘group’ and a non-significant ‘phase-by-group’ interaction (both p-values > .66). The incongruent condition revealed no significant effect (all p-values > .30) (Figure 3B, bottom inset). In the integrated model, with ‘condition’ (congruent, incongruent) as a fixed effect, we found a significant main effect of phase (F (1, 84.68) = 8.13, p = .005), indicating that participants were more accurate in the second session. There were no other significant effects or interactions, including the three-way interaction (‘condition’ × ‘phase’ × ‘group’: p = .98). For response time, the model revealed a significant main effect of condition (F (1, 81.22) = 6.46, p = .013), with slower responses in incongruent compared to congruent trials, and a significant main effect of phase (F (1, 82.40) = 79.15, p < .001), reflecting overall faster responses in the second session. No other effects or interactions were significant (all p-values > .42).
3.3. Context-free vocabulary navigation
Semantic fluency results (Figure 3C, top inset) revealed significant main effects of ‘language’ (F (1,126) = 191.56, p < .001) and ‘phase’ (F (1,126) = 7.13, p = .009), the latter revealing a significant increase in semantic fluency from the pre- to the post-exertion phase. Phonemic fluency outcomes (Figure 3C, bottom inset) showed a significant main effect of ‘phase’ (F (1,126) = 9.13, p = .003), typified by a significant increase in phonemic fluency performance from the pre- to the post-exertion phase. No other main or interaction effects reached significance in either fluency task (all p-values > .24).
4. Discussion
This study investigated the impact of SI on WM via a cognitive exertion paradigm. We found that SI, as opposed to TC, prevented practice-driven boosting of integration–recall outcomes, even adjusting for key aspects of bilingual experience. No such effects were observed on inhibitory and fluency performance. These results shed light on the cognitive demands of SI as a distinct form of bilingual processing.
Prior works have shown that WM outcomes are associated with SI experience (García et al., Reference García, Muñoz and Kogan2019) as well as interpreting quality (Bae & Jeong Reference Bae and Jeong2021; Injoque-Ricle et al., Reference Injoque-Ricle, Barreyro, Formoso and Jaichenco2015; Mellinger & Hanson, Reference Mellinger and Hanson2019), and relevant electrophysiological signatures are distinctly modulated during SI (Boos, Reference Boos, Kobi, Elmer and Jäncke2022). Our pre/post study extends such findings, indicating that WM resources are markedly affected after 10 minutes of this activity.
Crucially, post-task WM performance differed between groups. Integration–recall outcomes improved after TC. This likely reflects well-established practice and repetition effects in WM tasks, manifested as measurable gains through increased familiarity, greater encoding efficiency or more strategic resource allocation (Chein & Morrison, Reference Chein and Morrison2010; Jonides et al., Reference Jonides, Lewis, Nee, Lustig, Berman and Moore2008). Conversely, no such improvement emerged after SI, suggesting that this task hindered the consolidation or reactivation of learning-related mechanisms. This aligns with resource depletion models, which posit that high-load tasks temporarily exhaust executive resources, impairing subsequent performance or learning (Chen & Kalyuga, Reference Chen, Kalyuga, Dunlosky and Rawson2020; Just & Carpenter, Reference Just and Carpenter1992).
In the present design, the selective absence of WM gains after SI could reflect a combination of (1) acute resource depletion during SI and (2) interference with consolidation processes due to the multitasking demands of interpreting (Christoffels et al., Reference Christoffels, de Groot and Kroll2006). Additionally, the greater variability observed in WM performance within the SI group may reflect inter-individual differences in cognitive resilience or interpreting strategies, further supporting the view that SI imposes heterogeneous cognitive loads that differentially affect WM outcomes (Dong et al., Reference Dong, Liu and Cai2018). Importantly, these group differences were not explained by increased variability: variance did not differ between groups or phases, and quantile analyses indicated that only the control group showed practice-related gains at the upper tail of the integration–recall distribution. Thus, SI did not enhance heterogeneity per se, but rather prevented the emergence of high-gain learners typically observed after practice.
Again, such a disruption emerged only in the integration–recall condition, which, unlike the recall-only condition, requires binding different processes (viz., shape and color recognition) (Baddeley et al., Reference Baddeley, Allen and Hitch2011; Birba et al., Reference Birba, Hesse, Sedeño, Mikulan, García, Ávalos, Adolfi, Legaz, Bekinschtein and Zimerman2017; Brockmole et al., Reference Brockmole, Parra, Sala and Logie2008). Suggestively, and as corroborated in the TC group, integration–recall mechanisms are more susceptible to training and interference effects than recall-only mechanisms (Allen et al., Reference Allen, Baddeley and Hitch2006; Parra et al., Reference Parra, Abrahams, Logie, Méndez, Lopera and Della Sala2009) – a finding that highlights the selectivity of SI-related WM effects for high-load conditions. Compatibly, cognitive advantages associated with sustained SI practice are systematic for WM tasks that involve concurrent cognitive operations but inconsistent for tasks that tap on unimodal retrieval (García et al., Reference García, Muñoz and Kogan2019). For example, interpreters show training effects in WM tasks involving both audio and visual stimuli, but not under a single-modality condition (Morales et al., Reference Morales, Padilla, Gómez-Ariza and Bajo2015). This selectivity might reflect the critical role of resource management and task scheduling during SI, two aspects that are distinctly taxed during multimodal WM processes (Mizuno, Reference Mizuno2005).
The above pattern survived covariation with age of L2 appropriation, years of L2 use, L2 proficiency and SI competence. This is noteworthy, as WM performance in bilinguals is often influenced by such factors (Köpke & Nespoulous, Reference Köpke and Nespoulous2006; Lukasik et al., Reference Lukasik, Lehtonen, Soveri, Waris, Jylkkä and Laine2018; Miyake & Friedman, Reference Miyake, Friedman, Healy and Bourne2013; Tzou et al., Reference Tzou, Eslami, Chen and Vaid2012). Thus, contrary to other WM dynamics in this population, SI-driven WM exertion seems consistent beyond inter-individual variability in bilingual experience. Moreover, such WM exertion effect was not observed in the TC group, mirroring evidence that other SI-related effects on executive mechanisms are absent in single-language tasks, like text listening and shadowing (Boos, Reference Boos, Kobi, Elmer and Jäncke2022; Hervais-Adelman et al., Reference Hervais-Adelman, Moser-Mercer and Golestani2011, Reference Hervais-Adelman, Moser-Mercer, Michel and Golestani2015; Hyönä et al., Reference Hyönä, Tommola and Alaja1995). Since the WM task was identical (in stimuli, structure and duration) for both groups, the observed draining of WM resources would be a differential consequence of SI rather than a manifestation of sustained discourse processing (or prior WM taxing) at large.
Furthermore, exertion effects during L1–L2 SI seem distinctly operative in WM functions, as they were not observed on inhibitory or fluency outcomes. Similarly, while SI experience is consistently associated with WM advantages, no systematic enhancement is observed on either of those domains (Aparicio et al., Reference Aparicio, Heidlmayr and Isel2017; García et al., Reference García, Muñoz and Kogan2019; Morales et al., Reference Morales, Padilla, Gómez-Ariza and Bajo2015; Stavrakaki et al., Reference Stavrakaki, Megari, Kosmidis, Apostolidou and Takou2012; Van de Putte et al., Reference Van de Putte, De Baene, García-Pentón, Woumans, Dijkgraaf and Duyck2018). For example, despite inducing neuroanatomical changes, SI training yielded null effects on a Simon task – a gold-standard measure of response inhibition (Van de Putte et al., Reference Van de Putte, De Baene, García-Pentón, Woumans, Dijkgraaf and Duyck2018). As proposed elsewhere (García et al., Reference García, Muñoz and Kogan2019), this might be so because concomitant processes during SI (e.g., source-language comprehension and target-language production) need to be active in parallel rather than alternately suppressed. Still, null effects on inhibition and fluency may be partly driven by our design. Indeed, both tests, unlike the WM task, had demands other than SI between their pre- and post-exertion versions. Though further research is needed to rule out this confound, our findings suggest that SI does not uniformly impact domain-general processes at large.
Incidentally, fluency results revealed better outcomes in L1 (Chinese) than in L2 (English). This pattern might reflect two phenomena. First, in Asian languages, animal fluency outcomes evince disproportionately high rates of zodiac animals, a list that is rotely learned and quickly produced across L1 speakers (Sung et al., Reference Sung, Shin, Scimeca, Li and Kiran2025). Second, word retrieval outcomes are generally lower in L2 than in L1, likely due to less entrenchment in the former (French & Jacquet, Reference French and Jacquet2004) – a difference that may override immediate L1–L2 priming effects. However, fluency results might be influenced by the fixed administration of the L1 before L2 condition, inviting further research with strategic manipulations.
Our study carries theoretical implications. Links between SI and WM are acknowledged in different accounts, but mainly supported by cross-sectional studies (García et al., Reference García, Muñoz and Kogan2019). Rooted in a pre/post design, our results suggest that SI may be causally involved in the exhaustion of WM resources, offering mechanistic constraints for cognitive modeling. More particularly, our findings support the view that SI effects are typified by ‘demand-based domain-specificity’ (García et al., Reference García, Muñoz and Kogan2019: 736). While this claim was first advanced by reference to SI-induced benefits (García, Reference García2014), the abolition of practice-related WM effects shows that even SI-induced disadvantages are also confined to functions distinctly recruited during the task. This extends the interpreter advantage hypothesis, suggesting that the selective impact of SI on its central domains may manifest not only as long-term gains but also as immediate decline. Accordingly, accounts of SI-related effects could be fruitfully expanded by detailing how immediate resource depletion progressively leads to diachronic boosts – e.g., through entrenchment or activation threshold adaptations.
This work is not without limitations. First, though adequately powered and similar to that of previous works (e.g., Babcock & Vallesi, Reference Babcock and Vallesi2017; Becker et al., Reference Becker, Schubert, Strobach, Gallinat and Kühn2016; Čeňková et al., Reference Čeňková, Meylaerts, Hertog, Szmalec and Duyck2014; Tzou et al., Reference Tzou, Eslami, Chen and Vaid2012), our sample size was modest. As in previous bilingualism studies (e.g., Bialystok et al., Reference Bialystok, Craik, Klein and Viswanathan2004; Calabria et al., Reference Calabria, Hernández, Branzi and Costa2012; Blanco-Elorrieta & Pylkkänen, Reference Blanco-Elorrieta and Pylkkänen2017; Declerck et al., Reference Declerck, Thoma, Koch and Philipp2015), this was mitigated via linear mixed-effects models with a simple structure and within-subject data, which enhances statistical sensitivity by reducing inter-individual variability (Charness et al., Reference Charness, Gneezy and Kuhn2012). Nonetheless, future studies should aim to replicate these findings with larger and more diverse samples. Second, the sample consisted solely of interpreting students. Given that cognitive outcomes may vary widely between trainees and professionals (García et al., Reference García, Muñoz and Kogan2019), future works should test whether similar exertion effects occur across both populations. Third, L2 proficiency and SI competence levels were based solely on self-report data. Though widely used (Hulstijn, Reference Hulstijn2012) and predictive of objective language outcomes (Langdon et al., Reference Langdon, Wiig and Nielsen2005; Marian et al., Reference Marian, Blumenfeld and Kaushanskaya2007; Gollan et al., Reference Gollan, Weissberger, Runnqvist, Montoya and Cera2012), these could be prone to self-image and desirability biases. Future works should complement such measures with formal test results as inclusion criteria. Fourth, participants received no topic information prior to SI. While this mirrors previous works (Bae & Jeong, Reference Bae and Jeong2021; Boos et al. Reference Boos, Kobi, Elmer and Jäncke2022; Tzou et al. Reference Tzou, Eslami, Chen and Vaid2012), frequent professional scenarios (Chang et al. Reference Chang, Wu and Kuo2018) and participants’ experience during formal training, it deviates from standard, recommended practice, inviting studies on how pre-task briefing might influence WM demands.
Fifth, while our WM task has been explicitly used to operationalize recall skills (Brockmole et al., Reference Brockmole, Parra, Sala and Logie2008), these are variously intertwined with recognition abilities, especially in the present paradigm. Future works could tackle this issue through tasks that more clearly demarcate both domains, disentangling the possible contributions of recognition skills to present results. Sixth, although the use of colors as stimuli minimizes semantic associations, their nameability raises the possibility of verbal coding strategies (Baddeley, Reference Baddeley2003). While our data do not suggest that participants systematically engaged in such strategies, this possibility cannot be fully excluded, and thus, present effects should be interpreted as reflecting domain-general, multimodal and possibly language-linked mechanisms. Seventh, the SI session may have been too short for exertion effects to manifest in the post-SI inhibitory and fluency measures. Future works could examine whether stronger or longer depletion influences tasks administered later in the protocol. Eighth, inhibition and fluency tasks were administered after the post-training WM task. As such, their underlying resources may have been depleted but then replenished by the time of testing. While the short delay (20 minutes) and the added fatigue from the WM task make full recovery unlikely, future studies should counterbalance tasks to more firmly assess the selectivity of observed WM effects. Alternatively, SI-induced exertion effects may not have been strong enough to persist until the inhibitory and fluency tasks were performed. Further studies could assess this possibility with a prolonged SI session. Ninth, while our pre/post design offers an advantage over purely cross-sectional approaches by capturing within-subject change (Charness et al. Reference Charness, Gneezy and Kuhn2012), our main finding – the interaction between group and phase – relies on a between-subjects comparison. Observed differences, then, could be partly influenced by confounds beyond our design. Within-subject or crossover designs would help to more precisely isolate the phenomena targeted herein.
Finally, our SI task involved the L1–L2 direction only. Though widespread in the Chinese context, this direction involves neurocognitive demands that differ from those of L2–L1 interpreting (Pérez et al., Reference Pérez, Hesse, Dottori, Birba, Amoruso, Martorell Caro, Ibáñez and García2022; Rinne et al., Reference Rinne, Tommola, Laine, Krause, Schmidt, Kaasinen and Sunnari2000; Zheng et al., Reference Zheng, Báez, Su, Xiang, Weis, Ibáñez and García2020), as evidenced by distinct behavioral performance and greater activation along inferior frontal regions implicated in WM and linguistic processes (García, Reference García2019; Rinne et al., Reference Rinne, Tommola, Laine, Krause, Schmidt, Kaasinen and Sunnari2000; Tommola et al., Reference Tommola, Laine, Sunnari and Rinne2000). Higher demands for the L1–L2 direction have also been reported during sentence and word translation, manifested as increased temporo-frontoparietal connectivity (Zheng, et al., Reference Zheng, Báez, Su, Xiang, Weis, Ibáñez and García2020) and higher frontal theta power (Pérez et al., Reference Pérez, Hesse, Dottori, Birba, Amoruso, Martorell Caro, Ibáñez and García2022), respectively. Therefore, present results should be taken to reflect L1–L2 SI demands in particular, calling for further studies on how directionality modulates WM exertion.
5. Conclusion
In sum, our study shows that a brief period of SI interferes with practice-driven boosting of WM integration–recall outcomes. This effect seems specific to both SI (relative to single-language discourse processing) and WM (as opposed to other cognitive domains). Future works along this line can further illuminate the interplay between cross-linguistic tasks and the cognitive mechanisms that support performance in bilingual populations.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S1366728925100898.
Data availability statement
All experimental data and scripts are available via the Open Science Framework at https://osf.io/p37wr/?view_only=73e68d4e0e254400a611f9cb94d06c90.
Author contribution
Isabelle Chou: data curation, methodology, software, formal analysis, project administration and writing – original draft. Agustina Birba: methodology, software, formal analysis and writing – original draft. Jiehui Hu: resources, funding acquisition, investigation and writing – review and editing. Edinson Muñoz: investigation and writing – review and editing. Guoqing Kwon: data curation, methodology, software and writing – review and editing. Adolfo M. García: conceptualization, methodology, supervision and writing – review and editing.
Funding statement
Adolfo García is partially supported by the National Institute on Aging of the National Institutes of Health (R01AG075775); ANID (FONDECYT Regular 1250317, 1250091); Universidad de Santiago de Chile (DICYT 032351MA); and the Multi-partner Consortium to Expand Dementia Research in Latin America (ReDLat), which is supported by the Fogarty International Center and the National Institutes of Health, the National Institute on Aging (R01AG057234, R01AG075775, R01AG21051 and CARDS-NIH), Alzheimer’s Association (SG-20-725707), Rainwater Charitable Foundation’s Tau Consortium, the Bluefield Project to Cure Frontotemporal Dementia and the Global Brain Health Institute. This study was supported by the National Foreign Expert Project of China (grant no. G2022167001L). The contents of this publication are solely the responsibility of the authors and do not represent the official views of these institutions.
Competing interests
The authors declare none.