According to a recent study, nearly half of the world’s population is bilingual (Grosjean, Reference Grosjean2021). For example, at least 59% of all Europeans speak at least one foreign language—a number that has been steadily increasing over the past decades, according to recent polls (www.europa.eu). The large proportion of bilingual speakers across the world stands in stark contrast to the relatively small proportion of studies in cognitive psychology and psycholinguistics that have focused on how the human mind represents and processes multiple languages versus just a single one. For example, the role of prediction, that is, our ability to anticipate upcoming linguistic input before it is perceived, has been studied extensively for several decades among monolingual native (L1) adults (e.g., Kuperberg & Jäger, Reference Kuperberg and Jaeger2016; Ryskin & Nieuwland, Reference Ryskin and Nieuwland2023) but did not start to be investigated systematically among bilinguals and second language (L2) users until the past decade (e.g., Kaan, Reference Kaan2014; Schlenter, Reference Schlenter2023). Similarly, there is a much larger body of research on language and memory among monolinguals than there is among bilinguals and L2 speakers. In this paper, we draw on these previous strands of research to investigate how engagement in prediction in German-speaking L1 and L2 speakers affects their recognition memory. Specifically, we investigate whether potentially reduced engagement in prediction during L2 processing may lead to reduced false memory effects for predictable words among L2 users.
Prediction in first language (L1) processing
It is well known that native speakers of a language leverage the linguistic context to anticipate upcoming linguistic input (for reviews, see Huettig, Reference Huettig2015; Kuperberg & Jaeger, Reference Kuperberg and Jaeger2016; Ferreira & Chantavarin, Reference Ferreira and Chantavarin2018; Ryskin & Nieuwland, Reference Ryskin and Nieuwland2023). In fact, there is now evidence that predictive processing occurs at all levels of linguistic representation, that is, syntactic (e.g., Staub & Clifton, Reference Staub and Clifton2006), semantic (e.g., Altmann & Kamide, Reference Altmann and Kamide1999; Borovsky et al., Reference Borovsky, Elman and Fernald2012; Federmeier & Kutas, Reference Federmeier and Kutas1999; Kamide et al., Reference Kamide, Altmann and Haywood2003), and lexical (DeLong et al., Reference DeLong, Quante and Kutas2014; Ito et al., Reference Ito, Corley, Pickering, Martin and Nieuwland2016; DeLong et al., Reference DeLong, Urbach and Kutas2005). Even in the absence of strong semantically predictive information, comprehenders make use of word form information conveyed by prenominal modifiers or articles to (successfully) predict the identity of an upcoming referent (e.g., Cholewa et al., Reference Cholewa, Neitzel, Bürsgens and Günther2019; also see Boudewyn et al., Reference Boudewyn, Long and Swaab2015; DeLong et al., Reference DeLong, Urbach and Kutas2005; Fleur et al., Reference Fleur, Flecken, Rommers and Nieuwland2020; Haeuser et al., Reference Haeuser, Kray and Borovsky2022; Wicha et al., Reference Wicha, Moreno and Kutas2003; Van Berkum et al., Reference Van Berkum, Brown, Zwitserlood, Kooijman and Hagoort2005). The consequence of pre-activating or predicting words during comprehension is that predicted referents are easier to process than unpredicted ones (for reviews, see Van Petten & Luka, Reference Van Petten and Luka2012; Staub, Reference Staub2015), which could be useful to language processing by freeing up mental resources that would otherwise be bound to incremental, piece-by-piece processing of language. For example, predictable referents show shorter fixation times than unpredictable ones (e.g., Ehrlich & Rayner, Reference Ehrlich and Rayner1981) and are more likely to be skipped during initial reading (e.g., Staub, Reference Staub2015). In addition, predictable words show reduced N400 event-related potentials (ERP) components (e.g., Kutas & Federmeier, Reference Kutas and Federmeier2011), which suggests that less mental effort is needed to integrate predictable words into an unfolding sentence context. In sum, prediction is a critical component not only in general theories of cognition (e.g., Clark, Reference Clark2013) but also in L1 language processing (e.g., Shain, Reference Shain2024; Staub, Reference Staub2024).
Prediction in second language (L2) processing
What about prediction in a language that is not native? An intuitive hypothesis is that L2 speakers of a language should be less likely to engage in prediction, possibly because they have less entrenched lexical representations in that language than L1 users do (e.g., Bialystok et al., Reference Bialystok, Craik and Luk2008; Kroll & Stewart, Reference Kroll and Stewart1994), or because resource limitations that are inherent to L2 language processing put a cap on engagement in top-down processing (e.g., Foucart et al., Reference Foucart, Ruiz-Tada and Costa2016, Grüter et al., Reference Grüter, Rohde and Schafer2017; Kaan et al., Reference Kaan, Dallas, Wijnen, Zwart and de Vries2010). Notably, this is consistent with what much of the previous literature on prediction in L2 has found. While we know that L2 speakers can and do use linguistic and non-linguistic context to anticipate upcoming referents (for recent reviews, see Bovolenta & Marsden, Reference Bovolenta and Marsden2022; Hopp, Reference Hopp2022; Kaan & Grüter, Reference Kaan, Grüter, Kaan and Grüter2021; Schlenter, Reference Schlenter2023), there is also evidence that L2 speakers may use linguistic cues less readily, more slowly, and/or to a different extent. For example, early work that focused on phonological and gender cues provided by definite articles found no or reduced effects of prediction (Lew-Williams & Fernald, Reference Lew-Williams and Fernald2010; Martin et al., Reference Martin, Thierry, Kuipers, Boutonnet, Foucart and Costa2013), even among learners with near-native proficiency (Grüter et al., Reference Grüter, Lew-Williams and Fernald2012; but see Dussias et al., Reference Dussias, Kroff, Tamargo and Gerfen2013; Foucart et al., Reference Foucart, Martin, Moreno and Costa2014). While prediction at the word form level is debated on for L1 processing (e.g., DeLong et al., Reference DeLong, Urbach and Kutas2017; Nieuwland et al., Reference Nieuwland, Politzer-Ahles, Heyselaar, Segaert, Darley and Kazanina2018; Urbach et al., Reference Urbach, DeLong, Chan and Kutas2020; Yan et al., Reference Yan, Kuperberg and Jaeger2017), there is now evidence for differences in L2 prediction even when it comes to prediction based on semantics. For example, Dijkgraaf and colleagues (Reference Dijkgraaf, Hartsuiker and Duyck2019) used a visual-world eye-tracking paradigm to demonstrate that L1-Dutch/L2-English bilinguals were less likely to fixate competitor images that were semantically related to a highly predictable referent in their L2 than in their L1. Similarly, Ito and colleagues (Reference Ito, Pickering and Corley2018) found that both L1 and L2 (L1 Japanese) speakers of English looked at a target referent that was highly predictable from the discourse context (e.g., cloud, following “The tourists expected rain when the sun went behind the …”) before that target appeared in the acoustic input; however, L2 speakers did so more slowly compared to L1 speakers. Thus, findings from a variety of methods and different levels of linguistic representation indicate that the scope of prediction is reduced or slowed down in L2 processing.
False remembering in L1
While initial studies on prediction focused on the immediate processing consequences of prediction (e.g., during sentence comprehension), recent research has begun to investigate whether prediction affects the memory representations that language users store and maintain after engaging in prediction. Notably, studies in this vein of research show that predictable words that are disconfirmed tend to “linger” or persist in memory and can lead to false remembering in downstream recognition memory tests (e.g., Chung & Federmeier, Reference Chung and Federmeier2023; Haeuser & Kray, Reference Haeuser and Kray2024; Haeuser & Kray, Reference Haeuser, Kray, Culbertson, Perfors, Rabagliati and Ramenzoni2022; Hubbard et al., Reference Hubbard, Rommers, Jacobs and Federmeier2019; Hubbard & Federmeier, Reference Hubbard and Federmeier2024; Rich & Harris, Reference Rich and Harris2021; Rommers & Federmeier, Reference Rommers and Federmeier2018; also see Roediger & McDermott, Reference Roediger and McDermott1995; Roediger et al., Reference Roediger, Watson, McDermott and Gallo2001; Chang & Brainerd, Reference Chang and Brainerd2021). For example, a seminal study by Hubbard and colleagues (Reference Hubbard, Rommers, Jacobs and Federmeier2019) showed that predictable words that were never actually encountered elicited high rates of false remembering in a subsequent recognition memory test, where comprehenders incorrectly endorsed previously predictable words (henceforth, lures) as “old” (i.e., after reading a sentence such as, “Be careful, the top of the stove is very dirty.”, comprehenders attested to having read “hot”). Notably, false-alarm rates to predictable lures in that and many other studies were much higher than false-alarm rates to “new” words, that is, words that were not previously seen and not predictable. This illustrates that false remembering of predictable words cannot be merely attributed to response bias, that is, a general bias to endorse words as previously seen, which is a common source of distortion in recognition memory tasks (e.g., Brainerd et al., Reference Brainerd, Yang, Reyna, Howe and Mills2008; Fraundorf et al., Reference Fraundorf, Hourihan, Peters and Benjamin2019). Previous studies have also shown that false memory judgments to predictable lures are often made with high subjective confidence (i.e., “sure old” response; see Hubbard & Federmeier, Reference Hubbard and Federmeier2024; Haeuser & Kray, Reference Haeuser and Kray2024) or in conditions when participants indicate that they vividly remember having seen the predictable word earlier on (e.g., Norman & Schacter, Reference Norman and Schacter1997; Schacter et al., Reference Schacter, Koutstaal and Norman1997). Healthy aging older adults, for example, frequently endorse lure words as “old” while indicating they are very confident of their false memory judgment (e.g., Balota et al., Reference Balota, Cortese, Duchek, Adams, Roediger Iii, McDermott and Yerys1999; Dodson et al., Reference Dodson, Bawa and Krueger2007; Greene et al., Reference Greene, Forsberg, Guitard, Naveh-Benjamin and Cowan2024; Haeuser & Kray, Reference Haeuser and Kray2024; Kensinger & Schacter, Reference Kensinger and Schacter1999; Tun et al., Reference Tun, Wingfield, Rosen and Blanchard1998). These findings attest to the fact that comprehenders often vividly “remember” seeing the lure earlier on, and that different participant groups may show systematic differences in the strength of the false memory illusion (e.g., Greene et al., Reference Greene, Forsberg, Guitard, Naveh-Benjamin and Cowan2024).
Interestingly, the findings from the prediction literature stand against the backdrop of a much larger research field on false remembering which has evolved around the so-called DRM paradigm (coined after seminal work by James Deese in the 1950s and Roediger & McDermott (Reference Roediger and McDermott1995), who made the paradigm popular in memory research). In the DRM paradigm, participants encode lists of content words (e.g., “bed, rest, dream, awake, pillow …”) that are all semantically related to a particular lure word (e.g., “sleep”). In subsequent recognition tests, participants frequently indicate that they have seen the critical lure word earlier during encoding, or they falsely recall the lure in subsequent free recall tests (for recent reviews on DRM-based false remembering, see Gallo, Reference Gallo2010; Chang & Brainerd, Reference Chang and Brainerd2021; for cognitive theories on false memory effects, see Brainerd et al., Reference Brainerd, Yang, Reyna, Howe and Mills2008; Howe, Reference Howe2008; Howe et al., Reference Howe, Wimmer, Gagnon and Plumpton2009; Holliday & Weekes, Reference Holliday and Weekes2006; Reyna & Brainerd, Reference Reyna, Corbin, Weldon and Brainerd2016; Roediger et al., Reference Roediger, Watson, McDermott and Gallo2001; Sommers & Lewis, Reference Sommers and Lewis1999). Mirroring the findings from the prediction literature, false memory judgments to DRM lures are frequently made with high subjective confidence (e.g., Norman & Schacter, Reference Norman and Schacter1997; Roediger & McDermott, Reference Roediger and McDermott1995, Experiment 2; Tun et al., Reference Tun, Wingfield, Rosen and Blanchard1998; see also Greene et al., Reference Greene, Forsberg, Guitard, Naveh-Benjamin and Cowan2024), and their frequency of occurrence goes above and beyond effects that could be explained by mere response bias alone (Roediger & McDermott, Reference Roediger and McDermott1995; Brainerd et al., Reference Brainerd, Yang, Reyna, Howe and Mills2008; Chang & Brainerd, Reference Chang and Brainerd2021; Gallo, Reference Gallo2010). The striking difference between DRM-based and prediction-based false remembering is that, for DRM lists, lure words are entrenched by means of repeated presentation of semantic associates, whereas for predictable words in sentences, participants really only get one “shot” at the critical lure. Despite this difference, very recent research indicates that prediction-based false memory can be explained by means of similar, if not the same, cognitive processes that come to bear during false remembering of DRM words, namely activation of critical lures during encoding and monitoring failures during memory retrieval (e.g., Haeuser & Kray, Reference Haeuser and Kray2024; for activation-monitoring theories of DRM-based false remembering, see Hicks & Hancock, Reference Hicks and Hancock2002; Howe, Reference Howe2008; Howe et al., Reference Howe, Wimmer, Gagnon and Plumpton2009; Roediger et al., Reference Roediger, Watson, McDermott and Gallo2001).
False memory in L2
In light of the consistently observed attenuation in L2 speakers’ engagement in prediction, an obvious hypothesis for the outcome of a false memory experiment on predictable words is that L2 speakers should show reduced rates of false remembering compared to L1 speakers. We know of only one previous study that has investigated prediction-based false remembering from an L2 perspective before (Foucart et al., Reference Foucart, Ruiz-Tada and Costa2016). However, and in contrast to the hypothesis, that study found relatively equivalent rates of false remembering in L2 speakers. In that study, late French-Spanish bilinguals (n = 22) listened to constraining sentences in Spanish (e.g., “El pirata tenía el mapa secreto, pero nunca encontró…”, “The pirate had the secret map, but he never found…”) which continued with either the predictable (e.g., “el tesoro”, “the treasure”) or an unpredictable noun (e.g., “la gruta”, “the cave”) and later completed a recognition memory test. Importantly, during the listening phase, the critical noun was always completely muted so that only the prenominal gender-marked article indicated whether the upcoming noun was the predictable one or not (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012). ERPs recorded during the listening phase showed a more negative deflection for articles that were inconsistent with the gender of the predictable noun, suggesting that listeners showed early brain responses when predictions were not met. Results from the “old”/”new” recognition memory test showed that predictable nouns were false-alarmed to more often when participants had previously heard the sentence with the prediction-consistent gender-marked article (e.g., “encontró el” vs “encontró la”), with mean false-alarm rates of 49% versus 43%, respectively (note that differences in response bias were not taken into account in that study). As these findings for L2 speakers were very similar to those from an earlier study with L1 speakers of Spanish using the same materials (Foucart et al., Reference Foucart, Ruiz-Tada and Costa2015), the authors concluded that late bilinguals, much like native speakers, use anticipation processes during L2 comprehension that lead to the creation of memory traces for expected but not-encountered words. In other words, prediction elicits false remembering in L2 as much as it does in L1, contrary to the hypothesis stated above.
Interestingly, the results from the Foucart et al. (Reference Foucart, Ruiz-Tada and Costa2016) study stand in stark contrast to previous DRM research on bilingualism, which has consistently found that L2 speakers show reduced, and not equivalent, rates of false remembering, compared to L1 speakers (see e.g., Anastasi et al., Reference Anastasi, Rhodes, Marquez and Velino2005; Arndt & Beato, Reference Arndt and Beato2017; Bialystok et al., Reference Bialystok, Dey, Sullivan and Sommers2020; Sahlin et al., Reference Sahlin, Harding and Seamon2005; Suarez & Beato, Reference Suarez and Beato2023; for mini-review, see Suarez & Beato, Reference Suarez and Beato2021). For example, Arndt & Beato (Reference Arndt and Beato2017) used a within-subject design on L1-English/L2-Spanish speakers to study false recognition memory of DRM lures when using English and Spanish word lists. All L2 groups had relatively low proficiency in their second language. Across three experiments, the results showed that false recognition memory effects were lower when participants were tested in their L2 compared to their L1. The authors concluded that processing word lists in an L2 seems to create less robust false memory representations than processing similar word lists in an L1.Footnote 1
Past studies have offered two critical explanations for the reduced false memory effect among L2 speakers. One explanation holds that L2 participants show overall reduced-activation levels when processing their L2, owing to the fact that their L2 lexical network is less interconnected and/or less entrenched (e.g., Arndt & Beato, Reference Arndt and Beato2017; Anastasi et al., Reference Anastasi, Rhodes, Marquez and Velino2005; Futrell et al., Reference Futrell, Gibson and Levy2020; Pawley & Syder, Reference Pawley, Syder, Richards and Schmidt1983; for reviews, see Suarez & Beato, Reference Suarez and Beato2021; Beato et al., Reference Beato, Suarez, Cadavid and Albuquerque2023; for compatible theories on L1-based false remembering, see Howe, Reference Howe2008; Roediger et al., Reference Roediger, Watson, McDermott and Gallo2001). Qualitatively, this account is very similar to accounts in the L2 literature that have attributed reduced engagement in prediction to reduced levels of lexical-semantic activation and/or spreading within the L2 lexicon (e.g., Ito & Pickering, Reference Ito, Pickering, Kaan and Grüter2021). Therefore, we will subsume both accounts into one and refer to them as reduced-activation accounts. The reduced-activation account was originally brought forward by empirical studies reporting on L2-based reductions in false memory (Suarez & Beato, Reference Suarez and Beato2021; Beato et al., Reference Beato, Suarez, Cadavid and Albuquerque2023). However, this account also directly falls out of theories arguing that L2 speakers predict less when processing their L2 (e.g., Bovolenta & Marsden, Reference Bovolenta and Marsden2022; Hopp, Reference Hopp2022; Kaan & Grüter, Reference Kaan, Grüter, Kaan and Grüter2021; Schlenter, Reference Schlenter2023). Both lines of research suggest that L2 speakers activate lexical representations less distinctly during initial processing, which makes it less likely for them to predict or pre-activate critical lures during initial encoding. As such, L2 speakers should also be less likely to experience interference from lures in the memory test (e.g., what the DRM literature has referred to as “monitoring failures” where residue activation for critical lures is mistaken for genuine activation; see Roediger et al., Reference Roediger, Watson, McDermott and Gallo2001; Howe, Reference Howe2008). Consequently, reduced-activation accounts predict that L2 speakers should show reduced rates of false remembering overall.
The second explanation for reduced DRM effects in L2 holds that, due to L2 speakers being more experienced at juggling multiple activated representations at a time because of speaking two languages, they should be at an advantage when it comes to correctly suppressing activation from critical lures during encoding and/or memory retrieval (e.g., The word “sleep” looks familiar but I know I saw “bed”, not “sleep”). We will refer to this account as the bilingual-advantage account (e.g., Bialystok et al., Reference Bialystok, Dey, Sullivan and Sommers2020; Bialystok et al., Reference Bialystok, Craik and Luk2008; Morales et al., Reference Morales, Calvo and Bialystok2013). The bilingual-advantage account was brought forward in an empirical study by Bialystok and colleagues (Reference Bialystok, Dey, Sullivan and Sommers2020) who demonstrated that L2 speakers show reduced false memory effects for lure words in the DRM paradigm, when compared to L1 speakers (see Experiments 2 and 3 of that paper). According to the authors, L2 speakers are better at avoiding false alarms to lures due to their improved abilities in selective attention and executive control, which should arise from L2 speakers’ continuous experience at selecting from activated representations in the two languages (Bialystok et al., Reference Bialystok, Dey, Sullivan and Sommers2020). Of note, the bilingual-advantage account predicts that L1–L2 differences could arise both at encoding and retrieval. During both processing stages, L2 speakers are better at suppressing activation from co-activated lures.
The reduced-activation and bilingual-advantage accounts have in common that they both predict reduced false memory effects in L2 speakers, although they differ in the hypothesized cognitive mechanisms behind these (reduced) effects. The two accounts make opposite predictions, however, for recognition memory of previously seen, old, words: If reduced rates of false remembering are a consequence of L2 speakers activating lexical representations less distinctly (i.e., reduced-activation account), then L2 speakers should also show reductions in correct recognition memory for old words (because old words, too, should have been activated less distinctly). If, one the other hand, reduced false memory rates in L2 result from L2 speakers being experienced at suppressing unwanted activation from lures, then they should show L1-equivalent rates of correct recognition memory for old words (see Bialystok et al., Reference Bialystok, Dey, Sullivan and Sommers2020).
The present study
The goal of the present study is (i) to test the central prediction of both the reduced-activation and the bilingual-advantage accounts that false memory effects will be reduced among L2 compared to L1 speakers, and (ii) to distinguish between the two accounts by probing for memory of both critical lures and previously seen old words. We do so using an experimental paradigm from previous research with L1 speakers (e.g., Haeuser & Kray, Reference Haeuser and Kray2024; Haeuser & Kray, Reference Haeuser, Kray, Culbertson, Perfors, Rabagliati and Ramenzoni2022), including a self-paced reading (SPR) task followed by a recognition memory test, with L1 and advanced L2 speakers of German. During SPR, participants read German sentences constructed to constrain their expectations toward a predictable noun, with half of the critical sentences continuing with the predictable noun, and the other half continuing with an unpredictable but semantically plausible noun. After a brief distractor task, participants completed a surprise (i.e., previously unannounced) recognition memory test, in which they were presented with previously seen, “old” words (i.e., previously seen predictable and unpredictable words), “lures” (i.e., predictable but not actually presented words), as well as “new” words, and had to indicate whether the presented words were “old” or “new”. In order to investigate vivid recollection versus mere familiarity with an item, participants were also asked to indicate how confident they were in their response (i.e., sure vs maybe old and new; for rationale, see Haeuser & Kray, Reference Haeuser and Kray2024).
Hypotheses and predictions
For SPR, both the reduced-activation and the bilingual-advantage accounts would expect reduced predictability effects in L2, albeit for different reasons. According to the reduced-activation account, L2 speakers are less likely to engage in prediction when processing their second language, so they should show less facilitation for predictable words and less processing difficulty for unpredictable words. The bilingual-advantage account would also expect a reduced predictability effect in L2, because L2 speakers should be better at suppressing interfering activation from predictable words when reading unpredictable sentences.
For the recognition memory test, both accounts expect reduced rates of false remembering for predictable lures in L2 compared to L1 speakers, over and above possible group-wise differences in response bias (i.e., false alarms to “new” words), and possibly, differences between L1 and L2 regarding their subjective memory confidence (e.g., Dodson et al., Reference Dodson, Bawa and Krueger2007; Greene et al., Reference Greene, Forsberg, Guitard, Naveh-Benjamin and Cowan2024; Haeuser & Kray, Reference Haeuser and Kray2024; Hubbard & Federmeier, Reference Hubbard and Federmeier2024; Norman & Schacter, Reference Norman and Schacter1997; also see Pawley & Syder, Reference Pawley, Syder, Richards and Schmidt1983). However, diverging predictions emerge from the two accounts when it comes to correct recognition memory for previously seen, “old” words (henceforth, true recognition memory; see Roediger et al., Reference Roediger, Watson, McDermott and Gallo2001; Roediger & McDermott, Reference Roediger and McDermott1995; Chang & Brainerd, Reference Chang and Brainerd2021; Gallo, Reference Gallo2010). The reduced-activation account predicts lower rates of true recognition memory for L2 compared to L1 speakers, as L2 speakers would activate lexical representations for “old” words less distinctly during initial encoding. In contrast, the bilingual-advantage account does not predict any differences between L1 and L2 speakers for true recognition memory.
Method
Participants
A total of 91 L1 (n = 44, 25m, 19f, 0nb, Mean age = 27 years) and L2 (n = 47, 55m, 22f, 0nb, Mean age = 25 years) speakers of German was recruited through social media and flyer advertisement. The study was run online (see below for details) in the time period from July until October 2023. L1 speakers were recruited from the first author’s institution and reported learning German from birth and using German as their primary language in everyday life.Footnote 2 The L2 group consisted of relatively advanced L2 learners of German, who were foreign-language students with a diverse set of L1s at universities in Germany (L1s: Arabic (5), English (4), Chinese (2), French (5), Hindi (7), Indonesian (2), Italian (2), Japanese (1), Kinyaruanda (1), Malayalam (1), Persian (3), Polish (1), Portuguese (1), Russian (1), Serbian (1), Spanish (1), Turkish (1), Telugu (1), Ukranian (2), Hungarian (1), and Vietnamese (1)). Table 1 summarizes relevant background information on the L1 and L2 groups, including self-reported age of acquisition (AoA), immersion duration, usage of German, self-ratings of overall and subskills in German on a 10-point scale, as well as scores from the LexTALE German test as an independent measure of German proficiency (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012). None of the participants reported a history of reading or writing disabilities and/or taking neuropsychiatric medication at the time of testing.
Table 1. Background information on the L1 and L2 speaker group

Note. We define age of acquisition (AoA) as the age at which participants reported to first learning German.
Materials
Materials for self-paced reading
All materials are included on this paper’s OSF project page under the link https://osf.io/e56k4/. Materials for the SPR task consisted of 32 constraining German sentence frames (e.g., Im Krankenhaus vernäht die Stationsschwester mit schneller Hand die …, “At the hospital, the nurse stitches up quickly the …”), which continued in a relatively predictable (e.g., Wunde, “injury”) and relatively less predictable (but somewhat plausible) noun (e.g., Hose, “pants”), yielding a total of n = 64 experimental sentences. Predictable and unpredictable nouns were matched with respect to length, t(62) = −1.23, p = .22, and log-per-million word frequency, t(60) = 1.94, p = .06 (frequency values were taken from the movie subtitle database SUBTLEX-DE; see Brysbaert et al., Reference Brysbaert, Buchmeier, Conrad, Jacobs, Bölte and Böhl2011).
A cloze probability pretest conducted with 33 German L1 speakers (Mean age = 28 years) and 17 L2 learners of German (Mean age = 29 years, AoA range = 3–23), who did not participate in the main experiment, had shown relatively high cloze probabilities for predictable nouns and near-zero cloze probabilities for unpredictable nouns (see Table 2). On average, cloze probabilities for predictable nouns were lower among L2 than L1 speakers (M = 0.85, range = 0.56-1.00, and M = 0.94, range = 0.73–1.00, respectively), a significant difference, t(62) = −3.24, p < .01. Table 2 shows descriptive statistics and by-group cloze probabilities for predictable and unpredictable nouns used in the experiment.
Table 2. Descriptive means (and SD) of properties of critical nouns in the self-paced reading task

Note. Frequency estimates are log-per million values from the movie subtitle corpus (Brysbaert et al., Reference Brysbaert, Buchmeier, Conrad, Jacobs, Bölte and Böhl2011).
All predictable and unpredictable sentences contained two- or three-word sentence continuations after the noun that continued the sentence in a plausible fashion (e.g., “in the room next door”), in order to allow for spill-over effects, which frequently occur in self-paced sentence reading. Note that the spill-over region was identical for predictable and unpredictable versions of an item. The experimental sentences were distributed across two lists using a Latin square design, such that one participant was presented either with the predictable or unpredictable version of an item during the experiment, but not both (i.e., a total of 16 predictable and 16 unpredictable items per subject). Each list additionally contained 25 moderately constraining filler sentences (taken from the Potsdam sentence corpus; see Kliegl et al., Reference Kliegl, Nuthmann and Engbert2006), which were included to ensure that participants continued to generate predictions in the course of the experiment (despite frequently having predictions disconfirmed; see e.g., Delaney-Busch et al., Reference Delaney-Busch, Morgan, Lau and Kuperberg2019; Fine et al., Reference Fine, Jaeger, Farmer and Qian2013). All experimental sentences were followed by yes/no comprehension questions, inserted to ensure that participants were reading the experimental sentences for comprehension. Comprehension questions were identical for predictable and unpredictable versions of an item and mostly probed information provided in the sentence context.
Materials for word recognition memory
Materials for the recognition memory task consisted of 64 “old” nouns (i.e., 32 previously seen predictable and 32 unpredictable nouns in SPR), 40 “new” nouns (i.e., nouns not previously seen in SPR), and 32 “lure” nouns (i.e., nouns previously predicted but not seen; for recognition materials, see https://osf.io/e56k4/). A single participant saw a total of 88 nouns in the recognition task (i.e., 32 previously seen, old nouns (16 predictable, 16 unpredictable), 40 new nouns, and 16 lures). “New” nouns were obtained from a subset of German nouns listed in the movie subtitle database (SUBTLEX-DE; Brysbaert et al., Reference Brysbaert, Buchmeier, Conrad, Jacobs, Bölte and Böhl2011) that, in a previous step, had been cleared of all nouns used in the experiment. From that list, the experimenters chose “new” nouns that roughly fell in the same length and frequency range as the “old” and “lure” nouns. As a result, old-predictable, old-unpredictable, new and lure nouns in the experiment were matched both with respect to word length (F(3, 132) = 1.69, p = .17) and frequency (F(3, 129) = 2.46, p = .07). Table 3 shows descriptive statistics for words used in the recognition task.
Table 3. Descriptive means (and SD) of properties of nouns used in the recognition task

All recognition nouns were presented on two experimental lists, with list composition dependent on list administration in the SPR task: Participants who saw the unpredictable version of an experimental sentence during SPR were presented with the previously seen, “old” unpredictable noun, as well as the potentially predictable “lure” in the recognition task. Likewise, participants who saw the predictable version of a sentence in the SPR task were presented with the previously seen “old” predictable word during the recognition task (but not the lure).
Procedure
The study was conducted online using the experimental platform LabVanced (Finger et al., Reference Finger, Goeke, Diekamp, Standvoß and König2017), a JavaScript web application that allows for online implementation of behavioral research. Participation was only possible through a PC (Windows or Linux), Mac, or tablet; cell phone or iPhone participation was disabled. The study consisted of five major sections. The first section was a brief questionnaire (∼10 minutes) in which participants reported on their language background. The language background questionnaire was followed by a non-cumulative word-by-word SPR task (∼15–20 minutes). Following the SPR task, there was a brief retention interval (∼five minutes), in which participants solved simple math problems. The math task served the primary purpose of clearing participants’ short-term memory before they proceeded to the noun recognition memory task. The fourth section of the experiment was the noun recognition task (∼15 minutes), which was followed by the German LexTale (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012). All participants completed the study in the specified order.
In the SPR task, participants read the experimental sentences on a screen word by word (each word was only displayed once and was not replaced by dashes later on, that is, there was no mask, no moving-window reading). Each word was presented in the center of a white screen using Lucinda 18pt font and stayed on the screen until participants pushed the space bar, which revealed the next word in the sentence. Participants were instructed to read the sentences as quickly and thoroughly as possible and to answer all comprehension questions as accurately as possible by pressing the “S” (Yes, correct) and “L” (No, incorrect) keys on the keyboard. The experimental task was preceded by ten practice sentences to allow participants to get used to the word-by-word reading task.
In the (surprise) noun recognition memory task, participants saw single nouns appear on the screen in a central position using Lucinda 18pt font, separated by a central fixation cross displayed for 5000 ms. Participants were instructed to hover their middle and index fingers of each hand over the S, D and J, K keys on the keyboard. They were instructed to press the S or D key for all “new” nouns, and the J or K key for all “old” nouns, and to additionally indicate their confidence with their recognition rating (D = Maybe New, S = Sure New, J = Maybe Old, K = Sure Old). The confidence judgments were included in order to obtain a subjective estimate of how accurately participants judge the veracity of their memory judgments (e.g., Greene et al., Reference Greene, Forsberg, Guitard, Naveh-Benjamin and Cowan2024). For example, previous research on L1 processing had shown that, counterintuitively, memory intrusions are often committed with high confidence, suggesting that participants are misguided in evaluating the strengths or weaknesses of their memories (e.g., Greene et al., Reference Greene, Forsberg, Guitard, Naveh-Benjamin and Cowan2024; Haeuser & Kray, Reference Haeuser and Kray2024).
Participants were instructed that they did not need to memorize the key combinations, because every trial contained a schematic display of the response options (SD, JK), and what they represented, at the bottom of the screen. Participants were instructed to respond to each word as quickly and accurately as possible. About two-thirds of all participants completed the experiment within 30–70 minutes (average finish time = 62 minutes). We did not exclude participants based on longer experimental durations. On average, the L2 speakers took longer than the L1 speakers (M = 73 minutes vs M = 51 minutes, respectively).
Data analysis
All analysis scripts and data files are available on OSF (https://osf.io/e56k4/). We report the results for the word-by-word SPR task and subsequent word recognition task in two separate sections. To statistically analyze the reading and main recognition data, we ran linear mixed effects models and generalized linear mixed models (LMERs and GLMERs; Baayen et al., Reference Baayen, Davidson and Bates2008) specified in the lme4 library (Bates et al., Reference Bates, Mächler, Bolker and Walker2015) in R (R Core Team 2022; version 4.1.3), which were run on unaggregated trial-by-trial data. All models included random intercepts for subjects and items and were initially fit with the fullest by-item and by-subject random slope structure warranted by the design (i.e., “maximal” model including all within-subject and within-item predictors, including their interactions whenever specified; Barr et al., Reference Barr, Levy, Scheepers and Tily2013). In the case of convergence warnings, models were simplified progressively using the least variance approach (e.g., Barr et al., Reference Barr, Levy, Scheepers and Tily2013). P-values were calculated using the R package lmerTest (Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017).
Results
Self-paced reading
One participant from the L2 group had an accuracy rate below 70% on the comprehension questions and was excluded from further analysis. On average, the remaining participants responded to the comprehension questions very accurately (M = 0.90, range = 0.72–1), with L2 speakers (M = 0.88, range = 0.72–1) responding slightly less accurately than L1 speakers (M = 0.91, range = 0.81–0.97), a significant difference, t(88) = 2.68, p < .01. Thus, even though participants in both speaker groups showed good comprehension of the sentences overall, L2 speakers seemed to encode fewer details from the sentences into short-term memory.
The critical region for statistical analysis of the RT data consisted of the predictable and unpredictable noun, as well as the two words immediately following the noun to catch spill-over effects common in word-by-word SPR (i.e., the noun, the first word after the noun, N + 1, and the second word after the noun, N + 2). We fit separate models for each word in the critical region. The outcome variable was reading times, log-transformed to avoid skewness. Fixed effects were predictability (a categorical variable with two levels: predictable vs unpredictable, sum-coded with −1 and 1 to allow for interpretation as main effect; cf. Brehm & Alday, Reference Brehm and Alday2022; Schad et al., Reference Schad, Vasishth, Hohenstein and Kliegl2020)Footnote 3 , and speaker group (two levels: L1 and L2, sum-coded with −1 and 1), as well as the interaction between predictability and speaker group. Each model additionally included control predictors for trial number (to offset effects of customization), word position in the sentence, word frequency (i.e., log-per-million frequency values from the SUBTLEX-DE database; Brysbaert et al., Reference Brysbaert, Buchmeier, Conrad, Jacobs, Bölte and Böhl2011), and word length. We did not include a control predictor for scaled reading times of the previous word, as the inclusion of this predictor led to multicollinearity with predictability in models Noun+1 and Noun+2. All control predictors were scaled. All models were initially fit with the maximal random slope structure warranted by the design (i.e., by-subject random slopes for group, and by-item random slopes for predictability and group). Models that did not converge were simplified in their random slope structure using the least variance approach.
Based on visual inspection of the reading time data, we excluded data points in the L1 group that were faster than 100 ms and slower than 1500 ms (for the Noun and Noun+1 region), and slower than 2000 ms (for the Noun+2 region). For the L2 group, also based on visual inspection of the data, we excluded data points that were faster than 100 ms and slower than 2000 ms across regions. This procedure affected less than 2.7% of all data points in the L2 group, and less than 2.4% of all data points in the L1 group, justifiable rates according to Ratcliff (Reference Ratcliff1993).Footnote 4 Figure 1 shows a partial effects plot of reading times in the critical region.

Figure 1. Reading times in the self-paced reading task. Note. Error bars reflect SE.
Noun. We found a main effect of predictability (b = 0.02, SE = 0.01, t = 2.37, p = .02), indicating that across groups, unpredictable nouns elicited processing difficulty, as well as a main effect of group, suggesting that L2 speakers read more slowly than L1 speakers overall (b = 0.08, SE = 0.03, t = 2.38, p = .02). The interaction between predictability and group did not reach statistical significance (t = 1.28, p = .20).
Noun+1. There was a main effect of predictability (b = 0.02, SE = 0.01, t = 2.97, p = .01), again indicating slowed reading of unpredictable items. The main effect of group was not significant (t = 1.31, p = .20), nor was the interaction between predictability and group (t = 1.60, p = .11).
Noun+2. There was a main effect of predictability (b = 0.02, SE = 0.01, t = 2.29, p = .03). Much like in the Noun+1 model, the main effect of group was not significant (t = 1.53, p = .13), nor was the interaction between predictability and group (t = −0.004, p = .43).
In sum, two key findings emerged from the analysis of the SPR data: First, L2 speakers read more slowly compared to L1 speakers overall, an effect that was numerically present for all words in the critical region but reached significance only on the noun. Second, unpredictable words led to slower reading times overall, suggesting that participants in both groups experienced integration difficulties when reading sentence continuations that were unpredictable based on context. Critically, we observed no significant interactions between predictability and group. Thus, contrary to the predictions that derive from both the reduced-activation and the bilingual-advantage accounts, we obtained no evidence from the SPR task that predictability effects are reduced in L2 compared to L1 speakers.Footnote 5
Noun recognition memory
Two participants (one from the L1 and one from the L2 group) were excluded from all following analyses because their hit rate was lower than their false-alarm rate to new items.
We first examined whether L1 and L2 speakers showed systematic differences with respect to response bias. To do this, we computed hits and false alarms, as well as the signal detection measure d’ for participants in both groups (d prime). d’ is a measure for recognition discrimination that takes into hits and false alarms (e.g., Fraundorf et al., Reference Fraundorf, Hourihan, Peters and Benjamin2019; see Stanislaw & Todorov, Reference Stanislaw and Todorov1999). It is calculated as a difference score between z-transformed hit and false-alarm rates (e.g., using the R-formula qnorm(HR) - qnorm(FAR)). Footnote 6 Results are presented in Table 4. L2 speakers showed elevated false-alarm rates to new words, t(85) = −3.35, p < .001, in other words, there was evidence for elevated response bias among L2 speakers. Hit rates to old items did not differ between L1 and L2 speakers, t(85) = −0.53, p = .60. Altogether, this led to a significantly lower d’ in the L2 compared to the L1 group, t(85) = 2.92, p < .01, suggesting that L2 speakers discriminated less successfully between old and new items.
Table 4. Average hit and false alarm rates by group in the recognition memory task

The between-group difference for false alarms indicates that we cannot directly compare absolute (i.e., uncorrected) rates of true and false memory (to lures) in L1 and L2 speakers, as these rates are systematically distorted (i.e., shifted upward) in L2 speakers as a function of their response bias. As we describe below, our main analysis takes this response bias into account by using effect coding, as it computes recognition memory effects that are corrected for response bias.
To statistically analyze recognition memory rates, we used GLMER models using the Bobyqa optimizer, to facilitate convergence. Since there was no difference in hit rates between old predictable and unpredictable words, we collapsed all hits to predictable and unpredictable words into a single “old” category for the statistical analysis. Prior to analysis, recognition data were cleaned minimally by removing, in both groups, RT outliers that were faster than 200 ms and slower than 10000 ms, a procedure that affected less than 1% of all data points.Footnote 7
The outcome variable was trial-by-trial “old” judgments, that is, whether a word was judged as “old” or “new” (coded in 1’s and 0’s). For previously seen (“old”) nouns, old judgments reflect hits. For lures and new items, old judgments reflect false alarms (i.e., participants incorrectly endorsing a word as old when it was, in fact, new). Fixed effects in the model were word type (three levels: old, new, and lure; see below for contrast coding), confidence (low vs high confidence, sum-coded with −1 and 1), and group (L1 vs L2, sum coded with −1 and 1 for L1 and L2 speakers, respectively), as well as all two- and three-way interactions between them. Covariates included word length, frequency (log-per-million values from the SUBTLEX-DE database; see Brysbaert et al., Reference Brysbaert, Buchmeier, Conrad, Jacobs, Bölte and Böhl2011), trial number, and recognition response time. All covariates were scaled to facilitate direct comparison between their beta estimates.
For the predictor word type, we set up two orthogonal (i.e., statistically independent; Schad et al., Reference Schad, Vasishth, Hohenstein and Kliegl2020) contrasts that take into account response bias. The two contrasts of interest for this study were a) lure vs new (false memory effect), and b) old vs new (true memory effect); we explain these below.
The first contrast lure vs new measured the critical false memory effect by comparing false alarms for lures against false alarms to new items. Note that since false alarms to lures are compared to those of new words, this effect is corrected for response bias. If predictable words elicit false recognition memory, false-alarm rates to lures should be consistently higher than those to new words. According to both the reduced-activation and bilingual-advantage accounts, we expected this effect to interact with speaker group, such that the false memory effect should be reduced in L2 compared to L1 speakers. We also expected the memory effect to interact with confidence, such that L2 speakers should be less likely to express high confidence when false alarming to predictable lures.
The second contrast old vs new measured hit rates to old items by taking into account the difference between old and new nouns. We will refer to this contrast as the “true” memory effect—that is, hit responses corrected for response bias. We implemented this contrast to help distinguish between the two accounts of reduced false memory effects in L2 speakers. If reduced false remembering for lures among L2 participants result from overall lower activation when processing words in a second language (i.e., reduced-activation account), then L2 speakers should also show reduced rates of true remembering. If, on the other hand, L2 speakers show a reduced false memory effect specifically because they are more experienced at suppressing activation from irrelevant representations (i.e., bilingual-advantage account), then there should be no difference between L1 and L2 speakers with respect to true recognition memory.
Results. The final model converged with by-subject and by-noun random intercepts, as well as by-noun random slopes for speaker group, and by-subject random slopes for condition and confidence. Figure 2 shows proportions of items judged as old in L1 and L2 speakers, split up by condition and confidence.

Figure 2. Recognition memory test results. Note. Error bars reflect SE. Lure = previously predictable but not presented noun. New = genuinely new noun. Old = previously presented predictable and unpredictable nouns.
For the contrast lure vs new (i.e., false memory effect), the model showed a significant main effect (b = 1.19, SE = 0.18, z = 6.71, p < .001), indicating an overall false memory effect (i.e., elevated false-alarm rates to lure compared to new items). There was also a significant two-way interaction between lure vs new and group (b = −0.36, SE = 0.11, z = −3.27, p = .001), indicating a reduced false memory effect among L2 compared to L1 speakers, consistent with the predictions of both accounts under consideration and in line with previous false memory studies (reviewed in Suarez & Beato, Reference Suarez and Beato2021). Irrespective of speaker group, the false memory effect was larger in high- compared to low-confidence judgments, indicated by a significant interaction between lure vs new and confidence (b = 0.51, SE = 0.08, z = 6.05, p < .001), suggesting that both speaker groups misjudged their memory contents when false-alarming to lures. Notably, the three-way interaction between lure vs new, group and confidence reached statistical significance (b = −0.17, SE = 0.08, z = −2.10, p = .04). Model splits by speaker groups indicated that the interaction between lure vs new and confidence was numerically smaller among L2 (b = 0.31, SE = 0.11, z = 2.89, p < .004) compared to L1 speakers (b = 0.66, SE = 0.13, z = 5.24, p < .001). This means that L2 speakers were less likely to assign high subjective confidence to their memory judgments when false alarming to predictable lures.
In sum, there were two key findings for false memory: First, and in line with the reduced-activation and bilingual-advantage accounts, L2 speakers showed a reduced false memory effect compared to L1 speakers. Second, L2 speakers were less likely to false-alarm to critical lures with high subjective memory confidence, which suggests that predictable lures elicited less vivid remembering in L2 vs L1 speakers.
For the contrast old vs new (i.e., true memory effect), there was a significant main effect for old vs new (b = 2.53, SE = 0.17, z = 14.57, p < .001; see Figure 2), indicating that both groups discriminated consistently between old and new items. However, in line with the reduced-activation account, we also found a significant interaction between old vs new and group (b = −0.28, SE = 0.11, z = −2.55, p = .01), suggesting that L2 speakers showed reduced true memory rates compared to L1 speakers (see Figure 2). This finding supports the hypothesis that reduced false memory rates in L2 result from overall reduced or less distinct activation of lexical representations when processing a second language, and not from greater experience in suppressing activation from unwanted lexical representations as a consequence of bilingualism. There was also a significant, positive-going, interaction between old vs new and confidence (b = 1.13, SE = 0.08, z = 14.06, p < .001), indicating that, regardless of speaker group, participants showed a larger true memory effect (i.e., better old-new discrimination) in high- compared to low-confidence judgments. The three-way interaction between old vs new, group, and confidence was not significant (b = −1.0, SE = 0.08, z = −1.23, p = .22).
In sum, there were two key findings for true memory: First, much like false recognition memory, true recognition memory was reduced among L2 speakers, aligning with the prediction of the reduced-activation account, which holds that L2 speakers activate lexical representations less distinctly when processing their L2. This overall reduced lexical activation results in reduced rates of false remembering of predictable lures, but also in lower recognition rates for previously seen, “old” words. Second, irrespective of speaker group, recognition memory for previously seen, “old”, words was elevated in high confidence judgments, suggesting that when remembering previously presented information, participants in both speaker groups were mostly accurate in judging the veracity of their memory contents.Footnote 8
Discussion
We tested L1 and L2 speakers of German in a SPR and recognition memory paradigm to investigate false remembering and its cognitive underpinnings in L2 processing. According to the reduced-activation account (e.g., Arndt & Beato, Reference Arndt and Beato2017; Anastasi et al., Reference Anastasi, Rhodes, Marquez and Velino2005; Howe et al., Reference Howe, Wimmer, Gagnon and Plumpton2009; Suarez & Beato, Reference Suarez and Beato2021), L2 speakers should show reduced false memory effects for predictable words, as they (pre-)activate or predict lexical representations during encoding less distinctly than L1 speakers. According to the bilingual-advantage account, reduced false memory effects in L2 arise because L2 speakers are more experienced at suppressing unwanted activation from interfering lexical representations (i.e., predictable words) when comprehending unpredictable sentences. While these two accounts make similar predictions for false recognition memory (i.e., reduced false memory effects for L2), and for initial reading of predictable and unpredictable sentences (i.e., a reduced predictability effect for L2), they diverge in their predictions regarding true memory for previously seen, “old”, words. While the reduced-activation account predicts reduced rates of true remembering in L2 compared to L1 speakers (because L2 speakers activate lexical representations less distinctly and this should also pertain to previously seen, “old” words), the bilingual-advantage account predicts no particular L1/L2 differences when it comes to true memory.
We found that during initial reading, both L1 and L2 speakers slowed down when reading unpredictable sentence continuations, but against the predictions of both accounts, we did not obtain evidence for a reduced predictability effect in L2 speakers. In the recognition memory test, L2 speakers showed a reduced false memory effect compared to L1 speakers (i.e., lower rates of false remembering of predictable compared to new words, corrected for response bias), consistent with both the reduced-activation and bilingual-advantage accounts. L2 speakers also showed reduced rates of true recognition memory (i.e., lower hit rates to old words when corrected for response bias to new words). This latter findings is only consistent with the reduced-activation account, not the bilingual-advantage account. In addition, we found that L2 speakers were less likely to false alarm to predictable lures with high subjective memory confidence, suggesting that L2 speakers remembered predictable lures less vividly than L1 speakers. We discuss our findings for SPR and recognition memory below.
Prediction during reading in L1 and L2
In line with a large body of previous research, we found that L2 speakers read sentences more slowly than L1 speakers during SPR (e.g., Conklin et al., Reference Conklin, Alotaibi, Pellicer-Sánchez and Vilkaitė-Lozdienė2020; Dirix et al., Reference Dirix, Vander Beken, De Bruyne, Brysbaert and Duyck2020; Siyanova-Chanturia et al., Reference Siyanova-Chanturia, Conklin and Schmitt2011). In addition, we found a predictability effect during initial reading of sentences, that is, slower reading times for unpredictable compared to predictable sentence continuations, irrespective of group. As is common in SPR, the predictability effect spilled over onto the following words of the sentence. However, and in contrast to the predictions that derive from both the reduced-activation and the bilingual-advantage accounts, we did not find evidence for a reduced predictability effect in L2 speakers. This null effect is somewhat surprising, given the relatively large body of research attesting to reduced effects of prediction in L2 speakers (e.g., Schlenter, Reference Schlenter2023; Kaan & Grüter, Reference Kaan, Grüter, Kaan and Grüter2021). An obvious question that arises is whether we had enough power to obtain such an effect in the first place. We followed up on this question by conducting a power analysis that started with a set of comparable L1 data (Haeuser & Kray, Reference Haeuser and Kray2024) and simulated new data for a probable set of L2 speakers, assuming a higher intercept (i.e., slower reading times) and a reduced predictability effect in L2. This power analysis showed that with a total sample size of 91 subjects and 32 items, the present study was sufficiently powered to detect a reduced predictability effect in L2 that is as small as two-thirds of the effect in L1 speakers (we note here that the previous literature has often shown L1–L2 differences in reading speed that are much larger than that; e.g., Conklin et al., Reference Conklin, Alotaibi, Pellicer-Sánchez and Vilkaitė-Lozdienė2020; Siyanova-Chanturia et al., Reference Siyanova-Chanturia, Conklin and Schmitt2011).
We believe a more likely reason for the lack of a predictability by group interaction in our data—though speculative in nature—may lie in the potentially distributed nature of between-group differences in L1 and L2 speakers. This distributed nature could not be captured by the analysis of reading times on individual segments. Consider the pattern of reading times illustrated in Figure 1: In the L2 group, there is a clear peak at the noun, indicating a focal integration effort once the predictable/unpredictable noun itself is encountered. By contrast, the reading-time curve in the L1 group appears flatter and more “smeared out,” with no distinct peak on the noun itself. Hence, whereas L2 speakers showed a relatively focal integration effort, L1 speakers seemed to distribute their reading effort across multiple words in the critical region. The reasons for these group differences are unknown, but one speculative explanation could be that L2 readers are less able to hold verbal information in short-term memory, which may put more pressure on them to integrate new information immediately as it comes along, resulting in a costly use of resources at a single point. L1 speakers, by contrast, may be able to distribute attention and processing efforts more efficiently by holding more information in short-term memory, showing no such single-point integration effort. Future studies will be needed to investigate if this finding substantiates for other L1/L2 groups, and to what extent such differences in reading take-up are related to group-wise and individual differences in working memory capacity.
False remembering in L1 and L2
Our initial analysis of L1 and L2 speakers’ performance on the recognition memory task (i.e., d prime analysis) demonstrated lower recognition memory among L2 speakers overall, a result that was primarily driven by greater response bias among L2 speakers. In other words, L2 speakers were more likely than L1 speakers to endorse any kind of presented word as “old,” a result that replicates previous findings from recognition memory research using the DRM paradigm (e.g., Beato & Arndt, Reference Arndt and Beato2017). Our primary analysis of the recognition memory data took these differences in response bias into account and showed a reduced false memory effect for predictable words in L2 compared to L1 speakers. In other words, over and above any differences in response bias and in line with both the reduced-activation and bilingual-advantage accounts, L2 speakers were less likely to false alarm to predictable lures than L1 speakers. This aspect of our findings corroborates and extends previous research using the DRM paradigm showing that false remembering of semantically related information is reduced in L2 compared to L1 processing (Anastasi et al., Reference Anastasi, Rhodes, Marquez and Velino2005, Experiments 3 and 4; Sahlin et al., Reference Sahlin, Harding and Seamon2005; Howe et al., Reference Howe, Wimmer, Gagnon and Plumpton2009; Arndt and Beato, Reference Arndt and Beato2017; Beato and Arndt, Reference Beato and Arndt2021). In contrast, this result is not in line with a previous investigation from the prediction literature (Foucart et al., Reference Foucart, Ruiz-Tada and Costa2016), which had suggested L1-equivalent rates of false remembering in French-Spanish bilinguals (albeit without direct statistical comparison between L1 and L2 speakers). Importantly, Foucart et al. (Reference Foucart, Ruiz-Tada and Costa2016) did not consider potential differences in response bias in L2 vs L1 speakers, a factor that our study showed to play an important role.
Notably, our results also indicated that L2 speakers showed a qualitatively different pattern than L1 speakers when judging the contents of their memory. Even though both groups were more likely to false alarms to predictable lures with high compared to low subjective memory confidence, this increase was attenuated in L2 speakers, suggesting that L2 speakers experienced less detailed remembering when presented with predictable lures. Hence, L2 speakers showed evidence of a reduced false memory effect for predictable lures. While this aspect of our findings does not help adjudicate between the two L2-processing accounts, that is, reduced-activation and bilingual-advantage, we note that the totality of our findings is more in line with the reduced-activation account. Specifically, we found that true memory rates during word recognition were also reduced among L2 speakers. The most parsimonious explanation for these findings attributes differences between L1 and L2 processing to how strongly these groups activate lexical-semantic information during sentence reading. L2 speakers not only show reduced rates of false memory, indicating less consistent use of sentence context to predict or pre-activate lexical representationsFootnote 9 , but they may also show more diffuse and less distinct processing of words more generally. This interpretation is consistent with a large body of research on predictive processing in L2 which has consistently demonstrated that L2 speakers show reduced or slowed-down prediction effects. The reduced-activation interpretation of false remembering is also consistent with a common interpretation of reductions in false remembering in an L2 within the DRM paradigm. Previous studies in this line of research have hypothesized that L2 speakers’ lexical representations may be less robust and less strongly connected to semantic representations in memory (e.g., Beato & Arndt, 2021; Suarez & Beato, Reference Suarez and Beato2021), which in turn lowers the likelihood that L2 speakers will automatically activate semantically associated information.
In sum, the totality of results found in the present study are more in line with the reduced-activation interpretation of the false memory effect, and not with the bilingual-advantage account (also see Digermenci et al., Reference Degirmenci, Grossmann, Meyer and Teichmann2022; Gunnerud et al., Reference Gunnerud, ten Braak, Reikerås, Donolato and Melby-Lervåg2020; Lehtonen et al., Reference Lehtonen, Soveri, Laine, Järvenpää, De Bruin and Antfolk2018). Specifically, we found that L2 speakers not only show a reduced false memory effect but also a reduced true memory effect—that is, reduced effects of correct recognition memory, when corrected for response bias. Taken together, these findings could indicate that lexical entries in the L2 lexicon may be less strongly connected to semantic representations in memory (e.g., Kroll & Stewart, Reference Kroll and Stewart1994). This, in turn, may lead to less distinct (i.e., semantically enriched) encoding of presented information and reduce the likelihood of L2 speakers to automatically activate predictable or semantically associated information during language processing.
Conclusion
Previous studies with native speakers showed that predictable but not presented words linger in memory and elicit false remembering. The present study extended this research to L2 speakers of German and found evidence for a reduced false memory effect in L2 compared to L1 speakers. L2 speakers were also less likely to falsely remember predictable lures with high subjective memory confidence, suggesting that predictable words elicited less vivid and detailed remembering in L2 speakers. Notably, L2 speakers also showed reduced levels of accurate remembering for previously presented, “old” words. Our results thus support theories arguing that lexical representations in L2 are less entrenched and less interconnected with semantic information in long-term memory (e.g., Kroll & Stewart, Reference Kroll and Stewart1994), leading to less distinct (i.e., semantically enriched) lexical activation during L2 processing, and consequently to reduced (true) memory for encountered words, as well as reduced (false) memory for words that were predictable but not encountered.
Replication package
All data files, materials, and analysis scripts can be found on this paper’s project page using the link https://osf.io/e56k4/.
Financial support
This work was funded by the Deutsche Forschungsgesellschaft DFG, German Research Foundation—Project-ID 232722074—SFB 1102, project A5.
Competing interests
Parts of the results reported in this paper were presented at the ISBPAC 2024 conference in Swansea, UK. The authors declare no competing interests.
 
 





