There is currently great interest in whether the same mechanisms that underpin language processing also drive language learning. This interest stems, in part, from a desire to account for language learning in the absence of any kind of predetermined grammatical hard-wiring. That is, accounting for both learning phenomena and processing phenomena within the same model would achieve a desirable theoretical parsimony (O’Grady, Reference O’Grady2005). In addition, investigating the nature and role of processing in second language (L2) acquisition potentially offers a way to shed light on the ways in which L2 acquisition may differ from first language (L1) acquisition.Footnote 1
In the field of second language acquisition (SLA), it has been suggested that prediction has a role in language learning and, specifically, for acquiring complex contingencies, thought to be among the hardest phenomena to learn. For example, prediction during sentence comprehension may be useful for L2 learners by allowing for hypothesis testing, which could help them retreat from overgeneralization when their predictions (their hypotheses) are disconfirmed (Phillips & Ehrenhofer, Reference Phillips and Ehrenhofer2015). The idea that prediction may serve as a learning tool is supported by computational modeling showing that data from L1 acquisition and processing can be reproduced by recurrent neural networks that use prediction and error-based learning (Chang et al., Reference Chang, Dell and Bock2006). However, there is evidence that L2 speakers often lag behind L1 speakers in their ability to predict upcoming input, as shown by data from both eye-tracking and EEG (electroencephalography) studies. Indeed, it has been suggested that L2 speakers, in particular, may be affected by a reduced ability to generate expectations (Grüter et al., Reference Grüter, Rohde and Schafer2014, Reference Grüter, Rohde and Schafer2017). This possibility sparked a concern that limitations in L2 learners’ ability to predict, relative to L1 speakers, may prevent them from using a prediction-based learning mechanism (e.g., Kaan et al., Reference Kaan, Futch, Fuertes, Mujcinovic and de la Fuente2019; Phillips & Ehrenhofer, Reference Phillips and Ehrenhofer2015).Footnote 2 Therefore, understanding exactly what is meant by prediction, how L2 learners may differ from native speakers in their ability to predict, and what can be learned through prediction, will be necessary to address these concerns. The aim of this article is to provide an overview of research into prediction in L1 and L2 processing and learning, to offer a frame of reference for those interested in the role prediction may play in L2 acquisition in particular.
Our review is structured as follows: In the first section, we define prediction and describe different prediction mechanisms that have been identified in the literature, showing that it can be conceived of as a continuum going from simpler to more complex instances of prediction. We present the empirical evidence for the different types of prediction in L1 and L2 speakers and highlight the factors that can constrain prediction in both groups. In the second section, we introduce the theoretical debate on the role of prediction in language learning—both L1 and L2. We introduce computational models of L1 processing that show that language acquisition and priming phenomena can be explained by error-based learning. We then review the available empirical evidence for a learning mechanism based on prediction error, from studies on priming and adaptation in both L1 and L2, and conclude by highlighting potential limitations of this mechanism.
Evidence for prediction in L1 and L2 speakers
Defining “prediction”
We need to make here a preliminary distinction between different conceptions of prediction: prediction as the formulation of expectations during sentence comprehension (as in “preprocessing”; DeLong et al., Reference DeLong, Troyer and Kutas2014), which is the focus of this review, and a more general sense of prediction as inference generation (e.g., using contextual cues to assign referents to ambiguous pronouns). Grüter et al. (Reference Grüter, Rohde and Schafer2017) note that prediction in the narrower sense of “preprocessing” is still not often investigated by SLA researchers. They add that:
The term “prediction” has been used in the SLA literature, primarily in the context of L2 reading, to refer to inference generation, or guessing (e.g., in fill-the-gap tasks), more generally (e.g., McLaughlin, Reference McLaughlin, Goldman and Trueba1987). This usage does not specify the temporal aspect of this process, i.e., when such inference generation takes place during the incremental construction of meaning as we read/listen. As such, [this usage of the term prediction] is compatible with both (retroactive) information integration and prediction in terms of (proactive) linguistic pre-processing.
(Grüter et al., Reference Grüter, Rohde and Schafer2017, footnote 6, p. 224)One example of this generic, temporally nonspecific usage of “prediction” in SLA is the literature on statistical preemption (Ambridge & Brandt, Reference Ambridge and Brandt2013; Boyd & Goldberg, Reference Boyd and Goldberg2011; Foraker et al., Reference Foraker, Regier, Khetarpal, Perfors and Tenenbaum2009; Robenalt & Goldberg, Reference Robenalt and Goldberg2016), a proposed learning mechanism that is driven by associative learning: Every time an expected outcome is not encountered after a given cue, the strength of its association with that cue diminishes (Rescorla & Wagner, Reference Rescorla and Wagner1972). This line of research has to date examined inference generation through offline tasks such as acceptability judgments (e.g., Robenalt & Goldberg, Reference Robenalt and Goldberg2016) to determine to what extent learners take into account potential alternatives to structures they encounter. While it cannot be ruled out that prediction during processing also plays a role in determining acceptability, these kinds of acceptability tasks also capture the result of processes (such as retroactive information integration) that are not part of prediction in the narrower sense (i.e., linguistic preprocessing), and thus this line of research is not part of the scope of the current review.
Another field of research on prediction that is temporally nonspecific investigates the effect of expectation violation on the formation of new declarative memories. This line of research does not originate in SLA research, but it is relevant to language learning, as well as learning, more generally: Findings have shown that novel associations that violate established patterns are remembered better than those that do not (Brod et al., Reference Brod, Hasselhorn and Bunge2018; De Loof et al., Reference De Loof, Ergo, Naert, Janssens, Talsma, Van Opstal and Verguts2018; Greve et al., Reference Greve, Cooper, Kaula, Anderson and Henson2017; Greve et al., Reference Greve, Cooper, Tibon and Henson2019) and that generating incorrect guesses followed by corrective feedback can, under some circumstances, lead to better learning than simply being exposed to the correct answer (Potts et al., Reference Potts, Davies and Shanks2019; Potts & Shanks, Reference Potts and Shanks2014). While these effects have been observed for a variety of stimuli, such as conceptual knowledge (Brod et al., Reference Brod, Hasselhorn and Bunge2018) and arbitrary picture-word mappings (Greve et al., Reference Greve, Cooper, Kaula, Anderson and Henson2017), it is its role in the acquisition of vocabulary that may be most relevant to SLA. Expectation violation has been shown to aid the acquisition of L1 vocabulary in young children (Stahl & Feigenson, Reference Stahl and Feigenson2017), as well as Dutch–Swahili translation word pairs in adult Dutch L1 speakers (De Loof et al., Reference De Loof, Ergo, Naert, Janssens, Talsma, Van Opstal and Verguts2018; see also Gambi [cited in Kaan & Grüter, Reference Kaan, Grüter, Kaan and Grüter2021] for more recent work on L2 vocabulary learning). While it is of relevance to SLA, however, this particular conceptualization of prediction is, too, outside the scope of our review, which is on prediction as linguistic “preprocessing,” or the incremental formulation of expectations during sentence comprehension. We will now turn to prediction as linguistic “preprocessing” and the ways in which it can be conceptualized.
In the literature on prediction during language comprehension, there is a great amount of variation in terms of approaches and terminology used, with different authors focusing on different aspects of the phenomenon (see Kuperberg & Jaeger, Reference Kuperberg and Jaeger2016 for a review). Pickering and colleagues (Pickering & Gambi, Reference Pickering and Gambi2018; Pickering & Garrod, Reference Pickering and Garrod2013), distinguish between two types of prediction, prediction-by-association and prediction-by-production. The first mechanism, prediction-by-association, is driven by basic associative mechanisms like spreading activation, and may constitute the stage prior to prediction-by-production. A feature specific to this account is that in the more complex route (prediction-by-production) preactivation involves forward speech planning: comprehenders use the language production system to formulate predictions about upcoming input. The prediction-by-production route is considered to be very accurate and, therefore, to aid processing, but it is thought to be an optional mechanism; for example, it is not always available, especially in L2, or populations with cognitive limitations (Pickering & Gambi, Reference Pickering and Gambi2018). By contrast, prediction-by-association is less precise and less effective than prediction-by-production; however, it is an integral part of comprehension and, being automatic, it does not take up cognitive resources, which is why it should remain unimpaired even in comprehenders with limited resources (Pickering & Gambi, Reference Pickering and Gambi2018, p. 1030).
Similar to Pickering and colleagues, other authors also distinguish between two broad types of “preprocessing” prediction: a simple, automatic kind, generally limited to the semantic domain, and a more complex, resource-intensive type, involving prediction of specific linguistic features. Kuperberg and Jaeger (Reference Kuperberg and Jaeger2016) contrast a basic sense of prediction, as expectations based on discourse context, with “predictive activation” of low-level (e.g., phonological, morphological) features. In this latter case, comprehenders can “predictively preactivate” low-level representations (e.g., phonological form) based on high-level inferences, before encountering them in the input, rather than just making a high-level event hypothesis, as happens in the more basic, simple case of prediction. Another potential dual-route account of prediction is also offered by Huettig (Reference Huettig2015), modeled on Kahneman’s (Reference Kahneman2011) dual-system model of reasoning: a “dumb” route (System 1), based on simple associative mechanisms, which is contrasted with a “smart” route (System 2), linked to more effortful active reasoning.
The conceptualizations of prediction we have just seen could all be seen as, essentially, dichotomous distinctions; however, the empirical evidence (which will be reviewed in the following text) suggests a more graded process, which can vary in complexity and specificity depending on a variety of factors including context, language proficiency, and the nature of the task. In light of this complexity, Huettig (Reference Huettig2015) proposes a multiple-mechanisms account of prediction, called PACS (production-, association-, combinatorial-, simulation-based prediction). According to this account, prediction can be driven by diverse mechanisms. One is basic association, which is often for semantic information, but may also involve other types of representation (e.g., phonological); another is production, where prediction happens through covert speech production. There is also a combinatorial route—where meaning is built by drawing on multiple linguistic constraints, and an event simulation route, where mental imagery may be used to preactivate linguistic representations. Crucially, these four mechanisms interact with each other: For instance, basic association may provide input that then feeds into the combinatorial route (Huettig, Reference Huettig2015).
To further illustrate the graded nature of prediction processes and its context, task, and individual specificity, we now review evidence for prediction in L1 and L2 speakers in growing order of complexity: from basic sensitivity toward word predictability based on semantic context to preactivation of specific morphological and phonological features, but all aligning with our broad working definition of prediction for the purposes of this review as incremental formulation of expectations during sentence comprehension. In the subsequent section, we then move on to examine the factors that can constrain the extent of predictive processing in both L1 and L2 speakers.
Types of prediction: From basic expectations to preactivation of specific features
A simple type of prediction: Sensitivity to word predictability
The predictability of a word from context is known to affect the way it is processed during comprehension. Words that are predictable from their semantic context are easier to process: L1 speakers spend less time fixating on them during reading (Balota et al., Reference Balota, Pollatsek and Rayner1985; Demberg & Keller, Reference Demberg and Keller2008; Ehrlich & Rayner, Reference Ehrlich and Rayner1981; McDonald & Shillcock, Reference McDonald and Shillcock2003), and are quicker to react to them in behavioral tasks such as lexical decisions (Schwanenflugel & LaCount, Reference Schwanenflugel and LaCount1988; Schwanenflugel & White, Reference Schwanenflugel and White1991; Stanovich & West, Reference Stanovich and West1983) and naming tasks (Forster, Reference Forster1981; Stanovich & West, Reference Stanovich and West1981, Reference Stanovich and West1983; Traxler & Foss, Reference Traxler and Foss2000). Using EEG, words that are highly predictable from context elicit a reduced N400Footnote 3 relative to unexpected words (Kutas & Hillyard, Reference Kutas and Hillyard1980, Reference Kutas and Hillyard1984), a finding that has been widely replicated (see Kutas & Federmeier, Reference Kutas and Federmeier2011 for a review). The size of the N400 elicited by an unexpected sentence-final word is inversely proportional to the cloze probability of the word, that is, how likely the word is to occur at the end of that sentence (DeLong et al., Reference DeLong, Groppe, Urbach and Kutas2012; DeLong et al., Reference DeLong, Urbach and Kutas2005; Luke & Christianson, Reference Luke and Christianson2016), although it is not affected by the number of potential alternative completions (Kuperberg et al., Reference Kuperberg, Brothers and Wlotko2020; Kutas & Hillyard, Reference Kutas and Hillyard1984). The N400 reduction to predictable words is found in L2 speakers, too (Martin et al., Reference Martin, Thierry, Kuipers, Boutonnet, Foucart and Costa2013). Finally, both L1 and L2 speakers can also exhibit sensitivity to word predictability based on structural cues, not just semantic ones, as evidenced by data from EEG (Kaan et al., Reference Kaan, Kirkham and Wijnen2016) and self-paced reading (Leal et al., Reference Leal, Slabakova and Farmer2017).
In the L1 processing literature, there was initial resistance to accept evidence of sensitivity to word predictability as evidence of “prediction,” on the grounds that it could also simply be interpreted as an effect of “integration,” that is, the ease with which the word’s meaning could be accessed or combined with that of preceding words (see Kutas & Federmeier, Reference Kutas and Federmeier2011 and Van Petten & Luka, Reference Van Petten and Luka2012, for reviews). Indeed, it is very difficult to distinguish between these two accounts (prediction and integration) experimentally. For example, to explain the observation that the N400 in response to predictable words is smaller than that to less predictable words, one could argue that it is because comprehenders were expecting to encounter the specific, highly predictable word (thus, this observation is usable as evidence of prediction). But it is also possible that comprehenders were not expecting anything in particular, and that upon hearing a highly predictable word, it was simply easier for them to process due to its closer semantic fit with the preceding context (thus, this observation could be usable as evidence of integration).
In the body of evidence we have seen so far, a clear-cut distinction between prediction and integration may not always be found; it is now generally accepted that the N400 indexes a cascade of processes that happen both before, during and after word recognition (Nieuwland et al., Reference Nieuwland, Barr, Bartolozzi, Busch-Moreno, Darley, Donaldson, Ferguson, Fu, Heyselaar, Huettig, Matthew Husband, Ito, Kazanina, Kogan, Kohút, Kulakova, Mézière, Politzer-Ahles, Rousselet and Von Grebmer Zu Wolfsthurn2020). However, it is clear that basic sensitivity to word predictability, based on frequency information and often drawing on associative mechanisms, appears to be a robust feature of language processing, both in L1 and L2. Comprehenders use sentential context, whether highly constraining or not, to update their expectations about the likelihood of potential continuations, in a probabilistic fashion (i.e., where multiple possibilities have varying likelihoods). These expectations then affect processing of upcoming input, depending on how expected each was. However, in the next section we will see evidence for how, when context allows it, comprehenders can also make use of these expectations ahead of encountering input to narrow down the range of possible continuations, thus constituting a more complex type of prediction.
Anticipating content: Integrating cues with context
In experimental settings, at least, it has been shown that comprehenders can combine their expectations with context to identify the most likely referent of an upcoming word from a limited set of candidates. Studies using the visual world eye-tracking paradigm have shown that comprehenders use cues such as verb selectional restrictions to form expectations for upcoming content as the sentence unfolds, and identify likely referents from a set of options based on how well they fit these expectations. For instance, when hearing a verb such as eat in “The boy will eat…,” L1 speakers already restrict the range of potential expected completions to items that can be the object of eat; if the visual scene only contains one item that fits that category (e.g., a cake), they will automatically look at the picture of the cake even before hearing the word cake (Altmann & Kamide, Reference Altmann and Kamide1999). This shows a more active kind of anticipation, that goes beyond simple sensitivity to word likelihood: Rather than responding to a word based on how likely it was, comprehenders used their expectations to narrow down the range of potential referents for an upcoming word, ahead of encountering the word, by picking out the most likely candidate from the ones available.
Again, these effects have been observed in L2 speakers, too, although not to the same extent as in L1 speakers. High-proficiency L2 English speakers behaved similarly to L1 speakers when the object of a sentence could be predicted based on the verb’s meaning (Ito, Corley, & Pickering, Reference Ito, Corley and Pickering2018) or from the situational context more generally (Ito, Pickering, & Corley, Reference Ito, Pickering and Corley2018), giving anticipatory looks to suitable objects (from a constrained set) to the same extent as L1 speakers. However, there is also evidence that these effects are slower and weaker in L2 speakers (Dijkgraaf et al., Reference Dijkgraaf, Hartsuiker and Duyck2019). They are also modulated by proficiency: Lower-skilled L2 speakers are more likely to fixate (give a prolonged gaze on) less relevant themes in a visual world paradigm (e.g., “cat” when listening to the sentence “The pirate will chase … [the ship]”) compared to higher-skilled bilinguals (Peters et al., Reference Peters, Grüter and Borovsky2018).
The same visual world paradigm has been used in different languages to investigate features such as gender marking (Lew-Williams & Fernald, Reference Lew-Williams and Fernald2007) and case marking (Kamide et al., Reference Kamide, Altmann and Haywood2003), showing that L1 speakers can also use morphological cues to select possible referents from the items in a visual scene. L2 learners have often failed to show the same ability to anticipate content, whether on the basis of gender marking (Grüter et al., Reference Grüter, Lew-Williams and Fernald2012; Lew-Williams & Fernald, Reference Lew-Williams and Fernald2010), morphosyntactic (Andringa & Curcic, Reference Andringa and Curcic2015; Hopp, Reference Hopp2015), or morphological information (Mitsugi & MacWhinney, Reference Mitsugi and MacWhinney2016). However, there are also instances of L2 speakers performing similarly to L1 speakers in studies using the visual world eye-tracking paradigm with morphological cues (Dussias et al., Reference Dussias, Valdés Kroff, Guzzardo Tamargo and Gerfen2013; Hopp & Lemmerth, Reference Hopp and Lemmerth2018). High-proficiency L1 Russian-L2 German speakers, too, showed native speaker-like prediction using determiner gender marking, even though Russian does not have gender-marked prenominal articles (Hopp & Lemmerth, Reference Hopp and Lemmerth2018). Dussias et al. (Reference Dussias, Valdés Kroff, Guzzardo Tamargo and Gerfen2013) used the paradigm employed by Lew-Williams & Fernald (Reference Lew-Williams and Fernald2007) to investigate predictive processing of gender marking, extending it to L2 speakers. They showed that highly proficient L1 English (a –gender language) and L1 Italian (a +gender language) speakers of L2 Spanish could use gender agreement in prenominal determiners, as done by the L1 Spanish speakers in Lew-Williams and Fernald’s (Reference Lew-Williams and Fernald2007) study, with all groups of participants giving anticipatory looks to appropriate objects in the visual scene. By contrast, low-proficiency L1 English speakers did not show nativelike prediction (however, it should be noted that the L1 Italian group, which had low proficiency, only showed anticipatory looking for feminine determiners).
The evidence we considered earlier shows that both L1 and L2 speakers are sensitive to word predictability, as evidenced by their processing of more or less predictable words. We have now seen that L1 speakers (and sometimes, highly proficient L2 speakers too) can also use their expectations for upcoming content, based on cues in the input (which may be semantic, such as verb selectional restrictions, or grammatical, such as morphological gender marking) to select suitable referents from those made available by context. However, even preferential looking to suitable targets in a visual world eye-tracking study does not necessarily imply preactivation of a specific lexical item or feature: The visual world paradigm provides the item to the participants, who identify it from a set of options as that which most closely matches their expectations. A desire to establish conclusive evidence of prediction in the strictest sense (rather than integration) has informed more complex experimental work using EEG, aimed at showing that preactivation of specific features is possible, in the appropriate circumstances.
Preactivation of specific features: Evidence from EEG
A series of EEG studies has examined prediction by manipulating the morphological and phonological dependencies between highly predictable words and prior elements in the sentence, such as adjectives and determiners (DeLong, Reference DeLong2009; DeLong et al., Reference DeLong, Urbach and Kutas2005; Otten & Van Berkum, Reference Otten and Van Berkum2008; Szewczyk & Schriefers, Reference Szewczyk and Schriefers2013; Van Berkum et al., Reference Van Berkum, Brown, Zwitserlood, Kooijman and Hagoort2005; Wicha et al., Reference Wicha, Moreno and Kutas2004). For example, DeLong and colleagues (DeLong, Reference DeLong2009; DeLong et al., Reference DeLong, Urbach and Kutas2005) investigated whether specific words were being predicted by their participants by manipulating the phonological alternation of the English singular indefinite article (a/an). Participants read sentences such as “The day was breezy so the boy went to fly…,” which is highly constraining for the completion (a) kite. At this point, encountering the an form of the determiner (potentially leading to a less expected noun, e.g., an airplane) elicited a significantly larger N400 compared to the form a, compatible with the more likely kite. This suggests that the expectation for kite was being used to preactivate a specific linguistic representation including its phonological form (the initial consonant), in turn generating an expectation for a instead of an. (However, see Nieuwland et al., Reference Nieuwland, Politzer-Ahles, Heyselaar, Segaert, Darley, Kazanina, Von Grebmer Zu Wolfsthurn, Bartolozzi, Kogan, Ito, Mézière, Barr, Rousselet, Ferguson, Busch-Moreno, Fu, Tuomainen, Kulakova, Husband and Huettig2018 for a failure to replicate this effect in L1 speakers.) The size of the N400 effect on the article was graded based on the cloze probability of the target noun (i.e., how likely subjects were to expect it as the next word, based on an offline sentence completion task done by native speakers), suggesting that participants were making probabilistic predictions of specific words. Similar results using EEG have been obtained by manipulating gender agreement between nouns and determiners in Spanish (Wicha et al., Reference Wicha, Moreno and Kutas2004) and Dutch (Van Berkum et al., Reference Van Berkum, Brown, Zwitserlood, Kooijman and Hagoort2005), as well as animacy marking agreement between nouns and adjectives in Polish (Szewczyk & Schriefers, Reference Szewczyk and Schriefers2013).
Compared to the simpler types of prediction seen previously, there seems to be a greater gap between L1 and L2 speakers when it comes to preactivating specific features. Martin et al. (Reference Martin, Thierry, Kuipers, Boutonnet, Foucart and Costa2013) used the EEG paradigm from DeLong et al. (Reference DeLong, Urbach and Kutas2005), which required participants to preactivate phonological forms (a/an kite), but they failed to replicate in L2 English speakers the effect observed by DeLong et al. for L1 speakers on the determiner. However, Martin et al. still found a basic effect of noun predictability on the noun: Replacing a highly predictable noun with a less predictable one elicited an increased N400, as it did in L1 speakers. This means Martin et al.’s participants did have probabilistic expectations about possible upcoming nouns, even though they were not building on these to predict the appropriate determiner, as did L1 speakers in DeLong et al.’s study. Furthermore, subsequent research suggests that L2 speakers can preactivate specific features in a manner similar to native speakers, at least if those features exist in their L1. Foucart et al. (Reference Foucart, Martin, Moreno and Costa2014) exposed native Spanish speakers and two groups of L2 Romance bilinguals (French–Spanish late bilinguals, and Spanish–Catalan early bilinguals) to Spanish sentences with highly predictable final nouns, manipulating the gender of the preceding determiner following Wicha et al.’s (Reference Wicha, Moreno and Kutas2004) design. Unlike the English a/an alternation, which the L2 participants in the Martin et al. (Reference Martin, Thierry, Kuipers, Boutonnet, Foucart and Costa2013) study did not have in their L1 (Spanish), gender agreement between determiners and nouns is a common feature of Romance languages, meaning that both the bilingual groups in Foucart et al. (Reference Foucart, Martin, Moreno and Costa2014) would be familiar with this feature from their L1. When morphological gender marking on the determiner was incongruent with the gender of the expected noun, all three groups—Spanish monolinguals and the two bilingual groups—exhibited an increased N400 response, suggesting that L2 speakers were preactivating gender features in a way similar to L1 speakers. This kind of study arguably provides stronger evidence for the preactivation of specific forms than visual world eye-tracking studies because participants were not provided with the possible completions.
The studies we have reviewed here showed that L1 comprehenders (and sometimes, L2 comprehenders too) do not just form expectations based on context, and integrate them with other information, but can also preactivate the phonological and morphological features of the most likely completions, and use those to form further expectations about other elements in the sentence. However, we have also seen that there is variation due to factors (such as L1-L2 similarity for L2 speakers) which can constrain the extent to which comprehenders engage in prediction. In the next section, we review in more detail the factors which constitute the main limitations to predictive mechanisms, in both L1 and L2 speakers.
Limitations to prediction during sentence comprehension in both L1 and L2 speakers
It has been suggested that L2 speakers may suffer from a Reduced Ability to Generate Expectations, or RAGE (Grüter et al., Reference Grüter, Rohde and Schafer2014, Reference Grüter, Rohde and Schafer2017). Grüter et al. (Reference Grüter, Rohde and Schafer2014, Reference Grüter, Rohde and Schafer2017) showed that L2 English speakers (L1 Japanese and L1 Korean) do not take verb aspect information into consideration when formulating expectations about discourse context, as L1 English speakers do, despite the fact that, in both Japanese and Korean, verb aspect has the same discourse implications as it does in English. Grüter et al. argue that processing limitations make it difficult for L2 speakers to integrate cues to formulate predictions.
However, as the authors point out, the distinction between prediction in the L1 and L2 is far from a monolithic one. L2 speakers’ prediction abilities vary depending on proficiency, and L1 speakers can also be limited in their prediction abilities, prompting several authors to conclude that the difference between L1 and L2 in prediction is probably a quantitative, rather than a qualitative, one (Grüter et al., Reference Grüter, Rohde and Schafer2017; Kaan, Reference Kaan2014; Phillips & Ehrenhofer, Reference Phillips and Ehrenhofer2015). Therefore, rather than asking whether or not L2 speakers can predict, we can look at factors which affect prediction in both L1 and L2 speakers, and which can tend to affect L2 speakers in a specific way. The prediction of specific linguistic input based on expectations is constrained by a number of factors, both relating to the input (linguistic constraints and the context) and to the comprehender (cognitive abilities, processing speed, and proficiency). Here, we highlight just three of these constraints on prediction, which are not unique to L2 speakers, but can affect processing in both L1 and L2: cognitive abilities, proficiency, and task design. This is not meant to be a comprehensive list of the factors affecting prediction in L2 speakers (see Kaan, Reference Kaan2014 for a review). Rather, it shows how these factors can vary from being intrinsic to speakers (cognitive abilities) to completely extrinsic (task design), highlighting the complexity of mechanisms involved.
First, predictive mechanisms, other than the most basic sensitivity to word cloze probability, are cognitively demanding and are not consistently observed, even in L1 speakers: They are impaired in elderly L1 speakers (DeLong et al., Reference DeLong, Groppe, Urbach and Kutas2012; see also Huettig, Reference Huettig2015) and low-literacy populations (Mishra et al., Reference Mishra, Singh, Pandey and Huettig2012). Predictive looks in visual world paradigm eye-tracking studies correlate with working memory capacity and processing speed (Huettig & Janse, Reference Huettig and Janse2016) and are delayed under memory load (Ito, Corley, & Pickering, Reference Ito, Corley and Pickering2018). In a visual world eye-tracking study using Russian, Sekerina (Reference Sekerina2015) showed a gradient in the speed with which different Russian-speaking populations (L1 adult, heritage speaker adult, L1 child) showed preferential looking toward the upcoming noun based on gender information from the preceding adjective. Prediction in L1 speakers can also vary in speed depending on the cues used to formulate expectations (Chow et al., Reference Chow, Momma, Smith, Lau and Phillips2016, Reference Chow, Lau, Wang and Phillips2018). L2 speakers may be particularly affected by time constraints as they tend to be slower in their processing compared to L1 speakers (Frenck-Mestre, Reference Frenck-Mestre, Herrida and Altarriba2002; Frenck-Mestre et al., Reference Frenck-Mestre, German, Foucart, Heredia and Altarriba2014; Hahne, Reference Hahne2001), and so predictive behavior may not be observable.
Second, anticipation of linguistic material is heavily dependent on proficiency, both in the sense of correct knowledge representations, and in the procedural sense: Using morphological dependencies to generate expectations (as in visual world paradigms) or to probe for them (EEG studies) relies on participants both having a knowledge of these dependencies and being able to deploy it rapidly during processing. Such automatized knowledge may not be available to all, and perhaps only to the most advanced L2 learners for some linguistic dependencies.Footnote 4 In fact, as reviewed in the preceding text, while L2 speakers often do not predict to the same extent as L1 speakers, several studies have replicated prediction findings from L1 using high-proficiency L2 speakers, both using eye-tracking and EEG (Dussias et al., Reference Dussias, Valdés Kroff, Guzzardo Tamargo and Gerfen2013; Foucart et al., Reference Foucart, Martin, Moreno and Costa2014; Ito, Corley, & Pickering, Reference Ito, Corley and Pickering2018; Ito, Pickering, & Corley, Reference Ito, Pickering and Corley2018). Effects of proficiency have been observed in L1, too, both with regard to proficiency in the sense of knowledge representations (e.g., vocabulary size) and in the procedural sense (e.g., verbal fluency). Speed of anticipatory looking in a visual world eye-tracking paradigm correlated positively with vocabulary size in both adults and children (Borovsky et al., Reference Borovsky, Elman and Fernald2012), and with word reading skills in children (Mani & Huettig, Reference Mani and Huettig2014). In adults, anticipatory looking based on semantic cues was found to correlate with verbal fluency (Hintz et al., Reference Hintz, Meyer and Huettig2014), which is compatible with the idea of a prediction-by-production route (Pickering & Gambi, Reference Pickering and Gambi2018; Pickering & Garrod, Reference Pickering and Garrod2013). Following this account, reduced production skills may explain differences in prediction performance between L2 and L1 speakers, too. Grüter et al. (Reference Grüter, Lew-Williams and Fernald2012) found that L2 speakers who were unable to use gender cues to anticipate nouns also made errors in gender assignment on determiners in elicited production. Similarly, Hopp (Reference Hopp2013) observed that English learners of German showed nativelike anticipatory use of gender information in a visual world paradigm only if they were able to accurately and consistently produce the right gender assignment for those nouns. However, high proficiency in L2 speakers does not necessarily lead to native speaker-like prediction: even highly proficient L2 speakers may fail to display fully native speaker-like prediction (Dijkgraaf et al., Reference Dijkgraaf, Hartsuiker and Duyck2019; Kaan et al., Reference Kaan, Kirkham and Wijnen2016) and studies investigating the relation between prediction and L2 proficiency have not found a direct correlation between the two (e.g., Ito, Corley, & Pickering, Reference Ito, Corley and Pickering2018; Kim & Grüter, Reference Kim and Grüter2021; see Kaan & Grüter, Reference Kaan, Grüter, Kaan and Grüter2021 for a discussion).
Third, the nature of the task used can have a significant effect on the emergence and experimental detection of predictive processing, in multiple ways. On the one hand, prediction studies demonstrating prediction during language processing generally employ highly constraining contexts, which are rare in natural language use (as noted by Luke & Christianson, Reference Luke and Christianson2016). In fact, due to the rarity of highly constraining contexts in everyday language use, the relevance of predictive processes has been questioned, both in relation to language comprehension (Huettig & Mani, Reference Huettig and Mani2016) and L1 acquisition (Rabagliati et al., Reference Rabagliati, Gambi and Pickering2016). On the other hand, even given a context which encourages predictive processing, prediction may not be detected if not enough time is available. Huettig and Guerra (Reference Huettig and Guerra2015) found that anticipatory looking by L1 Dutch speakers based on gendered determiner cues was observed if the visual targets appeared on screen four seconds before the spoken sentence, but not if they only appeared one second before. Trenkic et al. (Reference Trenkic, Mirkovic and Altmann2014) found that L2 English speakers performed similarly to (though more slowly than) native speakers when processing English determiners in a visual world eye-tracking paradigm, even though they didn’t have the equivalent feature in their L1 (Mandarin). In fact, neither native speakers nor the L2 speakers showed evidence of prediction, as preferential looking emerged after the onset of the noun following the determiner in both groups (rather than prior to the noun); however, even this effect was slower to emerge in the L2 group, relative to native speakers. These findings illustrate the critical role of timing in detecting prediction. The reason why the data from Trenkic et al. (Reference Trenkic, Mirkovic and Altmann2014) did not count as evidence of “prediction” is that participants did not begin looking at potential referents before the onset of the noun; however, as we have seen, the speed with which preferential looking emerges is affected by several factors such as task timing and memory load, even in L1 speakers. Therefore, it is possible that, if participants had more time, preferential looking would have been observable even without needing to hear the noun first. The same applies more generally to L2 speakers when they fail to show predictive behavior in eye-tracking experiments. When preferential looking emerges ahead of the onset of the target for L1 speakers but not for L2 speakers it may simply reflect slower processing in L2 speakers (in a context that did not allow for detection at longer time intervals), rather than a qualitative difference between the groups.
In reality, all the factors described in the preceding text—cognitive abilities, proficiency, and task design—are likely to interact with each other. For instance, whether a task will show evidence of prediction depends, among other things, on whether it allows enough time for prediction to emerge and be observed in the particular experimental paradigm being used; in turn, what constitutes “enough time” will be affected by individual differences such as proficiency, verbal abilities, and working memory, for both L1 and L2 speakers. These limitations are relevant to our core question about the extent to which language learning, and L2 learning in particular, may draw on prediction as a learning mechanism, to which we now turn our attention.
Learning from error: Prediction as a learning mechanism
Having examined the extent to which language processing during sentence comprehension involves prediction, and the factors constraining it in both L1 and L2 speakers, we now turn to the question of whether prediction can serve as a learning mechanism. First, we lay out different accounts of the potential role of prediction in SLA, and in language acquisition more generally. We then examine the evidence for error-based learning, starting from the computational modeling that has inspired proposals on the role of prediction in SLA, and also covering empirical evidence from priming and adaptation in both L1 and L2.
What role may prediction play in L1 and L2 acquisition?
While there is abundant evidence that predictive mechanisms operate in language comprehension, the extent to which they may also contribute to language acquisition is debated. While some argue that prediction drives L1 acquisition (Chang et al., Reference Chang, Kidd and Rowland2013; Rowland et al., Reference Rowland, Chang, Ambridge, Pine and Lieven2012), there is skepticism on the importance of prediction in this respect (Huettig, Reference Huettig2015; Huettig & Mani, Reference Huettig and Mani2016; Rabagliati et al., Reference Rabagliati, Gambi and Pickering2016). Enabling learning is one of the main functions that have been proposed for predictive processing (see Huettig, Reference Huettig2015 for a discussion). Specifically, it has been suggested that prediction and error-based learning are necessary for L1 acquisition, partly due to the score of studies on statistical learning showing that children use forward transitional probabilities (the likelihood of an element being followed by another) to acquire language (Saffran et al., Reference Saffran, Aslin and Newport1996). However, tracking these probabilities does not necessarily involve predictive processing; in fact, backward probabilities (the likelihood of an element being preceded by another) are also used by children (Pelucchi et al., Reference Pelucchi, Hay and Saffran2009). The fact that learning can occur without prediction, then, casts doubt on claims that prediction is absolutely necessary for language learning (Huettig, Reference Huettig2015). Overall, the empirical evidence on whether children use prediction for learning their L1 is mixed (Rabagliati et al., Reference Rabagliati, Gambi and Pickering2016). While it is not clear whether prediction during processing is a necessary or pervasive element of L1 acquisition, there is, however, certainly evidence that prediction can be a source of learning. Computational models using error-based learning, which rely on prediction, can model data from L1 syntactic acquisition (Chang et al., Reference Chang, Dell and Bock2006) and from priming studies in L1 and L2, supporting claims that error-based learning may be the mechanism underpinning these phenomena (Bock et al., Reference Bock, Dell, Chang and Onishi2007). This evidence is reviewed more fully in the following section.
Against this backdrop, it has been suggested that prediction may serve as a learning mechanism for certain aspects of SLA (Phillips & Ehrenhofer, Reference Phillips and Ehrenhofer2015). Specifically, this proposal aims to address the problem of overgeneralization, traditionally stressed by generativist approaches to language acquisition: How learners can learn to use rules productively yet avoid producing ungrammatical forms (e.g., “I goed” instead of “I went”), even though they have no direct evidence that such forms are not allowed in the language. In L1 acquisition, children often overgeneralize rules, but eventually converge on the target variety of their language (Pinker, Reference Pinker2009). According to Phillips and Ehrenhofer’s proposal, prediction may offer a way out of overgeneralization for L2 learners, especially when learning complex phenomena (e.g., those that require learners to integrate information from syntax and semantics), by providing the opportunity for hypothesis testing: The ability to make sophisticated predictions about upcoming input, using multiple cues, may allow learners to acquire complex contingencies and, crucially, for retreat from overgeneralization when these hypotheses are not confirmed (Phillips & Ehrenhofer, Reference Phillips and Ehrenhofer2015).
To exploit this mechanism, learners would need to rapidly integrate multiple cues as they process speech; thus, the fact that the proposed hypothesis-testing mechanism relies on processing speed and proficiency makes it an unlikely candidate for early L1 acquisition, as Phillips and Ehrenhofer (Reference Phillips and Ehrenhofer2015) acknowledge. In L2 learners, however, it may serve a useful function, if they can formulate the relevant prediction quickly enough and track the source of their predictions so that they may readjust their prediction based on that cue for the next time they experience it. To our knowledge, this proposal has not yet been investigated empirically. We will, however, look at the existing evidence for error-based learning. In the following sections, we first review computational models showing that data from L1 processing and acquisition is consistent with a learning mechanism driven by error-based learning that, in turn, requires prediction to occur. We then review empirical evidence from human language processing, which is compatible with the predictions made by these models, and which shows evidence of error-based learning during both L1 and L2 processing.
Insights from computational modeling of L1 processing and acquisition
Computational modeling has shown that certain aspects of language can be acquired through the same mechanisms that are used to process it (Elman, Reference Elman1990). The models in question use so-called neural networks, a particular type of computational model loosely inspired by brain architecture, which consists of units connected to each other in a network. As each unit is activated, it transmits a signal to the units to which it is connected. The connections between units are weighted, meaning that the extent to which one unit affects the next can be adjusted. Neural network models can be used for a variety of tasks, such as classifying data (e.g., determining whether an image is a picture of a bee). A model can learn to perform the required task through supervised training: It is given a “training set” consisting of input (e.g., a set of images) and a desired, or target, output (e.g., a set of labels, either “bee” or “no bee”) for that input. As the model works its way through the input, it produces its own output (i.e., a label for each image). At each step, the model compares its own output to the target output (i.e., the desired label). The difference between the model’s output and the target output is known as the prediction error, and is used by the model to adjust its connection weights, so that the next time it encounters that input, the output it produces will be closer to the target output. In this manner, by gradually adjusting its connection weights, the model learns to perform the required task.
Sentence processing, too, can be modeled with neural networks. It is often modeled using a particular type of neural network called a recurrent neural network, or RNN (Elman, Reference Elman1990). In an RNN, an additional series of connections allows the model to keep track of its previous states (akin to keeping track of words experienced in a sentence), which allows it to process input unfolding over time. Elman (Reference Elman1990) first used this architecture to train a model on next-word prediction in a miniature language. As it encountered each word in the sentence, the model’s output was a pattern of activation reflecting the probabilities of possible continuations. Any difference between its output and the actual next word (prediction error) was then used to adjust its connections. As the model learned the word-order patterns in the language, words belonging to the same syntactic categories began to produce similar patterns of activation, even though the model had no initial notion of word category. This suggested that it is possible to acquire syntactic structure simply through processing language, by estimating the likelihood of possible continuations, and adjusting it based on experience (Elman, Reference Elman1990, Reference Elman1993; see Mikolov et al., Reference Mikolov, Chen, Corrado and Dean2013, for similar results obtained with a natural language corpus).
The potential relevance of these models to prediction during human language processing, and, in turn, language learning, is demonstrated by research showing that the magnitude of prediction error the models encounter positively correlates with sensitivity to word predictability in humans. RNNs trained on next-word prediction were first trained on natural language corpora and then applied to the same materials that were given to human participants in experimental studies, making it possible to compare model performance with human processing. Word-by-word prediction error from these models has been shown to reflect reading times (Frank, Reference Frank2013; Frank & Hoeks, Reference Frank and Hoeks2019; Goodkind & Bicknell, Reference Goodkind and Bicknell2018; Monsalve et al., Reference Monsalve, Frank and Vigliocco2012; Van Schijndel & Linzen, Reference Van Schijndel and Linzen2018), N400 amplitudes during EEG (Frank et al., Reference Frank, Otten, Galli and Vigliocco2013, Reference Frank, Otten, Galli and Vigliocco2015), and MEG responses (Wehbe et al., Reference Wehbe, Vaswani, Knight and Mitchell2014). In other words, the “error signal” used by neural network models to do error-based learning positively correlates with language users’ expectations about upcoming input, which suggests that these expectations may be what supports error-based learning in humans too.
More support for a potential role of prediction in learning comes from the Dual-Path model (Chang, Reference Chang2002; Chang et al., Reference Chang, Dell and Bock2006). This is a specific instance of RNN model that is particularly relevant to current debate on prediction in SLA because it has been cited as the theoretical underpinning for error-based learning in L2 and for the potential role of prediction in such learning (e.g., Jackson & Hopp, Reference Jackson and Hopp2020; Leal et al., Reference Leal, Slabakova and Farmer2017; Phillips & Ehrenhofer, Reference Phillips and Ehrenhofer2015). Originally developed as a model of language production (Chang, Reference Chang2002), the Dual-Path model was adapted to next-word prediction by harnessing its production output to formulate predictions for upcoming words (Chang et al., Reference Chang, Dell and Bock2006); that is, it simulates the prediction-by-production route in humans (Huettig, Reference Huettig2015; Pickering & Garrod, Reference Pickering and Garrod2013). The model was evaluated against data from L1 acquisition, showing that it could simulate findings from preferential looking studies on the acquisition of transitive structures (i.e., Hirsh-Pasek & Golinkoff, Reference Hirsh-Pasek, Golinkoff, McDaniel, McKee and Cairns1996; Naigles, Reference Naigles1990). It could also reproduce data on structural priming (Chang et al., Reference Chang, Dell and Bock2006) and the acquisition of word order biases in English and Japanese (Chang, Reference Chang2009). Priming has been suggested to be a case of implicit error-based learning, and we review evidence for this claim in the next section.
Evidence from priming and adaptation effects in L1 and L2 speakers
The main source of experimental behavioral evidence for error-based learning, in both L1 and L2 speakers, comes from studies on structural priming and adaptation. Structural priming refers to the fact that when language users encounter a particular syntactic construction, they are more likely to expect it again, or to use it in production, than they were before encountering it (Arai et al., Reference Arai, van Gompel and Scheepers2007; Bock, Reference Bock1986; Ferreira & Bock, Reference Ferreira and Bock2006; Ledoux et al., Reference Ledoux, Traxler and Swaab2007). Structural priming effects begin early: They have been observed in children as young as 3 years of age, with priming effects lasting across learning sessions (Branigan & Messenger, Reference Branigan and Messenger2016; Rowland et al., Reference Rowland, Chang, Ambridge, Pine and Lieven2012) and during the earliest stages of L2 learning (Weber et al., Reference Weber, Christiansen, Indefrey and Hagoort2019). It has been suggested that structural priming should be regarded as a case of implicit error-based learning, which modifies a comprehender’s language system, rather than simply inducing a temporary activation of representations (Bock et al., Reference Bock, Dell, Chang and Onishi2007; Bock & Griffin, Reference Bock and Griffin2000; Chang et al., Reference Chang, Janciauskas and Fitz2012).
When the priming effect is persistent, it is often called adaptation. Kaan and Chun (Reference Kaan, Chun, Federmeier and Watson2018a) define syntactic adaptation as “persistent” or “cumulative” priming, where “comprehension or production is not or not only affected by the most recently encountered structure, but by the cumulative prior exposure to structures of the same type” (p. 87). In computational modeling terms, the updating of expectations seen in adaptation would be akin to “adjusting one’s weights.” Adaptation can be measured by tracking the increase in priming effect following repeated exposure to a structure over time, which may manifest itself as increased likelihood to use it in production (Kaan & Chun, Reference Kaan and Chun2018b) or reduced response times when encountering it in comprehension (Fine & Jaeger, Reference Fine and Jaeger2016). Another method, more familiar to SLA research, is to use a pretest/posttest design (Jackson & Hopp, Reference Jackson and Hopp2020; Jackson & Ruf, Reference Jackson and Ruf2017). Adaptation to syntactic structure alternations (such as that between prepositional object and double object dative constructions in English) has been observed in L1 production (Jaeger & Snider, Reference Jaeger and Snider2013; Kaan & Chun, Reference Kaan and Chun2018b; Kaschak, Reference Kaschak2007; Kaschak et al., Reference Kaschak, Loney and Borreggine2006, Reference Kaschak, Kutta and Jones2011; Kaschak & Borreggine, Reference Kaschak and Borreggine2008), and in L1 comprehension (Farmer et al., Reference Farmer, Fine, Yan, Cheimariou and Jaeger2014; Fine et al., Reference Fine, Jaeger, Farmer and Qian2013; Fine & Jaeger, Reference Fine and Jaeger2016). Adaptation effects have also frequently been observed in L2 speakers (Hopp, Reference Hopp2020; Jackson & Ruf, Reference Jackson and Ruf2017; Kaan & Chun, Reference Kaan and Chun2018b; McDonough & Trofimovich, Reference McDonough, Trofimovich, Cadierno and Eskildsen2015; Montero-Melis & Jaeger, Reference Montero-Melis and Florian Jaeger2020; Shin & Christianson, Reference Shin and Christianson2012; see Jackson, Reference Jackson2018 for a review; and see Jackson & Hopp, Reference Jackson and Hopp2020 for an instance of priming without adaptation).
The permanence of priming effects observed in adaptation is compatible with accounts of priming as an instance of learning; however, it does not specifically implicate a role for prediction error as the driving learning mechanism. Additional support for the claim that priming (and adaptation) is a learning mechanism, and specifically an error-based learning mechanism, comes from the observation of inverse frequency effects, which are predicted by an error-based learning model. In the Dual-Path model, low-frequency words would generate greater prediction error, causing a larger adjustment in the weights and therefore a larger learning effect (Chang et al., Reference Chang, Dell and Bock2006). Inverse frequency effects are also observed at the level of structure, not just words: In both L1 and L2, structures that have lower frequency in the input elicit greater priming effects (Hartsuiker et al., Reference Hartsuiker, Kolk and Huiskamp1999; Hartsuiker & Westenberg, Reference Hartsuiker and Westenberg2000; Jaeger & Snider, Reference Jaeger and Snider2013; Kaan & Chun, Reference Kaan and Chun2018b; Kaschak, Reference Kaschak2007; Kaschak et al., Reference Kaschak, Loney and Borreggine2006; Montero-Melis & Jaeger, Reference Montero-Melis and Florian Jaeger2020). In L2 learners, frequency effects appear to be based on the statistics of the L1 at lower proficiency levels, moving to more native speaker-like expectations as proficiency increases (Jackson & Ruf, Reference Jackson and Ruf2017; Montero-Melis & Jaeger, Reference Montero-Melis and Florian Jaeger2020; see Jackson’s Reference Jackson2018 review). Finally, at least in L1, frequency effects also extend to adaptation, with greater adaptation observed for dative structures that are encountered in unexpected contexts (Fazekas et al., Reference Fazekas, Jessop, Pine and Rowland2020). More recent research has begun to directly investigate the link between adaptation and language acquisition, showing that children can adapt to different syntactic structures and use their adapted predictions to infer the meaning of new words and interpret ambiguous words (Havron et al., Reference Havron, de Carvalho, Fiévet and Christophe2019; Havron et al., Reference Havron, Babineau, Fiévet, Carvalho and Christophe2021). All these findings suggest that structural priming and adaptation, driven by a prediction-based mechanism, could potentially play a role in both L1 and L2 acquisition.
Limitations of error-based learning during processing
Even adaptation, however, is subject to variability. For instance, adaptation to structural alternation may depend on which specific semantically constrained constructions are used. In Experiment 1 in Jackson and Ruf (Reference Jackson and Ruf2017), intermediate English–German L2 learners were exposed to fronted temporal adverbial phrases such as Im Winter trägt Paul eine Jacke (“In winter Paul wears a jacket”), which are marked in both English and German, but more frequently in German. They showed both immediate priming and adaptation to these structures, as measured in a posttest. However, in Experiment 2, exposure to fronted adverbial phrases using locative instead of temporal expressions (e.g., Auf dem Berg trägt der Schüler eine Jacke, “On the mountain the pupil wears a jacket”) led to short-term priming, but no adaptation. Similarly, Jackson and Hopp (Reference Jackson and Hopp2020) found L2 English speakers exposed to fronted adverbials in English exhibited immediate priming but no evidence of adaptation in the posttest, unlike L1 English speakers.
In particular, adaptation to garden-path sentences appears to be less robust than adaptation to simple structural alternations, in both L1 and L2 speakers. While some studies find an adaptation effect to garden-path sentences in L1 (Farmer et al., Reference Farmer, Fine, Yan, Cheimariou and Jaeger2014; Fine et al., Reference Fine, Jaeger, Farmer and Qian2013; Kaan et al., Reference Kaan, Futch, Fuertes, Mujcinovic and de la Fuente2019), others find no such evidence (Dempsey et al., Reference Dempsey, Liu and Christianson2020) or show that is very difficult to detect (Prasad & Linzen, Reference Prasad and Linzen2021). In L2 speakers, the evidence is again mixed (Kaan et al., Reference Kaan, Futch, Fuertes, Mujcinovic and de la Fuente2019; Hopp, Reference Hopp2020). Kaan et al. examined adaptation to garden-path sentences (filler-gap wh-subordinates and ambiguous coordination) in L1 and L2 English speakers, in a study using self-paced reading. Adaptation was only found in the L1 English group, and only for the “easier” structure (coordination). Therefore, learning from prediction error during processing was difficult, not just for L2 speakers but for L1 speakers too, who only showed adaptation to the less cognitively demanding of the two structures examined. Such a finding is arguably compatible with evidence that L2 speakers have difficulty recovering from garden-path sentences in a way that is reflective of the difficulty faced by children in their L1. In Pozzan and Trueswell’s (Reference Pozzan and Trueswell2016) study, participants listened to instructions and carried them out while interacting with a visual scene. Error rates on garden-path sentences (showing an inability to revise the initial parse) were similar for 5-year-old L1 speakers and adult L2 learners when there were no referential cues supporting the target interpretation. However, Hopp (Reference Hopp2020, Experiment 1) found that L2 German speakers could adapt to garden-path sentences (specifically, to the intransitive use of optionally transitive verbs such as play) if the sentences provided an unambiguous cue flagging the correct interpretation, in the form of case marking (e.g., in “The boy played and he pleased the parents with the music,” the verb played would be followed by and and the pronoun he in the nominative case, signaling the start of a new clause).Footnote 5 In sum, as is the case for prediction mechanisms, we see that there is variation both among L1 and L2 speakers in the extent to which they can adapt to specific syntactic structures.
Another potential source of variability in adaptation is that prediction depends on context: Evidence suggests that the extent to which predictions are made during language comprehension depends on the overall reliability of context as a source of prediction (Delaney-Busch et al., Reference Delaney-Busch, Morgan, Lau and Kuperberg2019), and that predictions stop being formulated when cues become unreliable (Brothers et al., Reference Brothers, Swaab and Traxler2017). In a self-paced reading study by Brothers et al. (Reference Brothers, Swaab and Traxler2017, Experiment 2), the global validity of predictive cues affected the extent of prediction found. Participants read critical sentences that all had highly predictable completions (i.e., in highly constrained sentential contexts), and a set of highly constraining filler sentences that were manipulated by either having expected or unexpected completions. Participants who saw the filler sentences with the expected completions showed an effect of predictability on the critical items (i.e., reduced reaction times for predictable completions), while the group who saw the filler items with unexpected completions did not show a statistically significant prediction effect on the predictable completion critical items. This suggests that the overall likelihood of disconfirmed predictions had led the group who had experienced the unpredictable completions to abandon the use of sentential constraint as a cue.
The sensitivity of prediction to cue reliability means that prediction error may sometimes result in abandonment or temporary suppression of predictive mechanisms, instead of leading to adaptation (Brothers et al., Reference Brothers, Swaab and Traxler2017; Hopp, Reference Hopp2016; Husband & Bovolenta, Reference Husband and Bovolenta2020; Lau et al., Reference Lau, Holcomb and Kuperberg2013; Van Heugten et al., Reference Van Heugten, Dahan, Johnson and Christophe2012; Wlotko & Federmeier, Reference Wlotko and Federmeier2011). Hopp (Reference Hopp2016) manipulated cues that had high predictive value (gender marking in German) so as to make them unreliable. L1 speakers, who had been using them to anticipate upcoming referents in a visual world eye-tracking paradigm, stopped anticipatory looking when gender marking became unreliable. Hopp argues that this explains why L2 German speakers in previous studies (Hopp, Reference Hopp2013, Reference Hopp2016) only used gender predictively if they consistently produced accurate gender marking; for those who did not (meaning that they had nontarget representations), prediction lead to error, which perhaps led to it being abandoned. If a cue is not a reliably predictive cue, it is arguably an important part of the learning process to stop using it (as it would cause inefficient processing). However, language users can rapidly adapt to input that is seemingly inconsistent, if they can identify new reliable cues in it which have predictive value. Kroczek and Gunter (Reference Kroczek and Gunter2017) exposed listeners to speech by speakers who differed in the relative probabilities of specific syntactic structures they used (OSV/SVO word order in German). While structure usage was not consistent across speakers, listeners developed distinct expectations for syntactic continuations depending on the speaker, as each speaker manifested reliable structure usage. These findings suggest that language users are constantly evaluating the reliability of cues and potential cues, abandoning them if they are no longer reliable and tuning in to new ones that reliably predict upcoming input.
Conclusion
We have seen that the picture, when it comes to prediction in comprehension and learning, is extremely nuanced. Prediction consists of different processes, with different levels of complexity, from basic priming mechanisms to preactivation of specific features. Prediction abilities can vary depending on a large number of factors, in L1 and L2 speakers alike. Variability between L1 and L2 speakers—and even among L1 and L2 speakers—increases as prediction becomes more complex. Sensitivity to word predictability is the most robust type of prediction, showing the least difference between L1 and L2 speakers. However, even at the most complex level of prediction (i.e., preactivation of specific features), L2 speakers of sufficient proficiency have the potential to predict in a native speaker-like fashion, though higher proficiency does not necessarily lead to native speaker-like prediction. In both groups of language users, then, prediction is modulated by cognitive abilities, processing speed, and various aspects of proficiency. More research is needed to investigate the complex relationships between these factors. The role of cognitive and linguistic individual differences, is clearly a burgeoning area of interest in the language learning sciences (e.g., Bolibaugh & Foster, Reference Havron, Babineau, Fiévet, Carvalho and Christophe2021; Buffington et al., Reference Buffington, Demos and Morgan-Short2021; Pili-Moss, Reference Pili-Moss2021; Riches & Jackson, Reference Riches and Jackson2018; Walker et al., Reference Walker, Monaghan, Schoetensack and Rebuschat2020; including special issues dedicated to the topic such as those edited by Andringa & Dąbrowska, Reference Andringa and Dąbrowska2019; Roberts & Meyer, Reference Roberts and Meyer2012), and investigating the role of individual cognitive abilities in prediction and, specifically, error-based learning, in L1 and L2 (or Lx) speakers would constitute a timely extension of this agenda. Such research could shed more light on some of the factors we have reviewed, such as the varying effect of L2 proficiency (e.g., helping to clarify the relative contributions of knowledge representation and “procedural” proficiency in enabling prediction). Individual differences focused research could also help to address broader questions relating to the explanatory power of processing-based accounts of language acquisition (e.g., Havron et al., Reference Havron, Babineau, Fiévet, Carvalho and Christophe2021).
There is ample evidence—from empirical studies on priming and adaptation—that prediction error can be one source of learning, and such evidence is compatible with the predictions made by computational models employing error-based learning. Adaptation to syntactic structure is observed in both L1 and L2 speakers, but again, there is a great deal of variation. Cue reliability affects the extent to which comprehenders make predictions, and factors such as the complexity of the specific syntactic structures encountered can influence the degree of adaptation that can take place, both in L1 and L2 speakers. The extent to which prediction error could also support the acquisition of complex contingencies, as suggested by the Hypothesis Testing proposal, remains an open question to be investigated empirically. More generally, further research will be needed to investigate the question of which kind of linguistics properties can be learned by prediction error, and through which specific mechanisms. This review has focused primarily on error-based learning during the online processing of syntactic structure, mostly evidenced by syntactic adaptation. Other strands of research, however, have used different paradigms to investigate the effect of prediction error on L2 acquisition, such as research on declarative memory formation and vocabulary learning briefly mentioned at the start of this review (e.g., De Loof et al., Reference De Loof, Ergo, Naert, Janssens, Talsma, Van Opstal and Verguts2018). In reality, these mechanisms—error-based learning during online syntactic processing and enhanced declarative memory formation driven by prediction error—are unlikely to operate in isolation, yet the relationship between them is still unclear. A promising avenue for further research will be to examine the connections between them, for instance, by investigating the extent to which error-based learning during incremental sentence processing involves adjusting existing representations, and to what extent it may rely on the formation of new declarative memories (e.g., see Bernolet et al., Reference Bernolet, Collina and Hartsuiker2016, for evidence that syntactic priming is enhanced by explicit memory of the prime sentence). In turn, this understanding may help to address the question of what can and cannot be learned (rather than merely consolidated) through prediction error.
In light of the evidence we have reviewed, there is clearly no simple answer to the question of whether impaired prediction in L2 (or Lx) speakers could be a (qualitative or quantitative) hindrance for acquisition. However, the role of prediction in L2 acquisition is a fruitful avenue for future research, best approached with an awareness of its many nuances and complexities.