Introduction
A positive association between oral vocabulary and reading has been reported in monolingual children and adults (Duff & Hulme, Reference Duff and Hulme2012; Hogaboam & Perfetti, Reference Hogaboam and Perfetti1978; Johnston et al., Reference Johnston, McKague and Pratt2004; McKague et al., Reference McKague, Pratt and Johnston2001), such that prior knowledge of the spoken form of a word conveys a reading accuracy and efficiency advantage over orally unfamiliar words. Recently, it has been proposed that readers might be able to use their knowledge of phoneme-to-grapheme correspondences to form orthographic skeletons, which are expectations of the spellings of words held in oral vocabulary that have not been seen in writing (Beyersmann et al., Reference Beyersmann, Wegener, Nation, Prokupzcuk, Wang and Castles2021, Reference Beyersmann, Wegener, Pescuma, Nation, Colenbrander and Castles2022a; Jevtović et al., Reference Jevtović, Antzaka and Martin2022, Reference Jevtović, Antzaka and Martin2023; Wegener et al., Reference Wegener, Wang, de Lissa, Robidoux, Nation and Castles2018, Reference Wegener, Wang, Nation and Castles2020). The current study builds on prior evidence from monolinguals by adopting a training study design to address the question of whether German–English bilinguals also generate orthographic skeletons when acquiring morphologically complex novel words in their second language (L2).
In a recent review, Wegener et al. (Reference Wegener, Beyersmann, Wang and Castles2022a) outlined evidence for the existence of an association between oral vocabulary knowledge and word reading. They noted that the association is supported by results from a range of research designs, including cross-sectional studies of individual differences (e.g., Bowey & Rutherford, Reference Bowey and Rutherford2007; Goff et al., Reference Goff, Pratt and Ong2005; Ouellette, Reference Ouellette2006; Ouellette & Beers, Reference Ouellette and Beers2010), cross-sectional item-level analyses (Kearns & Al Ghanem, Reference Kearns and Al Ghanem2019; Nation & Cocksey, Reference Nation and Cocksey2009; Ricketts et al., Reference Ricketts, Davies, Masterson, Stuart and Duff2016), longitudinal studies (Duff et al., Reference Duff, Reen, Plunkett and Nation2015; Lee, Reference Lee2011) and training studies. Of these, training studies provide the strongest evidence for the existence of causal effects (Hulme & Snowling, Reference Hulme and Snowling2013). Training studies involve two stages: a teaching phase and a testing phase. In the teaching phase, participants are taught either the pronunciation alone or the pronunciations and meanings of new spoken words. In the subsequent testing phase, participants read the trained words and a matched set of untrained words for the first time and their reading accuracy and/or efficiency are recorded. Training studies with both children and adults (Beyersmann et al., Reference Beyersmann, Wegener, Pescuma, Nation, Colenbrander and Castles2022a; Duff & Hulme, Reference Duff and Hulme2012; Hogaboam & Perfetti, Reference Hogaboam and Perfetti1978; McKague et al., Reference McKague, Pratt and Johnston2001; Taylor et al., Reference Taylor, Plunkett and Nation2011) have found that prior knowledge of the spoken form of a word conveys a reading advantage over untrained words, consistent with the relationship between spoken word knowledge and reading being causal.
A potential cognitive mechanism supporting this link between oral vocabulary and reading was originally proposed, but not tested, by Stuart and Coltheart (Reference Stuart and Coltheart1988). These authors suggested that spoken word knowledge might influence word reading before visual exposure to printed word forms, arguing that if a child could segment spoken speech sounds and had some knowledge of letter sounds, they might begin to construct an orthographic lexicon prior to the commencement of formal reading instruction. Some years later, Johnston et al. (Reference Johnston, McKague and Pratt2004) provided skilled readers with training in novel spoken words before showing the written form for the first time within a masked priming task and found a pattern of results consistent with automatic activation of orthography. Subsequently, and also using the masked priming paradigm, McKague et al. (Reference McKague, Davis, Pratt and Johnston2008) proposed and found evidence for their consonant frame hypothesis, according to which skilled readers can build under-specified orthographic representations around the consonants of known spoken words that are visually novel. This early work with skilled readers suggested that it was plausible that oral vocabulary knowledge might support word reading before visual exposure.
Drawing on this prior work, Wegener et al. (Reference Wegener, Wang, de Lissa, Robidoux, Nation and Castles2018) proposed the orthographic skeleton hypothesis, according to which, once children have a reasonable appreciation of the mappings between phonemes and graphemes, they are in a position to draw on their oral vocabulary knowledge to form expectations of the spellings of words they have not yet seen in writing. In an initial test of this theory with developing readers, Wegener et al. (Reference Wegener, Wang, de Lissa, Robidoux, Nation and Castles2018) taught children in Grade 4 the pronunciations and meanings of novel words in English through oral description of a series of inventions. Next, the trained words, as well as a matched set of untrained words, were embedded in sentences; and children read them for the first time while their eye movements were monitored. Importantly, some novel words had predictable spellings from phonology (e.g., the spoken word “vish” was written as vish) and some had unpredictable spellings (e.g., the spoken word “jeab” was written as jeabb). The key result was an interaction between training and spelling predictability, such that there was a larger spelling predictability effect for trained compared to untrained novel words. This result was taken to indicate that children generated orthographic skeletons of orally known words, which was evident at the first orthographic exposure. In a follow-up study, Wegener et al. (Reference Wegener, Wang, Nation and Castles2020) replicated this finding at the first orthographic exposure and found evidence that children's orthographic skeletons are tentative initial orthographic expectations that are updated as experience with the written word form accrues.
Similarly, skilled readers have been found to generate orthographic skeletons during isolated word reading (Wegener, Wang, et al., Reference Wegener, Wang, Beyersmann, Nation, Colenbrander and Castles2022b). In an experiment with native Spanish speakers, Jevtović et al. (Reference Jevtović, Antzaka and Martin2022) trained their participants in the pronunciations of a set of novel words. When the written form of the trained and untrained words was presented in a sentence reading task, some items had consistent spellings (only one spelling was possible), while others had inconsistent spellings (two spellings were possible, and participants either saw a preferred or an unpreferred spelling). The results showed that participants read trained consistent and inconsistent words with preferred spellings faster than inconsistent words with unpreferred spellings, which is in line with the hypothesis that Spanish native speakers generated orthographic skeletons during oral word training. Similar findings have recently been reported among French native speakers (Jevtović et al., Reference Jevtović, Antzaka and Martin2023).
There are two studies that are particularly relevant for the aims of the current research, showing that orthographic skeletons are not limited to mono-morphemic words, but also yield robust facilitation effects for novel stems that are embedded in morphologically complex words during oral training, including English-speaking children (Beyersmann et al., Reference Beyersmann, Wegener, Spencer and Castles2022b) and adults (Beyersmann et al., Reference Beyersmann, Wegener, Nation, Prokupzcuk, Wang and Castles2021). Using a similar training study design to Wegener et al.'s prior work (Wegener et al., Reference Wegener, Wang, de Lissa, Robidoux, Nation and Castles2018, Reference Wegener, Wang, Nation and Castles2020), adults were first taught the spoken form of a set of novel morphologically complex words composed of a novel stem and a suffix (e.g., vish + ing, vish + es, vish + ed). Next, the adults read the stems of the novel words with predictable and unpredictable spellings from phonology embedded in sentences while their eye movements were monitored. Participants’ eye movements revealed the same key interaction between training and spelling predictability as was reported by Wegener et al. (Reference Wegener, Wang, de Lissa, Robidoux, Nation and Castles2018, Reference Wegener, Wang, Nation and Castles2020), suggesting that participants formed early orthographic skeletons of the embedded stems which influenced their word reading. These findings indicate that the acquisition of spoken words induces orthographic predictions that go beyond the trained whole words themselves, including setting up orthographic skeletons for stems embedded in morphologically complex words.
What is less clear from this prior research is whether or not similar mechanisms apply during the acquisition of morphologically complex novel words in L2 speakers of English, which was the focus of the current investigation. The formation of orthographic skeletons of novel stems embedded in morphologically complex words must not only involve the process of decomposing the novel words into morphemic subunits (e.g., vish + ing), but also the process of generating orthographic skeletons of the embedded novel word stems. While prior research has shown that monolinguals master these complex skills, it is uncertain if L2 speakers have the necessary morphological parsing and phoneme-to-grapheme mapping skills to benefit from spoken language training in similar ways. In particular, morphological processing has been shown to differ across first-language (L1) and second-language speakers. For example, in Silva and Clahsen's study (Reference Silva and Clahsen2008) a masked morphological priming task was used to investigate processing of derived and inflected words in L1 and L2 speakers of English (Chinese, German, or Japanese). They observed priming effects for derived words in both L1 and L2 speakers but priming effects for regular past tense inflections (i.e., -ed) were only observed in L1 speakers. In another study, Neubauer and Clahsen (Reference Neubauer and Clahsen2009) reported the priming facilitation for German regular inflection in both L1 German speakers and advanced L2 speakers of German with Polish as their L1. However, there was an absence of facilitation for irregular inflection in L2 speakers. While several studies reported processing inflectional morphology as challenging for L2 speakers (e.g., Chen et al., Reference Chen, Shu, Liu, Zhao and Li2007; Friederici, Reference Friederici2002; McDonald, Reference McDonald2006; Sabourin & Haverkort, Reference Sabourin, Haverkort, Hout, Hulk, Kuiken and Towell2003), other studies revealed priming effects for inflections in both L1 and L2 speakers (Heyer & Clahsen, Reference Heyer and Clahsen2015; Jacob et al., Reference Jacob, Heyer and Veríssimo2018; Reifegerste et al., Reference Reifegerste, Elin and Clahsen2019). Based on L2 speakers’ particular difficulty in processing inflections, coupled with their potentially less robust phoneme-to-grapheme mapping skills, it is not clear if the previously reported orthographic skeleton effect in monolinguals would generalize to second language speakers of English, a question we aimed to address in the present study.
The present study
We tested whether prior exposure to the oral forms of morphologically complex novel words benefits reading performance in a group of German (L1) – English (L2) bilinguals, building on Beyersmann et al.'s (Reference Beyersmann, Wegener, Nation, Prokupzcuk, Wang and Castles2021) results from English monolinguals. Testing whether L2 speakers of English can benefit from phoneme-to-grapheme mappings and generate L2 orthographic skeletons of novel words has not been explored previously. The case of German–English bilinguals is particularly intriguing because although German is classified as a shallow orthography (i.e., highly consistent grapheme-to-phoneme mappings) as opposed to English, which falls at the deep end of the orthographic transparency spectrum, German is less consistent in its phoneme-to-grapheme mappings as there are often several graphemic options to describe the same phoneme. However, despite the inconsistency of the German spelling system, phoneme-grapheme correspondences are typically regular, as opposed to the highly irregular correspondences in the English language. As a result, native speakers of German may rely more heavily on orthographic skeletons during spoken language exposure compared to native speakers of English. In other words, if the current study were to reveal differences between the orthographic skeleton effect in German–English bilinguals compared to the previously observed findings in English monolinguals, it would not likely be due to their lack of general proficiency in predicting spellings from sound in their L1, but more likely attributable to their greater reliance on regularities within the German compared to the English spelling system.
A three-day training study was conducted, following Beyersmann et al.'s experimental design. L2 speakers of English received oral training on a set of inflected novel words over three consecutive days (e.g., vishing, vished, vishes). The embedded novel words had predictable or unpredictable spellings from phonology. On the third day, participants took part in an eye-tracking experiment where their eye movements were measured while reading the trained and untrained stems embedded in sentences. We hypothesized that if L2 speakers of English can generate L2 orthographic skeletons of embedded stems during oral word learning and use these in a subsequent reading task, we would expect to replicate the training by spelling-predictability interaction that has been previously reported in native speakers (Beyersmann et al., Reference Beyersmann, Wegener, Nation, Prokupzcuk, Wang and Castles2021). That is, we would expect shorter looking times for orally trained items with predictable spellings than unpredictable spellings, and we would expect this difference to be larger than the corresponding difference for untrained items. We expect to observe this interaction at gaze duration, total reading time and regressions in. If the interaction is apparent at gaze duration as we anticipate, then this would likely reflect lexical identification processes, whereas if the interaction is apparent only at total reading time and regressions in, this would suggest that the effect likely reflects lexical integration processes (Rayner & Liversedge, Reference Rayner, Liversedge, Liversedge, Gilchrist and Everling2011). In contrast, if L2 speakers do not generate L2 orthographic skeletons we would expect no interaction between training and spelling predictability. We pre-registered these hypotheses (https://osf.io/qh7sm).
Method
Participants
First, a power analysis was conducted based on previous data from English monolinguals (Beyersmann et al., Reference Beyersmann, Wegener, Nation, Prokupzcuk, Wang and Castles2021). In their study, 40 participants were recruited: they observed training by spelling predictability interaction for the gaze duration, reading aloud and regressions in variables. To estimate the number of participants required to achieve a statistical power of at least 80% we simulated many datasets using the estimates of fixed and random effects from Beyersmann et al. (Reference Beyersmann, Wegener, Nation, Prokupzcuk, Wang and Castles2021). For each simulated dataset we ran the statistical model that we also applied to analyse our data (see below). For each dependent variable, we looked at the percentage of datasets with significant outcomes for different number of participants. The results of the power analysis showed more than 80% chance of replicating the interaction effect with 40 participants for the reading aloud and regressions in variables and 70% chance for gaze duration. Accordingly, for the present study fifty L2 English speakers were recruited. All participants were German university students (Mean Age: 23.6, SD: 3.9) from Potsdam University, Germany, who spoke German as their first language. They participated either for course credits or monetary reimbursement. The Language Experience and Proficiency Questionnaire (LEAP-Q; Marian et al., Reference Marian, Blumenfeld and Kaushanskaya2007) was used to acquire detailed information about participants’ language background (see Table 1). Participants reported high L2 proficiency and early L2 language acquisition. They reported that reading and exposure to media had the highest rate of contribution to their L2 acquisition. Only some of the participants reported a few months of immersion to an English-speaking environment.
Note: self-reports on L2 (English) history measure
a Range: 0 (none) to 10 (perfect)
b Range: 0 (not a contributor) to 10 (most important contributor)
c Range: 0 (never) to 10 (always)
d Range: 0 (none) to 10 (pervasive)
Materials
Novel words
A list of 32 three-phoneme monosyllabic nonwords (e.g., vish) were adapted from Wegener et al. (Reference Wegener, Wang, de Lissa, Robidoux, Nation and Castles2018) and were identical to those used by Beyersmann et al. (Reference Beyersmann, Wegener, Nation, Prokupzcuk, Wang and Castles2021). Three morphologically complex novel words were created from each of these nonwords by combining them with three different existing English suffixes (e.g., vishing, vished, vishes). The syntactic function of the novel words remained consistent, meaning that they were always used as verbs during the oral training phase. As is the case for existing English words, the pronunciation of the suffixes, depended on the phonological context (e.g., the past-tense suffix /t/ was added to stems ending in voiceless consonants, and /d/ to stems ending in voiced consonants).
Half of the novel words had a highly predictable spelling from their phonology (e.g., ‘b’ for /b/ as in yab) since they were assigned spellings with frequent phoneme to grapheme mappings. The other half were assigned spellings that were unpredictable as they contained less frequent mappings (e.g., ‘bb’ for /b/ as in jeabb). The frequencies of phoneme to grapheme mappings were extracted from the CELEX database (Baayen et al., Reference Baayen, Piepenbrock and Van Rijn1993). Bigram and trigram frequencies were extracted from the English Lexicon Project (ELP; Balota et al., Reference Balota, Yap, Hutchison, Cortese, Kessler, Loftis, Neely, Nelson, Simpson and Treiman2007) and SUBTLEX-DE databases (Brysbaert et al., Reference Brysbaert, Buchmeier, Conrad, Jacobs, Bölte and Böhl2011). The predictable trained and predictable untrained items, as well as the unpredictable trained and unpredictable untrained items, were matched on number of letters, English and German number of phonemes, English and German number of syllables, English and German logarithmic bigram frequency, and English and German logarithmic trigram frequency (see Table 2). Although the words were different in terms of their spelling predictability, they could be read aloud correctly using the most common grapheme-phoneme mappings.
Two sets of 16 novel words were used for oral training. All the novel words started with a consonant followed by a vowel. Half of the participants were trained with one set; the other half were trained with the other set. Both sets had novel words with predictable and unpredictable spellings. See Appendix A (Beyersmann et al., Reference Beyersmann, Wegener, Nation, Prokupzcuk, Wang and Castles2021, p. 97) for the full lists of novel words.
Eye-tracking sentences
Thirty-two sentences were created, one sentence for each of the novel words. There were also eight filler sentences which contained additional untrained word stems. In all sentences, only the stem was used (e.g., vish) and was embedded in the middle of the sentence (see Appendix B; Beyersmann et al., Reference Beyersmann, Wegener, Nation, Prokupzcuk, Wang and Castles2021, p. 98). The target words were placed in a position such that they were predictable for meaning (see Beyersmann et al., Reference Beyersmann, Wegener, Nation, Prokupzcuk, Wang and Castles2021; Wegener et al., Reference Wegener, Wang, de Lissa, Robidoux, Nation and Castles2018). For example: Ben put the machine into the fish tank to chig the glass clean again.
Procedure
Oral novel word training and testing took place over three consecutive days. Each training session on Day 1 and 2 lasted about 30 minutes. Day 3 took about 90 minutes as the participants completed the training phase and performed post training tasks which are summarized below (see Table 3; Beyersmann et al., Reference Beyersmann, Wegener, Nation, Prokupzcuk, Wang and Castles2021, p. 90). Throughout the training and post-training sessions, participants were instructed that they were presented with English novel words and that their task was to read aloud the English novel words. All experimental instructions were provided in English, and throughout the entire testing session the experimenter and participants communicated in the English language only.
Note . From “Learning morphologically complex spoken words: Orthographic expectations of embedded stems are formed prior to print exposure,” by Beyersmann et al., Reference Beyersmann, Wegener, Nation, Prokupzcuk, Wang and Castles2021, Journal of Experimental Psychology: Learning, Memory, and Cognition, 47(1), p. 90 (https://doi.org/10.1037/xlm0000808)
Training phase
The oral training phase closely followed the procedure by Beyersmann et al. (Reference Beyersmann, Wegener, Nation, Prokupzcuk, Wang and Castles2021). Participants received oral training of novel words in small groups of 2–4 students. They were informed that they would be learning novel English words about ‘Professor Parsnip's Inventions’ and learn about the features and functions of the inventions. The participants were shown pictures of several inventions and received oral descriptions of each invention using the novel words. The written form of the trained words was never shown to the participants. For instance, they learned that “Professor Parsnip has invented a machine that chigs. It is used for cleaning out fish tanks. It has a sponge and is shaped like an arm”; see Figure 1).
On Day 1, the participants were orally trained twice on eight complex novel words with half having a predictable and half an unpredictable spelling. The remaining eight words were introduced and also trained twice on Day 2. On Day 3, all 16 trained novel words and the corresponding features and functions of the inventions were reviewed.
Post-training phase: Picture naming
This task was performed to check whether the participants learned the novel words and their meanings. Therefore, the participants were shown the picture of each of the inventions and asked about their function (e.g., cleans fish tanks) and usage (e.g., it chigs). Participants’ responses were recorded.
Post-training phase: Eye-tracking experiment
The participants had the first orthographic exposure to the novel words stems during sentence reading. Sentences were presented one by one on the computer screen. Each sentence appeared on a single line. Eye movements were monitored while participants read the sentences silently. The participants read sentences on a computer monitor at a viewing distance of 85 cm. Each character covered 0.26̊ of horizontal visual angle. Sentences were presented in black, Courier New font on a white background. Eye movements were recorded using an EyeLink 1000 desk-mounted eye tracker (SR Research; Mississauga, Canada) with a sampling rate of 1000 Hz. Participants’ right-eye movements were recorded during binocular reading.
A nine-point calibration procedure was performed. Participants fixated a drift correct target prior to each trial. The experiment began with three practice sentences. For all trials, participants ended each trial by fixating box on the right, underneath each sentence. To promote attention to task, participants were asked a yes/no question after each trial.
Participants’ reading of the target words was captured by four eye movement dependent variables: first fixation duration (duration of initial fixation on the target word); gaze duration (sum of all fixations made on the target before the eyes move past the target word to a subsequent word within the sentence); total reading time (sum of all fixations on the target word, including any regressions back to it); and regressions in (probability of making a regression back to the target word from a later portion in the sentence).
Post-training phase: Reading aloud
Participants read aloud all trained (n = 16) and untrained (n = 16) word stems presented individually in a randomized order in the center of a computer screen using DMDX software (Forster & Forster, Reference Forster and Forster2003). Each trial consisted of an 800-ms fixation cross followed by the target word which remained on the screen until 2 seconds had elapsed. Participants were instructed to read aloud each word as quickly and accurately as possible.
Post-training phase: Spelling
The experimenter read aloud the trained and untrained novel word stems and the participants were required to spell the novel words as they were written in the eye- tracking and the reading aloud tasks.
Results
Analysis
We investigated the effect of training and spelling predictability on four eye-tracking dependent variables, including first fixation duration, gaze duration, total reading time, and regressions in. For continuous variables, extremely short or long (below 80 milliseconds and above 1200 milliseconds) looking times were removed. The distribution of these variables was visualized using R statistical software (R Core Team, 2020). Outliers were detected using a density plot. Extreme influential observations were identified using the influence.ME package (Nieuwenhuis et al., Reference Nieuwenhuis, te Grotenhuis and Pelzer2012) and removed (i.e., 40 out of 1462 trials for first fixation duration, 41 trials for gaze duration, and 19 trials for total reading time). We report the results of the models without influential observations. Following Box-Cox tests (Box & Cox, Reference Box and Cox1964), the continuous variables were log transformed. The lme4 package (Bates et al., Reference Bates, Mächler, Bolker and Walker2015) was used to run the statistical models.
One model was run for each dependent variable: first fixation duration, gaze duration, total reading time, and regressions in. All models had training (trained condition was coded as −0.5 and untrained condition was coded as 0.5), spelling predictability (predictable condition was coded as −0.5 and unpredictable condition was coded as 0.5), and their interaction as fixed-effects. The random-effects structure included by-participant and by-item varying intercepts and slopes. The initial model had no correlation between intercepts and slopes. When convergence issues occurred, random slopes were removed one by one, starting with the random slope with the smallest value. To analyze regressions in, we ran a generalized linear mixed-effects model with the glmer function in the lme4 R package. Outliers and extreme influential observations were removed from the dataset (i.e., 58 out of 2453 trials for first fixation duration, 70 trials for gaze duration, and 60 trials for total reading time).
Similarly, we investigated the effect of training and spelling predictability on participants’ reading aloud (i.e., response times and error rates) and spelling responses (i.e., error rates). The statistical models were the same as in the eye-tracking analyses. To analyze the picture naming task, a generalized linear mixed-effects model was run with response accuracy as the dependent variable. The model had word set (Set 1 was coded as −0.5 and Set 2 was coded as 0.5), spelling predictability (predictable condition was coded as −0.5 and unpredictable condition was coded as 0.5), and their interaction as fixed-effects. The same statistical models were used as in the analyses of the previous tasks.
Picture naming task
Participants were highly accurate (i.e., 97.5%, SD = 0.15) in recalling the orally trained novel words. The statistical model showed no significant effect of word set (β =−0.22, SE = 1.86, z = −0.12, p = 0.904), spelling predictability (β =−0.82, SE = 0.83, z = −1, p = 0.321), or interaction between word set and spelling predictability (β = 0.95, SE = 1.62, z = 0.58, p = 0.559). The results show that participants were successful in learning the complex novel words through oral training and were able to correctly associate the novel words with their corresponding pictures of the inventions.
Eye movements
First fixation duration
The statistical model showed no significant effect of training (β = −8.78 × 10−5, SE = 1.72 × 10−2, t = −0.005, p = 0.996), spelling predictability (β = 7.52×10−3, SE = 2.37×10−2, t = 0.317, p = 0.754), or interaction between training and spelling predictability (β = 3.09 × 10−2, SE = 3.44 × 10−2, t = 0.898, p = 0.369; see Figure 2).
Gaze duration
The statistical model showed no significant effect of training (β = 0.05, SE = 0.025, t = 1.944, p = 0.064). However, the effect of spelling predictability was significant (β = 0.21, SE = 0.032, t = 6.448, p < .001). The interaction between training and spelling predictability was not significant (β =−0.01, SE = 0.05, t = −0.339, p = 0.737), providing no support for or against the hypothesis that spelling predictability influenced looking times differently for trained and untrained target words.
Total reading time
The statistical model revealed significant effects of training (β = 0.24, SE = 0.04, t = 5.76, p < .001) and spelling predictability (β = 0.31, SE = 0.06, t = 5.07, p < .001). The results indicate that looking times were shorter for trained than untrained target words and for target words with predictable than unpredictable spellings. The interaction between training and spelling predictability was not significant (β = −0.09, SE = 0.085, t = −1.066, p = 0.295).
Regressions in
The statistical model revealed a significant effect of training (β = 0.74, SE = 0.13, z = 5.55, p < .001). The results indicate that probability of regressing to the target word was less for trained than untrained target words. The effect of spelling predictability (β = 0.33, SE = 0.177, z = 1.868 p = 0.061) and the interaction between training and spelling predictability were not significant (β = −0.40, SE = 0.258, z = −1.549, p = 0.121).
Reading aloud
Response times
One participant's data were excluded due to technical recording issues. In addition, four participants with error rates above 40% were excluded. Incorrect responses were removed from the dataset (22.3% of all data). Following the Box-Cox test (Box & Cox, Reference Box and Cox1964), we used the inverse transformation of response times as the dependent variable. The statistical model revealed significant effects of training (β = 6.39×10−5, SE = 1.20 × 10−5, t = 5.30, p < .001), spelling predictability (β = 1.25×10-4, SE = 3.40×10−5, t = 3.69, p < .001), and an interaction between training and spelling predictability (β = 5.75×10−5, SE = 2.18 × 10−5, t = 2.63, p = 0.01). This indicates that participants were faster in reading aloud trained items with predictable than unpredictable spellings (see Figure 3).
Error rates
The results revealed a significant effect of training (β = −0.81, SE = 0.15, z = −5.42, p < .001), indicating that participants made fewer errors reading trained than untrained items. The effect of spelling predictability (β = −0.47, SE = 0.41, z = −1.15, p = 0.249) and its interaction with training were not significant (β = −0.28, SE = 0.34, z = −0.83, p = 0.403).
Spelling
In the spelling data, there was a significant effect of training (β = 0.28, SE = 0.117, z = 2.44, p = 0.014) showing that participants were more accurate in producing the written form of the trained than untrained novel word stems. In addition, there was a significant effect of spelling predictability (β = 1.47, SE = 0.30, z = 4.97, p < .001) showing that they were more accurate in producing the written form of the novel word stems with predictable than unpredictable spelling. The interaction between training and spelling predictability was not significant (β = 0.09, SE = 0.235, z = 0.40, p = 0.685; see Figure 4).
Discussion
The present study used an oral novel word learning paradigm to investigate if L2 speakers of English generate L2 orthographic skeletons during oral training and benefit from them when reading the novel words for the first time. During three consecutive days of training, participants received oral descriptions of different inventions using novel word stems (e.g., vish) combined with three different inflectional morphemes (i.e., vishing, vishes, vished). Following training, on the third day, participants read the novel word stems for the first time embedded in sentences while their eye movements were tracked. In addition, they performed a picture naming, a reading aloud, and a spelling task.
The results of the eye-tracking experiment showed a significant effect of training on total reading time and regressions in. This is the first key finding showing that adult L2 speakers of English benefited from prior oral vocabulary training when reading novel words for the first time. This is consistent with the main effect of training previously evidenced in English monolinguals (see Beyersmann et al., Reference Beyersmann, Wegener, Nation, Prokupzcuk, Wang and Castles2021), although the effect occurs later in the eye movement record of the L2 speakers, perhaps implying somewhat reduced reading efficiency in the L2 sample.
In addition, a significant effect of spelling predictability was observed on gaze duration and total reading time showing that participants spent less time fixating words with predictable than unpredictable spellings. However, although L2 speakers were influenced by spelling predictability, this effect was not modulated by whether the words were trained or untrained. The observed absence of a training by spelling predictability interaction clearly contrasts with the earlier findings from English monolinguals (Beyersmann et al., Reference Beyersmann, Wegener, Nation, Prokupzcuk, Wang and Castles2021; Wegener et al., Reference Wegener, Wang, de Lissa, Robidoux, Nation and Castles2018, Reference Wegener, Wang, Nation and Castles2020). To statistically compare the current results from L2 with Beyersmann et al.'s monolingual data, we carried out an additional, pre-registered set of combined analyses across both data sets. The full results are reported in the study's OSF repository: https://osf.io/u7wkq/?view_only=3d4c8b8230154a2facfe7fbd883d9e23 The combined analyses revealed a significant three-way interaction between training, spelling predictability and the participant group (monolinguals vs. bilinguals) for gaze duration and total reading time. This suggests that although L2 speakers were sensitive to differences between common and unusual spellings in English, as evidenced by a main effect of predictability of the novel words, they did not demonstrate the orthographic skeleton effect observed in monolinguals.
There are several different possibilities why this might have been the case. One possibility is that participants simply did not generate orthographic skeletons at all. While the data do not allow us to rule out this possibility, we consider it unlikely. A second possibility is that participants generated English L2 orthographic skeletons of the novel words during oral learning, but then failed to integrate them when reading the words in the eye-tracking task. Exactly why this might have been the case is not clear, although it could be due to their generally lower levels of reading experience in English compared to native language speakers (Weber & Broersma, Reference Weber and Broersma2012). Some studies have reported the impact of L2 proficiency on morphological processing. For instance, Feldman et al. (Reference Feldman, KostiĆ, Basnight-Brown, ĐUrĐEviĆ and Pastizzo2010) reported priming effects for inflected verbs in L2 speakers was modulated by the English language proficiency of the participants. That is, the higher the language proficiency was, the stronger the priming effects were. Likewise, Coughlin and Tremblay (Reference Coughlin and Tremblay2015) found that priming effects for French inflected words were increased by participants’ L2 proficiency. This might explain why L2 speakers failed to generate orthographic skeletons through oral training.
A third possibility is that participants did generate orthographic skeletons, but they were not precisely based on L2 phoneme-grapheme correspondences. We speculate that although participants knew they were learning novel English words, they may have had a greater degree of automaticity in linking phonemes to graphemes in their L1 (German), which may in turn have interfered with the process of generating English L2 orthographic skeletons. For example, in English the phoneme /j/ is spelled as ‘y’ as in ‘yes’; however, in German the same phoneme has a different written form, that is /j/ as in ‘ja’ (means ‘yes’ in German). Several other such examples exist too (for instance, the phoneme /f/ in English is written as either ‘f’ or ‘ph’ as in ‘four’; however, in German the same phoneme can also be written as ‘v’ like ‘vier’ (i.e., /fɪə/ means ‘four’ in German). The difference between the German and English orthographies may have had an impact on the creation of orthographic skeletons as well as reading. That is, in English a grapheme corresponds to multiple phonemes (i.e., deep orthography), whereas in German it is more common to have one-to-one correspondence between graphemes and phonemes (i.e., shallow orthography; Liu & Cao, Reference Liu and Cao2016). Therefore, this difference makes it more difficult and time consuming for L2 speakers of English to make accurate and fast correspondences between phonemes and graphemes, which in turn may impact generating orthographic skeletons. Items containing phoneme-grapheme mappings that conflict in German and English were present in the current experiment, across both spelling predictability conditions. If participants automatically drew on their L1 phoneme-to-grapheme mappings, this could have resulted in the formation of an orthographic skeleton that differed from the English phoneme-to-grapheme mappings (e.g., upon hearing the spoken word ‘yab’ a German L1 speaker may automatically generate the spelling jab). If this occurred for items with predictable English spellings, then at least some of the orthographic skeletons participants formed would not have matched the orthographic form they saw, thus potentially limiting the effect of spelling predictability for orally trained items. Some tentative support for this interpretation comes from participants’ reading aloud and spelling responses: German grapheme-phoneme correspondences were used when encountering the stems for a proportion of items (for instance, reading aloud ‘jit’ as /jit/ or spelling ‘meaph’ as ‘meave’).
In the reading aloud task, we observed a significant effect of training, spelling predictability and their interaction on reading latencies, and an effect of training on reading aloud accuracy. This task was performed after the eye-tracking task, which means that participants had already encountered the spelling of each word. The results of the reading aloud task indicate that participants benefitted from their prior oral vocabulary knowledge of the trained items, regardless of the predictability of their spellings. This suggests that L2 speakers were able to draw on their oral vocabulary knowledge to assist with the process of forming connections between the orthographic form that participants first encountered during the eye-tracking task and the phonological form that participants had acquired during oral training.
Prior work with both children and adults (Duff & Hulme, Reference Duff and Hulme2012; Hogaboam & Perfetti, Reference Hogaboam and Perfetti1978; Johnston et al., Reference Johnston, McKague and Pratt2004; McKague et al., Reference McKague, Pratt and Johnston2001) has shown that monolingual English speakers demonstrate a reading accuracy and efficiency advantage for orally trained words compared to untrained words. When a novel word is encountered in print, the reader should phonologically decode the word and, if the word is orally familiar, the reader likely then attempts to match the decoded pronunciation with an entry stored in oral vocabulary. Current theories suggest that there may be two time points at which oral vocabulary knowledge might assist with the process of reading novel words (see Wegener et al., 2022a). The first occurs prior to visual exposure via the generation of orthographic skeletons, as described earlier, which facilitate this matching process. The second occurs from the point of visual exposure via a process termed set for variability (Venezky, Reference Venezky1999) or mispronunciation correction (Dyson et al., Reference Dyson, Best, Solity and Hulme2017), in which decoding attempts undergo some adjustment in order to match them with known spoken words. Given that the current study did not find evidence that L2 learners of English had generated orthographic skeletons prior to visual exposure but they still demonstrated facilitated reading of trained compared to untrained words, this suggests that the benefit of having a word in oral vocabulary may have been conferred via this second process of adjusting decoding attempts. The finding that, on the eye movement measures, the training effect only emerged late in processing supports this interpretation (see Murray et al., Reference Murray, Wegener, Wang, Parrila and Castles2022).
Further research might investigate whether or not children as beginner L2 learners of English (i.e., German L1 speakers) show similar patterns of learning and are sensitive to the orthographic skeleton of oral words like native children. Moreover, to further explore the challenges of L2 vocabulary acquisition, an interesting extension of the current work would be the direct comparison between novel words that provide an exact match in L1 and L2 phoneme-grapheme correspondences and novel words differing in their L1 and L2 phoneme-grapheme correspondences. It is also worth investigating whether combining novel word stems with derivational morphemes (e.g., vishist, vishment, vishity), as opposed to the inflectional forms that were presently used (e.g., vishing, vished, vishes) makes a difference in learning and decomposing novel words as well as generating the orthographic skeleton in both L1 and L2 speakers of English. This can contribute to the mixed literature showing differences of morphological priming effects in processing inflected words in L1 vs. L2 but not necessarily in processing derived words (Jacob, Reference Jacob2018; Reifegerste et al., Reference Reifegerste, Elin and Clahsen2019; Silva & Clahsen, Reference Silva and Clahsen2008). We suspect that German native speakers are naturally more reliant on German than English phoneme-grapheme correspondences, suggesting that the previously evidenced spelling-by-predictability interaction in English monolinguals (Beyersmann et al., Reference Beyersmann, Wegener, Nation, Prokupzcuk, Wang and Castles2021; Wegener et al., Reference Wegener, Wang, de Lissa, Robidoux, Nation and Castles2018), Spanish native speakers (Jevtović et al., Reference Jevtović, Antzaka and Martin2022) and French native speakers (Jevtović et al., Reference Jevtović, Antzaka and Martin2023) should be replicable in German monolinguals.
In sum, the current findings show that L2 speakers successfully learned the spoken novel words, which led to overall shorter looking times when reading the novel stems embedded in sentences for the first time. Participants also showed sensitivity to the spelling predictability of the novel words, which was however not modulated by oral vocabulary training. Crucially, the eye-movement data revealed that English L2 speakers, as opposed to English monolinguals, did not generate orthographic skeletons that were robust enough to affect their eye-movement behavior when seeing the novel words for the first time in print. Future work may further explore the mechanisms that L2 speakers use in predicting orthographic form based on oral exposure.
Acknowledgements
This research was supported by a Discovery Early Career Researcher Award (DECRA) by the Australian Research Council (ARC) to EB (DE190100850).
Ethics approval statement
This study was approved by the ethics committee of Macquarie University, Australia, which conforms to the requirements of the Australian National Statement on Ethical Conduct in Human Research 2007 (updated July 2018).
Data availability statement
The data, analyses, and the materials that support the findings of this study are openly available in [OSF repository] at https://osf.io/u7wkq/?view_only=3d4c8b8230154a2facfe7fbd883d9e23.
Competing interests declaration
The authors declare none.
Appendix A List of Complex Novel Words
Appendix B List of the Sentences Used in the Eye Tracking Experiment