1. Introduction and historical perspective
Research on the acquisition of second language (L2) phonology has relatively recently begun to systematically investigate the role that orthographic input plays in acquisition. For the purposes of this review, we use the term ‘input’ in the sense typical of the L2 acquisition literature, to refer to the linguistic information learners have available to them to acquire a language (see, e.g., Gass & Madden, Reference Gass and Madden1985). ‘Orthographic input,’ then, is the input that is available to learners in written form. In the following, we refer to those studies that have manipulated the availability of orthographic input during an exposure phase from those that examine the effects of existing orthographic knowledge (however and whenever it was acquired). We employ the narrower term ‘orthographic input’ in the former cases and the broader term ‘orthography’ in the latter cases. Given that instructed learners are often literate, and that in instructed settings, orthographic input is typically widely available (and may in fact be learners’ only source of input for some words), it is essential for language teachers to be aware of the potential beneficial and interference effects of orthography in L2 acquisition.
To our knowledge, there are three previously-published articles that provide overviews of this area of research: an early review article by Bassetti (Reference Bassetti, Piske and Young-Scholten2008), and introductions to special issues on this topic published in Language and Speech (Simon & Van Herreweghe, Reference Simon and Van Herreweghe2010) and Applied Psycholinguistics (Bassetti, Escudero, & Hayes-Harb, Reference Bassetti, Escudero and Hayes-Harb2015). The purpose of this review is to summarize studies in this area of research up to the present date, to discuss their pedagogical implications, and to suggest areas for future research. We break down the research on the role of orthography on L2 phonological acquisition into six main lines of inquiry, which correspond to the major sections of this review:
-
1. Orthography and L2 phonological awareness
-
2. The influence of orthography on L2 phoneme perception
-
3. Orthography and the acquisition of L2 phonological processes
-
4. Orthography and the acquisition of L2 syllable structure
-
5. Orthography and the pronunciation and recognition of L2 words
-
6. Research summary and pedagogical implications
In light of many decades of research on native language reading, it should not be surprising that access to orthographic forms would interact with phonological development. Phonographic orthographies are those where written forms represent sounds and include alphabets (phoneme-based) and syllabaries (syllable-based). Alphabets, in particular, have received a great deal of attention, as they vary widely in what is known as transparency, or the directness of the correspondence between letters (graphemes) and sounds (phonemes). So-called transparent alphabets (e.g., Serbian, Spanish) are those where there is a one-to-one bidirectional correspondence between graphemes and phonemes (also referred to as consistent mappings; see, e.g., Ziegler & Ferrand, Reference Ziegler and Ferrand2008). Opaque alphabets (e.g., English, French), then, are those that exhibit many-to-one grapheme-phoneme and/or phoneme-grapheme correspondences (also referred to as inconsistent mappings). Much of the relevant work from the native language reading literature has compared native English speakers’ performance on relatively more transparent or opaque written forms. For example, Castles, Holmes, Neath, and Kinoshita (Reference Castles, Holmes, Neath and Kinoshita2003) investigated the effect of the transparency of specific words’ spellings on phoneme deletion performance. Adult and child English-speaking participants were asked, for example, to ‘take the /rə/ from struggle’ (a transparent example) and the ‘/wə/ from squabble’ (an opaque example). They found that performance was more accurate in the transparent condition, and concluded that there are ‘substantial orthographic influences on phonological awareness task performance’ (Castles et al., Reference Castles, Holmes, Neath and Kinoshita2003, p. 445). Similar types of effects of orthographic forms on various phonological awareness tasks have been reported throughout the literature (e.g., Seidenberg & Tanenhaus, Reference Seidenberg and Tanenhaus1979; Ziegler, Ferrand, & Montant, Reference Ziegler, Ferrand and Montant2004; Tyler & Burnham, Reference Tyler and Burnham2006; Cutler, Treiman, & van Ooijen, Reference Cutler, Treiman and van Ooijen2010; Rastle, McCormick, Bayliss, & Davis, Reference Rastle, McCormick, Bayliss and Davis2011).
Also relevant is the large literature on L2 reading, as it has highlighted the importance of the differences between languages with respect to the ways in which linguistic information is conveyed through writing. Given that reading processes differ depending on the characteristics of the writing system (e.g., Katz & Frost, Reference Katz and Frost1992; Ellis et al., Reference Ellis, Natsume, Stavropoulou, Hoxhallari, Daal, Polyzoe and Petalas2004), a learner's experience with a L2 writing system will involve ‘continual interactions between the two languages as well as incessant adjustments in accommodating the disparate demands each language imposes’ (Koda, Reference Koda2007, p. 1). These adjustments involve navigating potential differences between the native language (L1) and the L2 on the following dimensions:
• Script type: logographic (e.g., Mandarin Hanzi and Japanese Kanji) and phonographic, which includes alphabetic (English, Spanish, Arabic) and syllabic (Japanese Hiragana and Katakana) scripts
• Transparency: shallow/transparent orthographies have largely one-to-one bidirectional mappings between graphemes and phonemes (e.g., Spanish); deep/opaque orthographies exhibit relatively more many-to-one and one-to-many mappings (e.g., English)
• Script direction: left-to-right (e.g., English, French), right-to-left (e.g., Arabic, Hebrew, Farsi), or top-down (e.g., sometimes Mandarin, Japanese).
For example, a native speaker of Arabic learning Mandarin must contend with a new script type and direction, and a native speaker of Spanish learning English is faced with the challenge of a difference in transparency, where the Spanish-based expectation is that letters and phonemes exhibit a one-to-one relationship but the English L2 system exhibits many-to-one and one-to-many grapheme-phoneme correspondences. Differential mappings between graphemes and phonemes in the L1 and L2 may also pose challenges for language learners:
• Congruence: congruent mappings are those where the L1 and L2 employ (roughly) the same grapheme-phoneme correspondences (e.g., <m> maps to /m/ in both English and Spanish; incongruent mappings are those where the two languages differ in grapheme-phoneme correspondences (e.g., <H> maps to /h/ in English but /n/ in Russian).
Importantly for the present purposes, these cross-linguistic differences among writing systems may impact the ways in which learners are able to make inferences about the phonological structure of the language from orthographic input. In the following sections, we review the relatively recent—and growing—body of research that has specifically focused on the effects of orthography on L2 phonological development.
2. Orthography and L2 phonological awareness
One of the ways in which orthographic input may affect L2 learners’ development is in their phonological awareness, a type of metalinguistic awareness having to do with the sound structure of words. Bassetti (Reference Bassetti2006) examined the effect of Pinyin spelling conventions—and their relationship to English spelling conventions—on native English speakers’ acquisition of Mandarin phonology. In Pinyin, which is a Romanization system for Mandarin, vowel letters are sometimes omitted from written forms, depending on syllable structure. For example, the syllable /uei/ is spelled <wei>, while the syllable /guei/ is spelled <gui>. In two experiments, one involving phoneme counting (a metalinguistic task that is often used to examine phonological awareness) and the other involving word production, Bassetti (Reference Bassetti2006) found that participants were more likely to both count and pronounce the /e/ when it was present in the Pinyin written forms. Bassetti (Reference Bassetti, Guder, Jiang and Wan2007) replicated the word production results of Bassetti (Reference Bassetti2006) with native Italian-speaking learners of Mandarin. What is particularly interesting about the Bassetti (Reference Bassetti2006) and Bassetti (Reference Bassetti, Guder, Jiang and Wan2007) studies is that participants were not shown the Pinyin written forms at any time during the study; rather, they performed both the phoneme counting and the word production tasks by reading non-phonographic Hanzi characters, which do not present phonological information. In this way, Bassetti was able to demonstrate that the effects of Pinyin spelling conventions affected learners’ phonological representations of Mandarin.
Pytlyk (Reference Pytlyk2017) investigated the effects of orthography on phoneme counting performance by native English-speaking learners of Mandarin and Russian. Participants were asked to count the number of phonemes in auditory words in their respective L2s, and were more accurate when the number of letters in the words’ spellings (in Pinyin and in Cyrillic, also an alphabetic script) matched the number of phonemes in the words than when they did not. Participants in the Pytlyk (Reference Pytlyk2017) study were presented with the auditory forms of the words, and not their written forms, indicating that learners’ prior knowledge of the words’ Pinyin/Cyrillic written forms influenced their phonological representations of the newly-learned words.
Detey and Nespoulous (Reference Detey and Nespoulous2008) employed a different type of phonological awareness task: syllable counting. They examined the syllable-counting performance by native Japanese learners of French in three conditions: auditory, visual (written forms), and audiovisual. Target items contained consonant clusters (e.g., /plokama/), and it was assumed that if participants’ knowledge of Japanese syllable structure interfered, they would epenthesize in order to repair consonant clusters, resulting in larger numbers of syllables counted. The authors do not present inferential statistical analysis of their data. However, descriptively, they found that participants were more likely to epenthesize in the visual condition than in the audiovisual condition, and least likely to epenthesize in the audio condition. This finding was counter to the authors’ hypothesis that the visual conditions would serve to suppress participants’ percept of epenthetic vowels, and are also counter to the findings of Zjakic (Reference Zjakic2017; described in detail below), who found that access to written forms while learning words resulted in more accurate memory for non-native consonant clusters.
From these studies, we conclude that knowledge of phonographic written forms can impact adults’ phonological representations and phonological awareness in a L2 at the segmental level. All of the studies reviewed in this section provide evidence that knowledge of written forms can in fact interfere with accurate metalinguistic task performance. In the case of Bassetti (Reference Bassetti2006), Bassetti (Reference Bassetti, Guder, Jiang and Wan2007), and Pytlyk (Reference Pytlyk2017), L2 learners’ phoneme counting performance was negatively impacted by Pinyin and/or Cyrillic spelling conventions. Detey and Nespoulous (Reference Detey and Nespoulous2008) showed that L2 learners’ syllable counting performance can also be negatively impacted by written input. Tasks of phonological awareness, such as phoneme and syllable counting, shed light on learners’ metalinguistic knowledge of the L2. However, they do not directly address the question of how orthographic input affects the acquisition or processing of the L2. The Bassetti (Reference Bassetti2006) study is notable in that it demonstrates a possible connection between phoneme counting and word production, suggesting a relationship between the two. Further work connecting phonological awareness to online speech production or perception—in particular, focusing on any causal relationship between phonological awareness and online performance—would be particularly useful for language instruction.
3. The influence of orthography on L2 phoneme perception
In this section, we review studies that investigate the effects of orthography on phoneme perception as revealed by metalinguistic tasks that involve participants judging similarity or difference among auditory stimuli. Pytlyk (Reference Pytlyk2007) and Pytlyk (Reference Pytlyk2011) are among the earliest studies to look specifically at the effect of orthography on L2 phoneme perception. In this study, Pytlyk taught native English speakers with no prior learning of Mandarin phonemes in three exposure conditions: Pinyin (familiar Roman graphemes), Zhuyin (a phonographic writing system with unfamiliar graphemes), and control (no orthographic input). Following a training period where participants heard the Mandarin phonemes with accompanying visual input according to their respective exposure conditions, participants performed an oddity discrimination task in which they were asked to determine which of the three auditory stimuli differed from the other two. The hypothesis was that because Pinyin uses the same graphemes as English, but with some different grapheme-phoneme correspondences, participants in the Pinyin condition would experience interference from English that would result in decreased perceptual performance relative to the other two groups. However, there was no difference in accuracy among the three groups, suggesting that, at least with respect to this training protocol, Pinyin input does not negatively affect the acquisition of Mandarin phoneme perception.
Simon, Chambless, and Kickhöfel Alves (Reference Simon, Chambless and Kickhöfel Alves2010) investigated the perception of the French /u - y/ vowel contrast by native speakers of English. A word learning phase consisted of the presentation of an auditory nonword (e.g., /dyʒ/) along with a pictured meaning (e.g., ‘banana’), and participants assigned to the sound-spelling group were additionally shown the word's spelling (e.g., <dûge>). At test, participants performed an AXB discrimination task in which a trial consisted of participants hearing three new words and being asked to determine whether the vowel in the second word (X) matched that in either the first (A) or the third (B). Participants in the sound-spelling group did not outperform those in the sound only group, suggesting that in this case access to written forms during word learning neither helped nor hindered native English speakers’ ability to discriminate the French /u/-/y/ contrast.
Escudero and Wanrooij (Reference Escudero and Wanrooij2010) examined the perception of Dutch vowels by native speakers of Spanish learning L2 Dutch. In the first experiment, 204 native Spanish speakers (from Spain and Latin America) living in the Netherlands performed an auditory XAB task where they were asked to determine whether an auditory vowel stimulus (X; e.g., [i]) better matched the second (A; e.g., [i]) or third (B; e.g., [y]) vowel stimulus. In the orthographic task, participants were again presented with an auditory vowel stimulus, but in this task were then asked to select the orthographic representation that best matched the vowel sound (e.g., <i>, <u>). Escudero and Wanrooij (Reference Escudero and Wanrooij2010) found, among other things, that the vowels /a/ and /ɑ/ were most difficult, and that /i/ and /y/ were easiest for these learners in the auditory task. However, in the orthographic task, this pattern of difficulty was reversed, with /y/ being the most difficult and /a/ and /ɑ/ the easiest. A follow-up study with native Peruvian Spanish speakers who had no Dutch language experience, and with modified auditory and orthographic tasks to make the task demands more similar to each other, revealed the same pattern of results. Together, these two experiments indicate that the relative difficulty that language learners experience with various L2 vowels may be moderated by the availability of written forms.
The studies reviewed in this section provide mixed results concerning the influence of orthography on speech perception in a L2. On the one hand, orthography impacted native Spanish speakers’ perception of Dutch vowels when the task itself involved written forms as response options (Escudero & Wanrooij, Reference Escudero and Wanrooij2010). On the other hand, in the case of perceptually difficult French vowels, where the written forms systematically signal the novel vowel contrasts, native English speakers who had access to the orthographic input during an exposure phase did not exhibit more accurate perception relative to those who did not (Simon et al., Reference Simon, Chambless and Kickhöfel Alves2010). Importantly, the test phase in this study was entirely auditory, suggesting that any benefit participants might have experienced from the availability of written input during the exposure phase did not translate into enhanced perceptibility at test. Finally, the predicted interference effects of differences between Mandarin/Pinyin and English grapheme-phoneme correspondences were not borne out—neither a group exposed to Pinyin nor a control group with no orthographic input outperformed the Pinyin group in Pytlyk (Reference Pytlyk2011) and Pytlyk (Reference Pytlyk2017). It is important to note that it is not possible to draw meaningful conclusions about the impact of orthography on L2 speech perception given the very small number of studies. In addition, the very different goals of each of these studies make cross-study comparisons difficult.
So far, we have considered studies of learners’ phonological awareness and their perception of L2 speech sounds. While they provide valuable insights into learners’ representations and metalinguistic knowledge of the L2, the tasks used in these studies do not necessarily reflect the online processes involved in language use. The remainder of this review will focus on studies of linguistic representation and processing in two domains: the learning and application of phonological processes, and the learning and processing of words.
4. Orthography and the acquisition of L2 phonological processes
In addition to learning the phonemes that are used to make meaningful distinctions between words, L2 learners are also faced with the challenge of acquiring the phonological processes (e.g., liaison in French, spirantization in Spanish, vowel harmony in Turkish) that determine the way that speech sounds are produced in particular contexts. A small number of studies have considered the role that orthography plays in the acquisition of L2 phonological processes, with the majority of these focusing on written input in the acquisition of German final devoicing by native English speakers. In German, while voicing differences among obstruents are found in non-final positions, word final obstruents are produced as voiceless regardless of their underlying voicing (e.g., <Rad> /ʁɑd/ ‘wheel’ and <Rat> /ʁɑt/ ‘advice’ are both pronounced [ʁɑt]). Despite being prevalent in the auditory input to learners of German, the process of German final devoicing is notoriously difficult for native English speakers to acquire. Young-Scholten (Reference Young-Scholten2002) hypothesized that English learners’ difficulty stems from exposure to written forms that represent the underlying voicing value of word final obstruents, and thus are misleading with respect to the surface voicing of German words that exhibit the voicing alternation. Support for this claim comes from a longitudinal study of the phonological acquisition of three adolescents over a year-long study abroad experience in German. Young-Scholten (Reference Young-Scholten2002) reported that, while none of the learners produced target-like patterns of surface voicing in their productions of German words exhibiting the voicing alternation by the end of their time abroad, participants who reported the most exposure to orthographic input demonstrated the least improvement. Young-Scholten and Langer (Reference Young-Scholten and Langer2015) later examined the development of word-initial /z/ in German by the same three naturalistic learners. They observed that the learners tended to produce word-initial <s>, pronounced /z/ in German, as /s/, consistent with the grapheme-phoneme correspondences of English, providing additional support for the influence of orthographic input in L2 acquisition of German by naturalistic learners.
The final devoicing findings from naturalistic learners reported by Young-Scholten and colleagues have been corroborated in a series of laboratory studies reported in Brown (Reference Brown2015) and Hayes-Harb, Brown, and Smith (Reference Hayes-Harb, Brown and Smith2018). These studies employed an artificial lexicon study format in which native English-speaking participants with no prior experience with German or any other language with final devoicing learned novel German-like words via exposure to their auditory forms or their auditory and written forms paired with their pictured ‘meanings.’ They found that participants who were exposed to the written forms of the novel German words produced more voiced final obstruents on a subsequent picture naming task than those who were not exposed to written forms, suggesting that written forms indeed interfered with the acquisition of a German-like pattern of surface voicing. In a second experiment, the authors examined whether the provision of explicit information regarding the misleading nature of the written forms would help learners overcome the effect. However, the intervention was not sufficient to circumvent the interfering effect of orthographic input.
Building on these findings, Barrios and Hayes-Harb (Reference Barrios and Hayes-Harb2020) investigated whether the interfering effect of orthographic input is observed when naïve learners are exposed to alternating and non-alternating surface forms in both their suffixed and unsuffixed form, which has been argued to provide the essential evidence for whether a learner has acquired a phonological alternation (Young-Scholten, Reference Young-Scholten2002, p. 268). In the study, in addition to observing auditory evidence that underlyingly voiced final obstruents are produced as voiceless in unsuffixed forms (e.g., /tʁob/ pronounced [tʁop] and spelled <trob>), participants were also exposed to the voiced alternants of alternating words (e.g., /tʁoben/ pronounced [tʁoben] and spelled <troben>). Despite auditory evidence for the alternation, the authors found that, unlike the participants who were not exposed to written forms, participants who had access to words’ written forms during word learning failed to learn the final devoicing process, suggesting that orthographic input interfered with the acquisition of the phonological process. Moreover, participants who were exposed to orthographic input were more likely than participants who were not to incorrectly produce underlyingly voiced words as voiced in their unsuffixed forms, as was also reported by Hayes-Harb et al. (Reference Hayes-Harb, Brown and Smith2018). However, unlike the previous study, Barrios and Hayes-Harb (Reference Barrios and Hayes-Harb2020) also uncovered a potential advantage of exposure to orthographic input during the acquisition of a phonological process. That is, the orthography group was found to correctly produce underlyingly voiced words as voiced in their suffixed forms, suggesting that orthographic input, while misleading with respect to the surface realizations of alternating words, may provide a helpful clue to words’ underlying forms.
In another study revealing an interference effect of orthography, on the acquisition of a phonological process, Özçelik and Sprouse (Reference Özçelik, Sprouse and Gürel2016) examined the role of orthographic input in the acquisition of the phonological process of vowel harmony by native English-speaking learners of L2 Turkish. In Turkish, the vowel features of suffixes are determined by the features of the vowel from the immediately preceding syllable. Specifically, [+high] suffix vowels take on the backness and roundness features of the preceding vowels, while [-high] suffix vowels match the preceding vowel in backness only (e.g., <eve> /ev-e/ ‘home.DAT’ vs. <ava> /av-a/ ‘hunting.DAT’). In addition to the canonical pattern of vowel harmony just described, Turkish also exhibits a pattern of non-canonical vowel harmony where the backness feature of a suffix vowel is determined by the backness feature of an intervening lateral, as opposed to the preceding vowel (e.g., [rol-e] /rol-e/ ‘role.DAT’ in which the [-high] suffix vowel matches the [-back] lateral in backness as opposed to the [+back] vowel o). Crucially, in non-canonical vowel harmony, the Turkish writing system does not represent the contrast between [+back] and [-back] laterals (e.g., [ɫ] vs.[l]) that trigger the process, which are both written as <l>. They asked whether an effect of misleading orthographic representations is found when the triggers for target language phonological rules are not transparently represented in the L2 orthography. Sixteen English learners of Turkish at three proficiency levels (six beginner, five intermediate, five advanced, as determined by a multiple choice cloze task) were presented with a Turkish word or pseudowords that undergo either canonical or non-canonical vowel harmony. Words were presented in one of two conditions; auditory only or both auditorily and in writing. Participants were asked to select the vowel-harmonically correct suffix among a set of options for each word and the percentage of correct suffix choices were considered for each condition. They observed more correct suffix choices for words that undergo canonical vowel harmony when the words were presented both auditorily and in writing than when they were presented only in auditory form. The reverse pattern was observed for non-canonical vowel harmony, suggesting that orthographic input that does not transparently represent the relevant features of the triggering segment interfered with participants' application of the phonological rule. Moreover, in the case of canonical vowel harmony, a significant interaction between input type (auditory only vs. auditory plus writing) and proficiency was observed in which beginners were found to rely more heavily on written input than participants in the advanced group. While this latter finding should be interpreted with caution given the small number of participants in each proficiency level, it is suggestive of a differential effect of orthography over the course of acquisition. Indeed, more research investigating the role of orthography in the acquisition of a phonological processes and by a greater number of participants and various proficiency levels is needed to better our understanding.
Finally, Han and Kim (Reference Han and Kim2017) investigated the influence of orthographic input on the L2 production of allophones of /h/ in Korean, finding a beneficial effect of orthography. In Korean, the phoneme /h/ undergoes a weakening process and can be produced as [h], [ɦ], [w], or deleted when it occurs between voiced sounds. Importantly, the Korean writing system, Hangul (alphabetic syllabary that represents both phonemes and syllables), represents the variant pronunciations of /h/. In Hangul, when /h/ is realized as [h] or [ɦ], the written form contains <ㅎ>. When /h/ is deleted, <ㅇ> is used to represent the null consonant. When /h/ is realized as [w], the letter <ㅗ> occurs in onset position. Sixty native Mandarin speaking learners of Korean (30 beginners, 30 advanced) were exposed to Korean non-words containing /h/ allophonic variants. Participants received the same auditory input, but their exposure to corresponding spellings varied. One group was exposed to /h/ variants represented by the h-letter, another by the null-letter and the third group was exposed to auditory input only. Following exposure, participants completed two tasks: Picture naming and spelling recall in order to access the impact of exposure to orthographic input on learners’ production and lexical storage of /h/ variants. Participants in the h-letter group produced more [h] variants than the null-letter and auditory only groups on the picture naming task, suggesting that orthographic input supported learners’ acquisition of L2 allophonic variants. On the spelling recall task, the h-letter group used <ㅎ> more often than the null-letter and auditory only groups, suggesting that orthographic input also impacted the lexical storage of phonological variants. Together, these findings suggest that the acquisition of L2 allophonic variants may be facilitated when learners have access to orthographic input that directly represents variant pronunciations of phonemes.
In summary, these studies on the impact of orthography on the acquisition of phonological processes reviewed earlier have differed in the phonological process under investigation, as well as the type and amount of prior experience the learner has with the target language, but have typically found that orthography hinders the acquisition of target-like surface variants in the L2 when the written form is misleading with respect to properties of the segments that undergo the process or trigger the process in question. However, more research is needed from other languages and phonological processes to build a more robust empirical base, and the potential orthographic input tradeoff needs to be investigated further and its implications for language learning and teaching examined. More research focusing on the acquisition of learners at various proficiency levels will also be important for determining the influence of orthographic input on the acquisition of phonological processes over time.
5. Orthography and the acquisition of L2 syllable structure
Beginning with Young-Scholten (Reference Young-Scholten1997) and Young-Scholten, Akita, and Cross (Reference Young-Scholten, Akita, Cross, Robinson and Jungheim1999), several studies have investigated the role that orthography plays in the acquisition of L2 syllable structure, by examining learners’ production of nonnative consonant clusters. Young-Scholten et al. (Reference Young-Scholten, Akita, Cross, Robinson and Jungheim1999) taught native English and native Japanese speakers to associate novel words containing Polish consonant clusters with pictures in two conditions: ‘Picture’ (participants heard the word and saw a picture indicating the word's meaning) and ‘Word’ (participants additionally saw each word written). Participants were then asked to (1) name the pictures (without written words) and then (2) read the words. These productions were then coded for the number of syllables produced beyond the target-like production, a measure of epenthesis. While the authors do not report inferential statistics, the descriptive analysis demonstrates that overall, the native Japanese speakers epenthesized more often than did the native English speakers, and that both native language groups were more likely to epenthesize consonants than to delete in the ‘Word’ condition, and all participants were more likely to epenthesize than delete when the words’ written forms were available at test. The authors interpret this as evidence that orthography promotes epenthesis because it provides evidence for the consonantal content of the words.
Davidson (Reference Davidson2010) investigated the production of nonwords containing nonnative consonant clusters by adult native speakers of English (n = 23) and Catalan (n = 14). The stimuli consisted of nonwords containing initial CC clusters (e.g., [pkadi]) or initial CəC (e.g., [pəkadi]). Participants performed a word repetition task where the auditory word was presented twice and participants then repeated the word. Half of the words were also presented with their written forms (e.g., <pekadi> for English-speaking participants and <pakadi> for Catalan-speaking participants, following language-specific conventions for spelling schwa). Davidson (2010) found that participants from both language backgrounds were overall more accurate on words that were presented along with written forms, but that input modality (audio vs. audio plus text) interacted with properties of the consonant clusters; that is, the benefit associated with written forms in the input was moderated by phonological factors.
Zjakic (Reference Zjakic2017) investigated the effect of orthographic input on the learning of non-native consonant clusters in newly-learned words by native speakers of Australian English. Participants were taught word forms containing non-native consonant clusters (e.g., /mnok, mrok, dnok, dvok/) along with pictured meanings in two conditions: in the audio + orthographic condition, participants also saw the words’ written forms (e.g., <mnok, mrok, dnok, dvok>), and in the audio condition, they did not. They were then tested on their ability to determine whether an auditory word matched a picture; in the trials of interest, they needed to distinguish similar-sounding clusters (e.g., a picture of the /dvok/ and the auditory form [dnok]). Following the word-picture matching test, they also performed a phoneme deletion test where they were asked to delete either the first or the second sound in each word and type the remaining word. Participants in the audio + orthographic group were more accurate on both tests than were those in the audio group, suggesting a facilitative effect of orthographic input in this case of non-native consonant cluster learning.
Al Azmi (Reference Al Azmi2019) investigated the acquisition of English consonant clusters (word-initial and word-final two- and three-consonant clusters) by native speakers of northern Najdi Arabic. Sixty participants with no English language knowledge attended ten word-learning sessions over five weeks, under three different exposure conditions: 20 of the participants were not literate in their native language (the non-literate aural-only group), and of the 40 participants who were literate in Arabic, half were assigned to a literate aural-only condition and half to a group that was also exposed to English orthography. The word-learning sessions taught participants to associate auditory words with pictures (e.g., ‘plum’, ‘sky’), and the orthographic input group additionally saw the words’ written forms. At test, where participants were asked to produce the names for the pictures, the non-literate aural-only group did not differ from the literate aural-only group in accuracy, and the literate aural plus orthography group performed significantly more accurately than the literate group that had only aural input only on word-final two-consonant clusters. Overall, non-target-like productions were more likely to involve epenthesis for participants who saw written forms during word learning, and deletion for participants who did not have access to written forms.
The handful of studies reviewed above investigated the impact of written input on the acquisition of L2 syllable structure by inexperienced learners, and they generally report a facilitative effect of orthographic input on the acquisition of nonnative consonant clusters. Additionally, the presence of orthographic input during word learning or at test promoted epenthesis, as opposed to deletion, as a repair strategy, suggesting that the availability of orthographic input supports the acquisition of the full phonemic content of words relative to auditory input alone. It is yet unknown whether there are long-term consequences of these differential representations, though one might hypothesize that a complete representation of a word's consonants will support eventual target-like word production as a learner acquires the ability to perceive and produce nonnative consonant clusters, while a representation involving deletion (which these studies suggest may be more likely in the absence of orthographic input) will not. One might further hypothesize that the availability of consonant clusters in the orthographic input will enhance learners’ perception of consonant clusters, with positive effects across the lexicon and even for words that are acquired only through auditory input. To explore such hypotheses, future studies will be needed to look at the role of written input in the acquisition of syllable structure over time.
6. Orthography and the pronunciation and recognition of L2 words
The majority of studies of orthographic effects in L2 phonological development have considered the effects of written input on the acquisition and processing of L2 words, both in studies looking at the auditory recognition of L2 words and their pronunciation. In this section we review these studies.
6.1 Effects of orthography on the pronunciation of familiar L2 words
In the following studies, researchers have investigated L2 learners’ pronunciation of L2 words that are already known to them (in the relevant literature, these are often referred to as ‘familiar’ words, though they may vary in their degree of familiarity to learners). Several studies have provided evidence that mismatches between the grapheme-phoneme correspondences between the native and second languages (referred to here as incongruency) can interfere with learners’ production of L2 words.
Vokic (Reference Vokic2011) studied the production of the English flap by 15 native Spanish speakers living in the USA. The flap is common to Spanish and English (as a phoneme in Spanish and an allophone of /t/ and /d/ in English), and both Spanish and English employ the same alphabet. However, the two languages differ in how flap is represented orthographically—<t>, <d>, <tt>, or <dd> in English and <r> in Spanish. The letters <t> and <d> map only to /t/ and /d/ in Spanish, while <tt> and <dd> do not occur in Spanish. The participants read English words containing the flap in carrier sentences, and their productions were transcribed. Vokic (Reference Vokic2011) found that when participants did not produce the target flap, they produced the segment according to Spanish grapheme-phoneme correspondence rules, and also that participants produced the flap more often in high-frequency than in low-frequency words. These findings suggest that participants experienced interference associated with their native Spanish grapheme-phoneme correspondence rules when reading aloud English words containing flap, and also that such effects may be modulated by how often they have encountered individual words.
Nimz (Reference Nimz2011) examined the production of three pairs of German vowels (e.g., /iː - ɪ/, /aː - a/, /uː - ʊ/) by L1 Turkish learners of L2 German in order to determine whether either German quantity and/or quality distinctions pose difficulty for Turkish learners. Twenty L1 Turkish high schoolers in an intensive German as a foreign language program and 20 German native speaking control completed a production task involving the same six German vowels. Participants were presented with a picture depicting one of 18 items to be named three times and the data from eight individuals from each were randomly selected for acoustic analysis of vowel duration and vowel quality. In terms of vowel quality, the authors reported that all the vowels produced by Turkish learners aside from /i:/ differed from those produced by native German speakers. With respect to duration, of the 18 words, only two (e.g., <glas> ‘glass’ and <fuß> ‘foot’) were produced with significantly different lengths by the two groups. Interestingly, these were the only two words containing long vowels that were not marked explicitly in the orthography either by ‘lengthening h’ or by the digraph <ie> for /i:/, suggesting that these learners may have picked up on orthographic markers of length, and that orthographic inconsistency in the case of <glas> and <fuß> posed difficulty for Turkish learners.
Silveira (Reference Silveira2012) investigated the orthography-induced transfer of native language phonological processes to L2 pronunciation. Thirty-one adult native speakers of Brazilian Portuguese living in the USA performed a sentence reading task. The task involved 75 sentences, each of which contained a target consonant-vowel-consonant (CVC) word where the final C was /m, n, or l/, in two spelling conditions: one where the word ended in a consonant letter (e.g., <sun>), and another where the word ended in a silent <e> (e.g., <bone>). Participants were more likely to insert an epenthetic vowel following the word-final consonant in the silent ‘e’ words than in the words ending in the consonant letter, but were more likely to exhibit L1 phonological processes (i.e., nasal vocalization or delateralization) when words were spelled with a final consonant letter. Silveira (Reference Silveira2012) further found that there was an effect of English proficiency on performance, with the rate of L1 processes decreasing with higher proficiency.
In a series of studies involving native Italian-speaking teenagers with several years of English language experience, Bassetti and colleagues have revealed persistent effects of incongruence between native and L2 grapheme-phoneme correspondences. Bassetti and Atkinson (Reference Bassetti and Atkinson2015) studied the effects of exposure to L2 orthographic forms on native Italian high schoolers with ten years of English experience. In the first study, they investigated 14 participants’ pronunciation of silent letters in word reading and word repetition tasks, with stimuli including, for example, [dɛt] <debt> and [klɑɪm] <climb>. In the word reading task, participants simply read written words aloud. In the word repetition task, participants both saw the written word and heard the word produced by a native speaker of English. Participants produced significantly more of the silent letters in the word reading than in the word repetition task (‘silent letter epenthesis’; e.g. [dɛbt] for <debt>), where they encountered target-like auditory input to counter the potential influence of exposure to the orthographic forms. In a second study, they investigated the effect of spelling on vowel duration, with the hypothesis that when English words are spelled with two adjacent vowel letters (e.g., <seen>) versus one (e.g., <scene>), the vowel may be produced with a longer duration. Fifteen native Italian-speaking teens performed a read-aloud task where they were presented with English words differing in whether they contained one (e.g., <more>) or two adjacent (e.g., <door>) vowel letters, and their productions were analysed for vowel duration. Words with double vowel letters elicited significantly longer vowel durations than those with single vowel letters. In a third study, they examined the effect of English written forms on the production of past tense and past participle affixes. In English, these affixes are both spelled <ed>, but may be realized variably as /t/, /d/, or /əd/, depending on the stem-final consonant. However, if Italian learners of English pronounce the vowel letter in the affix, they will be more likely to produce the affix as /əd/, regardless of the conditioning environment. Indeed, this is what Bassetti and Atkinson (Reference Bassetti and Atkinson2015) found. In a fourth study, they investigated the effect of orthography on native Italian teens’ production of English homophones such as <aloud, allowed> and <right, write> in word reading and word repetition tasks. They hypothesized that the different spellings of the homophonic pairs would cause participants to pronounce them as non-homophones. They found that participants produced the homophones as non-homophones on average 40% of the time (e.g., pronouncing <sun> as [sʌn] but <son> as [sɔn]), and further that they produced more pairs as non-homophones significantly more often in the word reading than in the word repetition tasks. In sum, these four studies provide evidence for a range of orthographic effects on native Italian teens’ production of familiar English words.
Bassetti (Reference Bassetti2017) investigated the production of English words by native speakers of Italian. Of interest were English consonant phonemes that are spelled with either single or double letters (e.g., /p/ in <copy> or <floppy>; /k/ in <acute> or <accuse>). Because Italian has a contrast between short and long consonants, which is represented orthographically via single vs. double letters, Bassetti (Reference Bassetti2017) hypothesized that double consonant letters in English words may cause native Italian speakers to produce a spurious length contrast when producing English words. Thirty native Italian high school students with over ten years of English language experience (and a control group of native English speakers) read a list of English words in a carrier sentence, and their productions were analyzed for consonant duration. The native Italian participants produced a duration difference between consonants spelled with single vs. double letters, which the native English control participants did not. In a second experiment, 60 native Italian high school students were assigned to one of two tasks: delayed word repetition with written forms presented and delayed word repetition without written forms. Again, the native Italian participants produced longer consonant durations for those that are spelled with double letters in English, regardless of whether they had immediate access to the words’ written forms. This suggests that their underlying representations for the words is affected by the transfer of L1 Italian grapheme-phoneme correspondences to L2 English.
In another study, also focused on the production of single vs. double consonant letters by native speakers of Italian, Bassetti, Sokolovic-Perovic, Mairano, and Cerni (Reference Bassetti, Sokolovic-Perovic, Mairano and Cerni2018) investigated the pronunciation of English homophones with different spellings (e.g., <finish> and <Finnish>). The study involved 30 native Italian high school students learning English as an L2, 30 adult late Italian-English bilinguals living in the UK, and a control group of 30 adult native speakers of British English. Participants were asked to read sentences containing target words, and the target words were of two types: C-CC pairs were homophone pairs that differed in the number of consonant letters (e.g., <bury> - <berry>) and V-VV pairs were those differing in the number of vowel letters (e.g., <sole> - <soul>). Duration ratios in participants’ pronunciations indicated that not only did they produce longer consonants when the consonants were spelled with two graphemes, but also this effect was moderated by whether or not the long consonant was legal in the particular phonological environment in Italian. Further, the native Italian participants produced longer vowels when they were spelled with two vowel letters. There was no difference in performance overall between the teenage learners and the adult Italian-English bilinguals. Bassetti et al. (Reference Bassetti, Sokolovic-Perovic, Mairano and Cerni2018) conclude that English spelling conventions can cause Italian learners of English to produce homophone pairs differentially and according to the number of graphemes associated with individual segments.
However, orthography may also lead to mixed results, as demonstrated by Nimz (Reference Nimz2016) and Nimz and Khattab (Reference Nimz and Khattab2020). They investigated the effects of orthography on the production of German vowel length and quality by intermediate-advanced L1 Polish learners. Of particular interest was whether explicit orthographic cues to vowel length (i.e., ‘lengthening h’ preceding long vowels and double consonants following short vowels) help Polish learners of German realize more target-like length contrasts relative to long and short vowels that are unmarked. Eighteen native Polish learners of German and 20 native German speaking controls performed a picture naming task involving 48 familiar and picturable German words involving the three long and short vowels pairs /e: - ɛ/, /a: - a/, /o: - ɔ/. Half of the words involved explicit markers of vowel length, whereas the other half did not. Acoustic measurements of vowel length and vowel quality were made and subjected to analysis. Nimz and Khattab (Reference Nimz and Khattab2020) found that when vowel length was explicitly represented by the orthographic forms of L2 German words, Polish learners produced a greater difference between short and long vowels than when cues to vowel length were not present in the orthographic representation of the German words. With respect to vowel quality, while the production of /a/ and /ɔ/ were native-like, and /ɛ/ and /a:/ were quite target-like (differing in F2 and F1 respectively), the quality of /o:/ and /e:/ were non-target-like. The Polish learners realized German /o:/ as /ɔ/ and German /e:/ as /ei/, which the authors explain as combined effects of L1 grapheme-phoneme correspondences and the perceived similarity between L1 and L2 vowels (i.e., <o> corresponds to /o:/ in German but /ɔ/ in Polish, whereas <e> is /ɛ/ in Polish, but is acoustically closer to /ɪ/). The authors suggest that an interaction of auditory and orthographic input results in the production of a sound that is found in neither the L1 nor the L2 (see also, e.g., Rafat & Stevenson, Reference Rafat and Stevenson2019).
6.2 Effects of orthography on the recognition of familiar L2 words
In this section, we review studies that have provided evidence regarding an effect of orthography on the recognition of familiar L2 words. In a series of studies, Veivo, Järvikivi, and colleagues have investigated whether and under what conditions L1 and L2 orthography is activated during the auditory processing of French words by native speakers of Finnish. Veivo and Järvikivi (Reference Veivo and Järvikivi2013) examined whether early stages of L2 spoken word processing are subject to orthographic influences, as has been previously demonstrated for L1 spoken word processing (Grainger, Diependaele, Spinelli, Ferrand, & Farioli, Reference Grainger, Diependaele, Spinelli, Ferrand and Farioli2003). To this end, the authors conducted two lexical decision experiments using masked cross-modal priming to investigate the role of orthography in the lexical processing of 75 native Finnish-speaking L2 late learners of French. In a masked cross-modal priming task, orthography is present but masked from the conscious processing of the participants, making it possible to study the automatic activation of orthography during spoken word processing. In experiment 1, native Finnish L2 late-learners of French provided lexical decisions on 58 French word and 58 non-word targets presented in one of three conditions with respect to orthographically presented French primes: repetition (French orthographic equivalents of the French word presented auditorily), pseudohomophones (French non-words which could be pronounced like the target word following French orthographic conventions) and control (legal French nonword with no form overlap with targets). The authors hypothesized that if orthographic activation can result in facilitation in L2 spoken word processing, a repetition effect (faster response times to targets, and reduced error rates) should be observed when targets were preceded by orthographically equivalent visual primes. This is indeed what they found. In experiment 2, Veivo and Järvikivi (Reference Veivo and Järvikivi2013) examined between-language orthographic influences using L1-based primes and L2 French targets. French auditory targets were associated with three Finnish-based visual prime conditions: orthographic onset overlap (semantically and phonologically unrelated Finnish words that overlapped in the first three letters with the French target), Finnish pseudohomophones (legal Finnish nonwords that overlapped orthographically and phonologically with the French target), and unrelated controls (real Finnish words with no semantic, orthographic, or phonological overlap with target). Interestingly, different patterns of effects were observed depending on L2 proficiency and prime condition. Facilitation (faster reaction times) was observed for French targets that were primed by L1 Finnish words that overlapped orthographically in the high proficiency group but not the lower intermediate group. By contrast, a pseudohomophone facilitation effect was observed for the lower intermediate but not the high proficiency groups, suggesting that the influence of orthography is observed in L2 spoken word processing, but its effect is influenced by learners’ L2 proficiency.
Veivo, Järvikivi, Porretta, and Hyona (Reference Veivo, Järvikivi, Porretta and Hyona2016) further examined the role of orthographic information in L2 spoken word recognition by L1 Finnish late-learners of French and its relation to learner's proficiency. Sixty-four native Finnish speaking learners of French differing in their L2 proficiency and 24 native French-speaking controls completed two visual world experiments in which they were to match French spoken word forms to French written word forms while their eye movements were recorded. Of particular interest was how phonological and orthographic overlap between target and competitor influenced the matching process and the time-course of activation and whether L2 proficiency of the learner played any role. In experiment 1, the degree of phonological overlap between competitors and targets was manipulated, and orthographic overlap was held constant. The printed form of target words were displayed together with competitors with either a higher degree of phonological overlap or a lower degree of phonological overlap (e.g., target <base> [ba:z] was paired with either <bague> [bag] or <bain> [bɛ̃]). L2 participants, but not L1 participants, were affected by the degree of phonological overlap between target and competitor. Moreover, phonological overlap also had an effect on the time-course of processing. When there was greater phonological overlap, L2 comprehenders took longer to identify the target than when there was less phonological overlap regardless of their proficiency level. In experiment 2, the authors investigated the role of orthographic information in the matching process. They held phonological overlap between target and competitors constant, and manipulated orthographic overlap (e.g., target <mince> [mɛ̃s] was paired with either <mite> [mit] or <mythe> [mit]). They observed no effect of the orthographic overlap condition on looking patterns for either the L1 or L2 participants. However, an analysis of the time-course of recognition for the L2 groups revealed an effect of proficiency for overlap over time. In particular, competitors with greater orthographic overlap were more disruptive for high proficiency learners than for low proficiency learners, suggesting that orthographic information may play a different role in the L2 spoken word recognition depending on the comprehender's L2 proficiency.
Veivo, Porretta, Hyona, and Järvikivi (Reference Veivo, Porretta, Hyona and Järvikivi2018) conducted two visual word eye-tracking experiments to further examine the time-course of orthographic information in spoken word recognition by 64 L1 Finnish learners of L2 French of various proficiency levels. Building on Veivo et al. (Reference Veivo, Järvikivi, Porretta and Hyona2016), they consider the two types of within-language L2 competitors in the same experiment and additionally investigate the activation of between-language L1 competitors in order to evaluate the activation of L1 orthography in L2 spoken word recognition. As in Veivo et al. (Reference Veivo, Järvikivi, Porretta and Hyona2016), participants matched spoken French words to their written forms displayed in print. In experiment 1, French targets (e.g., target <cidre> [sidʀ] ‘cider’ were presented with either French competitors with either high phonological and low orthographic overlap (e.g., <cycle> [sikl] ‘cycle’) or low phonological and high orthographic onset overlap (e.g., <cintre> [sɛ̃tʀ] ‘hanger’). An analysis of where participants looked as a function of time, proficiency, and overlap condition revealed that more proficient L2 learners looked to targets faster than less proficient learners. Moreover, the effect was more pronounced for lower proficiency learners. However, unexpectedly, looks to target did not differ as a function of competitor type. In experiment 2, French targets (e.g., <paume> [pom] ‘palm’) were presented with either high phonological and low orthographic overlap (e.g., <pommi> [po:mi] ‘bomb’) or low phonological and orthographic onset overlap (e.g., <pauhu> [pauhu] ‘roar’) onset competitors in Finnish. The authors report that orthographically similar L1 words delayed recognition more than phonologically similar L1 words. Moreover, in the presence of L1 orthographic competitors, higher proficiency learners looked more to the target than lower proficiency learners. Together, these findings suggest that L2 spoken word recognition is influenced by L1 competitors that overlap in phonological and orthographic onset, but that the effect is modulated by L2 proficiency.
Other studies have provided evidence for the effect of incongruency on learners’ auditory word recognition. Shea (Reference Shea2017) studied the processing of Spanish words by native English-speaking learners of Spanish. In Spanish, the graphemes <b, d, g> map to stops (as in English), but also to approximants in some phonological environments (e.g., <b> in a[β]uela ‘grandmother’). Shea (Reference Shea2017) predicted that native English-speaking learners of Spanish will activate the stop but not the approximant allophone when presented with Spanish written forms. Forty-two adult native speakers of English studying Spanish participated in a lexical decision task involving cross-modal and within-modal priming. In cross-modal (written-auditory) trials, participants saw a written word prime, then heard an auditory word target, and were asked whether the target was a real Spanish word. In the ‘approximant matched’ trials, the auditory forms exhibited the Spanish pattern of intervocalic lenition (e.g., <CABELLO> ‘hair’ [kaβeʝo]). However, in ‘stop matched’ trials, the auditory form did not conform to the Spanish intervocalic lenition rule, but rather mapped to a stop as in English pronunciation (e.g., <CABEZA> ‘head’ [kabesa]). In the within-modal trials, the prime and the target were both auditory forms (e.g., approximant matched: [naða] - [naða] ‘nothing’; stop matched: [ʝega] - [ʝega] ‘arrives’). Response times indicated an interaction between mode (cross- or within-mode) and trial type, with faster responses on stop matched trials in the cross-modal condition, suggesting that in the presence of written forms, the participants activated the (non-target-like) stop variant. In a second experiment, involving long-term repetition priming, the same participants performed within-modal (auditory) priming, this time with >5 minutes elapsing between presentation of the prime and the target. In this experiment, the learners exhibited priming by both the stop and approximant variants; however, a control group of native Spanish speakers exhibited priming only by the approximant (i.e., native-like) variants. Shea (Reference Shea2017) concludes that familiar graphemes will activate an L1 variant more strongly than a (different) L2 variant, but that the effect may be attenuated when the prime is auditory and there is a longer delay between the prime and the target.
Dean and Valdes Kroff (Reference Dean and Valdes Kroff2017) examined the processing of Spanish words by 18 English-dominant English-Spanish bilinguals and 13 Spanish-dominant bilinguals using a visual world paradigm and eye tracking. Participants heard Spanish words in a neutral sentence context (e.g., El detective busca su __ ‘The detective is looking for his __’) and were asked to click on a picture matching the target word. In the trials of interest, the target word (e.g., banco ‘bench’) began with a phoneme (in this case, /b/) that was also a potential mapping for the initial consonant letter in a distractor picture (e.g., vaca ‘cow’). They found that the Spanish-dominant but not the English-dominant bilinguals exhibited the effects of lexical competition in these trials. The authors interpret this finding as indicating that the differential mappings between orthography and phonology in Spanish and English were responsible for this effect—because the letter <v> also maps to the phoneme /b/ in Spanish but not in English, Spanish-dominant bilinguals were more likely to experience lexical competition between words spelled with initial <b> and <v>. In a second type of target trial, a target beginning with the phoneme /x/ and the letter <j> (e.g., jabón ‘soap’) was presented with a competitor beginning with the letter <h> (which is silent) (e.g., hacha ‘axe’). The expectation was that for these trials, the English-dominant bilinguals would exhibit the effects of lexical competition, given that between their two languages the two letters map to similar phonemes (i.e., <j> - /x/ in Spanish and <h> - /h/ in English). However, this prediction was not borne out; instead, as with the trials involving <b> and <v> words, only the Spanish-dominant bilinguals exhibited the competition effect. The authors conclude that differences in difficulty of the task for the two language groups may be responsible for this finding, and that more work on this topic is needed.
Other studies have reported no benefit of orthography on learners’ auditory word recognition. Simonchyk and Darcy (Reference Simonchyk and Darcy2017) and Simonchyk and Darcy (Reference Simonchyk and Darcy2018) examined the effect of orthography on native English speakers’ ability to learn Russian words involving palatalization contrasts. Twenty native English speakers enrolled in intermediate-level Russian language classes and 20 advanced-level learners participated in tasks involving a set of familiar Russian words. The words consisted of plain/palatalized pairs (e.g., <салат> [saˈlat] ‘salad’ - <спать> [spatʲ] ‘to sleep’). In the written picture naming task, participants saw a picture and were asked to spell the word represented by the picture on a sheet of paper. In the auditory word-picture matching task, they were asked to indicate whether or not a picture and an auditory word were accurately matched; in target trials, the auditory form differed from the correct label only in palatalization to form Russian nonwords (e.g., *[saˈlatʲ] and *[spat]). In a metalinguistic task, participants were asked to circle all of the palatalized letters in their responses from the written picture naming task. They found that while learners in both groups were quite accurate at spelling the words, their performance on the metalinguistic task—where they were asked to circle the palatalized sounds—was somewhat less accurate. In the auditory word-picture matching task, participants in both groups frequently (incorrectly) accepted the nonwords as matching the pictures, suggesting that accurate knowledge of words’ written forms does not necessarily lead to accurate auditory processing of the palatalization contrast for these learners.
Nimz and Khattab (Reference Nimz and Khattab2015) and Nimz (Reference Nimz2016) examined the phono-lexical representation of German vowel length by L1 Polish learners on a judgment task. Twenty participants were presented with images depicting 24 familiar German words involving /e:/, /a:/, /o:/and then were presented with auditory forms that either matched the intended German word or mismatched it in length, quality, or both (e.g., <Mehl> /me:l/ ‘flour’ was paired with [mel], [mɛ:l], or [mɛl], respectively). Half of the German words were ones in which the written forms of the words were explicitly marked for their length through so called ‘lengthening h’, whereas the other half were not. Unexpectedly, despite difficulty perceiving length differences on a discrimination task involving nonwords (experiment 1), the Polish learners were more accurate at detecting incorrect renditions on the basis of their mismatching length than on the basis of their mismatching quality. Moreover, there was no difference in performance depending on whether the words were explicitly marked for length by the German orthography or not.
Together, these studies on the influence of orthography on the recognition of familiar L2 words suggests that L1 and L2 orthographic information is activated during spoken word recognition in the L2, though the role of orthography appears to be modulated by the proficiency of the learner. Moreover, orthographic information across these studies typically resulted in interference effects or no beneficial effect.
So far, we have discussed the pronunciation (VI.a.) and recognition (VI.b.) of words that are already known to learners. A primary advantage of studies focused on the processing of familiar words by experienced learners lies in their ecological validity. That is, these studies examine how actual words acquired by actual L2 learners are processed. However, a disadvantage of these studies is that we typically know little if anything about the circumstances under which these words were learned, in particular the timing of learners’ exposure to the auditory and/or written forms, and we thus do not know the roles that the two types of input played in the development of the words’ representations. To address the question of how orthographic input affects the acquisition of words, other studies have sacrificed the ecological validity of the above studies in favor of studies involving highly controlled word exposure conditions.
6.3 Effects of orthographic input on the pronunciation of newly-learned L2 words
Now we turn to studies where participants have been taught a new set of words during the study period, for the purpose of being able to manipulate the information available to learners while learning the words. These studies have immediate relevance to language teaching in that they shed light on the impact of orthographic input on learners’ memory for new words’ phonological forms. We begin with a discussion of studies that are designed to test the utility of orthographic input in supporting learners’ pronunciation of novel phonemes.
Steele (Reference Steele2005) demonstrated that access to written forms during word learning can influence L2 learners’ knowledge of the segmental structure of words. Mandarin learners of French exhibit difficulty with French /ʁ/ in clusters. In addition, /ʁ/ is similar to Mandarin aspiration in some contexts. Together, these observations suggest that native Mandarin speakers may perceive the /ʁ/ in stop-/ʁ/ clusters as aspiration rather than as a separate segment. Native Mandarin learners of French were taught French words containing stop-/ʁ/ clusters. During training, the Aural group heard the words and saw pictures representing the words’ meanings, while the Aural + Ortho group additionally saw the words’ spellings. At test, where participants were asked to name the pictures, the Aural + Ortho group was less likely to delete the /ʁ/, exhibited shorter aspiration of the stop consonant, and produced longer /ʁ/ than participants in the Aural group, suggesting that access to the words’ written forms supported the acquisition of the words’ segmental structure.
Rafat (Reference Rafat2015) investigated the pronunciation of Spanish assibilated/fricative rhotics by native English speakers who had not learned Spanish. During a word exposure phase, participants heard Spanish words containing the assibilated/fricative rhotic in word-final position (e.g., <ahumar> [aumar̆]) and saw the words’ pictured meanings (although the words were real Spanish words, they were assigned new, readily picturable meanings). Participants in the auditory-orthographic group additionally saw the words’ written forms. At test, participants were asked to name the pictures. Rafat (Reference Rafat2015) found that participants in the auditory-only group were more likely to produce the word-final rhotics as fricatives, while those in the auditory-orthography group were more likely to produce them as rhotics (e.g., [r̆, ɹ]), suggesting that the availability of the written forms—specifically the letter <r>—influenced participants’ interpretation of the target segments as rhotics. Rafat (Reference Rafat2015) further found that the acoustics of the auditory forms encountered during training modulated the orthographic input effect: participants in the auditory-orthography group were more likely to produce the assibilated/fricative rhotic for words whose trained forms involving higher center of gravity values. Together, these findings demonstrate an effect of both the auditory properties of the input and the availability of written forms in the input on native English speakers’ acquisition of Spanish assibilated/fricative rhotics. From these two studies, we conclude that orthographic input can substantially affect the pronunciation of L2 words.
6.4 Effects of orthographic input on the recognition of newly-learned L2 words
Now we turn to studies of the impact of orthography on learners’ auditory processing of newly-learned words containing novel phonological contrasts, beginning with several studies that have shown facilitative effects of orthographic input. Escudero, Hayes-Harb, and Mitterer (Reference Escudero, Hayes-Harb and Mitterer2008) investigated the effect of orthographic input on the acquisition of the English /æ - ɛ/ contrast by native speakers of Dutch who were also proficient in English. Participants learned an English-like mini lexicon with words containing /æ/ or /ɛ/, a contrast that is known to be difficult for Dutch learners of English (Weber & Cutler, Reference Weber and Cutler2004). They were exposed to the auditory forms of the words (produced by a native speaker of British English) along with their pictured meanings (indicated by line drawings of non-objects). Half of the participants were additionally shown written forms of the words (e.g., [tenzə] was paired with the written <tenzer> and [tændək] with the written <tandek>). At test, the two groups of participants showed different patterns of lexical activation, as demonstrated by their looking patterns. While the group that was not exposed to the words’ written forms showed a symmetrical confusion (i.e., they looked to pictures containing both /æ/ and /ɛ/ in response to auditory words containing [æ] and [ɛ]), the group exposed to written forms did not (i.e., while auditory words containing [æ] triggered looks to pictures containing both /æ/ and /ɛ/, auditory words containing [ɛ] triggered looks only to pictures containing /ɛ/). The finding suggests that the systematic graphemic contrast between <a> and <e> in the orthographic input supported the participants’ encoding of the difficult L2 phonological contrast. Importantly, the graphemes <a> and <e> were familiar to Dutch learners of English and signaled a contrast in both Dutch and English (albeit a different one).
Showalter (Reference Showalter2012) and Showalter and Hayes-Harb (Reference Showalter and Hayes-Harb2013) demonstrated that even unfamiliar orthographic information can facilitate the acquisition of novel phonological contrasts. They examined the influence of familiar graphemes (Pinyin) with unfamiliar tone mark diacritics on the acquisition of Mandarin lexical tone contrasts by adult native speakers of English. In the first experiment, they taught 26 participants two Mandarin-like minimal quadruplets differentiated by four lexical tones. Participants in the No Tone Marks group were exposed to the auditory forms and pictured meanings, while those in the Tone Marks group additionally saw the words written in Pinyin (e.g., <fiān, fián, fiăn, fiàn>). At test, participants determined whether auditory words and pictures were correctly matched. On target items, where the picture was associated with a different tone than the auditory form (e.g., a picture of what they had learned was gi-tone1 and the auditory form [gi-tone4]), participants in the Tone Marks condition significantly outperformed those in the No Tone Marks condition. In a second experiment, Showalter (Reference Showalter2012) and Showalter and Hayes-Harb (Reference Showalter and Hayes-Harb2013) demonstrated, with a new group of participants and the same training conditions and protocol, that Tone Marks participants were able to learn to associate the tone marks with the tones to some extent (while No Tone Marks participants were not), suggesting that the tone marks’ iconicity was not solely responsible for the facilitatory effect of tone marks in the first experiment.
Escudero, Simon, and Mulak (Reference Escudero, Simon and Mulak2014) hypothesized that the effect of exposure to written forms would depend on the relationship between grapheme-to-phoneme correspondences across the learners’ languages. In particular, exposure to orthographic input during training was expected to help word recognition when L1 and L2 grapheme-phoneme correspondences were congruent, but hinder it when grapheme-phoneme correspondences were incongruent. In their study, 73 Spanish listeners (30 naive to Dutch and 43 Dutch learners of various proficiency levels) learned Dutch pseudowords containing one of six Dutch vowels /i, ɪ, a, ɑ, y, ʏ/ and their corresponding ‘meanings.’ Half of the participants were exposed to auditory forms only, while the other half additionally saw written forms. At test, participants identified the image that corresponded to the auditory form they heard on a given trial. The phonological form of Dutch target and distractor pairs differed in either one segment (minimal pair) or more than one segment (non-minimal pair). Minimal pairs involved contrasts that were expected to be either perceptually easy or difficult for Spanish listeners. Additionally, difficult minimal pairs were categorized as involving either congruent or incongruent grapheme-phoneme correspondences. As expected, exposure to words’ written forms indeed impacted performance, but its impact depended on the congruence of L1 and L2 grapheme-phoneme correspondences. When difficult novel contrasts were represented by familiar graphemes that likewise signal a contrast in the L1, exposure to orthographic forms during training aided performance. Performance was hindered when orthographic input filtered by L1 orthographic conventions failed to preserve the contrast.
Building on the Escudero et al. (Reference Escudero, Simon and Mulak2014) study, Escudero (Reference Escudero2015) further examined the role of orthographic input by comparing the Spanish speaking listeners’ data reported in Escudero et al. (Reference Escudero, Simon and Mulak2014) to native Australian English speakers with no prior experience with Dutch. Given that English has some of the contrasts that are present in Dutch, but absent in Spanish, the new learner population was expected to differ from the Spanish learners in the degree of difficulty they experience with some of the relevant Dutch contrasts. Moreover, because Spanish has a transparent orthography and English a relative opaque one, orthographic information was also expected to impact the two populations differently. 78 Australian English speakers (43 monolingual, 35 bilingual or multilingual) with no prior experience with Dutch performed the word learning task involving the same materials as Escudero et al. (Reference Escudero, Simon and Mulak2014). Out of all 66 word pairs presented to learners (51 non-minimal and 15 minimal), an effect of orthography was observed for only two of the perceptually difficult minimal pairs (i.e., /ɪ - y/ and /ɪ - ʏ/), which were relatively easy to discriminate, suggesting that orthography may serve to enhance differences that listeners’ already perceive with some accuracy. Unexpectedly, no effect of language background on the influence of orthography was observed. That is, orthography had a comparable effect on English and Spanish listeners despite differences in the opacity of their L1 orthographic systems.
While the studies just discussed all indicate a facilitative effect of orthographic input, Bürki, Welby, Clement, and Spinelli (Reference Bürki, Welby, Clement and Spinelli2019) point to a more nuanced effect. They examined the acquisition of English pseudowords by native speakers of French. In their study, 26 native speakers of French who were L2 speakers of English were taught to associate the auditory pseudowords with pictured meanings. Half of the words were presented in the audio-only condition, and the other half in the audio-ortho condition, where participants also saw words’ spellings. Participants were subsequently tested on their memory of the words in a picture naming task. Naming performance was more accurate and response latencies shorter for words in the audio-ortho condition. However, acoustic analyses of the productions indicated that words in the audio-ortho condition were produced with more French-like (and less target English-like) vowel formants, suggesting that while orthographic input may facilitate aspects of word learning, it may in fact interfere with the development of target-like pronunciation.
There are also a number of studies demonstrating no beneficial effect or mixed effects of orthographic input on learners’ acquisition of novel L2 phonological contrasts. Shepperd (Reference Shepperd2013) examined the influence of orthographic input on native English speakers’ acquisition of words involving the Zulu click contrasts /ǀʰ - gǀ/ and /g! - gǁ/. 26 native speakers of English with no prior Zulu language experience participated in discrimination and word learning tasks. In the first task, an AXB discrimination task, participants were tested on their ability to distinguish the click contrasts. They exhibited 73% and 72% accuracy on the /ǀʰ - gǀ/ and /g! - gǁ/ contrasts, respectively. Participants then learned a set of auditory words containing the target sounds along with pictured meanings (e.g., /ǀʰu:la/ ‘bath’ and /gǀu:la/ ‘fish’). For half of the words, participants also saw the words’ written forms (e.g., <CULA> and <GULA>). Participants were then asked to determine whether auditory words and pictures matched in immediate and delayed (two days later) tests. Participants exhibited no overall benefit of orthographic input for the words involving the novel contrast, though there was a significant positive impact of orthographic input on participants’ performance on control words involving the familiar /p - b/ and /v - z/ contrasts.
While Showalter and Hayes-Harb (Reference Showalter and Hayes-Harb2013; discussed earlier) demonstrated a facilitative effect of unfamiliar diacritics on native English speakers’ acquisition of Mandarin lexical tone contrasts, Hayes-Harb and Hacking (Reference Hayes-Harb and Hacking2015) found that unfamiliar diacritics did not facilitate native English speakers’ acquisition of Russian lexical stress. 44 naive native English speakers (Inexperienced Learners) and 29 native English-speaking learners of Russian (Experienced Learners) learned Russian nonwords forming six lexical stress minimal pairs (e.g., <сýба - субá> [‘suba - su'ba]). The Inexperienced Learners were assigned to one of four word learning conditions: Latin-Stress (Latin script with stress marks), Latin-NoStress, Cyrillic Stress, and Cyrillic-NoStress. The Experienced Learners were assigned only to the two Cyrillic conditions. During a word learning phase, participants heard a word (words were produced by two native speakers of Russian), saw a picture indicating the word's meaning, and saw the word's written form in accordance with their word learning condition. At test, they were asked to determine whether an auditory word and a pictured meaning were correctly matched. Performance by the Inexperienced Learners in all word learning conditions was effectively at chance, and while the Experienced Learners exhibited better performance overall, there was no significant difference in performance by those in the Cyrillic-Stress and Cyrillic-NoStress conditions. In a follow-up experiment involving productions by only one of the talkers in an attempt to simplify the task, there was no improvement in overall performance. Thus, lexical stress poses a substantial challenge for native English speakers learning Russian, which is not readily overcome by the availability of lexical stress marks in the orthographic input.
Two studies, Showalter and Hayes-Harb (Reference Showalter and Hayes-Harb2015) and Jackson (Reference Jackson2016), demonstrate the limits of even systematic and transparent orthographic input on the acquisition of novel contrasts. Showalter and Hayes-Harb (Reference Showalter and Hayes-Harb2015) were interested in whether a systematic and transparent, but unfamiliar, writing system could support the acquisition of a novel lexical contrast. They taught 30 adult native speakers of English a set of Arabic nonwords comprising six /k - q/ minimal pairs (e.g., [kubu], [qubu]), produced by two native speakers of Jordanian Arabic, along with pictured meanings. Participants were assigned to one of two conditions: Arabic Script and Control. During a word learning phase, participants in the Control condition heard a word and saw a line drawing indicating the word's ‘meaning’. In the Arabic Script condition, participants also saw the word written in Arabic (e.g., <كوبو> [kubu], <قوبو> [qubu]). At test, participants were asked whether auditory words and pictures were matched. There was no significant difference in accuracy between the groups. Hypothesizing that the unfamiliarity of the graphemes in addition to the script direction difference between English and Arabic, they conducted a second experiment wherein they repeated the Arabic Script condition, and further provided explicit instruction about the directionality of Arabic, with no improvement in test performance. Next, hypothesizing that the Arabic script's unfamiliarity may undermine the potential benefit of orthographic input, they used the Roman alphabet (e.g., with <kubu> and <qubu>) in a third experiment, but still found no significant beneficial effect of orthographic input. In a fourth experiment, hypothesizing that having multiple talkers produce the auditory stimuli may be interfering with participants’ ability to detect the /k - q/ contrast, they reduced the number of talkers who produced the stimuli to one, but still saw no improvement in performance at test. The authors concluded that the auditory /k - q/ contrast may be sufficiently difficult for native English speakers to perceive that the difficulty overrides any potential facilitatory effect of orthographic input. In a follow-up to Showalter and Hayes-Harb (Reference Showalter and Hayes-Harb2015), Jackson (Reference Jackson2016) made further attempts to leverage orthographic input to support native English speakers’ ability to distinguish Arabic /k - q/ minimal pairs. This study involved the same auditory stimuli, pictures, and protocol as Showalter and Hayes-Harb (Reference Showalter and Hayes-Harb2015), but differed both in the written forms presented to participants and also in that half of the participants were given a brief instruction session prior to learning the words. 52 participants were assigned to four word learning conditions: Diacritic-WithoutInstruction (the target contrast was spelled with a diacritic indicating the difference between /k/ <kubu> and /q/ <ḳubu>, no instruction provided), NovelGrapheme-WithoutInstruction (/k/ <kubu> and /q/ <Ꮧubu>), Diacritic-Instruction (word learning was preceded by a brief instruction session where participants’ attention was drawn to the novel /q/ phoneme and that it was written with an unfamiliar letter), and NovelGrapheme-Instruction. Participants in the novel grapheme conditions performed more accurately at test than those in the diacritic conditions, but the effect of instruction was only significant for participants in the diacritic conditions, with instruction leading to more accurate performance. Jackson (Reference Jackson2016) concludes that overall, while the novel grapheme was more supportive of participants learning the /k - q/ lexical contrast, the reduced efficacy of the diacritic was partially offset by explicit instruction.
In a departure from the typical focus in this literature on the acquisition of novel contrasts, Yang (Reference Yang2015) examined the influence of orthographic input on the acquisition of vowel and consonant free variation. Of particular interest was whether exposure to written input during word learning would help L2 learners to link free variants to a single lexical entry. In experiment 1 and 2, 20 Mandarin native speakers from Taiwan (logographic L1) and 20 native speakers of American English (alphabetic L1) learned a set of new words together with pictured meanings from an artificial language, some of which involved the vowels [ɔ] and [u] in free variation (e.g., [fɔsat] and [fusat] were both possible pronunciations of the word ‘shark’ spelled <fosat>). Yang (Reference Yang2015) hypothesized that exposure to L2 orthography during word learning would help learners link two free variants to a single lexical entry and that Chinese learners may benefit more than English speakers as their L1 and the target language do not share the same orthography. Unexpectedly, no effect of orthography was observed in the word-picture matching task (experiment 1). However, exposure to orthographic input during word learning did help participants score higher on the picture-naming task (experiment 2). Additionally, English learners produced more new (untrained) forms of free variants when written input was available during word learning. Experiment 3 and 4 used the same word learning paradigm and tests to examine the influence of orthographic input in the acquisition of free variation involving consonants (e.g., [p]-[b] or [t]-[d]) by 20 Taiwanese and 21 American English participants. For the consonant alternations, learners benefited from exposure to orthographic input on a word-picture matching task (experiment 3) when free variation involved [p]-[b], not when it involved [t]-[d]. Unlike experiment 2, the picture-naming results for the consonant free variation (experiment 4) revealed no clear benefit of orthographic exposure. Yang (Reference Yang2015) concluded that the results, while overall stronger for production than recognition, were mixed for the influence of orthographic input on the acquisition of L2 free variants in new words.
Finally, several studies have demonstrated interfering effects of exposure to orthographic input during novel word learning. Erdener and Burnham (Reference Erdener and Burnham2005) investigated the effects of audiovisual speech cues and written input on the pronunciation of non-native speech sounds in new words. In their study, native Turkish (a transparent L1) and native Australian English (an opaque L1) speakers with no prior experience with Spanish or Irish produced Spanish (transparent) and Irish (opaque) nonwords following exposure to them in four audiovisual conditions: auditory-only, auditory-visual, auditory-orthographic, and auditory-visual-orthographic. They were particularly interested in the impact of audiovisual speech cues depending on the transparency of a learner's native language orthographic system. While orthographic input was found to facilitate the production of non-native speech sounds overall, orthographic input interacted with group such that for speakers of Turkish (transparent L1), orthography was beneficial for transparent Spanish, but increased error rates for opaque Irish. For speakers of Australian English (opaque) performance on Spanish and Irish varied little. Of particular relevance here, orthographically induced phonological transfer was also observed for both groups, but resulted in substantially increased error rates (0% to 46%) for Turkish speakers for Spanish nonwords containing <j>, which was pronounced with the L1 Turkish phoneme /ʒ/ rather than /x/.
Hayes-Harb, Nicol, and Barker (Reference Hayes-Harb, Nicol and Barker2010) demonstrated an interfering effect of incongruency on the acquisition of a set of English-like nonwords by native English speakers in three word learning conditions: congruent orthography, incongruent/congruent orthography, or auditory only. The auditory only group heard the words and saw pictures indicating their meanings. The congruent group additionally saw written forms, all of which were congruent with English spelling conventions (e.g., [fɑməg] <famog>, [faʃə] <fasha>). The incongruent/congruent group also saw written forms, many of which were not congruent with English spelling conventions—these items were of two types: incongruent-extra-letter (e.g., [kɑməd] <kamand>) and incongruent-wrong-letter (e.g., The nonwords belonged to three conditions: congruent (spellings are congruent with English spelling conventions; e.g., [fɑməg] <famog>), incongruent-wrong-letter (e.g., [faʃə] <faza>). They found an effect of word learning condition on the incongruent-wrong-letter items, with participants in the incongruent/congruent group more likely to accept the incorrect pronunciation [fazə] than participants in the other two groups. In this way, Hayes-Harb et al. (Reference Hayes-Harb, Nicol and Barker2010) demonstrated that incongruencies between the spelling conventions of the native language and a L2 can cause learners to misremember the phonological forms of words.
Rafat (Reference Rafat2016) investigated the factors that contribute to orthographically-induced phonological transfer in the L2, focusing on the influence of grapheme-phoneme correspondences and various conditions of training and test. Forty native English-speaking participants with no prior experience with Spanish completed a picture-naming task involving Spanish words containing familiar sounds that are written either with the same grapheme (e.g., <m> - [m], <n> - [n], <b> - [b], <d> - [d], <h> - [∅]VCV, <s> - [s]) or different (e.g., <v> - [b], <ll> - [ʝ], <d> - [ð̞ ], <z> - [s], and <h> - [∅]) grapheme in English (i.e., congruent or incongruent grapheme-phoneme correspondences respectively). Participants were randomly assigned to one of four conditions of training and test: ‘orthography during training and test’, ‘orthography during training’, ‘orthography during test’, ‘auditory only’. The author hypothesized that orthographic input would promote L1-based phonological transfer and that incongruent grapheme-phoneme correspondences will result in transfer and lead to non-target-like productions, whereas congruent grapheme-phoneme correspondences would not. In line with these hypotheses, Rafat (Reference Rafat2016) reported greater proportions of non-target-like productions that could be attributed to orthographically-induced transfer for the three orthographic conditions relative to the auditory-only condition. Moreover, the proportion of non-target-like productions was found to differ between the orthography conditions with the ‘orthography during training and test’ and ‘orthography during training’ conditions producing more errors than the ‘orthography during test’. As predicted, grapheme-phoneme correspondences that differed in Spanish and English resulted in orthography-induced transfer effects in L2 production for participants in the orthographic conditions, whereas grapheme-phoneme correspondences that were the same in Spanish and English did not.
Building on the finding that incongruent grapheme-phoneme correspondences in Spanish and English can lead to orthography induced transfer, Rafat and Stevenson (Reference Rafat and Stevenson2019) investigated whether simultaneous exposure to auditory and orthographic input can result in McGurk-like effects in L2 production. The McGurk effect (McGurk & MacDonald, Reference McGurk and MacDonald1976) is a perceptual illusion that is elicited when auditory and visual cues simultaneously presented to listeners provide conflicting information (auditory /ba/ is paired with facial/visual /ga/) and result in an integrated percept (a combination /bga/ or a fusion /da/) that is not contained in either the auditory or visual information. Participants were randomly assigned to one of four conditions of training and test: ‘orthography during training and test’, ‘orthography during training’, ‘orthography during test’, and ‘auditory only’. Using a picture naming task, the authors elicited productions of Spanish words involving various grapheme-phoneme correspondences that are incongruent between Spanish and English (<v> - [b], <d> - [ð̞], <z> - [s], <ll> - [ʝ]). Whereas orthography-induced phonological transfer was predominant for the three orthographic conditions for <v>-[b] and <d>- [ð̞], McGurk-like effects were observed only for <z> - [s], <ll> - [ʝ]. Combination productions (e.g., [lj]) were observed for the digraph <ll>, which is pronounced [ʝ] in the Spanish auditory input but that maps to [l] in L1 English. Fusions were also observed for <z> - [s] with [z̥] productions of Spanish [s] in word initial position. These findings suggest that in addition to well-attested orthographically-induced transfer effects, the simultaneous presentation of auditory and orthographic input may also result in McGurk-like effects and lead to other types of non-target-like productions.
Building on the emerging understanding that orthographic input effects may be moderated by graphemic familiarity and L1-L2 congruence, Showalter (Reference Showalter2018a) and Showalter (Reference Showalter2018b) investigated the effects of familiarity with L2 graphemes and congruence between the L1 and L2 grapheme-phoneme correspondences on naive native English speakers’ ability to learn the phonological forms of Russian-like words. L1 English and L2 Russian provides an opportunity to study these two variables simultaneously because of the presence of familiar (e.g., <H>) and unfamiliar (e.g., <Ф>) letters in Cyrillic, as well as the presence of congruent (e.g., <M> maps to /m/) and incongruent (e.g., <H> maps to /n/) grapheme-phoneme correspondences. Showalter (Reference Showalter2018a) and Showalter (Reference Showalter2018b) taught 30 native English speakers a set of 12 Russian-like words (all of which contained familiar phonemes) in two conditions: no orthography and orthography. In the no orthography condition, participants learned the words via auditory presentation and pictured meanings; in the orthography condition, participants additionally saw the words spelled in Cyrillic. The words belonged to three conditions: unfamiliar (e.g., <ФИЛ> - [fil]), familiar-congruent (e.g., <KOM> - [kom]), and familiar-incongruent (<HOM> - [nom]). Participants were then asked to determine whether auditory words matched the pictures. In mismatched trials in the unfamiliar and familiar-congruent conditions, the auditory foils involved randomly-selected onsets (e.g., the word <ФИЛ> - [fil] was pronounced [dil]); in the familiar-incongruent condition, the foils were pronounced according to English grapheme-phoneme correspondences (e.g., the word<HOM> - [nom] was pronounced [hom]). Participants in the no orthography group performed near ceiling on all words, and those in the orthography group were near ceiling in the unfamiliar and familiar-congruent words, but were significantly less accurate on familiar-incongruent words. This pattern of results provides further evidence for the interference effects of incongruency between L1 and L2 grapheme-phoneme correspondences, suggesting that incongruency poses particular challenges for L2 learners.
With an eye to the practical implications of this line of research, Showalter (Reference Showalter2018b) and Showalter (Reference Showalter2019) followed up on this finding, investigating the effects of L2 experience and instructional interventions on the Cyrillic incongruency effect for native English speakers. A new group of naive native speakers of English were assigned to the no orthography, orthography, intervention A, and intervention B groups (n = 20 in each group). In intervention A, the first letter of each word was presented in boldface during the word learning phase (referred to as ‘textual enhancement’), in an effort to draw participants’ attention to the potentially misleading letters. In intervention B, participants were instructed prior to learning the words that some of the written words ‘contain different letters than you might expect’ (p. 16). There were also groups of 20 beginning and 20 experienced learners of Russian who were all exposed to written forms during word learning. This study replicated the earlier finding that naive participants in the orthography condition experienced interference from incongruent grapheme-phoneme correspondences. However, the experienced learners did not, and performance by the beginning learners and both intervention groups was intermediate to (but not significantly different from) that of the (naive) orthography group and the experienced learners. Together, these findings suggest that brief interventions with naive learners do not moderate interference from incongruent grapheme-phoneme correspondences, but with enough experience, native English-speaking learners of Russian can overcome this interference.
Han and Oh (Reference Han and Oh2018) examined the role of phonetic distance and orthographic information in the perception and lexical encoding of Arabic contrasts by L1 Korean learners in a word-learning study. 48 native Korean speakers learned a set of 30 MSA non-words (five involving each of the six target phonemes in the experimental /l - r/, /χ - ħ/ and /m - t/ control contrast). The contrasts were chosen such that the sound pairs involved differed from one another in whether there was a corresponding L1-dominant category (Korean /l/ in the case of /l - r/) or not (/χ – ħ/). Participants were randomly assigned to one of three exposure conditions: the auditory only, same-letter, and different-letter groups. The same-letter and the different-letter groups differed in whether different sounds were represented by a single letter or different letters in the orthographic input during exposure. Following word-learning, participants were tested on their phonetic categorization and lexical encoding of the three contrasts on an AX discrimination and lexical decision task, respectively. On the AX discrimination task, there was an effect of sound pair with participants demonstrating greater sensitivity to the control than the experimental contrasts. However, no effect of either phonetic similarity to an L1 category nor orthography was observed on phonetic categorization of the pairs. Proximity to an L1 phonetic category was found to impact participants’ lexical encoding. Lexical decision performance was significantly better for the /l/-/r/ than the /χ/-/ħ/ contrast. Unexpectedly, however, exposure to orthographic input did not facilitate lexical decision performance. Instead, lexical decision performance was poorer when orthographic input was provided, regardless of whether it involved same or different-letters.
The variability of findings regarding the beneficial, neutral, or interference effect of orthographic input while learning new L2 words leads to the question of the factors that control these effects. In the next group of studies reviewed, researchers have directly compared two or more different writing systems in order to isolate the influence of graphemic familiarity and congruence. Hayes-Harb and Cheng (Reference Hayes-Harb and Cheng2016) examined the relative influence of L2 graphemic familiarity and L1-L2 grapheme-phoneme correspondence congruence. 30 native speakers of English with no prior Mandarin language experience were assigned to two word learning conditions: Pinyin and Zhuyin. Pinyin is made up of graphemes that are familiar to native speakers of English, but a subset of these graphemes are incongruent in that they map to different phonemes in Mandarin than in English. Zhuyin, on the other hand, involves entirely unfamiliar graphemes. Participants learned a set of Mandarin-like auditory words and pictured ‘meanings’, and saw written forms in accordance with their word learning condition (e.g., /tsaɪ/ was spelled<ZAI> for the Pinyin group and <ㄗㄞ (presented vertically)> for the Zhuyin group). Participants were then asked to determine whether auditory words matched pictures, where the test items of interest involve the pronunciation of the target word following English grapheme-phoneme correspondences. For example, the picture that participants had learned to associate with the word /tsaɪ/, spelled<ZAI> in Pinyin, was presented with the auditory word /zaɪ/. On these incongruent trials, participants in the Pinyin group performed significantly less accurately than did participants in the Zhuyin group, despite similar performance by both groups on control test trials. A separate experiment, where they taught participants only to associate the auditory words with the written words (no pictures/meanings), revealed a similar pattern. Together, these experiments suggest that unlearning a native grapheme-phoneme correspondence may be more difficult than learning the correspondence for a novel grapheme.
Mathieu (Reference Mathieu2014) and Mathieu (Reference Mathieu2016) examined the role of unfamiliar written scripts on the acquisition of an Arabic consonant contrast by native English-speaking learners with no prior Arabic language experience. 84 participants learned 12 Arabic-like words (six minimal pairs differing only in the voiceless pharyngeal /ħ/ and uvular /χ/ fricative contrast) together with their pictured meanings, in one of four word learning conditions: Arabic script, Cyrillic script, Hybrid script (first letter Cyrillic, remainder in Roman script), or with no orthographic input. It was expected that the more unfamiliar the foreign script, the less helpful it would be in supporting the early acquisition of the novel contrast. At test, participants were presented with an auditory form and a picture and had to determine whether the two matched. Crucially, mismatch trials involved the /ħ/-/χ/ contrast. Unexpectedly, there was no interaction between word-learning condition and picture-word pairs. Moreover, no differences were observed between participants in any of the unfamiliar script conditions, suggesting that none of the unfamiliar scripts were helpful or hurtful relative to one another. However, participants in the no-orthography group performed better than the three unfamiliar script conditions combined, suggesting that exposure to an unfamiliar script generally interfered with the acquisition of the Arabic contrast under study. It remains unclear whether the observed pattern should be attributed to the perceptually difficult contrast, the unfamiliarity of the scripts involved, or both of these factors. Jackson (Reference Jackson2016), summarized above, similarly studied the effects of different unfamiliar script properties, and found that the when the Arabic /k - q/ contrast was encoded using a novel grapheme (i.e., <k - Ꮧ>), native English speakers were better able to learn the lexical distinction that when the contrast was encoded with a familiar grapheme plus a diacritic (i.e., <k - ḳ>).
Mok, Lee, Li, and Xu (Reference Mok, Lee, Li and Xu2018) investigated the effects of orthography on the perception and production of Mandarin tones by experienced L1 Cantonese learners. The authors compared orthographic effects of Pinyin (a transparent system) with Chinese characters (an opaque logographic system that does not have regular grapheme-to-phoneme correspondence or indicate lexical tone). They asked whether a transparent orthography (Pinyin) is beneficial relative to L1 phonological knowledge retrieved from Chinese characters. The results of perception and production tasks with monosyllabic and disyllabic words were largely consistent: Pinyin facilitated tone perception and production for monosyllabic words. However, Chinese characters were more beneficial for both perception and production of disyllabic words. Additionally, the low performance group (a subset of participants who were less accurate on the tone identification task) were more affected by written input than the high-performance group on the production task. These findings further suggest that orthographic effects are non-uniform, but dependent importantly on the nature of the task, stimuli, and proficiency of the participants.
Hao and Yang (Reference Hao and Yang2018) studied the effects of Pinyin and Chinese characters on native English speakers’ memory for lexical tone in Mandarin words. 20 naive participants, in addition to 29 second/third-year, and 17 fourth-year Mandarin learners learned Mandarin words in two orthographic input conditions: Pinyin and character. They heard a word and saw the English meaning on the screen, in addition to the word's Pinyin or character spelling. At test, they were asked to determine whether an auditory word matched a meaning; the target items involved mismatches of either tone or segments. Hao and Yang (Reference Hao and Yang2018) found that the fourth-year learners in the character group were more accurate than those in the Pinyin group for tonal mismatches, and that naive participants in the Pinyin group were somewhat more accurate than those in the character group. However, there were no overall differences in performance between the Pinyin and Character groups.
In summary, in this section we have seen repeated evidence that incongruent L1-L2 grapheme-phoneme correspondences can undermine the potential benefits of familiar graphemes. On the other hand, we have also seen that the systematic graphemic representation of difficult novel phonological contrasts in the orthographic input can support the acquisition those contrasts (e.g., Showalter & Hayes-Harb, Reference Showalter and Hayes-Harb2013; Escudero et al., Reference Escudero, Simon and Mulak2014). However, the supportive effect of systematic graphemic representation appears to be moderated by the familiarity of the graphemes and/or the relative perceptual difficulty posed by the novel phonological contrast (e.g., Mathieu, Reference Mathieu2014; Jackson, Reference Jackson2016; Mathieu, Reference Mathieu2016), suggesting an important interplay of multiple variables in determining the relative benefit or interference of orthography in L2 phonological development.
As noted above, studies involving words to which participants have not previously been exposed allow researchers to control the input leaners receive with respect to the words, and in this way it is possible to isolate the contributions of orthographic and auditory input on subsequent recognition and pronunciation of those words. However, these studies tend to be short in duration (often one study session), and may not reflect well the conditions under which words are typically learned in the real world (see, e.g., Gaskell & Dumay, Reference Gaskell and Dumay2003 for evidence that a period of sleep causes changes in the way newly-learned words are integrated into the lexicon). In addition, many of these studies involve naïve participants, which means that participants do not have prior experience with the target language. Again, this methodological decision serves to control for participants’ prior exposure to the language, which might otherwise vary and complicate the interpretation of the results. Together, however, studies involving actual learners and familiar words (i.e., those reviewed in sections 6.1 and 6.2) and those involving unfamiliar words (i.e., those reviewed in sections 6.3 and 6.4), provide somewhat complementary information about the effects of orthographic input in L2 phonology. While costly, an approach that combines the strengths of both types of studies would be to conduct longitudinal studies of actual learners under various input exposure conditions. A further advantage of such studies is that they would address a major gap that currently exists in this literature; that is, the pedagogical implications of this growing area of empirical research. We now turn to a summary of what is known, along with a discussion of the very few studies that have explicitly explored the pedagogical implications of this research.
7. Research summary and pedagogical implications
The research reviewed here has highlighted the relevance of at least four following crucial variables that control the effects of orthography on L2 learners’ phonological development:
• Systematicity: Whether or not a novel phonological contrast is systematically represented by the L2 writing system
• Familiarity: Whether some or all of the L2 graphemes are familiar to learners from the L1
• Congruence: For familiar graphemes, whether the L1 and the L2 employ the same grapheme-phoneme correspondences
• Perceptibility: The ability of learners to perceive a novel auditory contrast
Systematicity is related to the transparency/depth of a writing system—it refers to the degree to which the writing system systematically represents a phonological contrast of interest in writing such that learners may rely on orthographic input to make target-like inferences about the phonological structure of words. The systematic representation of a novel contrast can support the acquisition of novel L2 phonological contrasts (e.g., Escudero et al., Reference Escudero, Hayes-Harb and Mitterer2008), but is not always sufficient, at least over the course of brief experiment sessions (e.g., Simon et al., Reference Simon, Chambless and Kickhöfel Alves2010).
There have been mixed findings regarding the role that graphemic familiarity plays in the impact of orthographic input, depending on the congruence between the L1 and L2 grapheme-phoneme correspondences. On the one hand, familiar graphemes do not need to be learned; on the other hand, to the extent that the L1 and L2 grapheme-phoneme correspondences differ, graphemic familiarity may prove detrimental. A number of studies have demonstrated that the potential benefits for graphemic familiarity may be outweighed when the learner must establish a new grapheme-phoneme correspondence (e.g., Hayes-Harb & Cheng, Reference Hayes-Harb and Cheng2016).
Finally, a number of studies have found that potential beneficial orthographic effects may not be available to learners in cases where a novel auditory contrast is not perceptible to them. A few studies of the impact of systematic graphemic representation of difficult L2 contrasts have shown no beneficial effect (e.g., Simon et al., Reference Simon, Chambless and Kickhöfel Alves2010; Showalter & Hayes-Harb, Reference Showalter and Hayes-Harb2015). Indeed, Escudero et al. (Reference Escudero, Hayes-Harb and Mitterer2008) showed that even when orthographic input impacts learners’ representations for words, the relevant phonological contrasts may nonetheless still be neutralized in online auditory word recognition.
We now turn to the pedagogical implications of this field of inquiry. Unfortunately, in keeping with the observation that there is often a disconnect between pronunciation research and pedagogy (see, e.g., Darcy, Reference Darcy2017), relatively little research has directly examined the pedagogical implications of orthographic effects in L2 acquisition. However, there are two broad types of orthographic effects that may be most relevant to language teaching: (1) the potentially helpful role of the systematic orthographic encoding of a novel phonological contrast or syllable structure, and (2) the potential interference of orthography in cases of familiar graphemes but incongruent grapheme-phoneme correspondences. We address each of these in turn, followed by a discussion of the pedagogically relevant constructs of intelligibility and comprehensibility and their importance to future work in this area.
7.1 Leveraging orthography to teach and learn novel phonological contrasts and syllable structures
The studies reviewed here provide some evidence that the systematic orthographic representation of a novel L2 phonological contrast can support the development of more target-like representations of L2 words (e.g., Escudero et al., Reference Escudero, Hayes-Harb and Mitterer2008; Bürki et al., Reference Bürki, Welby, Clement and Spinelli2019). It is also evident that orthographic input can lead L2 learners to epenthesize vowels rather than delete consonants when faced with L2 consonant clusters—in these cases, orthographic input appears to help learners preserve the phonological structure (component segments) of words (e.g., Young-Scholten et al., Reference Young-Scholten, Akita, Cross, Robinson and Jungheim1999; Davidson, Reference Davidson2010).
The next question, then, is whether it is possible to enhance the beneficial effects of orthographic input to teach novel L2 contrasts and syllable structures. We are aware of only one study that has explicitly attempted this. Jackson (Reference Jackson2016) found that naive native English speakers exhibited more accurate performance on a test of their knowledge of the difficult Arabic /k - q/ lexical contrast when the novel phoneme /q/ was represented with an entirely novel grapheme (i.e., <Ꮧ>) than a familiar grapheme with a diacritic (i.e., <ḳ>). However, those who saw the diacritic exhibited a significant benefit from explicit instruction about the contrast relative to those who saw the novel grapheme. This suggests that an interplay of graphemic familiarity and instruction, at least in the case of this particular L2 contrast.
7.2 Preventing the interference effects of orthography in L2 learning and teaching
To the extent that we observe moderation of interference effects in experienced relative to inexperienced L2 learners, it appears that such effects can be overcome to some extent, and that instruction may play some role in this improvement. However, we have very little empirical evidence so far in this regard. A study that has observed improvement over time by learners is presented in Showalter (Reference Showalter2018b) and Showalter (Reference Showalter2020), which found that experienced L1 English learners of Russian had overcome the interference of incongruent orthographic effects to some extent relative to beginning learners and naive participants.
Similarly, very few studies have directly investigated the efficacy of specific interventions or instructional strategies aimed at mitigating the potential negative effects of orthographic input. A small number of studies have investigated the role that explicit instruction may play in preventing the interference effects that sometimes arise as a result of orthographic input. For example, Showalter (Reference Showalter2018b) and Showalter (Reference Showalter2020) attempted to moderate the negative effects of incongruent orthography on naive native English speakers learning Russian words, using textual enhancement in one condition and explicit instruction in another, finding that neither of these brief interventions improved participants’ performance at test. In another example, Brown (Reference Brown2015) and Hayes-Harb et al. (Reference Hayes-Harb, Brown and Smith2018) also attempted to prevent native English speakers from experiencing the negative effects of written input on their acquisition of final devoiced consonants in German. They told participants that the final letters in words could be misleading: ‘A “b” will be pronounced “p”, a “g” will be pronounced “k” and a “d” will be pronounced “t” when at the end of the word’ (p. 558). Despite this explicit instruction, participants showed no improvement in their pronunciation of devoiced consonants. In both of these cases, researchers made only modest attempts at interventions, and these interventions occurred during one-hour experiment sessions during which participants also learned and were tested on new words. We thus cannot draw conclusions about the efficacy of these or other types of interventions in real-world instructed settings.
It is worth noting that beyond the interference effects discussed so far, there may be additional, less obvious, negative effects of orthographic input on L2 acquisition. For example, Bürki et al. (Reference Bürki, Welby, Clement and Spinelli2019) showed that while orthographic input during a word learning phase enhanced accuracy and word processing speed in native French speakers learning English, it in fact caused them to produce less target-like vowel formants than when they were exposed only to the auditory forms of words. Further, as Cutler (Reference Cutler2015) points out, while orthographic input appears to facilitate the establishment of contrastive lexical representations in the absence of perceptual development (as in, e.g., Escudero et al., Reference Escudero, Hayes-Harb and Mitterer2008), the apparently beneficial effect may lead to important difficulties as learners’ (target-like) stored forms of words containing perceptually difficult phonemes do not map well to (non-target-like) perceptual representations of the speech signal. For example, a learner who has inferred from the orthographic input that [pændə] <panda> and [pɛnsəl] <pencil> contain different vowels in their initial syllables, but nonetheless perceives both vowels as [ɛ], will not experience a benefit from their knowledge of the contrast as their online processing of the speech signal continues to undermine their lexical knowledge. More problematically, when a learner has encoded a phonological contrast that they cannot perceive in lexical storage, they demonstrate a decrease in word processing speed (see Broersma & Cutler, Reference Broersma and Cutler2011; Cutler, Reference Cutler2015 for evidence and an explanation of this effect).
While many of the studies reviewed here have cautioned language teachers about the potential pitfalls associated with orthographic input, or urged teachers to take advantage of orthography when it is helpful, Rafat and Perry (Reference Rafat, Perry and Rao2019) is perhaps the only published manuscript devoted to translating this literature for instructional purposes. They suggest two lesson plans specifically designed to respond to orthographic effects in the acquisition of Spanish by native English speakers. In the first lesson plan, they suggest introducing the pronunciation of Spanish words before exposing learners to the words’ written forms so as to avoid potential interference from incongruent grapheme-phoneme correspondences. They suggest that such an approach could be used in the cases of the most problematic graphemes (see, e.g., Dean & Valdes Kroff, Reference Dean and Valdes Kroff2017 for examples). In the second lesson plan, they address the challenge of Spanish rhotics, in particular the sometimes-opaque mappings of Spanish <r> and <rr> to the alveolar trill /r/ and the flap /ɾ/. They suggest providing students with lists of Spanish words containing <r> and asking them to identify whether /ɾ/ or /r/ would be the appropriate pronunciation in each case. They also suggest using the most regular mappings (e.g., /ɾ/ in <pero, caro> and /r/ in <perro, carro>) to highlight the importance of the /ɾ - r/ contrast.
7.2.1 Orthographic input and the intelligibility and comprehensibility of L2 speech
Throughout this review, we have discussed studies that investigate the influence of orthographic input on L2 phonological development by examining learners’ production and word recognition processes directly. However, we are aware of no studies to date that have investigated the communicative consequences of orthographic input; that is, the influence of orthographic input on the degree to which learners’ speech is understood by others (i.e., intelligibility) or listeners’ assessment of the ease or difficulty of understanding learners’ speech (i.e., comprehensibility). Over the past two decades or so, L2 phonology and L2 pronunciation researchers have paid increasing attention to these constructs (see, e.g., Munro & Derwing, Reference Munro and Derwing1999; Kennedy & Trofimovich, Reference Kennedy and Trofimovich2008), with many advocating in particular for an intelligibility-based approach to language teaching (see, e.g., Levis, Reference Levis2018). Research is therefore needed that explores the L2 orthographic input effects that have been observed in the research reviewed here in light of more pedagogically meaningful constructs such as intelligibility and comprehensibility.
8. Conclusion and future directions
This relatively new area of research has seen huge growth over the past several years, and it now enjoys a substantial empirical foundation. While the field began with a flurry of publications by a small number of scholars, it has since grown to include dozens of scholars, with international representation.
The number of robust findings in this literature, in particular as they relate to the potential benefit of systematicity in supporting perceptual acquisition, as well as the detrimental effects of incongruent L1-L2 grapheme-phoneme correspondences, should inspire investment at this point in the translational research that is needed to determine the implications of this research for language learning and pedagogy. While, for example, the interventions suggested by Rafat and Perry (Reference Rafat, Perry and Rao2019) may prove effective in practice, there is simply not enough research yet that directly examines the efficacy of practices. While we have seen no research to date that indicates the efficacy of instructional strategies designed to counter the negative effects of orthographic input on L2 learners, the relevant studies have looked only at brief interventions conducted in a laboratory setting. At this time, more high-quality research is needed that is conducted in authentic instructed settings, employing robust instructional techniques.
Questions arising
(1) How does the degree of familiarity of a word interact with orthographic input effects—are these effects reduced for more familiar words?
(2) How does orthographic input affect the acquisition of a variety of phonological processes?
(3) How do orthographic input effects manifest in yet-unstudied combinations of native and second languages?
(4) How do orthographic input effects interact with L2 proficiency? How do these effects change over time?
(5) Under what conditions can orthographic input effects be moderated through instruction?
(6) What factors should teachers consider when deciding when and how to introduce written forms to learners?
(7) What is the impact of orthographic input on the intelligibility and comprehensibility of learners’ speech?
Rachel Hayes-Harb is a Professor of Linguistics at the University of Utah. She studies bilingual speech processing with a focus on the role of input in the development of second language lexical-phonological structure.
Shannon Barrios is an Assistant Professor of Linguistics at the University of Utah. She investigates the acquisition of novel phonological contrasts and phonological processes by adult second language learners. Her research interests include the role of the first language, input variables such as orthography, and the way these factors interact with available learning mechanisms to account for the ease and difficulty with which learners perceive and acquire the target language sound system.