1. Introduction
Late L2 learners face challenges with respect to speech sound perception, especially for phones that do not exist in the native phonological system. Their well-established phonological system might help or hinder the perception of sounds in their second language (Best, Reference Best1996; Flege, Reference Flege1995; Schertz, Cho, Lotto & Warner, Reference Schertz, Cho, Lotto and Warner2015; Troubetzkoy, Reference Troubetzkoy1949). Models of perception and production in second languages enable the prediction of possible difficulties that can be encountered by second language learners. These models formalize possible interferences between L1 and L2 speech perception and production. They account for typological differences of the learners’ L1 and L2 as predictions are made by comparing the phonological systems of the two languages. Unfortunately, most models do not consider learner proficiency on a continuum – consequently, the same predictions are usually made for learners who are either at a beginner's or at an advanced level.
Although two major models predicting production and perception biases in late learners of second languages are widely used – namely, Flege's Speech Learning Model (SLM; Flege, Reference Flege1995) and Best's Perceptual Assimilation Model for late bilinguals (PAM-L2; Best & Tyler, Reference Best and Tyler2007), we will focus only on the latter model in the following. The reason for this choice is that only Best's model predicts speech perception of non-native phonological contrasts specifically. The model was inspired by motor-theory which implies that successful perception is linked to successful production of the sound. The predictions of PAM-L2 are based on comparisons between the phonological contrasts existing in the listeners’ L1 and L2. Five distinct predictions are made by the model (i.e., two-category assimilation, single-category assimilation, category goodness difference, uncategorized-categorized assimilation, and uncategorized assimilation).
In order to apply the PAM-L2 model to the German–French language pair and to decide which predictions summarize their phonological differences, a comparison of the respective phonemic inventories was undertaken.
Comparing the phonemic systems of German and French indicates, on one hand, that the vowel systems of German and French differ with respect to their number of vowels. German counts 16 monophthongs not considering the 3 diphthongs, all oral vowels (Kohler, Reference Kohler1999; Mangold & Dudenredaktion, Reference Mangold and Dudenredaktion2005). The French language counts 11 monophthong oral vowels and 3 nasal vowels (Fougeron & Smith, Reference Fougeron and Smith1999).
If we compare the F1-F2 vowel space of the German and the French vowel systems (see Figure 1), we observe that the vowel triangle in German shows a greater density. This difference in vowel space density is linked to the German vowel length contrast which is absent in French. The vowel length contrast in German oral vowels is widely stated throughout the literature (Antoniadis & Strube, Reference Antoniadis and Strube1984; Becker, Reference Becker1998; Hall, Reference Hall2011; Ramers, Reference Ramers2012; Wiese, Reference Wiese2000). With respect to phonology, two vocalic features are engaged in this contrast: vowel quality and vowel duration. Pure phonological analyses demonstrate that vowels [+tense] always associate the feature [+long], whereas vowels [-tense] can associate both [-long] and [+long] (e.g., /ɛː, aː/) (Hall, Reference Hall2011). These observations led Ternes (Reference Ternes2012) to the conclusion that the primary feature opposing German vowels is vowel length. In the following, we will distinguish between short and long vowels. The differences in oral vowel inventory sizes, vowel space density and the vowel length contrast raised the question of how French learners of German discriminate the German vowel length contrast in speech perception.
On the other hand, the consonantal systems of German and French show a high number of similarities. Fougeron and Smith (Reference Fougeron and Smith1999) established a list of 21 French phonemic consonants which is quantitatively similar to the 20 German phonemic consonants listed by Kohler (Reference Kohler1999). However, some of the German consonants are not present in the French consonantal system. French lacks the glottal consonants [ʔ] and /h/, the palatal fricative [ç] (ich-Laut) and the uvular fricative [x]Footnote 1 (ach-Laut). Conversely, some of the French consonants such as the palatal nasal consonant /ɲ/ are not found in the German consonantal system.
The German fricatives [ç] and [x] although acoustically and perceptually clearly distinct are considered to be allophones. In the German lexicon, they appear in complementary word positions according to the left vowel context. Anterior vowels (as well as some consonants) are followed by [ç], whereas posterior vowels and the vowel /a/ are followed by [x]. Both fricatives appear generally in word final (Buch [buːx], book) or in morpheme final positions (riech-en [ʁiː.çən], to smell). In loanwords however, [ç] can also appear word initially (Chemie [çɛmiː], chemistry), China [çina], China etc.)
In production tasks, French learners of German often replace the consonant [ç] (as in mich [mɪç] - me) by its closest neighbor, the post alveolar fricative /ʃ/ (as in misch [mɪʃ] – to mix) (Wottawa, Adda-Decker & Isel, Reference Wottawa, Adda-Decker and Isel2016). The post alveolar fricative /ʃ/ appears syllable initially as well as syllable finally in German: schnell [ˈʃnɛl] (fast) vs. Fisch [ˈfɪʃ] (fish), and in the adjectival suffix -isch. At the end of monosyllabic words, the voiceless palatal fricative [ç] often appears in a cluster with the plosive [t]: Licht [ˈlɪçt] (light), echt [ˈɛçt] (real). Regarding the derivational morphology of German, the voiceless palatal fricative [ç] appears in the suffixes -chen and -(l)ich as well.
From an acoustic point of view, the German [ʃ] and [ç] are highly similar. Their respective points of articulation are very close to each other even if the articulatory movements are quite different. The respective centres of gravity of the two fricatives are situated in similar frequency bands. These similarities are particularly critical with respect to the question of non-native contrasts perception.
As we were interested in the differences between French and German, we applied the PAM-L2 model to these two languages. The articulatory dimension of the model was not investigated further as we did not record participants’ articulatory data in this study. We reduced the model on the phonemic categories of the participants L1 and L2. In comparing the phonological systems of German and French through the looking glass of PAM-L2, it appears that French learners of German are mainly confronted with single category assimilation (i.e., two distinctive non-native sounds are assimilated as variants of the same category) and category goodness difference (i.e., two distinctive non-native sounds are assimilated as more or less valid prototypes of the same category).
On one hand, the perception of the German vowel length contrast in French learners of German seems to belong to the category goodness difference. With respect to vowel length and vowel quality, in isolated spoken words, French vowels are more similar to long vowels than to short vowels of German (Strange, Weber, Levy, Shafiro, Hisagi & Nishi, Reference Strange, Weber, Levy, Shafiro, Hisagi and Nishi2007). German long vowels should therefore be perceived as good exemplars of the vowel category in French native listeners, whereas short vowels might be perceived as poor(er) examples of this category. Nevertheless, the acoustic differences might not be salient enough to French learners of German to perceptually separate short from long vowels. In our study, the German vowel pairs differ all in duration in a similar way. However, spectral properties vary to different degrees: the /ɪ-iː/ pair shows considerable differences in vowel quality, whereas /ɛ-ɛː/ and /a-aː/ have quite similar vowel qualities. The evaluation of category-goodness might be influenced by the identity of the investigated vowel pair. In that case, German [ɪ] should be perceived as a poor candidate for the French learners’ /i/ category, whereas German [iː] should be a good candidate for this category. German [ɪ] presents different articulatory movements of the jaw and tongue but especially the lips than does German [iː]. The German vowels [a] and [aː] should be equally good candidates for the French learners’ /a/ category as the articulatory movements are highly similar and the vowels differ mainly in duration, which should increase the perception difficulty in French native listeners.
On the other hand, the perception of the German consonants [ʃ] and [ç] seems to belong to the single category assimilation scenario. Especially as the fricative /ʃ/ is part of the French phonemic inventory. Late French–German bilinguals already have a representation for /ʃ/ with an automated articulatory gesture. The acoustic properties of German [ʃ] and [ç] are very similar (Jannedy & Weirich, Reference Jannedy and Weirich2016; Wottawa et al., Reference Wottawa, Adda-Decker and Isel2016) whereas the articulatory movements particularly of the tongue and the lips are quite different. Based on the acoustic information, it is unlikely that L2 learners of German whose L1 presents a relatively low number of fricative types consider the unknown but similar [ç] as an uncategorized sound. From an articulatory point of view, it is possible that even advanced French learners of German did not yet experience the canonical articulatory movement of [ç] because deducing a different articulatory movement from an acoustically similar sound is a very hard task for late learners with an already well-established phonological system. Thus, we assume that learners in our group assimilate [ʃ] and [ç] into one category. The PAM-L2 predicts rather low or no discrimination for sound contrasts that are assimilated by one single L1 phonemic category.
There are different methods that allow us to investigate non-native speech perception such as behavioral perception tests and neural imaging methods. We chose to investigate the perception of the German vowel length contrast and the opposition of [ʃ] and [ç] using electroencephalography (EEG). EEG allows us to record brain responses millisecond by millisecond, and, in some settings, without the participant needing to perform an experimental task. Combining EEG recordings with a Mismatch negativity oddball paradigm, which is assumed to sign phonological discrimination processes (Dehaene-Lambertz, Reference Dehaene-Lambertz1997) without involvement of attentional resources, ensured the ability to tap the automaticity of the processes by minimizing interferences, among others, of motor or decision processes (Winkler, Lehtokoski, Alku, Vainio, Czigler, Csépe, Aaltonen, Raimo, Alho, Lang, Iivonen & Näätänen, Reference Winkler, Lehtokoski, Alku, Vainio, Czigler, Csépe, Aaltonen, Raimo, Alho, Lang, Iivonen and Näätänen1999). Hence, this technique enables us to better understand early processing of sound discrimination in native and non-native listeners.
The classical oddball paradigm consists of a number of stimuli sequences combining two types of stimuli: similar (standard) and “odd” (deviant) ones. Deviant stimuli are rare (i.e., 10%) compared to the frequent standard stimuli (i.e., 90%) (Remington, Reference Remington1969; Squires, Wickens, Squires & Donchin, Reference Squires, Wickens, Squires and Donchin1976). The Mismatch negativity (MMN) (Näätänen, Gaillard & Mäntysalo, Reference Näätänen, Gaillard and Mäntysalo1978) is observed on deviant stimuli varying in some acoustic property from standard stimuli. The MMN is the difference of two ERP wave forms: the averaged wave forms associated with the deviant sounds (deviant condition) and the averaged wave forms elicited by the processing of the same stimulus but in a standard condition (i.e., preceded by the same sound). The subtraction of the standard condition, whose trace is kept in sensory memory (Cowan, Winkler, Teder & Näätänen, Reference Cowan, Winkler, Teder and Näätänen1993; Näätänen & Winkler, Reference Näätänen and Winkler1999; Picton, Alain, Otten, Ritter & Achim, Reference Picton, Alain, Otten, Ritter and Achim2000; Winkler & Cowan, Reference Winkler and Cowan2005), from the deviant one is expected to result in an MMN peaking around 100–200 ms after stimulus onset, if any. The MMN reflects the early process of auditory novelty detection (Näätänen, Reference Näätänen2018; Schröger, Reference Schröger1998) and is elicited by the auditory cortex (Alho, Reference Alho1995; Rinne, Alho, Ilmoniemi, Virtanen & Näätänen, Reference Rinne, Alho, Ilmoniemi, Virtanen and Näätänen2000). Recently, the MMN was also found to reflect variations of phonetic information related to vowels conveying different emotional information (Carminati, Fiori-Duharcourt & Isel, Reference Carminati, Fiori-Duharcourt and Isel2018). Its maximum amplitude is observed in the fronto-central regions of the scalp.
The MMN is used as a marker in L2 studies in order to investigate the discrimination capacities of L2 contrasts in second language learners. The emergence of MMN as a function of linguistic experience constitutes a good criterion of neuroplasticity. Most of the time, auditory stimuli are employed in order to investigate phonetic or phonological categories in L2 listeners i.e., Catalan vowels (Díaz, Baus, Escera, Costa & Sebastián-Gallés, Reference Díaz, Baus, Escera, Costa and Sebastián-Gallés2008); English vowels (García & Froud, Reference García and Froud2018; Grimaldi, Sisinni, Gili Fivela, Invitto, Resta, Alku & Brattico, Reference Grimaldi, Sisinni, Gili Fivela, Invitto, Resta, Alku and Brattico2014; Krzonowski, Pellegrino & Ferragne, Reference Krzonowski, Pellegrino and Ferragne2018; Peltola, Kujala, Tuomainen, Ek, Aaltonen & Näätänen, Reference Peltola, Kujala, Tuomainen, Ek, Aaltonen and Näätänen2003; Shafer, Yan & Datta, Reference Shafer, Yan and Datta2011); Finnish vowels (Nenonen, Shestakova, Huotilainen & Näätänen, Reference Nenonen, Shestakova, Huotilainen and Näätänen2005; Savo & Peltola, Reference Savo and Peltola2019); German vowels (Rinker, Alku, Brosch & Kiefer, Reference Rinker, Alku, Brosch and Kiefer2010); and English consonants (Iverson, Ekanayake, Hamann, Sennema & Evans, Reference Iverson, Ekanayake, Hamann, Sennema and Evans2008; Mah, Goad & Steinhauer, Reference Mah, Goad and Steinhauer2016) but also non-native prosody (Friedrich, Herold & Friederici, Reference Friedrich, Herold and Friederici2009). Furthermore, it is known that individual differences in L2 listeners such as language proficiency, musicality or the amount of language practice might influence the MMN amplitude (Grimaldi et al., Reference Grimaldi, Sisinni, Gili Fivela, Invitto, Resta, Alku and Brattico2014; Peltola et al., Reference Peltola, Kujala, Tuomainen, Ek, Aaltonen and Näätänen2003; Tervaniemi, Ilvonen, Karma, Alho & Näätänen, Reference Tervaniemi, Ilvonen, Karma, Alho and Näätänen1997).
Under certain conditions, the negative MMN is followed by a positive deflection: the P3a, a variant of the P300 appearing in passive conditions resulting in a biphasic ERP pattern (Snyder & Hillyard, Reference Snyder and Hillyard1976; Squires, Squires & Hillyard, Reference Squires, Squires and Hillyard1975). The P3a-component is thought to reflect attentional resources, whereas the MMN is an automatic reaction to acoustic changes. The P3a is a positive waveform peaking between 220–280 ms after stimulus onset, elicited in passive paradigms. This ERP shows its maximum amplitude over fronto-central electrodes. The P3a, also called novelty P3, is interpreted as an involuntary attention shift as a reaction to changes in the environment or the processing of new information (Fiori-Duharcourt & Isel, Reference Fiori-Duharcourt and Isel2012; Yamaguchi & Knight, Reference Yamaguchi and Knight1991a, Reference Yamaguchi and Knight1991b).
The present study aims to investigate the perception of some German vowel and fricative oppositions in French learners of German that we expect to be particularly difficult for French listeners. The German vowel length contrast, which has a lexical value (word opposition), appears in numerous minimal pairs: e.g., bitten [ˈbɪtən] (to ask, to solicit); and bieten [ˈbiːtən] (to offer). Regarding the opposition of [ʃ] and [ç], which exists in German (but not in French), only few minimal pairs can be found e.g., misch [ˈmɪʃ] (to mix) vs. mich [ˈmɪç] (me). We specifically selected these two oppositions as their frequency of occurrence in the German language is rather different. Whereas vowel length oppositions are a frequent phenomenon in German, it is less the case for the opposition of [ʃ] and [ç].
Our goal was to investigate to what extent the phonological system of native listeners of French and advanced learners of German is sensitive to the phonological properties of the German language. Sensitivity to phone contrasts of German that do not exist in the phonemic register of French observed in adult second language learners is a strong argument that the phonological processing system has sufficient deformability to adapt to the new sounds encountered in a second language (Costa & Sebastián-Gallés, Reference Costa and Sebastián-Gallés2014; Perani & Abutalebi, Reference Perani and Abutalebi2005). Empirical evidence of such adaptation abilities might be reflected at the neural level by neuronal changes, particularly in terms of neurophysiological responses.
We approached the question of cognitive plasticity at the phonological level of processing by using a neurophysiological marker, i.e., the MMN, which is thought to be a relevant passive sound discrimination signature in psycholinguistic and phonetic literature (Winkler et al., Reference Winkler, Lehtokoski, Alku, Vainio, Czigler, Csépe, Aaltonen, Raimo, Alho, Lang, Iivonen and Näätänen1999). Thus, in order to investigate the perception of the two contrasts in native and non-native learners of German, an EEG experiment with a passive oddball paradigm was designed. Discrimination performances of advanced German learners were compared to German native listeners. Variations of amplitude, peak latency, and surface topography (scalp distribution of the electrodes) of the MMN constitute critical dimensions (Zevin, Datta, Maurer, Rosania & McCandliss, Reference Zevin, Datta, Maurer, Rosania and McCandliss2010; Díaz, Mitterer, Broersma, Escera & Sebastian-Galles, Reference Díaz, Mitterer, Broersma, Escera and Sebastian-Galles2016) to study auditory processing differences between native speakers and late bilinguals.
Based on the predictions of the PAM-L2 model, different neural responses according to the studied L2 phone contrasts were expected. Vowels which have both, spectral and duration differences e.g., the vowel pair [ɪ-iː] are expected to be better discriminated in French learners of German than vowel pairs that have little spectral variation e.g., [ɛ-ɛː] and [a-aː]. A better perception of the acoustic differences between members of a given vowel pair should elicit a higher amplitude in both the MMN and P3a. Moreover, processing of deviant stimuli latencies should lead to a latency shortening of the biphasic ERP pattern MMN/P3a for [ɪ-iː], whereas [ɛ-ɛː] and [a-aː] should not provoke any latency shortening. According to PAM-L2, no or little discrimination is expected for the opposition of [ʃ] and [ç] in non-native listeners which might result in an MMN and P3a with low amplitudes.
In the following, we'll present first the applied Methods, followed by the Results section, finally we'll present the Discussion and Conclusion.
2. Methods
In this section, the applied methods and procedures including the choice of participants and stimuli followed by a description of the procedures are presented.
2.1 Participants
Twenty native speakers of French who started learning German at school (mean age of learning: 12.0 years, SD: 1.1 years) were tested in Paris (France). In addition, twenty native speakers of German were recruited in Leipzig (Germany). The French participants were aged from 19 to 34 years (mean age: 22.8 years, SD: 4.3 years). The German participants were aged from 21 to 28 years (mean age: 24.4 years, SD: 2.3 years). All participants had to fit the following criteria: 1) to have no other first language than respectively German (native speaker group) or French (learners’ group); 2) to be aged between 18 and 35 years; 3) to be right handed (Edinburgh Handedness Inventory); and 4) to present normal hearing. By their own account, participants had no history of current or past neurological or psychiatric diseases, they had normal or corrected-to-normal vision. Furthermore, the native speakers of German had no or very limited knowledge of Romance languages i.e., no or only an introductory class during high school while the French participants had to have regular contact with German (mean = 18.6h / week) during the past year through language classes or a teaching activity, for instance. The participants gave written consent after obtaining all necessary information on the experiment as well as data storage and anonymization. The collected data were anonymised by applying the European Data FAIR principle. The study was approved by the local ethics committee of the Paris Nanterre University and was performed in accordance with the Declaration of Helsinki. All EEG participants received 20€ for their time and effort.
2.2 Stimuli
Most oddball paradigms present stimuli that are either synthesized or coming from one speaker alone. This type of stimuli creates a homogeneous acoustic environment where listeners’ attention is easily drawn to any changes both acoustically or phonologically (Winkler, Reference Winkler2003). In order to test sensitivity to higher order regularities than mere acoustic changes, acoustically different stimuli can be used. In the present study, this acoustic variability was achieved using stimuli that were produced by multiple speakers. For this purpose, seven female German native speakers were recorded in a sound-proof room at the Laboratoire de Phonétique et Phonologie – Paris 3. Only female speakers were chosen in order to avoid reactions to gender change during the oddball EEG experiment (Casado & Brunellière, Reference Casado and Brunellière2016). The stimuli were isolated words, that were only harmonized for loudness. We postulate that natural speech provides a more realistic condition to investigate human auditory sensitivity to L1 and L2 speech than do synthesized speech or mono-speaker stimuli, especially in the case of a laboratory study.
Vowel duration contrast
The chosen vowel pairs were /ɪ-iː/, / ɛ-ɛː/ and /a-aː/. Only natural words, either monosyllabic or bi-syllabic, were recorded. The target vowel always appeared in stressed syllables. All three vowel pairs differ with respect to their duration: short vowels presented shorter durations than did long vowels. The spectral differences across the three vowel pairs, however, were not similar. The following hierarchy can be established going from the vowel pair with the highest spectral differences to the vowel pair with the lowest spectral differences: /ɪ-iː/ > /ɛ-ɛː/ ≥ /a- aː/ (Strange, Bohn, Trent & Nishi, Reference Strange, Bohn, Trent and Nishi2004). Spectral differences with respect to vowel discrimination might be of advantage in non-native perception (McAllister, Flege & Piske, Reference McAllister, Flege and Piske2002).
Figure 2 illustrates an example of a minimal pair containing the target vowels [ɪ-iː]. On top, a spectrogram of the word containing the short vowel is depicted, whereas below the word containing the long vowel is illustrated. The dotted red line in the spectrogram marks the second formant of the target vowel. Figure 2 shows a duration difference between short and long vowels. The spectral differences in the illustrations are mainly marked by F2 formant values for [ɪ-iː]: e.g., bitte (i.e., 1869 Hz) and biete (i.e., 2370 Hz).
Opposition of [ʃ] and [ç]
Both fricatives [ʃ] and [ç] appear mainly syllable-finally in German misch [mɪʃ] (to mix), mich [mɪç] (me). There are only few natural minimal pairs in German contrasting [ʃ] and [ç]. Therefore, we added pseudo-words containing this contrast in order to enhance the number of stimuli. It is known that pseudo-words elicit a smaller MMN response than do natural words – however, pseudo-words, which present valid phonotactics of the target language, are a good compromise to investigate contrasts that exists only in few natural minimal pairs (Pulvermüller & Shtyrov, Reference Pulvermüller and Shtyrov2006). The target consonants were recorded in different valid vowel contexts (i.e., [ɪ, ʏ, yː, ɛ, œ, a͜ɪ, ɔ͜ɪ]) and appeared both in sequence internal and in sequences final positions. Two examples of minimal pairs are illustrated in Figure 3. The figure displays typical differences in F2 of the preceding vowels due to the place of articulation of the respective fricatives. Both the preceding vowels as well as the fricative noise (important friction noise for [ʃ], less friction noise for [ç]) were criteria for stimuli selection.
2.3. Procedures
The experiment followed the design of a classical passive auditory oddball paradigm. Participants were comfortably seated in front of a computer screen in a sound isolated room and watched a silent movie. The stimuli were presented over headphones. At the end of the experiment, the participants filled out a questionnaire relative to the movie in order to make sure that their attention was focused on the movie and not on the sounds. The stimuli were organized in 5 sequential blocks. Each vowel contrast (/ɪ-iː/, /ɛ-ɛː/, /a- aː/) was presented in one separate block; furthermore, two blocks presented the [ʃ] and [ç] opposition, one for stimuli presenting [ʃ] and [ç] in mid-sequence, and one presenting the fricatives in sequence final positions. The blocks were presented in a randomized order. In order to allow pairwise comparisons, standards could become deviants and vice versa. For instance, the word biete was a standard followed by another standard biete or the deviant bitte as well as the word bitte was a standard followed by another standard bitte or the deviant biete. Moreover, half of the trialsFootnote 2 were pure standard trials.
Each trial contained a variable number of items produced by three or more different speakers. In each trial, the stimuli coming from the same speaker were at least separated by two stimuli from different speakers in order to ensure acoustic variability. In the literature, the number of preceding standard stimuli varies from two to eleven (Rosburg, Trautner, Ludowig, Schaller, Kurthen, Elger & Boutros, Reference Rosburg, Trautner, Ludowig, Schaller, Kurthen, Elger and Boutros2007; Garrido, Friston, Kiebel, Stephan, Baldeweg & Kilner, Reference Garrido, Friston, Kiebel, Stephan, Baldeweg and Kilner2008; Kirmse, Ylinen, Tervaniemi, Vainio, Schröger & Jacobsen, Reference Kirmse, Ylinen, Tervaniemi, Vainio, Schröger and Jacobsen2008; Biedermann, De Lissa, Mahajan, Polito, Badcock, Connors, Quinto, Larson & McArthur, Reference Biedermann, De Lissa, Mahajan, Polito, Badcock, Connors, Quinto, Larson and McArthur2016). Authors of the different studies have different approaches: for instance, Rosburg et al. (Reference Rosburg, Trautner, Ludowig, Schaller, Kurthen, Elger and Boutros2007) presented either two, three or four preceding stimuli, whereas Garrido et al. (Reference Garrido, Friston, Kiebel, Stephan, Baldeweg and Kilner2008) presented between one and eleven preceding stimuli. In the present experiment, the last item of a trial was preceded by six up to nine standard stimuli separated by a 500 ms inter stimulus interval. This choice was made for three reasons.
First, the stimuli were uttered by different speakers: habituation with this kind of stimuli should take longer than with synthesized or mono-speaker stimuli that present exactly the same acoustic quality.
Second, after each trial, a different standard word was presented: which led us to reject the first two standards of each trial in the ERP analyses in order to exclude standards, which carry also an MMN due to item change.
Third, Remington (Reference Remington1969) and Falmagne, Cohen and Dwivedi (Reference Falmagne, Cohen and Dwivedi1975) found that at least five items in a stimuli chain favor effective reaction times in identifying the deviant stimuli: we considered that neurophysiological responses should also benefit from longer standard chains.
EEG recordings
EEG was recorded from 64 channels mounted in an elastic cap. For data analysis, channels were re-referenced to an average reference. Electrode impedances were kept below 25 kΩ. Data were recorded at a sampling rate of 1000 Hz.
In France, recordings were undertaken with the BrainVision PyCorder, the signal was amplified with the BrainVision actiCHamp amplifier. The used EEG caps were actiCAP caps with 64 electrodes. The ground electrode was attached to the sternum. The device records the online EEG signal against an implicit reference generated by the BrainAmp (Brain Products) amplifier.
In Germany, recordings were undertaken with the BrainVisionRecorder (version 1.20.0601), Brain Product. The signal was amplified with the BrainVision BrainAmpDC amplifier. WaveguardTMoriginal caps comprising 64 electrodes were used at the Max-Plack-Institute in Leipzig. The ground electrode was attached to the sternum, the reference electrode was attached to the tip of the nose.
EEG signal processing and data analyses
The recorded EEG signal was processed with the Matlab toolboxes EEGLAB (Delorme & Makeig, Reference Delorme and Makeig2004) and ERPLAB (Lopez-Calderon & Luck, Reference Lopez-Calderon and Luck2014). In order to prepare the EEG signal for the ERP analyses, it was first down sampled to 500 Hz and filtered with a high-pass filter at 0.1 Hz and a low-pass filter at 45 Hz. Afterwards, a bad channel location was carried out, followed by the interpolation of electrodes in order to keep a high number of trials. Finally, all channels were re-referenced to the mean of all channels of the cap. This choice was made because the device used in France records against an implicit reference generated by the amplifier. In the next step, epochs were extracted from the pre-processed EEG signal for the three midline electrodes Fz, Cz, Pz where MMN is usually observed (Paavilainen, Karlsson, Reinikainen & Näätänen, Reference Paavilainen, Karlsson, Reinikainen and Näätänen1989; Alho, Reference Alho1995). The chosen time window was [-100 ms to 700 ms] from stimulus onset. Baseline correction was carried out on 100 ms before stimuli onset, finally, an automatic artefact rejection with a threshold of 70 μV was performed.
MMN calculation
In order to image the paradigm settings in the MMN calculations, the average of standards included all standard stimuli presented in the trials of the oddball paradigm, except for the first two items per trial, whereas the average of deviant stimuli included all the deviants which were at most one per trial and always situated at the end of a trial. The averaged waveform of the standards was then subtracted from the averaged waveform of the deviants.
For the vowel duration contrast condition, standards of short vowels were compared to deviants of short vowels and standards of long vowels were compared to deviants of long vowels. For the consonant condition, standards containing [ʃ] were subtracted from deviants containing [ʃ], whereas standards containing [ç] were subtracted from deviants containing [ç].
The time windows of the ERPs were determined based on the literature and then adjusted visually based on the data (begin: latency where the waveform crosses the zero-line, end: latency after the main peak's minimum).
3. Results
In this section, results are presented starting with the vowel duration contrast followed by the opposition of [ʃ] and [ç]. We recall, that the MMN is an automatic reaction to acoustic changes, whereas the P3a-component is thought to reflect attentional resources.
3.1 Vowel duration contrast
MMN
For the MMN amplitude analyses, the time window 90–200 ms was fixed. A four-way ANOVA including the factors Vowel pair (/ɪ-iː/, /ɛ-ɛː/, /a- aː/), Vowel length (short, long), Electrode (Fz, Cz, Pz) as within-subject factors, and the factor Group (German natives, French learners of German) as between-subject factor was run. Only the interaction Group × Electrode was significant (F(2, 74) = 5.60, η2 = .009, p < .01). Planned comparisons indicated that the MMN amplitude was significantly larger at Fz (M = −0.26 μV, SD = 1.26 μV) than on both Cz and Pz (M = 0.14 μV, SD = 1.32 μV) for the German group (F(1, 357) = 7.28, p < .01), while the MMN did not differ between the three electrodes in the bilingual French–German group (Fz: M = -0.08 μV, SD = 1.62 μV; Cz: M = -0.19 μV, SD = 1.85 μV; Pz: M = -0.12 μV, SD = 1.59 μV). Furthermore, a one sample t-test comprising the values from Fz, Cz, and Pz against zero revealed that the MMN in the bilingual French–German group was only marginally significant (t(359) = 1.9, p = .05).
In German native speakers, a one sample t-test comparing the values of Fz against zero was significant (t(119) = |2.2|, p < .05). Figure 4 illustrates this interaction.
With respect to peak latency, a four-way ANOVA including the factors Vowel pair, Vowel length, Electrode as within-subject factors, and the factor Group as between-subject factor was run. Neither main effects nor interactions were found.
P3a
Analyses on the P3a amplitude values, expected to peak between 220–280 ms, were extracted for each participant in the time window located between 190 and 240 ms after stimuli onset. No P3a was observed for the two speaker groups.
Late negativity
Interestingly, we found a negative deflection at Pz in the time window situated between 400 and 460 ms after stimuli onset. The negativity's amplitude values for each participant were extracted from the time window situated between 400 and 460 ms after stimuli onset for the Pz electrode. A four-way ANOVA with the within-subject factors Condition (standard, deviant), Vowel pair, Vowel length, and the between-subject factor Group was run. The ANOVA showed a marginally significant main effect of Condition (F(1, 37) = 3.15, η2 = .002, p = .08), which led us to carry out separate analyses for each group. According to the assumption that a late negativity in a time window around 400 ms might indicate involvement of lexical access processes, further analyses were carried out for the Pz electrode where the N400 is usually observed. A three-way ANOVA was run for each group comprising the within-subject factors Condition, Vowel pair, Vowel length. A main effect of Condition was observed only in German native speakers (F(1, 19) = 7.57, η2 = .02, p < .05) as illustrated by Figure 5. The effect had an amplitude of -0.30 μV (SD = 1.55 μV).
With respect to peak latency at the Pz electrode, a four-way ANOVA with Condition, Vowel pair, and Vowel length as within-subject factors and Group as between-subject factor showed no main effects or interactions.
3.2 Opposition of [ʃ] and [ç]
MMN
The MMN amplitude was extracted in the time window situated between 350 and 550 ms after stimulus onset, counting in the delay between the pseudo-word beginning and the start of the target fricative.
A four-way ANOVA including the factors Consonant (ʃ, ç), Position (word-internal, word-final), Electrode (Fz, Cz, Pz) as within-subject factors, and the factor Group (German natives, French learners of German) as between-subject factor was conducted. Neither significant main effects nor interactions were found.
P3a
Analyses on the P3a amplitude values were extracted for each participant in the time window located between 400 and 700 ms after stimuli onset. No P3a was observed for the two speaker groups.
Late negativity
Phone onsets were late thus no late negativity could be observed in the extracted epochs.
4. Discussion
Our goal was to determine to what extent the phonological system of native speakers of French is sensitive to phonological specificities of the German language. To this aim, we investigated the perception of German 1) vowel duration contrasts and 2) of acoustically close consonant oppositions like [ʃ] and [ç] in French learners of German using an EEG auditory oddball multi-speaker experiment. Our motivation to specifically select these contrasts was that the PAM-L2 model makes different predictions about their processing in French native listeners. Furthermore, we were interested in using sound pairs with different occurrence frequencies in the German language where the vowel length oppositions are a frequent phenomenon in the German lexicon and spoken language, while it is less the case for the fricative opposition of [ʃ] and [ç]. Moreover, whereas the variation of vowel duration has a lexical value (minimal pairs) in German, it is only a phonetic variation in standard French without lexical consequences. For this purpose, twenty native speakers of French and twenty native speakers of German were tested.
Results show that the vowel duration contrast and the opposition of [ʃ] and [ç] were processed in a different way by the participants of the two groups. In German native speakers, an MMN was present for all the tested vowel duration contrasts. Importantly, the spectral differences of the vowel pairs i.e., /ɪ-iː/, /ɛ-ɛː/ and /a-aː/ had no impact on their MMN amplitude. This result suggests that at the early stage of auditory processing, German vowel length is sufficient. Vowel duration seems to be a more salient feature than is vowel quality. The vowel length contrast is marked by duration and, depending on vowel type, by spectral differences. The perceived information is therefore acoustically rich and multidimensional. Thus, changes in the stimuli are detected at an early auditory stage.
With respect to the opposition of [ʃ] and [ç], unexpectedly no MMN was found in German native speakers. This observation might be explained by the acoustic properties of the tested stimuli. The [ʃ] and [ç] fricatives only display limited acoustic differences. The respective centres of gravity of the two fricatives are situated in similar frequency bands. [ʃ] presents merely a stronger friction sound than does [ç]. The results suggest that the acoustic-phonetic differences between [ʃ] and [ç] are not salient enough to be detected by the perceptual system at an early stage of speech processing. The latter argument is reflected by an absence of an MMN in both speaker groups.
Critically, with respect to the addressed question in the present study, French learners of German failed to show an MMN in response to the processing of the two tested contrast types. However, and importantly, we found an emerging negativity distributed across the three midline electrodes (i.e., Fz, Cz, Pz) in association with the vowel duration contrast at an early stage (i.e., 90–200 ms). This distributed but emerging negativity might indicate processing difficulties of the otherwise rich acoustic differences between short and long vowels in German at this early auditory stage after stimulus onset. As for German natives, the observed MMN amplitude did not vary according to the tested vowel pairs in non-native listeners. This result suggests that French learners of German have difficulties separating short from long vowels even if spectral differences are present depending on the vowel pair. We recall that the French lexicon does not present minimal pairs opposing short and long vowels. Acoustic variability was introduced in our experiment using stimuli produced by multiple speakers. Interestingly in our experiment, the MMN amplitude in the non-native group was not correlated with their language proficiency, their musicality, or their amount of German practice per week.
In both speaker groups, no P3a could be observed for either of the tested contrasts. A possible explanation for this absence of effect might be that the linguistic stimuli in the multi-speaker experiment did not present a salient enough acoustic difference in order to capture the participants’ attention. The P3a-component is thought to reflect attentional resources, whereas the MMN is an automatic reaction to acoustic changes. It is possible that the stimuli elicited an MMN, as for the native group, at least for the vowel duration contrast, but did not trigger the involuntary attention shift assumed to be marked by the P3a. The multi-speaker design might have hindered the involuntary attention shift for deviant stimuli as the acoustic properties of the stimuli changed with each item in the stimuli stream. Thus, the acoustic differences of the vowels and consonants might have been partially masked due to speaker change, at least at an early attentional level.
Importantly, at a later stage of language processing, in German native speakers, our results showed a late negativity in a time window between 400 and 460 ms after stimuli onset for all tested vowel length contrasts. The stimuli used for this experimental condition were natural German word stimuli. Hence, we hypothesize that the stimuli changes triggered lexical access like unrelated word pairs in a priming experiment (e.g., bitten [ˈbɪtən] (to ask, to solicit) - bieten [ˈbiːtən] (to offer)), at least in the native speaker group. The phonemic changes leading to a lexical change might have contributed to creating an incongruent word pair situation in native speakers of German but not in second language learners. The late negativity might be an N400-like component. The standard and deviant stimuli were always existing minimal pairs of the German language which seemed to trigger the N400-like neural response.
In the light of these results, we will now discuss the predictions made by the PAM-L2 model. Our results clearly show that the vowel duration contrast and the [ʃ]-[ç] contrast are not processed in the same way by none of the two groups: German natives and French learners of German. However, the L2-speech models made predictions for L2 learners without taking into consideration the acoustic information that is carried by the different contrasts. Our experiment showed that a rich acoustic contrast such as the German vowel duration contrast elicits an MMN in German natives and an emerging negativity in French learners of German. However, the negativity's amplitude was smaller in the learners’ group and displayed a different distribution in comparison with the native speakers (negativity more largely distributed over the midline electrodes in the non-native group). Furthermore, our data indicated that contrasts with small acoustic differences such as the opposition of [ʃ] and [ç] failed to elicit an MMN in both speaker groups.
According to the PAM-L2 model, the vowel duration contrast can be considered as a “category goodness difference” and should present processing differences according to the tested vowel pairs. Contrasting vowel pairs that differ not only in duration but also present spectral differences (i.e., /ɪ-iː/) should be discriminated more effortlessly than vowel pairs that differ only in duration (i.e., /a-aː/). Our results did not support this hypothesis. French native speakers seem to process more easily the non-native phonological features [-long] and [+long] than the non-native spectral changes in the vowel pairs in order to discriminate the German vowel contrast. Furthermore, the model predicted less successful discrimination of the vowel duration contrast in L2 speakers than in L1 speakers. This prediction was confirmed by our results.
With respect to the opposition of [ʃ] and [ç], the model predicted no or very little discrimination in French learners of German because of the phones’ assimilation to one single native phonemic category. This prediction holds, but also meets some limits. Our data showed that both groups lacked an MMN for the fricatives. In the light of the MMN, which indicates early auditory processing mechanisms, we hypothesize that the acoustic differences of [ʃ] and [ç] in a multi-speaker design are not salient enough to trigger an automatic early auditory response in human listeners. Thus, the poor discrimination is not directly linked to the participants’ first language but rather to the acoustic properties of the tested contrasts.
Our results suggest that a comparison of L2 system to the phonetic or phonemic inventory of the learners’ L1 does not always allow fine-grained predictions of perception difficulties in L2 speakers. Especially with respect to phones that present few acoustic differences and whose contrasting perception is difficult even for native speakers who are exposed to them on a daily basis.
5. Conclusions
The present study showed that the German vowel duration contrast and the consonant opposition involving the fricatives [ʃ] and [ç] are processed in different ways by German native and non-native participants. All tested vowel duration contrasts elicited an MMN in German native speakers and an emerging MMN with a broader midline distribution in French learners of Germans, whereas the fricative contrast did not elicit an MMN in either group. These results suggest that the non-native speakers in our experiment, although exposed to the German language on a regular basis, had processing difficulties with respect to the German vowel duration contrasts. The absence of the N400-like effect in the learner group indicates also that the vowel duration contrast is not (yet) treated as a phonological difference by the tested non-native participants. Regarding the fricative contrast, our results suggest that the acoustic differences were not salient enough to elicit an automatic auditory response in either listener group, at least in a multi-speaker setting. It seems that the perception of non-native contrasts is not only conditioned by the similarity of the non-native phones to the phones included in the phonetic inventories of the learners’ L1 but also by the acoustic properties of the stimuli themselves. Our data indicate that the acoustic differences of [ʃ] and [ç] are not salient enough to elicit an MMN, at least in a multi-speaker experiment.
Acknowledgments
We'd like to thank the Max-Planck-Institute for Human Cognitive and Brain Sciences in Leipzig (Germany) for allowing us to collect data there and to exchange with its members. We'd further like to thank Cornelia Henschel and Mahsa Bahrami who helped with the recruitment of participants and data collection. This work was funded by the French Investissements d'Avenir - Labex EFL program (ANR-10-LABX-0083).