Introduction
Spoken word recognition is continuous and dynamic: as spoken words unfold, lexical alternatives compete for recognition (TRACE, McClelland & Elman, Reference McClelland and Elman1986; Allopenna, Magnuson & Tanenhaus, Reference Allopenna, Magnuson and Tanenhaus1998). For example, as beach unfolds, cohorts, words with similar initial phonology, compete for recognition (e.g., bead). As the word continues, rhymes and other neighbors are activated (e.g., peach, batch). Competition suggests an interaction between top-down lexical representations and bottom-up phoneme processing. For bilinguals, competition is not restricted to one language (e.g., for an English–French bilingual, beach may also activate second language competitors like bien; Marian & Spivey, Reference Marian and Spivey2003a; Spivey & Marian, Reference Spivey and Marian1999). The question of the basis of between-language connections is, therefore, important: do bilinguals have separate language-specific lexical representations, or do they have lexical representations that encode phonology from both languages, and a language selection mechanism that allows words to be identified correctly from the target language? We used event-related potentials (ERPs) during a picture/spoken word matching task to investigate the dynamics of cross-language connections in bilinguals.
Temporally sensitive measures like eyetracking and ERPs have been used to investigate spoken word processing within languages. Eyetracking has revealed that phonologically similar words compete for recognition, marked by more looks to items that are phonologically related to targets (e.g., cohorts and rhymes) vs. unrelated distractors (Allopenna et al., Reference Allopenna, Magnuson and Tanenhaus1998; Desroches, Joanisse & Robertson, Reference Desroches, Joanisse and Robertson2006). ERPs have further characterized the role of phonological similarity, with two components being relevant to the current study. The Phonological Mapping Negativity (PMN) is a negative going deflection that occurs in the 200–400ms range, over frontal/central sites (see Lewendon, Mortimore & Egan, Reference Lewendon, Mortimore and Egan2020). The PMN reflects bottom-up phoneme mapping and exhibits greater negativity when incoming phonemes mismatch a phonological expectation. The N400 is a negative-going central/posterior component peaking at approximately 400 ms post-stimulus onset. It reflects lexical-semantic processing and exhibits greater negativity when incoming semantic or lexical content differs from expectation (Connolly & Phillips, Reference Connolly and Phillips1994).
ERPs during picture/spoken-word matching have revealed differences in processing cohorts vs. rhymes (Desroches, Newman & Joanisse, Reference Desroches, Newman and Joanisse2009). Cohort mismatches (e.g., see cone hear comb) result in no modulation of the PMN and late increased N400 negativity compared to unrelated and rhyme mismatches. These large late N400s for cohort mismatches reflect the simultaneous mismatch of phonology and lexicality, and the effortful processing required when initial bottom-up information matches an expectation. In contrast, rhyme mismatches (e.g., see cone hear bone) elicit a PMN and a reduced N400. The data suggest rhyme competitors are activated via top-down connections, leading to facilitated recognition of rhyme mismatches. The differential modulation of the PMN and N400 for cohorts and rhymes have been robust across typical development in English (Desroches et al., Reference Desroches, Newman and Joanisse2009; Desroches, Newman, Robertson & Joanisse, Reference Desroches, Newman, Robertson and Joanisse2013; Malins, Desroches, Robertson, Newman, Archibald & Joanisse, Reference Malins, Desroches, Robertson, Newman, Archibald and Joanisse2013), and in other languages (i.e., Mandarin, rhyme and tonal mismatches showed similar facilitation; Malins & Joanisse, Reference Malins and Joanisse2012; Malins, Gao, Tao, Booth, Shu, Joanisse, Liu & Desroches, Reference Malins, Gao, Tao, Booth, Shu, Joanisse, Liu and Desroches2014).
The dynamics of bilingual language processing
Several models have been proposed to explain language processing in bilinguals. For example, the Bilingual Language Interaction Network for Comprehension of Speech Model (BLINCS; Shook & Marian, Reference Shook and Marian2013) has interconnected levels of processing that are dynamic and interactive. When activation occurs at the conceptual level, lexical options from both languages could be subsequently activated. Lexical representations are structured via self-organizing maps: that is, words within a language tend to cluster together. However, words that overlap across languages are situated closer together relative to lexical items that do not overlap. BLINCS suggests that language co-activation occurs during processing, such that within-language and between-language cohorts and rhymes compete via feedback.
There is ample evidence that bilinguals activate both languages when processing a single language (e.g., Dijkstra, Grainger & van Heuven, Reference Dijkstra, Grainger and van Heuven1999; Duyck, Reference Duyck2005; Friesen, Jared & Haigh, Reference Friesen, Jared and Haigh2014; Haigh & Jared, Reference Haigh and Jared2007; Jared & Kroll, Reference Jared and Kroll2001; Hermans, Bongaerts, De Bot & Schreuder, Reference Hermans, Bongaerts, De Bot and Schreuder1998; Marian & Spivey, Reference Marian and Spivey2003a; Spivey & Marian, Reference Spivey and Marian1999). Cross-language phonological competition has been revealed with eye-tracking using the visual world paradigm (Spivey & Marian, Reference Spivey and Marian1999; Marian & Spivey, Reference Marian and Spivey2003a, Reference Marian and Spivey2003b). For example, when Russian–English bilinguals were asked to move a marker (in their L2) in an array of objects, there was a greater proportion of looks to a stamp (marka in Russian) vs. unrelated objects. In contrast, monolinguals treated cross-language cohorts as unrelated distractors (Marian & Spivey, Reference Marian and Spivey2003b). This cross-language effect was robust in bilinguals’ second language and of similar magnitude to within-language competitors (Marian & Spivey, Reference Marian and Spivey2003a). However, an impact of the second language on the first language was not observed, which the authors attributed to weaker activation of the second language due to lower proficiency in late bilinguals. Marian and Spivey's findings suggest bilinguals process between-language and within-language cohorts similarly; if activation is sufficiently strong, cross-language cohorts compete for recognition.
Very few studies have examined the neural basis of L2 lexical activation during L1 processing. Recently, Bobb and colleagues (Bobb, Von Holzen, Mayor, Mani & Carreiras, Reference Bobb, Von Holzen, Mayor, Mani and Carreiras2020) examined whether Spanish–Basque bilinguals activate labels from both languages when presented with pictures during an L1 task. In critical trials, the picture was followed by either an unrelated L1 auditory word or an L1 word that rhymed with the L2 label of the picture. ERPs were time-locked to the onset of the auditory word. Results indicate that labels for the picture in both languages were activated because bilinguals responded differently to cross-language rhymes than to unrelated word. However, the interpretation of the temporal dynamics in this dataset was challenging because, unexpectedly, the rhyme condition produced greater negativity than the unrelated condition.
Friesen and colleagues (Friesen, Chung-Fat-Yim & Bialystok, Reference Friesen, Chung-Fat-Yim and Bialystok2016; Study 2) investigated cross-language cohort effects. English–French bilinguals saw two pictures and simultaneously heard an English auditory cue; the task was to identify which picture matched the spoken word. The phonological competitor shared cross-language onset with the target. ERPs did not differ between this condition and the unrelated condition. Since stimuli were presented simultaneously, there may not have been sufficient time for cross-language phonological competition to occur. Thus, further investigation of the neural bases of cross-language competition is warranted using a paradigm that has established clear effects with monolinguals.
The current study
We used ERPs to examine spoken word recognition in English–French bilinguals and English monolinguals during a picture/spoken-word matching task (adapted from monolingual research; Desroches et al., Reference Desroches, Newman and Joanisse2009; Desroches et al., Reference Desroches, Newman, Robertson and Joanisse2013; Malins et al., Reference Malins, Desroches, Robertson, Newman, Archibald and Joanisse2013, Reference Malins, Gao, Tao, Booth, Shu, Joanisse, Liu and Desroches2014). We included match (BEACH-beach), unrelated mismatch (BEACH-tack), and second language cohort mismatch (L2-cohort: BEACH-plaid; which overlaps with plage, the French word for beach) trials. Past research with monolinguals showed that within-language cohort mismatches resulted in no modulation of the PMN and larger late-going N400s, reflecting the initial matching of the bottom-up input to the phonological expectation (e.g., cone and comb do not mismatch until the final phoneme). Thus, if between-language cohorts behave like within-language cohorts on this task, this would require the bilingual to generate an explicit phonological expectation for the target picture in both of their languages (e.g., they expect to hear either beach or plage). However, phonological representations can be activated even if they are not explicitly expected, as has been observed for rhyme and tonal mismatches within-language. That is, even when a participant develops a strong target expectation, top-down connections from that target lead to activation of competitors, marked by reduced N400s to these types of mismatches. It is possible that L2-cohort mismatches might behave similarly: participants develop an explicit expectation for the L1 target phonology; however, the L2 target phonology is nevertheless activated via top-down representations. In this case we would expect that L2-cohort mismatches would result in an increased PMN compared to matches (reflecting the mismatch between the bottom-up input and phonological expectation of the L1 target), and a reduced N400 compared to unrelated mismatches, reflecting the top-down activation of the L2 phonology and its overlap with the spoken word.
Method
Participants
Data was collected from 40 right-handed participants who self-identified as functionally monolingual or as English–French bilinguals. Data was excluded from 2 bilinguals who named fewer than 70% of the stimuli in French, and 2 monolinguals who had excessive EEG artifacts. Data were also excluded from 3 participants who self-identified as monolinguals but successfully named more than 40% of the stimuli in French. The final sample included 17 bilinguals (M age = 24.1, SD age = 3.5; 11 females) and 16 monolinguals (M age = 24.1, SD age = 4.0; 10 females).
Table 1 includes self-report data on age of language acquisition, the current language use and language proficiency ratings for each language for each group. It reports the number of years students were educated solely in English and the number of years students took at least one French course. Note, monolinguals were required to take a French course from middle school (age 10) to the first year of high school while at English school (4 years on average). Bilinguals only had on average 2.3 years of education without any French instruction (often at University). On average, they spent 13.9 years with instruction in French for all core subjects. Importantly, bilinguals and monolinguals did not differ on their age of English acquisition, t(29) = -0.07, ns, or their self-rated English Proficiency, t(29) = 1.21, ns. They did differ significantly on the age of French acquisition, t(29) = 6.05, p < .001, their self-rated French ability, t(29) = 11.32, p <.001, and on the French picture naming accuracy, the objective measure of French proficiency, t(29) 17.88, p <.001.
Materials and procedure
For the picture-word matching task (E-prime-2), each picture (e.g., beach) was paired with three monosyllabic auditory stimuli: a match (e.g., beach), an unrelated mismatch (e.g., tack) and an L2-cohort competitor mismatch (L2-cohort, e.g., plaid). L2-cohort competitors shared cross-language onset with the correct word (i.e., plaid is unrelated to beach in English; however, it sounds like the French word for beach, plage). See the Appendix for the list of items (Supplementary Materials). Pictures were color stock images on a white background (27” monitor; resolution,1920 x 1080), and spoken words were presented over speakers (recorded by an adult female English monolingual;16 bits, 44,100 Hz, normalized).
There were 232 trials with equal proportions of matches and mismatches. There were 29 critical trials for each condition (match, L2-cohort, unrelated). Frequency did not differ across conditions (HAL, Lund & Burgess, Reference Lund and Burgess1996; Match: 40442.8, L2-Cohort: 27591.0, Unrelated: 27591.0, F(2,56) = 0.5, p = .65). Filler trials, which had no L2-cohort competitors and were not analyzed, were included to ensure equal proportions of matches and mismatches (87 match and 58 unrelated fillers). On each trial, a fixation-cross appeared for 250 ms, followed by the picture. After 1500 ms, a spoken word was presented while the picture remained on screen. Participants indicated whether the spoken word matched the picture with a button press (“yes,” left index finger; “no,” right index finger). They were allowed 2500 ms from the onset of the spoken word to respond, with 1000 ms between the response and the next trial.
When bilingual participants were recruited, they were informed that they needed to be proficient in both English and French. However, when greeted at the lab, no mention of French was made. Prior to the experiment, participants were asked to name each picture in English to ensure they would develop strong expectations of that label. If a different label was provided, corrective feedback was given (e.g., saying flower instead of rose). At no point prior to the experiment were participants explicitly cued to the French words for the items. Participants performed six practice trials to ensure that they understood the procedure. Following the experiment, we assessed participants’ French knowledge by asking them to report the French name for each picture, and via a language history questionnaire.
Electrophysiological recording
Electroencephalogram (EEG) was recorded at 500 Hz (32 Ag/AgCl active electrodes; actiCHamp, Brain Products GmbH, Gilching, Germany, with a standard 10/20 placement), digitally referenced to Fz (re-referenced TP9/TP10, approx. mastoids) with impedances kept below 20 kΩ. Data (analyzed with Brain Analyzer 2.0) was filtered off-line (60 Hz notch; 0.01 Hz, 24 dB/Oct zero phase shift high pass). Data were segmented (-200 to 800 ms) and trials were baseline corrected to the pre-stimulus interval. Blinks were removed using ICA ocular correction (one component per participant on average). Trials with voltages +/- 85 μV on the sites of interest identified by past research (Fz/Cz/Pz) were removed (on channels of interest, to avoid removing trials with high voltages only on channels not of interest). Data were averaged for each participant for each condition. Participants had an average of 25.4, 26.6, and 26.3 trials included for the match, L2-cohort, and unrelated conditions, respectively. Mean amplitudes were calculated for intervals of interest, selected based on both visual inspection of the data and on past research (N400: 350–500 ms; PMN: 300–350 ms). After analyses were complete, we applied a low pass filter for visualization (30 Hz, 24 dB/Oct).
Results
This task is designed to be easy, and like past studies, accuracy was at ceiling (see Table 2). Also like past studies, there was an effect of condition (F(2,62) = 30.50, p <.001), with match trials showing faster latencies than mismatches (ps < .001, Bonferroni corrected). No significant interactions or group effects were observed for either measure; thus, we focused on analysis of the electrophysiological data.
3.1 N400
The N400 was analyzed from 350–500 ms at posterior sites (Figure 1). A 3-site (Pz, P3, P4) by 3 condition (match, L2-cohort, unrelated), by 2 group (bilingual, monolingual) repeated measures ANOVA revealed a main effect of condition (F(2,62) = 35.56, p < .001, η2 = .534). Posthoc comparisons revealed that both the L2-cohort and the unrelated mismatch had significantly more negative amplitude than the match condition (p < .05, Bonferroni corrected). Overall, no differences emerged between mismatch conditions. However, this main effect was qualified by a group by condition interaction that approached significance (F(2,62) = 3.03, p = .061, η2 = .089). We had strong a priori predictions related to group differences between the mismatch conditions, which may have been masked in this ANOVA due to the inclusion of the match condition. To pursue our a priori predictions regarding group effects on the unrelated and L2-cohort mismatches, a 3-site (Pz, P3, P4) by 2 condition (L2-cohort, unrelated), by 2 group (bilingual, monolingual) repeated measures ANOVA was performed. This ANOVA revealed a significant interaction between group and condition (F(1,31) = 5.39, p < .027, η2 = .148, estimated power = 0.991 (G*Power 3.1, Faul, Erdfelder, Buchner & Lang, Reference Faul, Erdfelder, Buchner and Lang2009). Bonferroni pairwise comparisons within each group revealed that for bilinguals, N400s were significantly reduced for L2-cohort vs. unrelated mismatches (p = .013). No differences were observed between conditions for monolinguals (p = .507).
PMN
The PMN was analyzed from 300–350 ms at F3, Fz, F4; C3, Cz, and C4 (Figure 1). These sites were selected since the PMN typically has a frontal-central distribution (Lewendon et al., Reference Lewendon, Mortimore and Egan2020). A 2-row (frontal, central) by 3 column (left, center, right) by 3 condition (match, L2-cohort, unrelated), by 2 group (bilingual, monolingual) repeated measures ANOVA revealed a main effect of condition (F(2,62) = 8.11, p = .001, η2 = .207). Bonferroni pairwise comparisons revealed significant PMNs, marked by greater negativity for each of the mismatch conditions compared to match (ps < .005). No difference in the PMN was observed between L2-cohort vs. unrelated mismatches (p = 1.0). There were no significant interactions that included both group and condition (all Fs < 1.1).
Discussion
We used a picture/spoken-word matching paradigm to examine cross-language cohort competition in English–French bilinguals vs. monolinguals. We found group differences in N400 modulations to L2-cohorts vs. unrelated mismatches. Specifically, bilinguals showed reduced N400s to L2-cohorts vs. unrelated mismatches, while monolinguals treated both conditions similarly. Importantly, the results revealed that bilinguals activated the pictures’ labels in both their languages when performing a task in L1. Moreover, they help to specify the nature of cross-language connections, indicating that top-down cross-language connections exist between lexical representations and phonology.
Our study revealed that between-language cohort mismatches resulted in large PMNs and reductions in the N400. These results contrast the reduced PMNs and the large late-going N400s observed in previous studies of within-language cohorts (Desroches et al., Reference Desroches, Newman and Joanisse2009; Malins et al., Reference Malins, Desroches, Robertson, Newman, Archibald and Joanisse2013). Thus, although our results indicate between-language cohort activation occurred, the nature of this cohort activation differs from within-language cohort effects. These results may be accounted for by considering an expectancy generation explanation. In a within-language task, listeners generate a phonological expectation from the picture in the target language (e.g., beach). When a within-language cohort is presented (e.g., bead), the bottom-up input is initially consistent with the expectation resulting in the reduced PMN. At the point of uniqueness, the listener must engage in effortful processing to identify the cohort as a mismatch, resulting in greater N400 negativity. For between-language cohorts, our results indicate that bilinguals do not develop an explicit phonological expectation of the French word (e.g., plage), thus the between-language cohort (e.g., plaid) does not elicit the same pattern of results as a within-language competitor on this task.
Nevertheless, the results indicate that bilinguals do activate L2 phonological representations for targets on this task, marked by reduced N400s to L2-mismatches. This pattern of response for L2-cohorts for bilinguals is strikingly similar to what has been observed for rhymes and tones, which also show reduced N400 effects (e.g., Desroches et al., Reference Desroches, Newman and Joanisse2009; Malins et al., Reference Malins, Desroches, Robertson, Newman, Archibald and Joanisse2013). Desroches and colleagues suggested that this N400 reduction occurs because top-down connections from the expectation to phonologically similar items leads to facilitated recognition of rhyme mismatches. We suggest that something similar is going on for the L2-competitors for bilinguals on this task: seeing a picture of a beach leads to top-down activation of the phonemes in plage, even though plage is not expected. For L2-mismatch trials, hearing plaid after seeing beach leads to reduced N400 responses compared to unrelated, which could only happen if L2 phonology was activated.
Despite the differences between within-language and between-language cohort effects, we do not consider our findings to be inconsistent with those of Marian and colleagues (Spivey & Marian, Reference Spivey and Marian1999; Marian & Spivey, Reference Marian and Spivey2003a, Reference Marian and Spivey2003b), but rather complementary. For cohorts in the visual world paradigm there is explicit competition: listeners select between objects in a display, and both target and cohort pictures match the initial bottom-up input. In contrast, our design does not require the listener to select between objects. Instead, our paradigm investigates how top-down expectations may influence and interact with subsequent bottom-up processing. Importantly, this effect, marked by reduced N400s compared to unrelated mismatches, has been observed using rhyme mismatches in English, rhyme and tonal mismatches in Mandarin, and now using cross-language cohorts in English–French bilinguals.
Our results provide strong evidence of language co-activation consistent with models of written and spoken bilingual word recognition (e.g., Dijkstra, Wahl, Buytenhuijs, van Halem, Al-Jibouri, de Korte & Rekké, Reference Dijkstra, Wahl, Buytenhuijs, van Halem, Al-Jibouri, de Korte and Rekké2019; Shook & Marian, Reference Shook and Marian2013). However, current bilingual models underspecify the role of language expectations in word recognition. BLINCS (Shook & Marian, Reference Shook and Marian2013) is driven by bottom-up relationships between words and does not specify a global language system to account for language expectancies. Similarly, the Multilink model (Dijkstra et al., Reference Dijkstra, Wahl, Buytenhuijs, van Halem, Al-Jibouri, de Korte and Rekké2019) does not specify a top-down role for language nodes to generate expectations about single or dual language input, but rather explains effects at the lexical level through resting activation. This distinction is important, given our results suggest that language expectations themselves impact the nature of cross-lexical activation in our paradigm. The current study demonstrates the importance of expectancy in how language co-activation is manifested at the neural level and this will need to be accounted for in future models of bilingual activation.
Supplementary Material
Supplementary material can be found online at https://doi.org/10.1017/S1366728922000049