Navigating the bilingual cocktail party: a critical role for listeners’ L1 in the linguistic aspect of informational masking

Emilia Lew; Sophie Hallot; Krista Byers-Heinlein; Mickael Deroche

doi:10.1017/S1366728924000944

Navigating the bilingual cocktail party: a critical role for listeners’ L1 in the linguistic aspect of informational masking

Published online by Cambridge University Press: 04 December 2024

Emilia Lew

Sophie Hallot ,

Krista Byers-Heinlein

and

Mickael Deroche

Show author details

Emilia Lew: Affiliation:
Laboratory for Hearing and Cognition, Psychology Department, Concordia University, Montreal, QC, Canada Centre for Research on Brain, Language & Music, Montreal, QC, Canada
Sophie Hallot: Affiliation:
School of Medicine, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC Canada Centre for Research on Brain, Language & Music, Montreal, QC, Canada
Krista Byers-Heinlein: Affiliation:
Concordia Infant Research Laboratory, Psychology Department, Concordia University, Montreal, QC, Canada Centre for Research on Brain, Language & Music, Montreal, QC, Canada
Mickael Deroche*: Affiliation:
Laboratory for Hearing and Cognition, Psychology Department, Concordia University, Montreal, QC, Canada Centre for Research on Brain, Language & Music, Montreal, QC, Canada
*: Corresponding author: Mickael Deroche; Email: mickael.deroche@concordia.ca

Article contents

Abstract
Highlights
Introduction
Method
Data analysis
Results
General discussion
Conclusion
Data availability
Competing interest
Footnotes
References

Rights & Permissions

Abstract

Cocktail party environments require listeners to tune in to a target voice while ignoring surrounding speakers. This presents unique challenges for bilingual listeners who have familiarity with several languages. Our study recruited English-French bilinguals to listen to a male target speaking French or English, masked by two female voices speaking French, English or Tamil, or by speech-shaped noise, in a fully factorial design. Listeners struggled most with L1 maskers and least with foreign maskers. Critically, this finding held regardless of the target language (L1 or L2) challenging theories about the linguistic component of informational masking, which contrary to our results predicts stronger interference with greater target-to-masker similarity such as L2 vs L2 compared to L2 vs L1. Our findings suggest that the listener’s familiarity with the masker language is an important source of informational masking in multilingual environments.

Keywords

speech intelligibility auditory masking bilinguals cocktail party problem

Type: Research Article
Information: Bilingualism: Language and Cognition , First View , pp. 1 - 9

DOI: https://doi.org/10.1017/S1366728924000944 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use and/or adaptation of the article.
Open Practices: Open data
Copyright: © The Author(s), 2024. Published by Cambridge University Press

Highlights

• Two hypotheses are at play in speech intelligibility against different languages.
• Recognition of L2 targets (but not L1 targets) can disentangle the two hypotheses.
• All bilinguals had more difficulties ignoring an L1 masker than an L2 or foreign masker.
• Informational masking is particularly strong with a native masker.

1. Introduction

Imagine you are attending a crowded cocktail party, trying to hear what a friend is saying over the noise in the room. To make matters even more challenging, you happen to be at a scientific conference in Montreal with international attendees speaking a variety of different languages. In this situation, would you have more trouble understanding your friend if they spoke your first or your second language? Would it matter what language the other guests were speaking? There is considerable literature on the cocktail party problem (Cherry, Reference Cherry1953), and most of this research is concerned with the mechanisms underlying speech recognition, spatial hearing, auditory masking and source segregation, among other factors (McDermott, Reference McDermott2009). To conduct such studies, experimenters primarily recruit listeners who are first-language speakers of the materials used in the test and tend to test monolingual listeners disregarding other language(s) they may have been exposed to (Linck, Osthus, Koeth & Bunting, Reference Linck, Osthus, Koeth and Bunting2014; Melby-Lervåg & Lervåg, Reference Melby-Lervåg and Lervåg2014; Yow & Li, Reference Yow and Li2015). While this approach can be useful in reducing variability between participants to delve into psychoacoustics and the mechanisms of speech processing, these studies do not address the experiences of the estimated half of the global population who speak two or more languages (Grosjean, Reference Grosjean, Grosjean and Li2012). Research that has included bilingual participants has often compared them to monolinguals (e.g., Cooke et al., Reference Cooke, Lecumberri and Barker2008; Broersma & Scharenborg, Reference Broersma and Scharenborg2010; Lecumberri et al., Reference Lecumberri, Cooke and Cutler2010; Calandruccio & Zhou, Reference Calandruccio and Zhou2014; Bidelman & Dexter, Reference Bidelman and Dexter2015) and rarely explored differences amongst bilinguals in their speech perception performance (Luk, Reference Luk2015; de Bruin, Reference de Bruin2019; DeLuca, Rothman, Bialystok & Pliatsikas, Reference DeLuca, Rothman, Bialystok and Pliatsikas2019; Kim et al., Reference Kim, Marton, Obler, Sekerina, Sradlin and Valina2019). Yet, these individual differences (e.g. if they rely more on speaking/listening proficiency, speaking/listening use or age of acquisition) could provide insight into the top-down processes that help deal with cocktail party situations (Bregman, Reference Bregman1990). More precisely, they provide a valuable opportunity to rethink the linguistic aspect of informational masking.

1.1. Energetic and informational masking

In situations where multiple auditory signals are present, two types of phenomena can interfere with a listener’s ability to detect and process the target signal: energetic masking and informational masking. Energetic masking is relatively well-defined (Culling & Stone, Reference Culling, Stone, Middlebrooks, Simon, Popper and Fay2017): it occurs when the target and masking stimuli share similar acoustic content, for example, when two people are talking at the same time (temporal similarity) or have similar spectral content (frequency similarity). On the other hand, there is no positive, accepted definition of informational masking. It is often defined negatively, as what is left of the difficulty in understanding a masked speaker, after energetic masking has been accounted for (Kidd, Mason, Richards, Gallun & Durlach, Reference Kidd, Mason, Richards, Gallun, Durlach, Yost, Popper and Fay2008; Bronkhorst, Reference Bronkhorst2015).

To illustrate, it is possible to generate artificial stimuli that approach the spectro-temporal content of masking voices and thus present a similar level of energetic masking yet being largely devoid of linguistic units (Hawley, Litovsky & Culling, Reference Hawley, Litovsky and Culling2004; Deroche & Culling, Reference Deroche and Culling2013; Leclère et al., Reference Leclère, Lavandier and Deroche2017). When instructed to attend to a target speaker, listeners tend to find such artificial maskers easier to ignore than the voices on which they were modelled. The presence of distracting linguistic units tends to grasp listener’s attention and adds a level of cognitive difficulty due to the automatic processing that occurs upon hearing speech. This is (at least partly) what we refer to as informational masking in a cocktail party environment. Though it is possible to uncover the presence of informational masking with attentional tasks devoid of energetic masking (e.g., random-frequency multitone bursts; Oxenham. Fligor, Mason & Kidd, Reference Oxenham, Fligor, Mason and Kidd2003), in ecological speech-on-speech situations there is always a blurry line between the energetic and informational components (Brungart, Reference Brungart2001; Brungart, Simpson, Ericson & Scott, Reference Brungart, Simpson, Ericson and Scott2001; Kidd, Mason, & Gallun, Reference Kidd, Mason and Gallun2005). As a result, when a cue allows listeners to segregate the target from the masker perceptually (e.g. a difference in voice pitch or spatial position) to obtain a better performance (aka masking release), the energetic and informational components of the masking release are often difficult to disentangle (e.g. Deroche et al., Reference Deroche, Culling, Lavandier and Gracco2017a). The language of competing speakers is yet another cue involving both energetic and informational masking, but we have little understanding of their respective contributions.

1.2. The masking of competing languages

Let us consider for a moment the sort of masking that occurs specifically because a target and interfering speaker speak in the same language. On the energetic side, sounds that belong to the phonetics of a particular language would likely mask each other more than sounds that belong to another language. For example, English vowels would be more likely to occupy certain spectral regions (despite talker variability), and English syllables more likely to possess the sort of envelope modulations common to other English syllables (this is what makes the rhythm of a given language so unique). Classically, this phenomenon is demonstrated in studies of monolingual participants, who are better able to ignore a masker speaking a foreign language as compared to their first-language L1 (Van Engen & Bradlow, Reference Van Engen and Bradlow2007; Calandruccio, Dhar & Bradlow, Reference Calandruccio, Dhar and Bradlow2010). However, note that the energetic account only relies on the fact that the masker language is different (not foreign) such that the lower target-to-masker acoustic similarity provides a substantial masking release. An informational masking account is also at play in this scenario, and it can be framed in two ways. The first one is also based on a target-to-masker similarity idea but at a linguistic (or more cognitive) level. To illustrate, a listener attending to the sentence “the dog eats a bone” would be more distracted by a competing word taken from the same language or lexical field (e.g. the English word “chocolate”) than a word in a different language or lexical field (e.g. the French word “chocolat”) despite both words approaching a similar spectro-temporal content. The second perspective is that a distracting voice conversing in the listener’s L1 might be particularly difficult to ignore as the brain cannot help but process linguistic units that are familiar (and often native) to the listener. This idea is generally referred to as the listener’s language familiarity hypothesis. Notably, with monolingual participants, or with a task restricted to L1 targets, the two hypotheses make similar predictions: a foreign masker would result in lower target-to-masker similarity and would capture less efficiently the listener’s native language system. With targets speaking in a second language (L2), however, the two hypotheses make different predictions: an L2 masker would result in higher target-to-masker similarity than an L1 masker (and this is true at both energetic and informational levels) while less efficiently capturing the listener’s native language system than an L1 masker. Of course, being able to even conduct a speech recognition task with L2 targets requires participants to have a minimum level of L2 proficiency, and this is where studies with bilinguals allow testing these competing hypotheses.

1.3. Bilingual studies

Several studies have focussed on monolingual-bilingual comparisons. For example, English monolinguals and Mandarin-English bilinguals (L1 Mandarin, who had demonstrated lower proficiency in English than monolinguals) were tested on their recognition of English targets versus English or Mandarin maskers (Van Engen, Reference Van Engen2010). Mandarin maskers were less difficult for both groups (partly due to less energetic masking) but particularly less for English monolingual listeners. However, it is unclear whether the bilinguals’ knowledge of both masker languages (English and Mandarin) or lower proficiency in the target language (English) drove this difference. In a similar study, English monolinguals and Greek-English simultaneous bilinguals (whose proficiency in English was close to that of the English monolinguals) were tested on their recognition of English targets versus English or Greek maskers (Calandruccio & Zhou, Reference Calandruccio and Zhou2014). Again, Greek maskers were less difficult overall, consistent with the energetic masking account, but no interaction between the group and masker language was found. A few more studies (Kilman, Zekveld, Hällgren & Rönnberg, Reference Kilman, Zekveld, Hällgren and Rönnberg2014; Brouwer, Van Engen, Calandruccio, & Bradlow, Reference Brouwer, Van Engen, Calandruccio and Bradlow2012 – specifically experiment 2; and Mepham, Bi and Mattys, Reference Mepham, Bi and Mattys2022 – specifically experiment 3) have more readily tested the similarity versus familiarity hypotheses: in a nutshell (but see General Discussion for a further description) their results emphasised the primary importance of target-to-masker similarity while spotting signs that the listener’s language background played a role.

One limitation of studies to date is that none used a full factorial design, where bilingual participants were tested in conditions with L1 and L2 targets, against L1 and L2 (or, Lf) maskers. Examples of such ‘missing’ conditions include bilingual listeners with English dominance performing the task with Mandarin targets in Van Engen’s study, Swedish targets in Kilman et al.’s study, Dutch targets in Brouwer et al.’s study and Mandarin targets in Mepham et al.’s study. Without such conditions, it is not possible to fully disentangle the two accounts. Furthermore, there are always (true in all the studies aforementioned) non-negligible differences in energetic masking between the materials spoken in two different languages. This is a notoriously difficult problem to solve but one way to approach it is using mirror populations to counterbalance these differences. In this study, we addressed this shortcoming by using a full factorial design of L1 or L2 targets versus L1, L2 and Lf maskers (a completely foreign language, Tamil) in addition to speech-shaped noise maskers. Noise maskers were shaped in the long-term spectrum of each language, respectively, which provided an additional check on differences primarily driven by energetic masking. We tested these conditions in mirror populations: namely French-L1 or English-L1 bilingual listeners with varying degrees of proficiency respectively, English or French as L2. This mirror design was key in allowing us to define L1, L2 or Lf relative to the listener’s profile, and not relative to the languages’ identity.

1.4. Hypothesis

This investigation (primarily focussed on how the masker language induces interference in speech recognition) allowed us to disentangle two competing hypotheses: the target-to-masker similarity hypothesis and the language familiarity hypothesis. The target-to-masker similarity hypothesis predicts that participants would perform worse when the target and masker are in the same language compared to distinct languages. Explicitly, situations of L1 target vs L1 masker or L2 target vs L2 masker should be harder than situations of L1 target vs L2 masker or L2 target vs L1 masker, respectively. And importantly, performance with an Lf masker should be similar to performance with an L2 masker, as neither has substantial energetic nor informational overlap with L1. The language familiarity hypothesis predicts that participants would have worse performance when the masker spoke in a language that is familiar to the listener, regardless of the target language. Explicitly, situations of L1 target vs L1 masker or L2 target vs L1 masker should be harder than situations of L1 target vs L2 masker or L2 target vs L2 masker, respectively, themselves harder than situations of L1 target vs Lf masker or L2 target vs Lf masker. By measuring the speech reception threshold (SRT) under all these experimental conditions, and relative to the listener’s language background, we could test which pattern of predictions occurred.

2. Method

2.1. Participants

A total of 200 French-English bilingual participants were recruited through the Prolific platform. All participants spoke either English or French as their L1, and the other language as their L2, and were between 18 and 50 years of age. A total of 72 participants were excluded for technical issues, incomplete data, inability to complete study in L2, or not following the instructions. This resulted in 57 participants in the L1 ENGLISH group (40 women and 17 men) and 71 participants in the L1 FRENCH group (34 women, 36 men, 1 not reported). The two groups were matched in student status (43.4% students; χ ²(2) = 0.2, p =.916) and employment status (53.1% employed; χ ²(2) = 1.5, p =.462). Note that some participants were missing data for their student and employment status leading to a third level (unknown status) in these two analyses.

The L1 ENGLISH and L1 FRENCH groups differed in the country of residence (χ ²(3) = 97.0, p <.001), which was expected as the countries from which participants were recruited have different official languages (English in the UK and USA, French in France and both English and French in Canada). Similar statistics were found with country of birth (χ ²(11) = 98.1, p <.001) and nationality (χ ²(3) = 99.1, p <.001). Unintentionally, participants in the two groups differed in sex distribution (χ ²(1) = 6.0, p =.014), as well as chronological age (t(126) = 3.0, p =.003). The L1 ENGLISH group had a majority (70%) of female participants and was on average (std) 31.7 (9.5) years old, while the L1 FRENCH group was more balanced in sex (49% female) and was a little younger, on average (std) 27.3 (6.9) years old. This likely had a negligible effect: be it for noise maskers (section 4.1) or speech maskers (section 4.2), sex did not interact with the factor of interest (group, target language or masker language, all p-values ≥.085). As for chronological age, it is well known that performance in speech perception tasks can degrade with age (e.g., Murphy, Daneman & Schneider, Reference Murphy, Daneman and Schneider2006; Schneider, Daneman & Pichora-Fuller, Reference Schneider, Daneman and Pichora-Fuller2002; Schneider, Li & Daneman, Reference Schneider, Li and Daneman2007) but these effects are not expected until later in life (often >60 years of age, e.g., Schneider, Speranza & Pichora-Fuller, Reference Schneider, Speranza and Pichora-Fuller1998; Bilodeau-Mercure, Lortie, Sato, Guitton & Tremblay, Reference Bilodeau-Mercure, Lortie, Sato, Guitton and Tremblay2015). Curiously, the two groups also differed in the amount of time they took to complete the study (t(126) = −2.1, p =.037). Participants in the L1 ENGLISH group took a mean (std) of 73.7 (21.4) minutes to complete the study, while the L1 FRENCH group took a mean (std) of 82.1 (22.9) minutes. This observation is presumably not very important: people generally wrote more words in their L1 than their L2, and written French is known to be longer than written English (Durieux, Reference Durieux1990).

Most importantly, the two groups differed (as intended) in their language background. This was assessed very simply by asking the age of acquisition (AOA), the listening proficiency and speaking proficiency (a number from 0 to 10), and the use of listening and use of speaking (a number from 0 to 10). This was done in L1 and L2, being either French or English, as all participants identified themselves as French-English bilinguals with varying degrees of fluency. Of the participants, 17.2% spoke three or four languages. Additional languages were Bulgarian, Chinese, German, Italian, Luxembourgish, Moroccan Arabic, Russian, Spanish and Welsh, with an average (std) listening proficiency of 6.3 (2.7) and an average (std) speaking proficiency of 5.1 (2.4). These L3 and L4 data were ignored in this study. A mixed analysis of variance (ANOVA) was conducted for each bilingualism metric with one between-subjects factor (group) and one within-subjects factor (language: L1 or L2). The main effects and interactions are reported in Table 1 (along with post-hoc tests to probe significant interactions at each level). The main effect of language was always significant, confirming that all participants acquired their L1 much earlier than their L2 (0.9 [SD = 2.1] vs 9.3 [SD = 4.4] years old) and had better listening proficiency (9.9 [SD = 0.4] vs 7.9 [SD = 1.3]), speaking proficiency (9.9 [SD = 0.4] vs 7.3 [SD = 1.6]), listening use (9.8 [SD = 0.8] vs 6.3 [SD = 2.7]) and speaking use (9.8 [SD = 0.9] vs 5.2 [SD = 2.9]) in their L1 compared to their L2. None of this is surprising, but it gives a sense of the imbalance of this sample of French-English bilinguals between their two languages. Less expected was the main effect of the group and its interaction with language: it was significant for every variable except AOA. Post-hoc pairwise comparisons between the two groups were never significant in L1 (i.e., the fluency of the L1 FRENCH group in French was comparable to that of the L1 ENGLISH group in English) but were always significant in L2, namely that the L1 FRENCH group had better proficiency and more frequent use in English than the L1 ENGLISH group in French (mean difference (MD) for listening proficiency = 0.9, MD for speaking proficiency = 0.7, MD for listening use = 2.4, MD for speaking use = 1.7). This was unintended (but did not seem to have much impact – see section 4.1) and may reflect the higher global mastery of English compared to French.

Table 1. Statistics on the participants’ fluency in L1 and L2

We expected these bilingualism measures to be highly correlated with one another. This was the case among all L2 proficiency and use variables (all p <.001, all R ² ≥.195), but none of them correlated with L2 AOA (all p ≥.298, all R ² ≤.01).

2.2. Stimuli

The English target stimuli were sourced from the Institute of Electrical and Electronics Engineers’ recommended practice for speech quality measurements, often termed the Harvard sentences (IEEE, 1969). This corpus of 720 phonetically balanced, standardised English sentences was originally created to test audio quality in various telephone systems but has since expanded in use in psychoacoustic research. The speaker of these target stimuli was a North American male. The French target stimuli were a phonetically balanced translation of the Harvard sentences, termed the Fharvard corpus (Aubanel, Bayard, Strauß & Schwartz, Reference Aubanel, Bayard, Strauß and Schwartz2020; openly available), produced by a French male speaker (a different individual than the North American adult). In both corpora, the stimuli were trimmed to leave roughly 150 ms of silence before onset (and 300 ms after offset) in an attempt to make targets start roughly at the same time across trials. Lists contained ten sentences each and were arranged on the basis of sentence duration so that targets were always shorter than the corresponding 2-sentence maskers.

In contrast to target stimuli, the masker stimuli were created for the purpose of this study. All English, French and Tamil masking stimuli were recorded by a single trilingual woman to keep speaking characteristics of the masker relatively constant. She acquired all three languages roughly simultaneously. Her sex was selected to lessen to some degree the energetic masking between the target and the masking voices. As competing speech tasks are already very challenging in one’s first language, let alone one’s second language, we wanted to provide salient cues to direct attention to the correct voice. She first translated all English transcripts into Tamil and then used the iPhone Voice Memo application using the internal microphone, holding the iPhone 10–15 cm away from her mouth, in a quiet room in her home. She read each sentence from a script in her natural speaking voice with 2 seconds between each production. Recordings were broken down into 8 lists of 10 sentences, and she was instructed to leave 1 minute of silence at the start of a recording, which was subsequently used to filter out any background noise using a spectral subtraction method (Boll, Reference Boll1979), conducted on Audacity version 2.1.1 (https://www.audacityteam.org/). Audio files were cut in Audacity for disfluencies and extended pauses. The most fluent and natural productions were then selected, ensuring that (1) they did not belong to any of the target lists and (2) they contained few pauses between syllables (ideally continuously voiced). To create a masker list (of 10 maskers, each consisting of 2 simultaneous sentences spoken in the same language), five sentences were selected and added in pairs in all permutations. We manually shifted the timing of each sentence in a pair to optimise the pseudo-stationarity of the combination waveform, leaving relatively few temporal dips where listeners could glimpse target words (Collin & Lavandier, Reference Collin and Lavandier2013; Leclère, Lavandier & Deroche, Reference Leclère, Lavandier and Deroche2017). All maskers were finally root-mean-square equalised at the same level as the targets (i.e., a target was as intense as a 2-voice masker).

2.3. Design and protocol

The competing speech task consisted of 20 blocks per participant, with 10 trials per block. Each participant began the study with two practice blocks: the first with English target sentences masked by English sentences and the second with French target sentences masked by Tamil sentences. None of the materials in the practice blocks were used in the rest of the study. Transcripts of the two masking sentences were displayed on the screen, both during the practice blocks and the trial blocks, to aid participants in understanding which voices not to listen to (depiction of experimental interface illustrated in Figure 1). Listeners were instructed to ignore the sentences depicted on the screen and to listen instead to the third sentence (a relatively common practice – see e.g. Hawley, Litovsky & Culling, Reference Hawley, Litovsky and Culling2004).

Figure 1. Depiction of experimental interface: (A) listening portion with instructions and masking sentences written on screen, (B) response portion and (C) self-grading portion.

The first trial of each block started with a target-to-masker ratio (TMR) at −16 dB, that is, with a target sentence much quieter than the two maskers. Participants were allowed to repeat the first trial as many times as necessary, with each repetition increasing the target level by 4 dB while the combined masker level was fixed. Participants were instructed to move on to the next trial once they were able to hear about half of the target sentence. At the end of each trial, participants were asked to type as much of the target sentence as they could. They were then presented with the correct transcript and asked to self-score the number of keywords they correctly typed (see Supplementary Material 1 for a detailed analysis of the self-scoring accuracy). Each target sentence contained five keywords, written in capital letters. If the listener identified three or more keywords correctly, the target level decreased by 2 dB, making the next trial more difficult. If the listener identified two or fewer keywords correctly, the target level increased by 2 dB, making the next trial easier. At the end of each block, this 1-up/1-down adaptive threshold method (Plomp & Mimpen, Reference Plomp and Mimpen1979) provided one value calculated as the mean TMR over the last eight trials; it was assumed to bracket the TMR required to achieve 50% intelligibility. This final SRT value serves as the dependent variable in our experiment.

After completing two practice blocks, participants completed 12 blocks measuring two SRTs for each of the six speech-in-speech conditions (two target languages by three masker languages). While each of the target sentences was presented to every listener in the same order, the order of the masking conditions was rotated for successive listeners, to counterbalance effects of order and material. They then completed six blocks measuring three SRTs for each of the two target languages against speech-shaped noise, where no transcript was displayed on the screen. Once again, these six blocks were counterbalanced.

2.4. Equipment

Because the experiment was delivered online during the COVID-19 pandemic, we were unable to control the audio quality presented to each participant. Instead, we asked participants to report whether they were listening through earbuds, headphones, loudspeakers or through the default output of their computer. The two groups differed in the type of audio output (χ ²(3) = 12.3, p =.006). In the L1 ENGLISH group, the most common audio output was the default output of their computer (36.8% of the group), while it was headphones (52.9% of the group) in the L1 FRENCH group. This difference was unfortunate but likely negligible since the two groups did not differ from one another in their SRTs against either noise or speech maskers (see results). We also asked them to report on a scale of 1–5 how good their audio quality was, where 1 was “poor” and 5 was “excellent”. The two groups did not differ in these subjective ratings (χ ²(2) = 2.9, p =.232). We found no impact of audio quality on SRT performance with either noise or speech maskers (all p-values ≥.252). Participants were instructed to set the volume of their output to a comfortable level during the practice blocks at the beginning of the task and to not touch the volume afterwards. All stimuli were presented at a sampling frequency of 44.1 kHz, with a 32-bit resolution. All subjects provided informed consent online in accordance with the Institutional Review Board at Concordia University (ref: 30013650) and were compensated £7.50 for completing the study, or £3.75 in the case of withdrawal from the study.

3. Data analysis

3.1. Speech-in-noise conditions

The effect of target language was first examined from the SRTs collected against speech-shaped noise maskers. A linear mixed-effect (LME) model was fitted on the DV (SRT in noise) with two fixed factors: group (L1 ENGLISH and L1 FRENCH) and target language (L1 and L2). We included random intercepts and slopes by participants and by lists. Each main effect and each interaction was tested by likelihood ratio tests progressively adding fixed terms to the final formula: DV ~ target*group + (1 + target | participant) + (1 + target | list).

3.2. Speech-in-speech conditions

An LME model was fitted on the SRT obtained across the six speech-in-speech conditions: with group (L1 ENGLISH and L1 FRENCH), target language (L1 and L2) and masker language (L1, L2 and Lf as fixed factors). We considered similar random terms as earlier, namely random intercepts and slopes (for the effect of target language) by participants and by lists. Furthermore, we also considered by-participant random slopes for the effect of masker (which improved the final model slightly further), while the model complexity could not support by-list random slopes for the effect of masker. Each main effect and each interaction was tested by likelihood ratio tests progressively adding fixed terms to the final formula: DV ~ target*masker*group + (1 + target+masker | participant) + (1 + target | list).

4. Results

4.1. Speech-in-noise conditions

The LME analysis (whose final output is shown in Supplementary Material 2) revealed a main effect of the target language (χ2(1) = 26.4, p <.001) reflecting that SRTs were estimated at 11.1 dB lower when listening to L1 rather than L2 (as illustrated in the left-hand sides of both panels in Figure 2). There was no main effect of group (χ2(1) = 1.0, p =.311) and no interaction (χ2(1) = 0.1, p =.749). Participants performed better with L1 targets than with L2 targets, and this pattern was found equally in both groups. Given that participants in the L1 FRENCH group reported being more fluent in English relative to the L1 ENGLISH group in French (section 2.1), one might have suspected a smaller SRT difference in L1 vs L2 in the L1 FRENCH than in the L1 ENGLISH group (i.e., an interaction), but this was not the case. This was rather a confirmation that the two groups were relatively good mirror images in this task.

Figure 2. SRTs obtained across all experimental conditions, for the L1 ENGLISH group (left panel) and L1 FRENCH group (right panel).

4.2. Speech-in-speech conditions

The LME analysis (whose final output is shown in Supplementary Material 2) confirmed the main effect of target language as above (χ2(1) = 48.5, p <.001), but the size of the effect was slightly reduced: SRTs were 8.7 dB lower when listening to L1 rather than L2 (Figure 2). There was also a main effect of masker language (χ2(2) = 23.6, p <.001), a key result, suggesting that SRT was respectively 0.7 and 2.3 dB lower with an L2 and an Lf masker compared to an L1 masker. Importantly, this masker effect did not interact with target language (χ2(2) = 0.3, p =.846). To our knowledge, this has never been demonstrated before. There was no main effect of group (χ2(1) = 0.6, p =.426) and group did not interact with target (χ2(1) = 0.7, p =.402), with masker (χ2(2) = 1.7, p =.437), or in a 3-way (χ2(2) = 0.3, p =.882). To summarise, participants found the task easier when attempting to listen to sentences spoken in L1 rather than spoken in L2, and this was true for the two groups of participants (just like it was in background noise). On the other hand, participants found it most challenging to ignore the female voices speaking their L1, and least challenging when they spoke a completely foreign language. Critically, this pattern was similar whether the male target spoke in the participants’ L1 or L2, whether it was French or English, supporting the hypothesis of the listener’s familiarity with the masker language.

5. General discussion

5.1. Key finding

In this study, we used bilinguals to address a research question that had only partially been answered regarding the role of the listener’s language experience in cocktail party situations. Traditional descriptions of informational masking frame it in terms of target-to-masker similarity. For example, the word ‘dog’ spoken by a masking speaker could easily interfere with the target sentence “cats and pigs are selfish creatures” because of semantic similarity, or with the target sentence “the woodcutter searches for the missing log” because of phonetic similarity. This would occur irrespective of the additional energetic masking the word ‘dog’ may create at a given location on the target’s spectrogram. Although not explicitly stated, this conceptualisation of informational masking disregards the listener altogether. It is supposed to make no difference if the listener is a lumberjack or a veterinarian, or whether they spent the last year exploring the woods or dog-sitting. However, these experiential factors are likely to act like priming, that is pushing the listener to process speech in a certain manner and cueing them to guess a word (which could be highly degraded, embedded in noise or not even spoken yet). From this perspective, one would ideally want to redefine informational masking relative to the listener’s mind, not only relative to the similarity between the materials at play. This is the key message of this article, and we call for this redefinition because we showed that listeners experience the most difficulty in speech recognition with an L1 masker, not just with a masker that shares the same language as the target. This being said, we do not mean that target-to-masker similarity is irrelevant; it absolutely plays a role in cocktail party situations whether the similarity is phonetic or semantic in nature. In the current study, these two accounts were pitted against each other in a way that led to different predictions, but in real life, the two accounts are not mutually exclusive and certainly act together in situations of L1 targets.

5.2. Non-native maskers are weak

All participants exhibited the weakest interference with foreign language maskers. Our estimate of a 2.3-dB difference in performance between L1 and Lf maskers is in reasonably good agreement with previous reports in monolingual samples. For example, it is comparable to Rhebergen et al. (Reference Rhebergen, Versfeld and Dreschler2005) who reported a 3.0-dB difference, and Calandruccio et al. (Reference Calandruccio, Leibold and Buss2016) who reported a 2.8-dB difference in adults and a 3.0-dB difference in children for SRT against L1 or Lf maskers. To compare children to adults, this latter study used sentences from the Bamford-Kowal-Bench (BKB) Standard Sentence Test, which is based off the speech of children aged 8–15. The fact that similar effect sizes were found with very different materials (BKB database vs IEEE database here) and different age groups is a solid indication that the additional interference caused by the presence of native speech in the background is a reliable and replicable phenomenon.

Our findings also agree, though less directly, with Calandruccio et al. (Reference Calandruccio, Brouwer, Van Engen, Dhar and Bradlow2013) and Lecumberri and Cooke (Reference Lecumberri and Cooke2006). In both of these studies, results were reported in terms of percentage of keywords that participants entered correctly, not in terms of SRT. In the first one, monolingual English participants were tested on English targets under three different masking conditions: English, Dutch and Mandarin. Both Dutch and Mandarin were foreign languages, but Dutch is phonetically and grammatically similar to English, in contrast to Mandarin. Listeners obtained 20% increase in performance for Dutch relative to English maskers, and another 14% increase for Mandarin maskers. Considering that the slope of the psychometric functions underlying performance in such tasks is generally around 10% per dB in the vicinity of the inflection point (see for example Deroche et al., Reference Deroche, Limb, Chatterjee and Gracco2017b for an illustration of these estimates), this translates to roughly 2.0 dB and 3.4 dB decrease in SRT between L1 maskers and Dutch and Mandarin maskers, respectively. Thus, once again, languages that are phonetically and/or grammatically more different from L1 act as weaker maskers. Curiously, this opens up the possibility to assess how foreign different languages may be to one another, that is, demonstrating that Mandarin is perceptually (rather than lexically or syntactically) more foreign to English than Dutch is. Following this sort of reasoning here, we might speculate that Tamil is similarly foreign to English and French.

Unlike Calandruccio et al. (Reference Calandruccio, Brouwer, Van Engen, Dhar and Bradlow2013), Lecumberri and Cooke (Reference Lecumberri and Cooke2006) used bilingual participants in their experiment. They compared the performance of L1-English participants and L1-Spanish participants with L2-English targets (consonant phoneme sounds) in a variety of noise conditions, including English and Spanish speech maskers. The L1-English participants improved slightly by around 3% when the competing speech was Spanish (Lf) compared to English (L1), while L1-Spanish participants barely improved (1% difference) when competing speech was English (L2) compared to Spanish (L1). These differences between L1 and L2/Lf maskers are consistent with the direction of the present findings (and offered mirror populations, an important asset) but they were minimal in size. We speculate that it is because their materials were much simpler (consonant phonemes compared to full sentences in our study) leaving little room for informational masking to take place in the linguistic domain.

A particular note is warranted on the two studies which are perhaps closest to the current design. The observation that a situation of L1 vs Lf is easier than L1 vs L1 (experiments 1 and 3 in Brouwer et al., Reference Brouwer, Van Engen, Calandruccio and Bradlow2012; and experiments 1 and 2 in Mepham et al., Reference Mepham, Bi and Mattys2022) could be explained both by a target-to-masker similarity account or by a masker language familiarity account. However, both studies also tested bilinguals with L2 targets against L1 or L2 maskers. Surprisingly, Brouwer et al. (Reference Brouwer, Van Engen, Calandruccio and Bradlow2012) found L2 vs L2 to be more challenging than L2 vs L1, in direct contradiction to the present results, but they also acknowledged a role for the listener’s familiarity with the masker language via an indirect route (comparison of masking release across their experiments). Mepham et al. (Reference Mepham, Bi and Mattys2022)’s results were not straightforward (since the linguistic interference was captured using a difference between forward and time-reversed maskers, compared across groups), but they also acknowledged a role for the listener’s familiarity with the masker language. It is somewhat surprising that we found such a clear pattern in support of the familiarity account, while neither Brouwer et al. (Reference Brouwer, Van Engen, Calandruccio and Bradlow2012) nor Mepham et al. (Reference Mepham, Bi and Mattys2022) observed it as clearly as we did. Aside from methodological aspects,Footnote ¹ we suspect that the gender difference between target and maskers in our study was a critical parameter. Voice pitch (like spatial location) is a powerful cue to group pieces of a target utterance into a coherent stream. Without it (Brouwer et al. used the same speakers for targets and maskers) or when it is too subtle (Mepham et al. used different female speakers but their F0 was close), it may be that listeners are too confused about which voice to direct their attention to, leaving little room for the listener’s familiarity with a masker language to play its role. In other words, the task may be overloaded by the failure of grouping mechanisms based on F0 and vocal tract length. If this interpretation were correct, it would imply that observing the phenomenon of masker language familiarity is facilitated when strong grouping cues are present, making this phenomenon very valid from an ecological perspective.

5.3. Future directions

Like many of the articles discussed above, our study looked at the performance of adult participants. In the vein of Calandruccio et al. (Reference Calandruccio, Leibold and Buss2016), replicating our experimental design in children would be valuable both for developmental purposes (to better understand the role of bilingualism in development) and because children are generally known to be more prone to informational masking (e.g., Wightman, Kistler & O’Bryan, Reference Wightman, Kistler and O’Bryan2010), so perhaps the masker language familiarity account would be exacerbated in paediatric populations. This poses important challenges though because this endeavour would require modifications of task and materials to be accessible to children (e.g., simpler task, close-set, smaller vocabulary). Unfortunately, doing so will reduce informational masking (which is generally more involved with complex speech materials and open-set tasks). Also, replicating this study in-person is warranted to make sure that these findings obtained online are generalisable. Our comparison with in-person experiments (traits of self-scoring accuracy in Supplementary Material 1, from prior studies) at least supports the idea that the method remained generally valid and that this online dataset was of decent quality. However, the samples of bilinguals found online vary in a number of other factors, typically the amount of music training (see e.g. Neumann et al., Reference Neumann, Sares, Chelini and Deroche2023) uncontrolled here but which could have an impact on inhibitory processes involved in speech. Another avenue would be replicating this experiment with bilinguals whose two languages are more distinct. Though English is a Germanic language and French is a Romance language, the French language has had a large influence on the English language as a result of the French invasion of England in the 11th century (Britannica, 2021), and both are Indo-European. Perhaps, we would find more dramatic results by recruiting bilinguals whose two languages were less related, such as English and Mandarin.

6. Conclusion

Our results indicate that a background babble is the most disruptive masker when speaking in the listener’s L1, and the least disruptive masker when speaking in a language foreign to the listener. Critically, this finding was observed irrespective of the target language and not tied to the language’s identity (be it French or English). These results call for redefining the concept of informational masking in relation to the listener’s linguistic profile, not just in terms of target-to-masker similarity.

Supplementary material

To view supplementary material for this article, please visit http://doi.org/10.1017/S1366728924000944.

Data availability

The data that support the findings (and additional exploratory analyses raising the possibility of qualitative differences between balanced versus unbalanced bilinguals) are openly available in OSF at https://osf.io/2x653/.

Acknowledgements

This research was supported by a pilot grant awarded to M.D. and K.B.H. from the Center for Research on Brain, Language and Music (Research Incubator Award, reference FRQ-NT RS-203287). We wish to thank Ms. Ramiya Veluppillai for her help in making the trilingual recordings used in this study.

Competing interest

The authors declare none.

Footnotes

This research article was awarded Open Data badge for transparent practices. See the Data Availability Statement for details.

¹ Both Brouwer et al. (Reference Brouwer, Van Engen, Calandruccio and Bradlow2012) and Mepham et al. (Reference Mepham, Bi and Mattys2022) used fixed TMRs for their stimuli presentation, which meant that intelligibility of the target sentences varied between experiments. Comparing groups and experimental conditions at different points along the underlying psychometric function is potentially problematic.

References

Aubanel, V., Bayard, C., Strauß, A., & Schwartz, J. L. (2020) The Fharvard corpus: A phonemically-balanced French sentence resource for audiology and intelligibility research. Speech Communication, 124, 68–74. https://doi.org/10.1016/j.specom.2020.07.004CrossRef Google Scholar

Bidelman, G. M., & Dexter, L. (2015). Bilinguals at the “cocktail party”: Dissociable neural activity in auditory–linguistic brain regions reveals neurobiological basis for nonnative listeners’ speech-in-noise recognition deficits. Brain and Language, 143, 32–41. https://doi.org/10.1016/j.bandl.2015.02.002CrossRef Google Scholar PubMed

Bilodeau-Mercure, M., Lortie, C. L., Sato, M., Guitton, M. J., & Tremblay, P. (2015). The neurobiology of speech perception decline in aging. Brain Structure and Function, 220(2), 979–997. https://doi.org/10.1007/s00429-013-0695-3CrossRef Google Scholar PubMed

Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, speech, and signal processing, 27(2), 113–120.CrossRef Google Scholar

Bregman, A. S. (1990). Auditory scene analysis: The perceptual organization of sound. Cambridge, MA: The MIT Press https://doi.org/10.7551/mitpress/1486.001.0001.CrossRef Google Scholar

Britannica, The Editors of Encyclopaedia. (2021, December 15). Norman Conquest. Encyclopedia Britannica. https://www.britannica.com/event/Norman-Conquest Google Scholar

Broersma, M., & Scharenborg, O. (2010). Native and non-native listeners’ perception of English consonants in different types of noise. Speech Communication, 52(11), 980–995. https://doi.org/10.1016/j.specom.2010.08.010CrossRef Google Scholar

Bronkhorst, A. W. (2015). The cocktail-party problem revisited: Early processing and selection of multi-talker speech. Attention, Perception, & Psychophysics, 77(5), 1465–1487.CrossRef Google Scholar PubMed

Brouwer, S., Van Engen, K. J., Calandruccio, L., & Bradlow, A. R., (2012). Linguistic contributions to speech-on-speech masking for native and non-native listeners: Language familiarity and semantic content. The Journal of the Acoustical Society of America, 131(2), 1449–1464. https://doi.org/10.1121/1.3675943CrossRef Google Scholar PubMed

Brungart, D. S. (2001) Informational and energetic masking effects in the perception of two simultaneous talkers. The Journal of the Acoustical Society of America, 109(3), 1101–1109. https://doi.org/10.1121/1.1345696CrossRef Google Scholar PubMed

Brungart, D. S., Simpson, B. D., Ericson, M. A., & Scott, K. R. (2001). Informational and energetic masking effects in the perception of multiple simultaneous talkers. The Journal of the Acoustical Society of America, 110(5), 2527–2538. https://doi.org/10.1121/1.1408946CrossRef Google Scholar PubMed

Calandruccio, L., Brouwer, S., Van Engen, K. J., Dhar, S., & Bradlow, A. R. (2013). Masking release due to linguistic and phonetic dissimilarity between the target and masker speech. American Journal of Audiology, 22(1), 157–164. https://doi.org/10.1044/1059-0889(2013/12-0072)CrossRef Google Scholar PubMed

Calandruccio, L., Dhar, S., & Bradlow, A. R. (2010) Speech-in-speech masking with variable access to the linguistic content of the masker speech. The Journal of the Acoustical Society of America, 128(2), 860–869. https://doi.org/10.1121/1.3458857CrossRef Google Scholar

Calandruccio, L., Leibold, L. J., & Buss, E. (2016). linguistic masking release in school-age children and adults. American Journal of Audiology, 25(1), 34–40. https://doi.org/10.1044/2015_AJA-15-0053CrossRef Google Scholar PubMed

Calandruccio, L., & Zhou, H. (2014). Increase in speech recognition due to linguistic mismatch between target and masker speech: Monolingual and simultaneous bilingual performance. Journal of Speech, Language, and Hearing Research, 57(3), 1089–1097. https://doi.org/10.1044/2013_JSLHR-H-12-0378CrossRef Google Scholar PubMed

Cherry, E. C. (1953). Some Experiments on the Recognition of Speech, with One and with two ears. The Journal of the Acoustical Society of America, 25(5), 975–979. https://doi.org/10.1121/1.1907229CrossRef Google Scholar

Collin, B., & Lavandier, M. (2013). Binaural speech intelligibility in rooms with variations in spatial location of sources and modulation depth of noise interferers. The Journal of the Acoustical Society of America, 134(2), 1146–1159. https://doi.org/10.1121/1.4812248CrossRef Google Scholar PubMed

Cooke, M., Lecumberri, M. L. G., & Barker, J. (2008). The foreign language cocktail party problem: Energetic and informational masking effects in non-native speech perception. The Journal of the Acoustical Society of America 123, 414–427. https://doi.org/10.1121/1.2804952CrossRef Google Scholar PubMed

Culling, J. F., & Stone, M. A. (2017). Energetic masking and masking release. In Middlebrooks, J. C., Simon, J. Z., Popper, A. N., & Fay, R. R. (Eds.), The auditory system at the cocktail party (Vol. 60, pp. 41–73). Springer International Publishing. https://doi.org/10.1007/978-3-319-51662-2_3CrossRef Google Scholar

de Bruin, A. (2019). Not all bilinguals are the same: A call for more detailed assessments and descriptions of bilingual experiences. Behavioral Sciences,9(3), 33. https://doi.org/10.3390/bs9030033CrossRef Google Scholar

DeLuca, V., Rothman, J., Bialystok, E., & Pliatsikas, C. (2019). Redefining bilingualism as a spectrum of experiences that differentially affects brain structure and function. Proceedings of the National Academy of Sciences, 116(15), 7565–7574. https://doi.org/10.1073/pnas.1811513116CrossRef Google Scholar PubMed

Deroche, M., & Culling, J. F. (2013). Voice segregation by difference in fundamental frequency: Effect of masker type. The Journal of the Acoustical Society of America, 134(5), EL465–EL470. https://doi.org/10.1121/1.4826152CrossRef Google Scholar PubMed

Deroche, M. L. D., Culling, J. F., Lavandier, M., & Gracco, V. L. (2017a). Reverberation limits the release from informational masking obtained in the harmonic and binaural domains. Attention, Perception, & Psychophysics, 79(1), 363–379. https://doi.org/10.3758/s13414-016-1207-3CrossRef Google Scholar PubMed

Deroche, M. L. D., Limb, C. J., Chatterjee, M., & Gracco, V. L. (2017b). Similar abilities of musicians and non-musicians to segregate voices by fundamental frequency. The Journal of the Acoustical Society of America, 142(4), 1739–1755. https://doi.org/10.1121/1.5005496CrossRef Google Scholar PubMed

Durieux, C. (1990). Le foisonnement en traduction technique d’anglais en français. Meta, 35(1), 55–60. https://doi.org/10.7202/002689arCrossRef Google Scholar

Grosjean, F. (2012). Bilingualism: A short introduction. In Grosjean, F., & Li, P. (Eds.), The psycholinguistics of bilingualism (pp. 5–25). Hoboken, NJ: John Wiley & Sons.Google Scholar

Hawley, M. L., Litovsky, R. Y., & Culling, J. F. (2004). The benefit of binaural hearing in a cocktail party: Effect of location and type of interferer. The Journal of the Acoustical Society of America, 115(2), 833–843. https://doi.org/10.1121/1.1639908CrossRef Google Scholar

Institute of Electrical and Electronics Engineers. (1969). IEEE recommended practice for speech quality measurements. IEEE Transactions on Audio and Electroacoustics, 17(3), 225–246.CrossRef Google Scholar

Kidd, G., Mason, C. R., & Gallun, F. J. (2005). Combining energetic and informational masking for speech identification. The Journal of the Acoustical Society of America, 118(2), 982–992. https://doi.org/10.1121/1.1953167CrossRef Google Scholar PubMed

Kidd, G., Mason, C. R., Richards, V.M., Gallun, F. J., & Durlach, N. I. (2008). Informational masking. In Yost, W. A., Popper, A. N., & Fay, R. R. (Eds.), Auditory perception of sound sources (Vol. 29, pp. 143–189). Boston, MA: Springer US. https://doi.org/10.1007/978-0-387-71305-2_6CrossRef Google Scholar

Kilman, L., Zekveld, A., Hällgren, M., & Rönnberg, J. (2014). The influence of non-native language proficiency on speech perception performance. Frontiers in Psychology, 5. https://doi.org/10.3389/fpsyg.2014.00651CrossRef Google Scholar PubMed

Kim, J., Marton, K., Obler, L., Sekerina, I., Sradlin, L., & Valina, V. (2019). Interference control in bilingual auditory sentence processing in noise. In Bilingualism, executive function, and beyond: Questions and insights. John Benjamins Publishing Company, 103–116. https://benjamins.com/catalog/sibil.57.07kim.CrossRef Google Scholar

Leclère, T, Lavandier, M., & Deroche, M. L. D. (2017). The intelligibility of speech in a harmonic masker varying in fundamental frequency contour, broadband temporal envelope, and spatial location. Hearing Research, 350, 1–10. https://doi.org/10.1016/j.heares.2017.03.012CrossRef Google Scholar

Lecumberri, M. L.G., & Cooke, M. (2006). Effect of masker type on native and non-native consonant perception in noise. The Journal of the Acoustical Society of America, 119(4), 2445–2454. https://doi.org/10.1121/1.2180210CrossRef Google Scholar

Lecumberri, M. L.G., Cooke, M., & Cutler, A. (2010). Non-native speech perception in adverse conditions: A review, Speech Communication, 52, 864–886. https://doi.org/10.1016/j.specom.2010.08.014CrossRef Google Scholar

Linck, J. A., Osthus, P., Koeth, J.T., & Bunting, M. F. (2014). Working memory and second language comprehension and production: A meta-analysis. Psychonomic Bulletin & Review, 21(4), 861–883. https://doi.org/10.3758/s13423-013-0565-2CrossRef Google Scholar PubMed

Luk, G. (2015). Who are the bilinguals (and monolinguals)?. Bilingualism: Language and Cognition, 18(1), 35–36. https://doi.org/10.1017/S1366728914000625CrossRef Google Scholar

McDermott, J. H. (2009). The cocktail party problem. Current Biology, 19(22), R1024–R1027. https://doi.org/10.1016/j.cub.2009.09.005CrossRef Google Scholar PubMed

Melby-Lervåg, M., & Lervåg, A. (2014). Reading comprehension and its underlying components in second-language learners: A meta-analysis of studies comparing first- and second-language learners. Psychological Bulletin, 140(2), 409–433. https://doi.org/10.1037/a0033890CrossRef Google Scholar PubMed

Mepham, A., Bi, Y., Mattys, S. L. (2022) The time-course of linguistic interference during native and non-native speech-in-speech listening. J. Acoust. Soc. Am., 152 (2), 954–969. https://doi.org/10.1121/10.0013417CrossRef Google Scholar PubMed

Murphy, D. R., Daneman, M., & Schneider, B.A. (2006). Why do older adults have difficulty following conversations? Psychology and Aging, 21(1), 49–61. https://doi.org/10.1037/0882-7974.21.1.49CrossRef Google Scholar PubMed

Neumann, C., Sares, A., Chelini, E., & Deroche, M. (2023). Roles of bilingualism and musicianship in resisting semantic or prosodic interference while recognizing emotion in sentences. Bilingualism: Language and Cognition, 27, 1–15. https://doi.org/10.1017/S1366728923000573Google Scholar

Oxenham, A. J., Fligor, B. J., Mason, C.R., & Kidd, G. (2003). Informational masking and musical training. The Journal of the Acoustical Society of America, 114(3), 1543–1549. https://doi.org/10.1121/1.1598197CrossRef Google Scholar PubMed

Plomp, R., & Mimpen, A. M. (1979). Improving the reliability of testing the speech reception threshold for sentences. International Journal of Audiology, 18(1), 43–52.CrossRef Google Scholar PubMed

Rhebergen, K. S., Versfeld, N. J., and Dreschler, W. A. (2005). Release from informational masking by time reversal of native and non-native interfering speech (L). The Journal of the Acoustical Society of America, 118(3), 5.CrossRef Google Scholar

Schneider, B. A, Daneman, M., & Pichora-Fuller, M. K. (2002). Listening in aging adults: From discourse comprehension to psychoacoustics. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Expérimentale, 56(3), 139–152. https://doi.org/10.1037/h0087392CrossRef Google Scholar PubMed

Schneider, B. A., Li, L., & Daneman, M. (2007). How competing speech interferes with speech comprehension in everyday listening situations. Journal of the American Academy of Audiology, 18(07), 559–572. https://doi.org/10.3766/jaaa.18.7.4Google Scholar PubMed

Schneider, B., Speranza, F., & Pichora-Fuller, M. K., (1998). Age-related changes in temporal resolution: Envelope and intensity effects. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Expérimentale, 52(4), 184–191. https://doi.org/10.1037/h0087291CrossRef Google Scholar PubMed

Van Engen, K. J. (2010). Similarity and familiarity: Second language sentence recognition in first- and second-language multi-talker babble. Speech Communication, 52(11–12), 943–953. https://doi.org/10.1016/j.specom.2010.05.002CrossRef Google Scholar PubMed

Van Engen, K. J., & Bradlow, A. R. (2007). Sentence recognition in native- and foreign-language multi-talker background noise. The Journal of the Acoustical Society of America, 121(1), 519–526. https://doi.org/10.1121/1.2400666CrossRef Google Scholar PubMed

Wightman, F. L, Kistler, D. J., & O’Bryan, A. (2010). Individual differences and age effects in a dichotic informational masking paradigm. The Journal of the Acoustical Society of America, 128(1), 11. https://doi.org/10.1121/1.3436536CrossRef Google Scholar

Yow, W. Q., & Li, X. (2015). Balanced bilingualism and early age of second language acquisition as the underlying mechanisms of a bilingual executive control advantage: Why variations in bilingual experiences matter. Frontiers in Psychology, 6. https://doi.org/10.3389/fpsyg.2015.00164CrossRef Google Scholar

Table 1. Statistics on the participants’ fluency in L1 and L2

Figure 1. Depiction of experimental interface: (A) listening portion with instructions and masking sentences written on screen, (B) response portion and (C) self-grading portion.

Figure 2. SRTs obtained across all experimental conditions, for the L1 ENGLISH group (left panel) and L1 FRENCH group (right panel).

Lew et al. supplementary material 1

Lew et al. supplementary material

File 267.8 KB

Lew et al. supplementary material 2

Lew et al. supplementary material

File 179.6 KB

Article contents

Navigating the bilingual cocktail party: a critical role for listeners’ L1 in the linguistic aspect of informational masking

Abstract

Keywords

Highlights

1. Introduction

1.1. Energetic and informational masking

1.2. The masking of competing languages

1.3. Bilingual studies

1.4. Hypothesis

2. Method

2.1. Participants

2.2. Stimuli

2.3. Design and protocol

2.4. Equipment

3. Data analysis

3.1. Speech-in-noise conditions

3.2. Speech-in-speech conditions

4. Results

4.1. Speech-in-noise conditions

4.2. Speech-in-speech conditions

5. General discussion

5.1. Key finding

5.2. Non-native maskers are weak

5.3. Future directions

6. Conclusion

Supplementary material

Data availability

Acknowledgements

Competing interest

Footnotes

References

Lew et al. supplementary material 1

Lew et al. supplementary material 2

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests