Listening like a native: Unprofitable procedures need to be discarded

Laurence Bruggeman; Anne Cutler

doi:10.1017/S1366728923000305

Listening like a native: Unprofitable procedures need to be discarded

Published online by Cambridge University Press: 29 May 2023

Laurence Bruggeman

and

Anne Cutler

Show author details

Laurence Bruggeman*: Affiliation:
The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, Australia ARC Centre of Excellence for the Dynamics of Language, Western Sydney University, Penrith, Australia
Anne Cutler: Affiliation:
The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, Australia ARC Centre of Excellence for the Dynamics of Language, Western Sydney University, Penrith, Australia Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
*: Corresponding author: Laurence Bruggeman; Email: l.bruggeman@westernsydney.edu.au

Article contents

Abstract
Introduction
Experiment I – use of suprasegmental stress cues in L2 listening
Experiment II – validation of the Dutch materials with L1 controls
Experiment III – use of suprasegmental stress cues in L1 listening
General discussion
Competing interest declaration
Footnotes
References

Rights & Permissions

Abstract

Two languages, historically related, both have lexical stress, with word stress distinctions signalled in each by the same suprasegmental cues. In each language, words can overlap segmentally but differ in placement of primary versus secondary stress (OCtopus, ocTOber). However, secondary stress occurs more often in the words of one language, Dutch, than in the other, English, and largely because of this, Dutch listeners find it helpful to use suprasegmental stress cues when recognising spoken words. English listeners, in contrast, do not; indeed, Dutch listeners can outdo English listeners in correctly identifying the source words of English word fragments (oc-). Here we show that Dutch-native listeners who reside in an English-speaking environment and have become dominant in English, though still maintaining their use of these stress cues in their L1, ignore the same cues in their L2 English, performing as poorly in the fragment identification task as the L1 English do.

Keywords

lexical stress suprasegmentals bilingualism emigrants dominance

Information

Type: Research Article
Information: Bilingualism: Language and Cognition , Volume 26 , Issue 5 , November 2023 , pp. 1093 - 1102

DOI: https://doi.org/10.1017/S1366728923000305 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: Copyright © The Author(s), 2023. Published by Cambridge University Press

Introduction

The efficiency of listening to speech is based on our ability to adjust the processing mechanisms involved to ensure that they function optimally in the language under use. Different languages deploy different acoustic cues to distinguish between phonemes and hence between spoken words, and listeners learn to process speech in the most efficient manner; together these situations produce language-specific listening, with native users of each language listening in a way that is tailored to the particular properties of the language they have been exposed to (L1; Cutler, Reference Cutler2012).

As a result, the cues used during speech processing can differ from one listener group to another. This can even hold true with two languages that are historically closely related and in which many structural features are highly similar, such as Dutch and English. These two Germanic languages have broadly comparable syntactic and phonological systems. For instance, both languages use lexical stress, and as a result, the syllables of the words of each language differ in the way they are realised suprasegmentally (i.e., in the syllable's duration, and the intensity and the fundamental frequency producing its vocalic portion). The placement of primary stress in English and Dutch is not rule-governed. Stress may fall at any word position (PRImary, poSItion, fundaMENtal, withIN; upper case letters indicate primary stress); also, in every word there is one and only one syllable that may bear primary stress. In both languages, the location of stress within a word may shift under influence of sentence rhythm (e.g., thirTEEN becomes THIRteen when it is followed by MEN; Gussenhoven, Reference Gussenhoven1983; Kager & Visch, Reference Kager and Visch1985; Liberman & Prince, Reference Liberman and Prince1977). Nonetheless, research in recent years has shown Dutch and English to differ quite reliably in the way their listeners handle the various kinds of phonetic cue available for identifying spoken words, with the markers of lexical stress playing the leading role (e.g., Cooper et al. Reference Cooper, Cutler and Wales2002; Tremblay et al., Reference Tremblay, Broersma, Zeng, Kim, Lee and Shin2021).

In English, lexical stress is cued suprasegmentally by duration, intensity and pitch. The most important cue to lexical stress, however, occurs at the segmental level and is provided by the quality of the vowel (e.g., Chrabaszcz et al., Reference Chrabaszcz, Winn, Lin and Idsardi2014; Lin et al., Reference Lin, Wang, Idsardi and Xu2014; Zhang & Francis, Reference Zhang and Francis2010): stressed syllables always contain a full vowel, but vowels in unstressed syllables are frequently reduced towards schwa (Fourakis, Reference Fourakis1991), so that minimal pairs such as PREsent (noun) and preSENT (verb) are segmentally and suprasegmentally distinct. In Dutch, reduction of vowels in unstressed syllables occurs much less frequently than in English (Sluijter & van Heuven, Reference Sluijter and van Heuven1996), leaving duration, intensity and pitch as the most important acoustic correlates to lexical stress (van Heuven & de Jonge, Reference van Heuven and de Jonge2011). In contrast to other studies that have compared listeners’ weighting of all acoustic cues to stress (i.e., both segmental and suprasegmental cues), in the present study we focus specifically on listeners’ use of suprasegmental cues to lexical stress only.

In principle, stress in both English and Dutch can be contrastive and serve to distinguish between segmentally identical word pairs such as INsight and inCITE – although in fact such minimal pairs are rare in all stress languages (Cutler & Jesse, Reference Cutler, Jesse, Pardo, Nygaard, Remez and Pisoni2021), with neither English nor Dutch defying this rule. What is particularly useful about such minimal stress pairs, of course, is how well they show the availability of the suprasegmental cues for listeners. Figure 1 shows the English pair PERvert (noun)/perVERT (verb), and the Dutch pair VOORnaam (noun: “first name”)/voorNAAM (adjective: “respectable”). In duration, amplitude and pitch, each primary-stressed syllable clearly outdoes its segmentally matched but suprasegmentally mismatched companion.

Figure 1. Waveforms and spectrograms for the English minimal stress pair PERvert – perVERT (left) and the Dutch minimal stress pair VOORnaam – voorNAAM (right). Blue lines represent pitch contours.

Even without many such minimal pairs, the simple fact that stress patterns vary from word to word should make suprasegmental stress cues useful for listeners engaged in spoken-word recognition. Word pairs with segmentally identical first syllables, such as PRImary versus priMEval, or OCtopus versus ocTOber, could surely be distinguished more rapidly if a listener takes the stress cues into consideration as well as the segmental differences later in the word.

Indeed, there is evidence that Dutch listeners use them very efficiently. An early demonstration of this (van Heuven, Reference van Heuven1988) used a gating task and sentences in which both words from a pair with versus without initial primary stress were equally plausible (e.g., ORgel, “organ”, versus orKEST, “orchestra”). Listeners heard these sentences truncated so that only a short fragment of the final word was audible, and had to guess which word it was; 76% of their guesses from just the initial vowel were correct, and this could only have been due to use of the suprasegmental differences. Other Dutch listeners in a similar study using minimal stress pairs also achieved high correct identifications of the source word (this time in 86% of cases; Cutler & van Donselaar, Reference Cutler and van Donselaar2001).

Although both of these results are from ‘offline’ tasks (with decision responses collected after speech processing has concluded), they certainly indicate that Dutch listeners exploit not only segmental but also suprasegmental information. Investigations using ‘online’ tasks measuring processing speed confirmed these findings. In a priming task with minimal stress pairs Dutch listeners were quicker to accept words primed with their initial syllable only if the prime had the correct suprasegmental cues (Cutler & van Donselaar, Reference Cutler and van Donselaar2001), and quicker to accept a visually presented word when it was primed by a spoken bisyllabic fragment of the same word as long as, again, the suprasegmental cues were correct (van Donselaar et al. Reference van Donselaar, Koster and Cutler2005). Likewise, incorrectly applied stress patterns proved to affect word recognition in Dutch, in that mis-stressing impeded word recognition (Koster & Cutler, Reference Koster and Cutler1997; van Leyden & van Heuven, Reference van Leyden, van Heuven, Cremers and den Dikken1996). Clearly, suprasegmental stress cues aid listeners of Dutch to quickly distinguish between differently stressed Dutch words.

Figure 1 suggests that the strength of suprasegmental cues to lexical stress in spoken English words is no less than that of Dutch words. It is thus on the face of it surprising that the Dutch results above have no match in English. Mis-stressing does not prevent English word recognition in noise as long as vowels are intact (Slowiaczek, Reference Slowiaczek1990), and it fails to affect the speed with which English words are recognised (Small et al. Reference Small, Simon and Goldberg1988), the acceptability of spoken words in sentences (Slowiaczek, Reference Slowiaczek1991) or the judged naturalness of spoken words (Fear et al. Reference Fear, Cutler and Butterfield1995). In English, minimal stress pairs even prime each other's associates (Cutler, Reference Cutler1986). As for the fragment priming results from Dutch, these too do not replicate in English; segmental overlap does prime matching word forms, but whether the segments are accompanied by matching suprasegmental features as well makes no difference to listeners’ responses (Cooper et al., Reference Cooper, Cutler and Wales2002; Experiment 1a; Fear et al., Reference Fear, Cutler and Butterfield1995; Small et al., Reference Small, Simon and Goldberg1988). Native listeners of Dutch and English thus appear to differ in the extent to which they exploit suprasegmental stress cues during spoken-word recognition, despite the similarity between the two languages and their close relatedness. In both languages, the information is there in the signal; in one language, the information is used, in the other it is not. As proposed by Cooper et al. (Reference Cooper, Cutler and Wales2002), listeners’ use (or otherwise) of suprasegmental stress information depends on whether it is useful. That, in turn, depends on the structure of the lexicon (Cutler & Pasveer, Reference Cutler, Pasveer, Hoffmann and Mixdorff2006).

The vocabularies of English and Dutch differ in the distribution and the frequency of occurrence of speech fragments that are ambiguous on a segmental level yet can be disambiguated when suprasegmental stress patterns are taken into account. In English, such fragments occur relatively infrequently, since the vowel in a syllable which itself is preceded by a stressed syllable is frequently reduced, leading to a pair of segmentally differing rather than a pair of segmentally identical syllables. English listeners are therefore not confronted with segmental ambiguity at all; the first two syllables of words such as ocTOber (with a stressed and therefore full vowel in the second syllable) and OCtopus (with a reduced vowel in the second syllable) can be disambiguated on segmental differences alone. There is no additional information to be gained by taking suprasegmental stress cues into account.

The Dutch lexicon, on the other hand, contains many words of three syllables or more that have full vowels in the first two syllables, and as a result, many pairs that are temporarily ambiguous (such as okTOber and OKtopus). For Dutch listeners, the use of suprasegmental stress cues is thus efficient, indeed essential, as it provides disambiguating information that is not available on a segmental level. The vocabulary asymmetry results in native speakers of English and Dutch developing differently weighted models of segmental and suprasegmental information and, in consequence, quite different listening strategies. In both languages, the suprasegmental information is there in the signal; but whether listeners use it depends on whether it is useful in speeding the recognition of their words. The asymmetry in this case of otherwise highly similar languages simply reflects the efficiency of the speech processing system.

The question at issue in the present study is what consequences the asymmetry may have for those who fully command both languages. Previous research on lexical stress has shown that listeners’ use of acoustic cues to lexical stress in a second language (L2) is strongly influenced by their use of these cues in the L1 (e.g., Choi, Reference Choi2022; Cooper et al., Reference Cooper, Cutler and Wales2002; Dupoux et al., Reference Dupoux, Sebastián-Gallés, Navarrete and Peperkamp2008; Kim & Tremblay, Reference Kim and Tremblay2021; Qin et al. Reference Qin, Chien and Tremblay2017; Tremblay et al., Reference Tremblay, Broersma, Zeng, Kim, Lee and Shin2021). While some listeners of languages without lexical stress may struggle to perceive English lexical stress (e.g., Lin et al., Reference Lin, Wang, Idsardi and Xu2014), others may be able to perceive it by exploiting acoustic cues that they rely on for other aspects of lexical access in their native language. For instance, Cantonese listeners, experienced in the use of F0 as a cue to lexical tones, and listeners of Gyongsang-Korean, a dialect with lexical pitch accents, can both successfully discriminate minimal stress pair words in the L2 English despite the lack of lexical stress in their native language (Choi et al. Reference Choi, Tong and Samuel2019; Kim & Tremblay, Reference Kim and Tremblay2021). Listeners whose L1 does have lexical stress tend to transfer their cue use from the L1 to the L2, leading to non-native-like stress perception (Cooper et al., Reference Cooper, Cutler and Wales2002; Cutler, Reference Cutler2009; Tremblay et al., Reference Tremblay, Broersma, Zeng, Kim, Lee and Shin2021). Dutch listeners presented with segmentally identical but suprasegmentally distinct word fragments (such as oc-/OC) in their L2, English, actually outdo native listeners in their ability to correctly classify the source word (Cooper et al., Reference Cooper, Cutler and Wales2002; Cutler, Reference Cutler2009). Thus, when they process L2 speech they draw upon skills induced by their L1 which are not in the possession of L1 listeners to English whose previous English input of course has not induced any such skills. But with substantial experience in the same L2, might Dutch-native listeners learn to listen like the English do, and ignore those features which are useful in their L1 but actually are not appropriate for their L2? Of particular interest then is the kind of learning involved. With a few notable exceptions (e.g., Tremblay & Spinelli, Reference Tremblay and Spinelli2014; Weber & Cutler, Reference Weber and Cutler2006), existing studies of phonological structure in L2 listening have tended to focus on the acquisition of L2-appropriate strategies, but the present question amounts to whether L2 listeners can learn that their perceptual performance could be improved by dropping an L1 strategy.

The appropriate population for such a question is one immersed in an L2 environment and predominantly using the L2 in daily life. Our study involves a population of native Dutch-speaking emigrants in Australia. Dutch emigrants tend to quickly adopt the language of their new environment (Clyne & Pauwels, Reference Clyne, Pauwels, J. and S.1997), with the result that Dutch emigrants in Australia typically use English, their L2, for everyday communication. In Experiment I, these Dutch emigrants living in Australia completed a replication of Experiment 3 from Cooper et al.'s (Reference Cooper, Cutler and Wales2002) study. If the emigrants exploit suprasegmental cues to lexical stress in English, their accuracy is predicted to be high and resemble that of the Dutch L2 listeners in the original study by Cooper and colleagues. If, on the other hand, the emigrants have stopped using suprasegmental stress cues as they are not useful for the L2, accuracy is predicted to be lower than that of Cooper et al.'s Dutch listeners and more similar to the accuracy of the English L1 listeners in that same experiment. Experiment II aimed to establish the validity of new Dutch stimulus materials that we constructed in parallel to the English stimuli from Cooper et al. (Reference Cooper, Cutler and Wales2002), and was conducted with native Dutch listeners in the Netherlands; Experiment III then used these new materials to examine the L1 identification accuracy available to the same group of Dutch emigrants who had completed Experiment I.

Experiment I – use of suprasegmental stress cues in L2 listening

Method

Participants

Twenty-four participants were recruited from the Dutch emigrant community in the wider Sydney area (aged 27–73 years, M = 48.8, SD = 14.9; 14 females). All participants were native speakers of Dutch, who grew up in the Netherlands and had migrated to Australia as adults (mean age at migration: 28.4 years, SD = 7.7, range: 18–52). Their mean length of residence in Australia was 20.5 years (SD = 15.2). Participants were highly proficient in their L2, English, as indicated by their mean score of 93.6 (SD = 5.3) on the Lexical Test for Advanced Learners of English (LexTALE; Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012). To measure their frequency of L1 and L2 use, the question “Please indicate to what extent you use Dutch and English in the situations listed” was included as part of a background questionnaire participants completed prior to the start of the experiment. All participants reported using the L2, English, more frequently than the L1, Dutch, which was mostly restricted to use with family members. See Appendix S1 (Supplementary Materials) for the full list of situations and a tally of responses to this question. No participant reported any hearing problems. All participants provided written informed consent prior to the start of the experiment and were paid for their participation.

Materials

Stimulus materials were taken from Experiment 3 of Cooper et al. (Reference Cooper, Cutler and Wales2002) and consisted of truncated recordings of 21 pairs of English words, spoken by a male native speaker of Australian English (see Appendix S2, Supplementary Materials). Words in each pair differed in their stress pattern, so that in each case one word had primary stress on the first syllable (e.g., RObot), while primary stress for the other word fell on the second syllable (e.g., roBUST). To ensure the truncated words in each pair were segmentally the same and differed only suprasegmentally, the first syllable of all words always contained a full vowel. Mean log word frequencies in the CELEX lexical database of English (Baayen et al. Reference Baayen, Piepenbrock and Gulikers1995), as reported by Cooper et al. (Reference Cooper, Cutler and Wales2002), were 2.18 for first-syllable stress words, and 1.88 for second-syllable stress words. Each word was truncated at the end of the first syllable and had been recorded twice, resulting in a total of 84 spoken word fragments, that were each presented twice (making 168 trials). Mean durational, F0 and amplitude measures for the syllable fragments are shown in Table 1, averaged across all fragments with the same stress type. All measures were computed over the voiced portion of a fragment only, with the exception of duration, which was measured over the entire fragment. In conformity with the study by Cooper et al., different pseudo-randomised stimulus lists were created for all participants, and fragments from the same word pair never occurred in successive trials.

Table 1. Mean values on six acoustic measures of the stimuli of Experiment I. Values were averaged across all fragments from source words with first-syllable (left) or second-syllable stress (right).

Procedure

Participants were tested individually in a sound-attenuated booth. Auditory stimuli were presented over Beyerdynamic DT770 PRO headphones at a comfortable sound level, kept constant for all participants. Instructions in English were displayed on the computer screen and were repeated and clarified orally (in Dutch) by the experimenter. Participants were instructed to listen carefully to each word fragment and decide whether the fragment they heard formed the beginning of the word displayed on the left of the screen or of that on the right. The screen position (left or right) of the word that was the correct response was counterbalanced across presentations of the same word fragment. At the start of each trial, the response words were displayed on the computer screen for a preview period of 2000 ms. The truncated word fragment was then played and participants gave their response. There was no time-out period and the next trial started 500 ms after a response was received. Participants responded using the shift keys, pressing the left shift key to select the word printed on the left of the screen and the right shift key to choose the word printed on the right. Upon completion of the experiment, participants completed the English version of the LexTALE (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012) to assess their English proficiency.

Results and discussion

One trial had a response time of less than 100 ms and was therefore excluded from all analyses reported below. The results of the remaining trials are displayed in Figure 2a. For comparison, Figure 2b contains the mean results of both the English and Dutch listener groups tested by Cooper et al. (Reference Cooper, Cutler and Wales2002; henceforth referred to as L1 controls and L2 controls, respectively). Overall, the emigrants correctly identified the source word for 61.9% of truncated fragments. They assigned fragments more accurately to their source words when they had first-syllable stress (72.3%) than when they originated from words with second-syllable stress (51.5%). This asymmetry may be the result of the fact that listeners selected the response option with first-syllable stress more often than the other option. Indeed, in 60.4% of all trials, participants judged a word with first-syllable stress to be the source of the fragment they had heard, and this percentage is very similar to the first-syllable-stress judgments on these same materials made by the L2 (58.5%) and L1 listeners (62.9%) of Cooper et al. (Reference Cooper, Cutler and Wales2002). This bias towards words with first-syllable stress may reflect differences in word frequency (of the source words used in the present experiment, those with first-syllable stress had higher word frequencies than those with second-syllable stress), in acoustic clarity (syllables with primary stress tend to be articulated more precisely; Scarborough et al., Reference Scarborough, Keating, Mattys, Cho and Alwan2009), and/or in the lexical statistics of stress patterns (first-syllable stress is the most frequently occurring stress pattern in English; Clopper, Reference Clopper2002; Cutler & Carter, Reference Cutler and Carter1987).

Figure 2. Mean percentage of correct responses from Experiment I (panel A) and from Cooper et al. (Reference Cooper, Cutler and Wales2002; panel B). Error bars represent standard errors.

The emigrants’ overall identification accuracy was statistically compared to that of the L1 (mean accuracy = 59.2%) and the L2 controls (mean accuracy = 72.3%) from Experiment 3 of Cooper et al.'s (Reference Cooper, Cutler and Wales2002) study by fitting a generalised linear mixed-effects model to the combined data from the study by Cooper et al. and the present experiment. This was done in R (R Core Team, 2019), using family ‘binomial’ and the logit-link function from the lme4 package (Bates et al., Reference Bates, Maechler, Bolker and Walker2015). Listener group (emigrants, L2 controls, L1 controls) was entered into the model as a fixed categorical predictor. This predictor was coded using Helmert contrasts, such that the beta value of Group1 represents the difference between the mean of the L1 listeners on one hand, and that of both groups of L2 listeners combined on the other, whereas the beta value of Group2 represents the difference between the means of those latter two groups (see Table 2 for the contrast matrix). Random intercepts were added to the model for participants and items. Results of the model fit are displayed in Table 3, and showed significant effects of Group1 and Group2. Post-hoc analyses with Tukey-adjusted α-levels were conducted with the emmeans package (Lenth, Reference Lenth2019) and revealed that the emigrants’ accuracy was significantly different from that of the L2 controls (p < .001) but not from the accuracy of the L1 controls (p = .58). This suggests that the emigrants no longer use suprasegmental stress cues to the same extent as their compatriots who remained in the Netherlands.

Table 2. Helmert contrast coding for the predictor Listener group.

Table 3. Results of the generalised linear mixed-effects model on the responses of Experiment I and of Experiment 3 from Cooper et al. (Reference Cooper, Cutler and Wales2002).

We then compared the emigrants’ response accuracy to chance level (i.e., 50%) with a two-sided binomial test. Since the aforementioned bias towards first-syllable-stress responses prevents a meaningful interpretation of participants’ accuracy for fragments with this stress pattern, this comparison was only carried out with participants’ judgments for items with second-syllable stress (cf. Cooper et al., Reference Cooper, Cutler and Wales2002). While the L2 controls had performed significantly better than chance, this was not the case for the emigrants, who performed neither better nor worse than chance level (z = 1.34, p = .181).

In sum, the results from this experiment clearly show that the Dutch emigrants do not exploit suprasegmental information to the same extent as Dutch L2 listeners living in the Netherlands, and that their use of this information is more in line with that of English L1 listeners. This indicates that after an extended period of daily L2 use, the emigrants have learned the properties of the English lexicon and adjusted the way they listen accordingly to optimise processing efficiency. This finding can be interpreted in different ways. On one hand, the emigrants may have expanded their strategy repertoire to include not only an L1-specific way of suprasegmental cue use, but also an extra, L2-specific way. Alternatively, under influence of their L2, the emigrants may have lost the L1-specific ability to exploit suprasegmental cues in favour of a new strategy that is more efficient for the L2, essentially replacing one strategy with another. Under this interpretation, the emigrants would only have the new L2-specific strategy at their disposal, even when listening to their L1.

To determine which of these two interpretations is the most likely, we decided to examine the emigrants’ use of suprasegmental stress cues in their L1, Dutch. However, in contrast to Experiment I, for which stimuli and control data were readily available from the literature, no suitable stimuli nor pre-existing control data were available for this Dutch experiment. Previous studies of suprasegmental stress cue use in Dutch (e.g., Cutler & van Donselaar, Reference Cutler and van Donselaar2001; Donselaar et al., Reference van Donselaar, Koster and Cutler2005; van Heuven, Reference van Heuven1988) could not provide a direct comparison as they had used paradigms that differed from the present study. Therefore, our new stimulus materials were first tested with a group of Dutch L1 listeners living in the Netherlands (Experiment II), before the emigrants’ use of suprasegmental stress cues in Dutch was assessed using the same materials (Experiment III).