Hostname: page-component-78c5997874-v9fdk Total loading time: 0 Render date: 2024-11-10T10:57:05.531Z Has data issue: false hasContentIssue false

Behavioral and ERP evidence of differences in pitch feedback control in late bilinguals’ L1 and L2 speech production

Published online by Cambridge University Press:  23 February 2023

Xiao Cai
Affiliation:
Department of Psychology, Renmin University of China, Beijing, China School of Foreign Languages, Renmin University of China, Beijing, China
Yulong Yin
Affiliation:
Department of Psychology, Renmin University of China, Beijing, China School of Psychology, Northwest Normal University, Lanzhou, China
Qingfang Zhang*
Affiliation:
Department of Psychology, Renmin University of China, Beijing, China
*
Address for correspondence: Qingfang Zhang 59 Zhongguancun Street, Haidian District, Beijing 100872, PR China E-mail: qingfang.zhang@ruc.edu.cn
Rights & Permissions [Opens in a new window]

Abstract

This study compared late bilinguals’ pitch feedback control in L1 and L2 production using a frequency-altered feedback paradigm in which participants read target words while presented with unexpected pitch-shift in their voice feedback. Variables of language (L1 or L2) and perturbation magnitudes (0, 100, 200, or 400 cents) were manipulated. Behaviorally, participants produced larger magnitudes but longer latencies of vocal compensation in L2 than in L1 production, suggesting that L2 pitch feedback control has greater importance but lower efficiency. Event-related potential findings demonstrated that 400-cent shifts elicited greater N1 amplitudes than those in the 0-cent baseline condition in L1 production. This difference was non-significant in L2 production, implying different neural processing of unaltered feedback and externally-generated feedback in L1 and L2 production. Participants’ vocal compensation and P2 amplitudes were similarly modulated by pitch-shift in L1 and L2 production, implying a similar gating mechanism to correct internal and external errors.

Type
Research Article
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press

1. Introduction

Auditory feedback has consistently attracted attention in the speech motor control literature because of its role in guiding vocal output. Online auditory feedback control has been formulated in the state feedback control model as follows (Hickok, Reference Hickok2012; Hickok, Houde & Rong, Reference Hickok, Houde and Rong2011; Houde & Nagarajan, Reference Houde and Nagarajan2011). A motor command is sent to the motor system, while a copy of the issued command (i.e., efference copy) is sent to the auditory system. Based on this efference copy, the internal forward model predicts an articulatory outcome before actual auditory feedback is available (Houde & Nagarajan, Reference Houde and Nagarajan2011; Houde, Kort, Niziolek, Chang & Nagarajan, Reference Houde, Kort, Niziolek, Chang and Nagarajan2013). Then, speakers compare the perceived and predicted feedback and make feedback-based motor corrections if a mismatch exists. Thus, auditory feedback control is responsible for producing accurate speech.

Recent decades have seen an upsurge in the number of bilinguals whose first language (L1) and second language (L2) are acquired at different stages (van Hell & Tanner, Reference Van Hell. and Tanner2012). Especially for late bilinguals, L2 is acquired much later after L1 has already been established (Liu & Tian, Reference Liu and Tian2018). Native speakers of a language have established internal representations of speech sounds, including acoustic features and articulatory motor commands. When bilinguals learn to speak a new language, they must perceive new sound targets, master new motor commands, and establish new sensory-motor mappings to achieve the desired speech output (Ning, Shih & Loucks, Reference Ning, Shih and Loucks2014). Typically, an accent would distinguish whether late bilinguals are speaking their native L1 or later-acquired L2 (Simmonds, Wise & Leech, Reference Simmonds, Wise and Leech2011b). Nevertheless, research into L2 production has been relatively insufficient concerning the sensorimotor aspects of bilingual speech production. This study aims to compare the neurocognitive mechanism of auditory feedback control in late bilinguals’ L1 and L2 production from the perspective of sensorimotor control.

1.1 Auditory feedback control in late bilinguals

Auditory feedback control comprises four successive stages: (1) detecting an error by comparing the actual feedback to the auditory target, (2) computing the necessary corrective command, (3) transmitting the corrective command to the muscles, and (4) contracting the muscles to correct the movement trajectory (Perkell, Reference Perkell2012). In bilingualism, L1 and L2 are hypothesized to use the same sensorimotor control systems (Simmonds et al., Reference Simmonds, Wise and Leech2011b). However, according to the critical period hypothesis, a specific time period exists in which only languages learned in prepuberty can be mastered to native proficiency (Bylund, Hyltenstam & Abrahamsson, Reference Bylund, Hyltenstam and Abrahamsson2021; Lenneberg, Reference Lenneberg1967). For bilinguals who acquire L2 later in life, after their L1 has already been highly developed, this late age of acquisition (AoA) of L2 limits the motor and auditory systems’ neural plasticity (DeKeyser, Reference DeKeyser2013). Late bilinguals face challenges in perceiving and parsing incoming acoustic signals, matching perceived phonemes to auditory targets, transforming phonological targets into motor targets, selecting and executing appropriate speech plans, and relaying auditory feedback to correct online speech errors (Hickok, Reference Hickok2012). Further, late bilinguals encounter difficulties in phonetic and articulatory aspects; specifically, L2 sounds are produced less precisely with a foreign accent (Reiterer et al., Reference Reiterer, Hu, Erb, Rota, Nardo, Grodd, Winkler and Ackermann2011) and L2 vowels are produced less stably with more variable first and second formant frequency values (Wang & van Heuven, Reference Wang and van Heuven2006).

Given the problem of inaccurate L2 speaking, researchers have realized that late bilinguals may show differences between L1 and L2 production in how they integrate auditory feedback to modify articulatory errors (Liu & Tian, Reference Liu and Tian2018; Mitsuya, MacDonald, Purcell & Munhall, Reference Mitsuya, MacDonald, Purcell and Munhall2011; Ning et al., Reference Ning, Shih and Loucks2014; Ning, Loucks & Shih, Reference Ning, Loucks and Shih2015; Simmonds et al., Reference Simmonds, Wise and Leech2011b). The first difference is embodied in the reliance on auditory feedback for online speech production. Language development is one factor influencing the relative weighting of feedback and feedforward control for fluent speech production (Civier, Tasko & Guenther, Reference Civier, Tasko and Guenther2010; Guenther, Reference Guenther2016; Perkell, Reference Perkell2012). Specifically, language development is associated with increased reliability of internal sensorimotor representations, and accordingly, decreased reliance on external auditory feedback. Several studies have provided evidence supporting that L2 production in late bilinguals is characterized by greater reliance on auditory feedback control relative to L1 production (Cai, Yin & Zhang, Reference Cai, Yin and Zhang2020, Reference Cai, Yin and Zhang2021; Ning et al., Reference Ning, Shih and Loucks2014, Reference Ning, Loucks and Shih2015). For example, Ning et al. (Reference Ning, Loucks and Shih2015) found that, among native and L2 Mandarin speakers, L2 learners exhibited greater motor adjustment in response to the same amount of pitch errors perceived in auditory feedback. Further, our recent study compared L1 and L2 intensity control in late Chinese–English bilinguals and found larger intensity increases to unexpected masking noise in L2 than L1 production (Cai et al., Reference Cai, Yin and Zhang2021). The magnitude of feedback-based motor adjustment provides information on an individual's reliance on auditory feedback control. Larger magnitudes are hypothesized to reflect increased reliance on auditory feedback, as the individual is closely monitoring their auditory feedback during online speaking and is more likely to respond to a perceived error (Murray & Stepp, Reference Murray and Stepp2020). On the contrary, the individual who relies more on pre-determined feedforward commands, needn't closely monitor their feedback during online speaking and is less likely to be affected by deviated auditory feedback. Thus, these previous findings suggest that late bilinguals may not achieve native-like motor control, resulting in more reliance on auditory feedback during online L2 speaking.

The second difference relates to the speed of initiating feedback-based motor correction. Previous studies found that during the L1 acquisition process, both compensatory response latencies and P1-N1 latencies decreased along with increases’ in participants’ age (Liu et al., Reference Liu, Chen, Jones, Wang, Chen, Huang and Liu2013; Scheerer, Behich, Liu & Jones, Reference Scheerer, Behich, Liu and Jones2013a). These findings provided a link between information processing efficiency in the cortical areas supporting sensorimotor integration and the developmental trajectory of auditory feedback control (Coughler, de Launay, Purcell, Cardy & Beal, Reference Coughler, de Launay, Purcell, Cardy and Beal2022). Specifically, speakers develop more efficient neural pathways within feedback control networks as language learning develops. This developmental trend might also hold true for late bilinguals who have mastered two languages at different stages, because L2 is inherently less developed than L1, given the language learning/production experience (Simmonds et al., Reference Simmonds, Wise and Leech2011b). One behavioral study (Cai et al., Reference Cai, Yin and Zhang2021) provided support for this hypothesis by showing that, in response to unexpected masking noise, late Chinese–English bilinguals presented later onset latencies of intensity increases in L2 compared with L1 production. Altogether, findings of increased efficiency related to language development suggest that late bilinguals may elicit slower pitch feedback processing during online speaking in their L2.

A few studies have provided evidence that language status (L1 vs. L2) affects late bilinguals’ working mechanism of auditory feedback control (i.e., relative weighting of the feedback control system and speed of feedback processing). However, these studies have been behavioral in nature, and only provide information on the final product of vocal responses to perturbed feedback (Coughler et al., Reference Coughler, de Launay, Purcell, Cardy and Beal2022). Thus, the spatiotemporal brain mechanisms underlying language-specific auditory feedback control in late bilinguals remain unknown. In response to this, the current study further compared the neurocognitive mechanisms of L1 and L2 auditory feedback control in late bilinguals using more sensitive measurements.

1.2 Neurocognitive mechanism of pitch feedback control

Pitch, the perceptual correlate of fundamental frequency (F 0), is associated with the vocal control responsible for accommodating vocal settings of the respiratory, laryngeal, and supraglottal systems (Perkell, Matthies, Lane, Guenther, Wilhelms-Tricarico, Wozniak & Guiod, Reference Perkell, Matthies, Lane, Guenther, Wilhelms-Tricarico, Wozniak and Guiod1997). F 0 refers to the positioning and frequency of vocal fold vibrations and is determined by vocal fold length and tension (Zhang, Reference Zhang2016). For speakers of tonal languages such as Chinese, lexical tones must be produced to distinguish otherwise phonologically identical words (Xu, Larson, Bauer & Hain, Reference Xu, Larson, Bauer and Hain2004). In non-tonal languages such as English, F 0 increases for stressed syllables and at the end of a phrase or sentence to indicate a question (Xu & Xu, Reference Xu and Xu2005). In some neurologically-based voice disorders, voice F 0 is often abnormal and interferes with communication (Demopoulos et al., Reference Demopoulos, Kothare, Mizuiri, Henderson-Sabes, Fregeau, Tjernagel, Houde, Sherr and Nagarajan2018; Scheerer, Jones & Iarocci, Reference Scheerer, Jones and Iarocci2020b). Thus, understanding the mechanism of F 0 control during speech is important for successful human communication (Coughler et al., Reference Coughler, de Launay, Purcell, Cardy and Beal2022).

The frequency-altered feedback (FAF) paradigm is a method frequently used to examine auditory feedback control, in which participants hear their pitch feedback shift (up or down) unexpectedly when speaking (Elman, Reference Elman1981). This paradigm purely taps into feedback-based error monitoring and motor correction, instead of (pre-determined) feedforward motor plans (Parrell & Houde, Reference Parrell and Houde2019). Generally, speakers adjust their pitch in the opposite direction of perturbation approximately 100–200 ms post-perturbation onset (Burnett, Senner & Larson, Reference Burnett, Senner and Larson1997; Burnett, Freedland, Larson & Hain, Reference Burnett, Freedland, Larson and Hain1998; Chang, Niziolek, Knight, Nagarajan & Houde, Reference Chang, Niziolek, Knight, Nagarajan and Houde2013; Liu, Meshman, Behroozmand & Larson, Reference Liu, Meshman, Behroozmand and Larson2011; Scheerer, Jacobson & Jones, Reference Scheerer, Jacobson and Jones2020a). For example, Chang et al. (Reference Chang, Niziolek, Knight, Nagarajan and Houde2013) found that speakers raised their voice pitch when perturbation decreased F 0 in perceived pitch feedback, and vice versa. These compensatory responses indicate that pitch feedback control is used to correct errors and stabilize voice F 0 at a desired level (Behroozmand, Korzyukov, Sattler & Larson, Reference Behroozmand, Korzyukov, Sattler and Larson2012; Liu, Zhang, Xu & Larson, Reference Liu, Zhang, Xu and Larson2007).

Most research concerning how speakers transform pitch errors into corrective motor responses has traditionally relied on behavioral studies using FAF paradigms (Burnett et al., Reference Burnett, Senner and Larson1997, Reference Burnett, Freedland, Larson and Hain1998). However, compensatory responses merely index the endpoint of successive mental processes that precede initiation of corrective motor commands, making it difficult to examine how the brain processes pitch-shifted feedback leading to final motor corrections (Coughler et al., Reference Coughler, de Launay, Purcell, Cardy and Beal2022). Recently, interest has been growing in combing neurophysiology or neuroimaging techniques with FAF paradigms to promote explorations of spatiotemporal brain mechanisms underlying pitch feedback control. In neuroimaging research, auditory- and motor-related areas have been identified in feedback control, including the premotor cortex, superior temporal gyrus, basal ganglia, and fronto-parietal regions (Kearney & Guenther, Reference Kearney and Guenther2019; Tourville, Reilly & Guenther, Reference Tourville, Reilly and Guenther2008). However, despite excellent spatial resolution, the functional magnetic resonance imaging (fMRI) technique suffers from the limitation of low temporal resolution for measuring blood-oxygen-level dependent activation.

At the same time, electroencephalography (EEG) has high temporal resolution, and brain responses to specific sensory, cognitive, or motor events can be assessed millisecond-by-millisecond (Zhu, Damian & Zhang, Reference Zhu, Damian and Zhang2015). In neurophysiology research, typical event-related potential (ERP) components observed in response to auditory stimuli are characterized by a positive-negative-positive sequence (i.e., the P1-N1-P2 complex), and studies have found increased P1-N1-P2 complex activity when speaking in the context of pitch-shifted feedback, compared with speaking in the context of one's unaltered feedback (Behroozmand, Karvelis, Liu & Larson, Reference Behroozmand, Karvelis, Liu and Larson2009; Behroozmand, Ibrahim, Korzyukov & Robin, Reference Behroozmand, Ibrahim, Korzyukov and Robin2015; Behroozmand, Sangtian, Korzyukov & Larson, Reference Behroozmand, Sangtian, Korzyukov and Larson2016; Chen, Liu, Wang, Larson, Huang & Liu, Reference Chen, Liu, Wang, Larson, Huang and Liu2012; Korzyukov, Karvelis, Behroozmand & Larson, Reference Korzyukov, Karvelis, Behroozmand and Larson2012a; Korzyukov, Sattler, Behroozmand & Larson, Reference Korzyukov, Sattler, Behroozmand and Larson2012b; Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a; Scheerer, Liu & Jones, Reference Scheerer, Liu and Jones2013b). The high temporal resolution makes EEG ideal for examining P1-N1-P2 activity during FAF paradigms, which could expand our understanding of late bilinguals’ use of pitch feedback for motor corrections.

Notably, the P1-N1-P2 components are not specific to pitch feedback processing, as the ERP responses to FAF paradigms look much like those recorded in response to auditory stimuli in general (Coughler et al., Reference Coughler, de Launay, Purcell, Cardy and Beal2022). Participants perform complex, goal-oriented motor actions in FAF paradigms; thus, the dynamic contribution of brain activity underlying auditory- and motor-related responses needs to be discussed in the specific experimental context (Behroozmand, Liu & Larson, Reference Behroozmand, Liu and Larson2011). Using FAF paradigms, studies found the P1-N1-P2 complex to be a reliable indicator of pitch feedback processing, with N1 (negative peak latency around 100 ms) and P2 (positive peak latency around 200-300 ms) being the most prominent components (Behroozmand et al., Reference Behroozmand, Karvelis, Liu and Larson2009, Reference Behroozmand, Ibrahim, Korzyukov and Robin2015; Korzyukov et al., Reference Korzyukov, Karvelis, Behroozmand and Larson2012a, Reference Korzyukov, Sattler, Behroozmand and Larson2012b; Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a, Reference Scheerer, Liu and Jones2013b).

One concept that has not received sufficient attention in relation to pitch feedback control is the sense of agency (SoA), defined as the experience of oneself as the agent of one's own actions (Korzyukov, Bronder, Lee, Patel & Larson, Reference Korzyukov, Bronder, Lee, Patel and Larson2017). It refers to the sense that I am the one who is causing or controlling a movement or change in the outside world that I am perceiving (Moore, Reference Moore2016). Previous research found that participants perceived different perturbation magnitudes as self-produced pitch errors (100 cents) or externally-generated pitch shift (400 cents; Korzyukov et al., Reference Korzyukov, Bronder, Lee, Patel and Larson2017). Through manipulating perturbation magnitudes in FAF paradigms, researchers could examine the role of SoA in regulating speech motor movement and the P1-N1-P2 amplitudes during pitch feedback control (Behroozmand & Larson, Reference Behroozmand and Larson2011; Chen, Wong, Jones, Li, Liu & Chen, Reference Chen, Wong, Jones, Li, Liu, Chen and Liu2015; Korzyukov et al., Reference Korzyukov, Bronder, Lee, Patel and Larson2017; Liu et al., Reference Liu, Meshman, Behroozmand and Larson2011; Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a).

The P1 component has been considered to reflect early recognition of changes in an auditory stimulus, rather than specific processing of the magnitude of deviation in auditory feedback (Korzyukov et al., Reference Korzyukov, Karvelis, Behroozmand and Larson2012a; Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a). Scheerer et al. (Reference Scheerer, Behich, Liu and Jones2013a) found that the P1 component was sensitive to perturbations, as its amplitude increased in all FAF conditions, compared with the one in the 0-cent condition, when participants spoke under unaltered feedback. However, P1 amplitudes did not vary across different FAF conditions, suggesting a lack of sensitivity to pitch-shift magnitudes (see also Korzyukov et al., Reference Korzyukov, Karvelis, Behroozmand and Larson2012a).

The N1 component has been associated with SoA – that is, the determination of whether feedback is internally or externally produced (Behroozmand & Larson, Reference Behroozmand and Larson2011; Chen et al., Reference Chen, Wong, Jones, Li, Liu, Chen and Liu2015; Korzyukov et al., Reference Korzyukov, Bronder, Lee, Patel and Larson2017; Liu et al., Reference Liu, Meshman, Behroozmand and Larson2011; Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a; Scheerer & Jones, Reference Scheerer and Jones2018). If a sound is recognized as self-generated, N1/M1 amplitude suppression occurs in relation to auditorily processing this sound (Heinks-Maldonado, Mathalon, Gray & Ford, Reference Heinks-Maldonado, Mathalon, Gray and Ford2005; Flinker, Chang, Kirsch, Barbaro, Crone & Knight, Reference Flinker, Chang, Kirsch, Barbaro, Crone and Knight2010). Scheerer et al. (Reference Scheerer, Behich, Liu and Jones2013a) found that all perturbation magnitudes elicited larger N1 amplitudes than did the unaltered (0 cent) condition. Unlike the P1 component, the N1 component was modulated by perturbation magnitudes, as the 400-cent condition elicited larger N1 amplitudes than perturbation magnitudes, roughly between the range of 50–250 cents, whereas smaller perturbation magnitudes elicited similarly sized N1 amplitudes. These findings suggest that when perceived errors remain within a range (no greater than 250 cents) in which feedback could still be considered internally produced, N1 amplitudes increase in an all-or-nothing manner (Behroozmand & Larson, Reference Behroozmand and Larson2011; Hawco, Jones, Ferretti & Keough, Reference Hawco, Jones, Ferretti and Keough2009). In contrast, when large feedback perturbations exceed what could be considered physiologically feasible (400 cents), N1 amplitudes become even larger, with increased N1 activity suggesting decreased SoA (Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a).

The P2 component has been considered to reflect computation of a mismatch in auditory feedback and issuance of corrective motor commands (Chen et al., Reference Chen, Wong, Jones, Li, Liu, Chen and Liu2015; Jones, Scheerer & Tumber, Reference Jones, Scheerer and Tumber2013; Liu et al., Reference Liu, Meshman, Behroozmand and Larson2011; Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a). In Scheerer et al. (Reference Scheerer, Behich, Liu and Jones2013a), graded increases in P2 amplitudes emerged as perturbation magnitudes increased from 50 to 250 cents (see also Behroozmand et al., Reference Behroozmand, Karvelis, Liu and Larson2009); however, P2 amplitudes began to decrease when perturbation was increased to 250 cents and higher. This pattern was consistent with changes in vocal compensation as a function of perturbation magnitudes (Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a). Regression analysis also showed that vocal response magnitude explained a significant proportion of the variance in P2 amplitudes (Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a). In addition, evidence indicates that the Sylvian fissure, an area involved in sensorimotor transformation, is one generator of P2 (Hickok, Buchsbaum, Humphries & Muftuler, Reference Hickok, Buchsbaum, Humphries and Muftuler2003). Motor regulation in pitch feedback control also involves SoA. Specifically, if feedback is not recognized as self-produced (400 cents), the brain treats the sound as other environmental sounds; hence, no SoA experience and smaller magnitudes of motor correction and P2 amplitudes occur (Korzyukov et al., Reference Korzyukov, Bronder, Lee, Patel and Larson2017).

Taken together, the sensitivity of the P1-N1-P2 components in FAF paradigms makes them ideal for assessing the neurocognitive mechanism underlying pitch feedback control. Given that the solid evidentiary basis for this only considered monolingual speakers, this study utilized pitch and the P1-N1-P2 complex to reveal how auditory feedback control may vary between L1 and L2 production in late bilinguals.

1.3 Pitch feedback control in late bilinguals

Apart from bilingualism, tonal language experience has been found to influence speakers’ pitch feedback processing (Giuliano, Pfordresher, Stanley, Narayana & Wicha, Reference Giuliano, Pfordresher, Stanley, Narayana and Wicha2011; Ning et al., Reference Ning, Shih and Loucks2014, Reference Ning, Loucks and Shih2015). Specifically, tonal language background enhances speakers’ sensitivity in detecting pitch changes (Chandrasekaran, Krishnan & Gandour, Reference Chandrasekaran, Krishnan and Gandour2009; Giuliano et al., Reference Giuliano, Pfordresher, Stanley, Narayana and Wicha2011) and motor control stability in response to pitch-shifted feedback (Ning et al., Reference Ning, Shih and Loucks2014, Reference Ning, Loucks and Shih2015). For example, Ning et al. (Reference Ning, Shih and Loucks2014) compared vocal compensation magnitudes among native Mandarin and English speakers and adult L2 learners of Mandarin, and found that native Mandarin speakers exhibited the least vocal compensation in response to pitch perturbations, suggesting the most stable pitch feedback control (see also Ning et al., Reference Ning, Loucks and Shih2015). These findings were explained by the varying saliency of pitch in tonal (i.e., discriminating among otherwise identical syllables) and non-tonal languages (i.e., signaling stress and intonation patterns). Further, pitch feedback control was also found to vary between speakers of different tonal languages (Chen et al., Reference Chen, Liu, Wang, Larson, Huang and Liu2012; Liu, Wang, Chen, Liu, Larson & Huang, Reference Liu, Wang, Chen, Liu, Larson and Huang2010b). For example, Cantonese speakers were reported to have smaller compensatory responses (Liu et al., Reference Liu, Wang, Chen, Liu, Larson and Huang2010b) and larger P2 amplitudes (Chen et al., Reference Chen, Liu, Wang, Larson, Huang and Liu2012) compared with Mandarin speakers in response to 200- and 500-cent perturbations. This was because Cantonese and Mandarin have six and four contrastive lexical tones, respectively, thus rendering varying saliency of pitch between the two languages. Altogether, the above findings indicate that tonal language background facilitates stable pitch feedback control during online speaking.

Ning et al. (Reference Ning, Shih and Loucks2014) conducted an initial study to compare pitch feedback control between native speakers with and without tonal language experience (English vs. Mandarin speakers) and extended the question to L2 learning (L2 Mandarin learners). The pitch-shift compensatory response of L2 learners differed from those of both Mandarin and English native speakers, suggesting that L2 learners were in the process of acquiring the response patterns of native speakers of tonal language. However, the complex interaction between bilingualism and tonal language background remains unclear. The current study followed the methodology of earlier studies (Ning et al., Reference Ning, Shih and Loucks2014, Reference Ning, Loucks and Shih2015) to further examine this issue by recruiting bilinguals whose L1 and L2 are tonal and non-tonal, respectively (Chinese vs. English). By doing so, we could address the language effect by comparing L1 and L2 pitch feedback control in the same bilinguals; we could also test a further assumption of whether a tonal-language advantage in pitch feedback control for L1 could successfully transfer to bilinguals’ non-tonal L2.

According to the language transfer theory, learning language A facilitates learning language B if both languages share a linguistic feature, and that particular linguistic feature is more prominent in language A than in B (Bialystok, Majumder & Martin, Reference Bialystok, Majumder and Martin2003). Speaking a tonal language such as Chinese requires establishing finely grained associations between pitch contours and word meanings, creating enhanced pitch-processing abilities (Ning et al., Reference Ning, Shih and Loucks2014, Reference Ning, Loucks and Shih2015). If a tonal-language advantage in L1 does transfer to L2, then late Chinese–English bilinguals may demonstrate similarly stable pitch feedback control in L1 and L2 production. However, if this advantage is limited to processing pitch in one's native language, then bilinguals may exhibit specialized abilities in L1 and L2 pitch feedback control.

1.4 The current study

The purpose of this cross-language study was to compare the behavioral and neural correlates of pitch feedback control in L1 and L2 production among late Chinese–English bilinguals. This study employed a typical FAF paradigm in which participants read target words while presented with unexpected pitch shifts in their voice feedback. Participants’ vocal compensation and ERP responses (P1-N1-P2 complex) were measured and compared across different conditions. Some previous studies demonstrated that responses to pitch shifts are stimulus- and domain-specific (Cooper & Wang, Reference Cooper and Wang2012; Ning et al., Reference Ning, Loucks and Shih2015); thus, we expected that participants may show language-specific pitch feedback control abilities in L1 and L2 production.

Generally, we hypothesized that late bilinguals would be less proficient in auditory feedback control in L2 production relative to L1 production due to inherently different language experience. Specifically, at the behavior level, we hypothesized that late bilinguals would have heavier reliance on pitch feedback control and lower efficiency in initiating error-correction during online L2 speaking (i.e., larger and later vocal compensation). Because the auditory P1-N1-P2 complex reflects monitoring processes and subsequent modification of ongoing speech in FAF paradigms, it has been utilized as a sensitive neural marker of pitch-feedback processing (Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a). At the electrophysiological level, we hypothesized that pitch-shifted feedback would induce greater P1-N1-P2 amplitudes, compared to unaltered feedback, in both L1 and L2 production. Further, we hypothesized that late bilinguals would exhibit language-specific pitch feedback control in L1 and L2 production (i.e., different modulations of P1-N1-P2 amplitudes and latencies).

At present, the relationship between perturbation magnitudes and resultant behavioral/cortical responses remains inconclusive. The present study included perturbation magnitudes of varying levels (i.e., +100, +200, or +400 cents) as a proxy for SoA, to explore how speakers distinguish between internally and externally generated errors. Smaller deviations between perceived and predicted F 0 are more likely to be labeled as ‘normal’ within self-made production errors, thus triggering feedback-based motor corrections. In contrast, larger deviations from predicted feedback are more likely to be classified as external environmental sound, thus preventing large feedback-based motor corrections. At the behavioral level, we predicted that smaller perturbation magnitudes (100~200 cents) would elicit a larger amount of compensatory response, whereas larger perturbation magnitudes (400 cents) would elicit a smaller amount of compensatory response, owing to reduced SoA. At the electrophysiological level, we expected to see perturbation magnitudes modulate N1 and P2 amplitudes, as previous studies have shown (Korzyukov et al., Reference Korzyukov, Bronder, Lee, Patel and Larson2017; Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a). To the best of our knowledge, no systematic study has yet been conducted on late bilinguals to compare language-specific feedback processing at varying FAF levels. We further predicted that language and perturbation magnitudes will significantly interact to modulate both vocal compensation and cortical ERPs in late bilinguals’ pitch feedback control.

2. Material and methods

2.1 Participants

This study's participants were 24 Chinese–English bilinguals from Renmin University of China. Participants (10 males) were, on average, 22.9 years old (SD = 2.3, range: 17–28). Note that bilinguals can be classified based on age of L2 acquisition. Early and late bilinguals are those who learned their L2 between birth and 8 years old or at 8 years old or older, respectively (Birdsong & Molis, Reference Birdsong and Molis2001). In the current study, we only recruited late bilinguals who were native Chinese (L1) speakers and learned English in a classroom setting (L2). The mean age of L2 acquisition was 9.9 years (SD = 1.4, range: 8–12). All participants had passed the College English Test for Band Four (CET4), a proficiency test administered by the Ministry of Education to examine the English-language proficiency of non-English major students in China. The total score of CET4 is 710, and the official minimum passing score is 425. Participants’ average CET4 score was 534 (SD = 47, range: 468–632), indicating that they were moderately proficient to proficient bilinguals. None of the participants reported any history of speech, hearing, or neurological disorders. They received a financial reward for their participation and provided informed consent in compliance with a protocol approved by the Institution Review Board at Renmin University of China.

2.2 Stimuli and design

Six Chinese words and six English words were chosen as target stimuli. Two Chinese and two English words as practice items were included to familiarize participants with the experimental procedure, which were not employed in the formal experiment. Previous studies on pitch feedback control typically required participants to produce and sustain a specific vowel sound (i.e., /u/ or /a/; Demopoulos et al., Reference Demopoulos, Kothare, Mizuiri, Henderson-Sabes, Fregeau, Tjernagel, Houde, Sherr and Nagarajan2018; Murray & Stepp, Reference Murray and Stepp2020; Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a, Reference Scheerer, Liu and Jones2013b). Based on this common practice, our study adapted the classical paradigm and required participants to produce L1/L2 words containing the neighboring vowels /u/ or /ʊ/. To minimize contamination from the initial consonant to the pitch data, we employed the same initial consonants across two languages (i.e., /g/, /f/, /k/, /p/, /h/, and /t/). In the L1 task, participants were instructed to read aloud monosyllabic Chinese words with a high-level tone (“估/gu1/,” “夫/fu1/,” “枯/ku1/,” “扑/pu1/,” “呼/hu1/,” and “突/tu1/”) containing the vowel /u/. In the L2 task, participants read aloud utterances of English words (“good /ɡʊd/,” “food /fuːd/,” “cook /kʊk/,” “pull /pʊl/,” “hook /hʊk/,” and “tool /tuːl/”) containing the neighboring vowel sounds /u/ or /ʊ/. The L1 and L2 stimuli were pseudo-cognates which had phonological resemblance but shared no semantic and orthographic resemblance with each other. These six items in each language aimed to increase stimulus variety and reduce the participants’ boredom.

The experiment adopted a 2 (language: L1 and L2) × 4 (perturbation magnitudes: 0, 100, 200, and 400 cents) within-subjects and between-items design. During the experiment, each participant completed L1 and L2 sessions of 240 trials arranged in 60 blocks of four trials each, with 480 trials in total. In each block, the trials were presented under four conditions for each frequency, and the order of pitch manipulation (+ 0, 100, 200, and 400 cents) was randomized. Additionally, the six different words were arranged in a pseudo-randomized order for each participant with the constraint that a particular target did not re-occur for at least five trials. Thus, participants could not predict which stimulus type would occur in any given trial.

2.3 Apparatus and procedure

Participants performed the experiment in a sound-attenuated booth, with visual access to a computer monitor that displayed a word for each trial. Words were presented using the Psychophysics Toolbox (Brainard & Vision, Reference Brainard and Vision1997), and custom MATLAB scripts controlled Audapter software for perturbing auditory feedback. The apparatus included a MOTU Microbook II USB Audio Interface and Behringer Xenyx 502 mixer connected to a Lenovo desktop running Audapter, a custom-built MEX-based software (Cai, Boucek, Ghosh, Guenther & Perkell, Reference Cai, Boucek, Ghosh, Guenther and Perkell2008) written in C++ and executed under MATLAB software (Mathworks Inc., 2014b), to control the magnitudes of F 0 perturbation in real time. Participants’ speech was recorded with an external condenser microphone (SHURE SM58S) connected to an external soundcard (YAMAHA Steinberg CI1). The microphone was fixed on a short stand on the desk and secured 10 cm from the participant's mouth, calibrated by the experimenter at the beginning of both L1 and L2 sessions. Participants were seated comfortably and still throughout the L1/L2 session to ensure the same microphone proximity across different experimental conditions. Microphone signals were digitized at a frequency of 48000 samples/sec and down-sampled to 12000 samples/sec for real-time processing. The tracked F 0 were mapped to values after perturbation and then a pole-substituting digital filter converted the F 0 from their original values to the shifted values. Absolute values of pitch shift (+100, +200, or +400 cents) were used at random. The latency to deliver the perturbed signals was approximately 11 ms, below the 30 ms threshold for detectable perturbation (Yates, Reference Yates1963). Audio output from Audapter was delivered via supra-aural headphones (Bose QuietComfort35 II) worn by the speaker. To partially mask air- and bone-conducted feedback, we calibrated the recording system by making the intensity of feedback that the participants heard 10 dB SPL (sound pressure level) higher than that of their voice output (see Ballard et al., Reference Ballard, Halaki, Sowman, Kha, Daliri, Robin, Tourville and Guenther2018; for a similar procedure).

The experiment comprised a practice phase and testing phase with the same trial procedure. In the practice phase, participants were told that a word stimulus (L1 or L2 word) would appear on the screen for 1s, and they should produce the word after its disappearance. They were trained to produce the target L1 or L2 word in a clear voice, at a consistent volume range of 74–84 dB SPL, and for a medium duration range of 400–500 ms (for the similar criteria, see Cai, Beal, Ghosh, Tiede, Guenther & Perkell, Reference Cai, Beal, Ghosh, Tiede, Guenther and Perkell2012). The practice trials were analyzed using Praat speech analysis software (version 6.0.43; Boersma & Weenink, Reference Boersma and Weenink2013). Regarding the vocal intensity and word duration, feedback of the appropriacy of current vocalization and suggestion for forthcoming vocalization were given verbally to participants. This procedure ensured an approximate consistency of intensity and speaking rate across trials, conditions, and participants. The order of L1 and L2 practice sessions was counterbalanced across participants. Because of the individual variability in how fast participants could perform as required by the experimenter, there was a slight difference in the number of practice trials across participants. For each participant, there was also a slight difference in the number of practice trials of L1 words and L2 words. The experimenter judged that participants could follow the instruction when participants could meet the above-mentioned criteria for 5 consecutive trials. After that, participants were informed they would hear their voice through headphones in the formal experiment, and that sometimes it might sound odd, but they should continue their vocalization regardless of this. After the experimenter determined participants had fully understood the procedure, the formal experiment was administered.

In the test phase, participants completed two sessions, L1-Chinese and L2-English, with an interval of approximately 10 minutes to avoid cross-language interference and fatigue. The order of two sessions was counterbalanced across participants. For each trial, first, a fixation cross (+) was presented at the center of the screen for 500 ms, followed by a blank screen for 500 ms. Then, a word was presented for 1000 ms, followed by a blank screen during which participants were trained to produce the word. The blank screen disappeared when participants provided a vocal response or after 2000 ms if participants did not respond. In the perturbed trials, participants’ voice onset activated the pitch tracking and shifting program. Following each response, the experimenter judged and recorded if the response was correct and whether a voice key error (i.e., non-speech sounds triggered the voice key or speech sounds failed to trigger the voice key) had occurred. An intertrial interval of 1000 ms occurred between each trial (see Figure 1).

Fig. 1. Examples of experimental paradigm.

2.4 Acoustic recording and analysis

Participants’ vocalizations were extracted and saved as separate WAV format files. The first author, naïve to the experimental conditions of all trials, manually examined the recordings using Praat speech analysis software. Preliminary data inspection resulted in exclusion of trials containing speech errors, dysfluencies, or gross pitch tracking errors. The vocalization onset and offset of each word were labeled by hand, and the vocal cycles were manually checked for errors such as missed or double marks.

To estimate the time course of vocal compensation, we extracted continuous F 0 contours of each vocalization (with a temporal resolution of 2 ms), using a trimming algorithm (Xu, Reference Xu1999). The F 0 contours were aligned from vocal onset and averaged across all trials frame-by-frame for each condition, resulting in 8 average contours per participant. Data from the first 400 ms (i.e., lower limit of word duration) were included in the averaging. To ensure that trials were uniform from the onset to the offset of the contour averaging, we discarded trials shorter than 400 ms, resulting in a loss of 4.47% of the data. The F 0 signals were then converted to a cent-scale using the following formula:

$${\rm Cents} = 100 ( {39.86{\rm lo}{\rm g}_{10}( {\,f2/f1} ) } ) $$

where f1 equals an arbitrary reference note at 196 Hz (G4) and f2 is the voice signal in Hertz (Chen, Liu, Xu & Larson, Reference Chen, Liu, Xu and Larson2007; Xu et al., Reference Xu, Larson, Bauer and Hain2004). Every speaker has a unique F 0 range; thus, converting F 0 values to cents allowed for pitch comparison across participants.

After averaging, a statistical test was performed to determine if the average of the baseline wave for 0-cent unperturbed trials differed significantly from the average of the perturbed wave for 100-cent, 200-cent, and 400-cent shifted trials. To allow for a direct comparison, the procedure and parameters of the statistical test followed two relevant studies that also examined pitch feedback control during online production of Mandarin (Xu et al., Reference Xu, Larson, Bauer and Hain2004) and English (Chen et al., Reference Chen, Liu, Xu and Larson2007). A series of point-by-point t-tests were performed between all baseline and all perturbed waves for a given condition and participant. This process yielded an array of p-values indicating the level of significant difference between all baseline and perturbed waves. To control for type I errors, we used a data-driven statistical correction method (i.e., false discovery rate, FDR) and physiological criteria which consider the limitation of the neuromuscular system of pitch feedback control (Chen et al., Reference Chen, Liu, Xu and Larson2007; Xu et al., Reference Xu, Larson, Bauer and Hain2004).

The onset of compensatory response was defined as the timepoint at which the p-value decreased below .02 for at least 50 ms and exceeded 60 ms post perturbation onset. The timepoint at which the p-value surpassed .02 indicated the end of the compensatory response (for the same criteria, see Chen et al., Reference Chen, Liu, Xu and Larson2007; Xu et al., Reference Xu, Larson, Bauer and Hain2004). First, by setting p-values to a stricter level of .02 instead of .05, the probability of null hypothesis being true decreased accordingly, hence reducing type I errors. Second, previous studies demonstrated that a finite time of at least 60 ms occurs between a pitch-shift stimulus and F 0 compensatory response (Burnett et al., Reference Burnett, Senner and Larson1997, Reference Burnett, Freedland, Larson and Hain1998; Larson, Reference Larson1998). By limiting minimal onset latencies to 60 ms, very early onset latencies that were not caused by feedback-based motor correction during pitch feedback control were validly rejected. Third, studies also suggested that the fastest contraction speed of a muscle such as the cricothyroid, which is important for voice F 0 control, takes approximately 30 ms to reach peak contraction (Perlman & Alipour-Haghighi, Reference Perlman and Alipour-Haghighi1988), and another 20–30 ms to elicit a change in voice F 0 (Larson, Kempster & Kistler, Reference Larson, Kempster and Kistler1987). By limiting minimal durations of compensatory response to 50 ms, very short duration responses (i.e., not reflecting the F 0 change during pitch feedback control) were validly rejected.

A difference wave was then calculated by subtracting the average baseline wave from the averaged perturbed waves for each participant and each condition. The magnitude of compensatory response was defined as the greatest deviation (from zero) of the difference wave, following the onset latency but before the end of vocal compensation. Peak latency was defined as the timepoint when maximal compensation occurred. Three separate 2 (language: L1 and L2) × 3 (perturbation magnitudes: 100, 200, and 400 cents) repeated-measures analyses of variance (RM-ANOVAs) were then conducted for the onset latencies, response magnitudes, and peak latencies, respectively. Perturbations were not presented during the baseline trials in the 0-cent condition; thus, it would have been meaningless to examine response magnitude and response latency.

2.5 EEG recording and analysis

The EEG data were recorded with 64 electrodes secured in an elastic cap (Electro Cap International) using Neuroscan 4.3 software. The vertical electro-oculogram (VEOG) was monitored with two electrodes placed above and below the left eye. The horizontal EOG (HEOG) was recorded by a bipolar montage using two electrodes placed on the right and left external canthus. The left mastoid electrode served as the reference. All electrode impedances were kept under 5 kΩ during the experiment. Electrophysiological signals were amplified with a band-pass filter of 0.05 and 70 Hz and digitized continuously at a sampling rate of 500 Hz. Offline analyses of EEG signals were performed using the EEGLAB toolbox (Delorme & Makeig, Reference Delorme and Makeig2004) and custom-written MATLAB scripts (MathWorks, Inc., Natick, MA). The EEG was re-referenced offline to the average of both mastoids and filtered offline using a 0.1 Hz high-pass filter and 30 Hz low-pass filter. Bad electrodes were then detected using the kurtosis method and interpolated using data from surrounding electrodes. EEG epochs were extracted for each epoch extending from 1 s preceding to 1.5 s following the vocalization onset. To remove eye movement artifacts, the ICA algorithm implemented in the EEGLAB toolbox (Delorme & Makeig, Reference Delorme and Makeig2004) was applied to all EEG epochs. Epochs were re-segmented from 200 ms before to 500 ms after vocalization onset with baseline correction from -200 to 0 ms preceding vocalization onset. Prior to offline averaging, all single-trial waveforms were screened for eye movements, electrode drifting, amplifier blocking, and EMG artifacts. For further analysis, EEG epochs were subjected to an artifact rejection procedure in which epochs containing artifact signals below/above ±50 μv were rejected. If less than 40 epochs remained in one condition, the data from that participant were excluded.

Numerous studies suggest that electrodes placed on the frontal and central regions are ideal for recording reliable FAF responses (Liu et al., Reference Liu, Meshman, Behroozmand and Larson2011; Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a, Reference Scheerer, Jones and Iarocci2013b). Six electrodes were included, F3, FZ, F4, C3, CZ, and C4, and then grouped as left (average of F3 and C3), medial (average of FZ and CZ), and right (average of F4 and C4). These electrodes were chosen to allow for direct comparison with previous FAF research (Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a). For each participant, averaged waveforms were created for each condition for each electrode. Grand averaged waveforms were created for all conditions by averaging the data from all participants for each electrode. For all averaged waveforms for each participant, the amplitudes of the P1-N1-P2 components were manually extracted as the positive, negative, and positive peaks in the time windows of 50–150 ms, 100–200 ms, and 200–300 ms after vocalization onset. The corresponding latencies were extracted as the timepoint at which peak amplitudes occurred. RM-ANOVAs were conducted to examine the effects of language (L1 and L2), perturbation magnitudes (0, 100, 200, and 400 cents), and laterality (left, medial, and right), and their interaction on P1-N1-P2 amplitudes and latencies.

3. Results

Data from 24 participants were included in the subsequent analyses. Separate ANOVAs including factors of language and perturbation magnitudes were performed on onset latencies, response magnitudes, peak latencies, and P1-N1-P2 amplitudes and latencies. We used the FDR method in multiple comparisons (Yekutieli & Benjamini, Reference Yekutieli and Benjamini1999), as implemented using the fdrtool package of R software.

3.1 Behavioral results

Onset latency analysis

Figure 2 shows the grand-averaged difference waves reflecting pitch changes from baseline trials in response to 100-, 200-, and 400-cent perturbations in L1-Chinese (A) and L2-English (B) production. To identify the timepoints at which the pitch diverged between two conditions (baseline vs. perturbed), we adopted Chen et al.'s (Reference Chen, Liu, Xu and Larson2007) method to calculate onset latency (i.e., the timepoint at which the p-value decreased below .02 for at least 50 ms within the 400-ms time window). For the 100-, 200-, and 400-cent conditions, results showed onset latencies of 122 ms, 101 ms, and 85 ms, respectively, in L1-Chinese production; however, the respective onset latencies in L2-English production were 151 ms, 128 ms, and 93 ms.

Fig. 2. Grand-averaged difference waves reflecting pitch changes from baseline trials in response to 100-cent, 200-cent, and 400-cent perturbations in L1-Chinese (A) and L2-English (B) production.

A two-way ANOVA conducted on the onset latencies revealed significant main effects of language, F(1, 23) = 258.60, p < .001, ηp 2 = .92, and perturbation magnitudes, F(2, 46) = 415.90, p < .001, ηp 2 = .97. A significant interaction effect was found between language and perturbation magnitudes, F(2, 46) = 32.33, p < .001, ηp 2 = .58; therefore, separate one-way ANOVAs of language were performed on onset latencies across three pitch-shift conditions. The results (after FDR correction) showed that late Chinese–English bilinguals produced significantly earlier onset latencies in L1 than in L2 production when their voice pitch feedback was shifted +100 cents (F(1, 23) = 179.19, p = .001, ηp 2 = .89), +200 cents (F(1, 23) = 128.56, p = .001, ηp 2 = .85), or +400 cents (F(1, 23) = 15.46, p = .001, ηp 2 = .40). ANOVAs of perturbation magnitudes were also performed on onset latencies for L1 and L2 production, respectively. The results (after FDR correction) showed that the main effect of perturbation magnitudes was significant in L1 (F(2, 46) = 161.11, p = .001, ηp 2 = .88) and L2 (F(2, 46) = 311.54, p = .001, ηp 2 = .93) production.

Response magnitude analysis

Figure 3 shows the results of the response magnitudes to pitch-shifted feedback in L1-Chinese and L2-English production. A two-way ANOVA conducted on the response magnitudes revealed a significant main effect of language, F(1, 23) = 40.72, p < .001, ηp 2 = .65 (Figure 3A). Larger response magnitudes were observed in L2 than in L1 production. Perturbation magnitudes also showed a significant main effect, F(2, 46) = 58.47, p < .001, ηp 2 = .73 (Figure 3B). Post-hoc pairwise comparisons (after FDR correction) indicated that a 200-cent shift resulted in significantly larger magnitudes of compensation than a 100-cent shift, F(1, 23) = 8.67, p = .008, ηp 2 = .28; however, a 400-cent shift was found to elicit significantly smaller magnitudes of compensation than 100-cent (F(1, 23) = 69.38, p = .002, ηp 2 = .76) or 200-cent (F(1, 23) = 105.16, p = .002, ηp 2 = .83) shifts.

Fig. 3. Behavioral results of response magnitude analysis. (A) Violin plots of the absolute values of response magnitudes for L1-Chinese and L2-English conditions. (B) Violin plots of the absolute values of response magnitudes for 100-, 200-, and 400-cent conditions. (C) Violin plots of the absolute values of response magnitudes in the 100-, 200-, and 400-cent conditions as a function of language. (D) Violin plots of the absolute values of response magnitudes in L1-Chinese and L2-English conditions as a function of perturbation magnitudes. Full dots represent individual data points. White circles represent means. Error bars represent ± SEM. The lines' two ends represent individual participants.

A significant interaction effect was found between language and perturbation magnitudes, F(2, 46) = 26.10, p < .001, ηp 2 = .54. Thus, separate one-way ANOVAs of language were performed on response magnitudes across three pitch-shift conditions (Figure 3C). The results (after FDR correction) showed that late Chinese–English bilinguals produced significantly smaller response magnitudes in L1 than in L2 production when their voice pitch feedback was shifted +100 (F(1, 23) = 34.57, p = .002, ηp 2 = .61), +200 (200-cent shift, F(1, 23) = 39.14, p = .002, ηp 2 = .64), or +400 (F(1, 23) = 5.24, p = .03, ηp 2 = .19) cents. ANOVAs of perturbation magnitudes were also performed on response magnitudes for L1 and L2 production, respectively (Figure 3D). The results (after FDR correction) showed that the main effect of perturbation magnitude was significant in L1 production (F(2, 46) = 15.44, p = .001, ηp 2 = .41) and L2 (F(2, 46) = 43.24, p = .001, ηp 2 = .66).

Peak latency analysis

Figure 4 shows the results of the peak latencies for pitch-shifted feedback in L1-Chinese and L2-English production. A two-way ANOVA conducted on the peak latencies revealed a significant main effect of language, F(1, 23) = 63.34, p < .001, ηp 2 = .74 (Figure 4A). Peak latencies occurred earlier in L1 than in L2 production. A significant main effect of perturbation magnitudes was also observed, F(2, 46) = 63.76, p < .001, ηp 2 = .74 (Figure 4B). Post-hoc pairwise comparisons (after FDR correction) indicated that a 400-cent shift resulted in significantly faster peak latencies than a 100-cent (F(1, 23) = 124.39, p = .002, ηp 2 = .85) or 200-cent (F(1, 23) = 52.50, p = .002, ηp 2 = .71) shift. In addition, a 200-cent shift elicited significantly faster peak latencies than did a 100-cent shift, F(1, 23) = 10.70, p = .004, ηp 2 = .33.

Fig. 4. Behavioral results of the peak latency analysis. (A) Violin plots of peak latencies for L1-Chinese and L2-English conditions. (B) Violin plots of peak latencies for 100-, 200-, and 400-cent conditions. (C) Violin plots of peak latencies in the 100-, 200-, and 400-cent conditions as a function of language. (D) Violin plots of peak latencies in L1-Chinese and L2-English conditions as a function of perturbation magnitudes. Full dots represent individual data points. White circles represent means. Error bars represent ± SEM. The lines' two ends represent individual participants.

An interaction effect between language and perturbation magnitudes reached significance, F(2, 46) = 8.25, p = .001, ηp 2 = .27. Thus, separate one-way ANOVAs of language were performed on peak latencies across three pitch-shift conditions (Figure 4C). The results (after FDR correction) showed that late Chinese–English bilinguals produced significantly earlier peak latencies in L1 than in L2 production when their voice pitch feedback was shifted +100 (F(1, 23) = 56.04, p = .002, ηp 2 = .72), +200 (F(1, 23) = 34.98, p = .002, ηp 2 = .61), or +400 (F(1, 23) = 4.56, p = .04, ηp 2 = .17) cents. ANOVAs of perturbation magnitudes were also performed on response magnitudes for L1 and L2 production, respectively (Figure 4D). The results (after FDR correction) showed that the main effect of perturbation magnitudes was significant in L1 (F(2, 46) = 22.83, p = .001, ηp 2 = .51), and L2 (F(2, 46) = 50.77, p = .001, ηp 2 = .70) production.

3.2 ERP results

Trials with uncorrected eye-blink artifacts and muscle movements and all incorrect trials were manually excluded from the ERP analysis. Accepted trials were on average of 51.04 per experimental condition, with 51.38, 51.13, 50.96, and 50.75 trials remaining in the 0-cent, 100-cent, 200-cent, 400-cent shift conditions in L1 production, and 51.46, 51.29, 50.75, and 50.63 trials in the 0-cent, 100-cent, 200-cent, 400-cent shift conditions in L2 production. As suggested by Boudewyn, Luck, Farrens, and Kappenman (Reference Boudewyn, Luck, Farrens and Kappenman2018), increasing the number of trials beyond 45 will yield substantial increases in statistical power of detecting a small ERP effect in within-participant experiments (e.g., lateralized readiness potential). Thus, we argue that the amount of remaining data (i.e., on average 51.04 trials per condition) reaches the threshold of detecting small effects of P1-N1-P2 ERP components. Figures 5 and 6 show the grand-averaged ERP waveforms and topographical distributions of P1, N1, and P2 amplitudes in response to all pitch-shift conditions in L1 and L2 production.

Fig. 5. Grand-averaged ERP waveforms as a function of perturbation magnitudes at the left, medial, and right electrodes in L1-Chinese (upper panel) and L2-English (lower panel).

Fig. 6. (A) Topographical distributions of P1, N1, and P2 components as a function of perturbation magnitudes in L1-Chinese (upper panel) and L2-English (lower panel). Comparisons between 0- and 400-cent conditions for L1-Chinese and L2-English production in the representative left region for P1 amplitudes (B), N1 amplitudes (C), and P2 amplitudes (D), respectively.

P1-N1-P2 amplitudes analysis

Separate three-way RM-ANOVAs were conducted to investigate the effects of language, perturbation magnitudes, and laterality on P1-N1-P2 amplitudes. For P1 amplitudes, the results indicated a trend for the main effect of language (F(1, 23) = 3.30, p = .08, ηp 2 = .13), and a significant interaction between perturbation magnitudes and laterality (F(6, 138) = 3.12, p = .03, ηp 2 = .12). Separate one-way ANOVAs of perturbation magnitudes were performed across three laterality conditions. The results (after FDR correction) showed non-significant effects of perturbation magnitudes in the left (F(3, 69) = 2.46, p = .27, ηp 2 = .10), medial (F(3, 69) = .40, p = .77, ηp 2 = .02) and right electrodes (F(3, 69) = .28, p = .77, ηp 2 = .01). In addition, no significant main effects of perturbation magnitudes (F(3, 69) = .70, p = .51, ηp 2 = .03) and laterality (F(2, 46) = 1.42, p = .25, ηp 2 = .06), or other significant interactions among these variables, were found (all ps ≥ .08).

For N1 amplitudes, the results indicated a trend for the main effect of language (F(1, 23) = 3.96, p = .06, ηp 2 = .15), and a significant main effect of perturbation magnitudes (F(3, 69) = 3.53, p = .02, ηp 2 = .13). Post-hoc pairwise comparisons (after FDR correction) revealed that a 400-cent shift elicited significantly larger N1 amplitudes (absolute value) than 0-cent (F(1, 23) = 6.69, p = .05, ηp 2 = .23), but a trend of larger N1 amplitudes than 100-cent (F(1, 23) = 4.41, p = .06, ηp 2 = .16), or 200-cent (F(1, 23) = 3.82, p = .06, ηp 2 = .14) shifts. Further, 100- and 200-cent shifts failed to elicit larger N1 amplitudes than 0-cent shifts (p ≥ .15). A significant interaction between language and laterality was also found, F(2, 46) = 4.94, p = .03, ηp 2 = .18. Separate one-way ANOVAs of language were performed across three laterality conditions. The results (after FDR correction) showed a significant effect of language in the medial (F(1, 23) = 5.10, p = .05, ηp 2 = .18) and right electrodes (F(1, 23) = 5.89, p = .05, ηp 2 = .20), but not the left electrodes (F(1, 23) = 1.12, p = .30, ηp 2 = .05). No significant main effect of laterality (F(2, 46) = .21, p = .81, ηp 2 = .01), or other significant interactions among these variables, were found (all ps ≥ .17).

For P2 amplitudes, the results indicated significant main effects of language (F(1, 23) = 4.17, p = .05, ηp 2 = .15) and perturbation magnitudes (F(1, 23) = 5.70, p = .006, ηp 2 = .20). Post-hoc pairwise comparisons (after FDR correction) revealed that a 0-cent shift elicited significantly smaller P2 amplitudes than a 100-cent (F(1, 23) = 6.28, p = .03, ηp 2 = .22) or 200-cent (F(1, 23) = 15.09, p = .003, ηp 2 = .40) shift; however, no significant differences in P2 amplitudes were found between the 0-cent and 400-cent conditions (F(1, 23) = 1.72, p = .20, ηp 2 = .07). A significant interaction between language and laterality was also found, F(2, 46) = 8.62, p = .005, ηp 2 = .27. Separate one-way ANOVAs of language were performed across three laterality conditions. The results (after FDR correction) showed a significant effect of language in the medial (F(1, 23) = 5.78, p = .05, ηp 2 = .20 and right electrodes (F(1, 23) = 6.63, p = .05, ηp 2 = .22), but not the left electrodes (F(1, 23) = .76, p = .39, ηp 2 = .03). No significant main effect of laterality (F(2, 46) = 2.85, p = .10, ηp 2 = .11), or other significant interactions among these variables, were found (all ps ≥ .22).

To further investigate how late bilinguals might differ in neural processing of internally generated unaltered feedback and externally generated feedback, we conducted planned pairwise comparisons between the 0- and 400-cent conditions in both L1 and L2 production at each region of interest (ROI). For the P1 amplitudes (Figure 6B), planned pairwise comparisons (after FDR correction) indicated that there were no significant differences between the 0- and 400-cent conditions in any of the ROIs in L1 (all ps ≥ .78) and L2 (all ps ≥ .32) production. For the N1 amplitudes (Figure 6C), results indicated that a 400-cent shift elicited significantly larger N1 amplitudes (absolute value) than a 0-cent shift in the left (p = .04), medial (p = .04), and right (p = .04) ROIs in L1 production; however, these differences failed to reach significance in any of the ROIs in L2 production (all ps ≥ .30). For the P2 amplitudes (Figure 6D), results indicated no significant differences between the 0- and 400-cent conditions in any of the ROIs in L1 (all ps ≥ .28) and L2 (all ps ≥ .30) production.

P1-N1-P2 latency analysis

Separate three-way RM-ANOVAs were conducted to investigate the effects of language, perturbation magnitudes, and laterality on P1-N1-P2 latencies. For P1 latencies, the results indicated a significant main effect of language (F(1, 23) = 4.67, p = .04, ηp 2 = .17), showing that L1 production elicited significantly earlier P1 latencies than did L2 production. The interaction between perturbation magnitudes and laterality was also significant, F(6, 138) = 2.89, p = .03, ηp 2 = .11. Separate one-way ANOVAs of perturbation magnitudes were performed across three laterality conditions. The results (after FDR correction) showed non-significant effects of perturbation magnitudes in the left (F(3, 69) = 2.71, p = .15, ηp 2 = .11), medial (F(3, 69) = .18, p = .91, ηp 2 = .008) and right electrodes (F(3, 69) = 1.98, p = .20, ηp 2 = .08). No significant main effects of perturbation magnitudes (F(3, 69) = 1.07, p = .35, ηp 2 = .05) and laterality (F(2, 46) = .17, p = .75, ηp 2 = .007), or other significant interactions among these variables, were found (all ps ≥ .16).

For N1 latencies, the results indicated a trend for the main effect of perturbation magnitudes (F(3, 69) = 2.34, p = .08, ηp 2 = .09) and laterality (F(2, 46) = 2.97, p = .08, ηp 2 = .11). No significant main effect of language (F(1, 23) = .009, p = .93, ηp 2 < .001), or other significant interactions among these variables, were found (all ps ≥ .23).

For P2 latencies, the results indicated a significant main effect of laterality (F(2, 46) = 4.81, p = .02, ηp 2 = .17). However, no significant main effect of language (F(1, 23) = 1.37, p = .25, ηp 2 = .06) or perturbation magnitudes (F(3, 69) = 1.18, p = .32, ηp 2 = .05), or other significant interactions among these variables, were found (all ps ≥ .29).

Table 1 summarizes the omnibus ANOVA results of behavioral and ERP analyses which included the factors of language and perturbation magnitudes, and the planned pairwise comparison results of the 0- and 400-cent conditions in L1 and L2 production.

Table 1. Summary of the omnibus ANOVA results of behavioral and ERP analyses including the factors of language and perturbation magnitudes (upper panel), and planned pairwise comparison results of 0 vs. 400-cent conditions in L1 and L2 production (lower panel).

Note: *** p < .001, ** p < .01, * p < .05, † .05 < p < .1

4. Discussion

This study investigated how language status (L1 and L2) and perturbation magnitudes (0, 100, 200, and 400 cents) influence the neurocognitive mechanism of pitch feedback control in late Chinese–English bilinguals. Regarding the behavioral results, late bilinguals produced significantly larger magnitudes, but with longer onset and peak latencies of vocal compensation, in L2 compared with L1 production. Regarding the ERP results, a 400-cent shift elicited significantly greater N1 amplitudes compared with baseline in L1 production, whereas such differences failed to reach significance in L2 production. Additionally, late bilinguals presented significantly earlier P1 latencies in L1 compared with L2 production. These findings support the modulating effect of language status on neurocognitive processing of auditory pitch feedback in late bilinguals. Despite the differences, we also found similarities between L1 and L2 pitch feedback processing. Specifically, late bilinguals exhibited similar vocal compensation patterns approximately 80–150 ms post-perturbation onset in L1 and L2 production. Further, participants’ vocal compensation, latencies, and P2 amplitudes were similarly modulated by FAF levels in L1 and L2 production, implying a similar gating mechanism to correct internal and external errors.

4.1 Behavioral differences in pitch feedback control for L1 and L2 production

The findings reveal that the vocal compensation magnitudes were consistently larger in L2 than in L1 production at all FAF levels, which echoed our earlier research showing that unexpected brief masking noise elicited significantly larger intensity increases in L2-English production than in L1-Chinese production (Cai et al., Reference Cai, Yin and Zhang2021). The present findings also showed that the magnitudes of vocal compensation were smaller than those of pitch perturbations at all FAF levels (see Figure 2). The combined results were in line with previous studies suggesting that speakers never fully compensate for auditory perturbations (Burnett et al., Reference Burnett, Freedland, Larson and Hain1998; Chen et al., Reference Chen, Liu, Xu and Larson2007; Liu et al., Reference Liu, Meshman, Behroozmand and Larson2011; Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a, Reference Scheerer, Liu and Jones2013b). For example, Scheerer et al. (Reference Scheerer, Liu and Jones2013b) reported compensatory responses below 30 cents, constituting only a fraction of a 100-cent shift. This phenomenon of partial compensation supports the theoretical assumption that speech production works through cooperation of feedforward and feedback control systems (Guenther, Reference Guenther2016). The joint contributions prevent speakers from being affected solely by deviated auditory feedback; thus, the compensation magnitudes reflect speakers’ weighting of auditory feedback control, with larger compensation indicating heavier reliance (Lametti, Nasir & Ostry, Reference Lametti, Nasir and Ostry2012; Murray & Stepp, Reference Murray and Stepp2020). In this study, language-specific magnitudes of vocal compensation to pitch-shifted feedback in late bilinguals indicate that, relative to L1 speaking, pitch feedback control has relatively higher weighting for guiding speech output during L2 speaking.

Previous literature has suggested that language development accompanies decreasing reliance on auditory feedback for motor control, and that such a transition is driven by the stability of internal sensorimotor representations (Civier et al., Reference Civier, Tasko and Guenther2010; Guenther, Reference Guenther2016; Perkell, Reference Perkell2012; Tourville et al., Reference Tourville, Reilly and Guenther2008). Auditory feedback enables sensorimotor representations to be learned in the first place and is important for speakers to be able to monitor and correct speech output, especially when the mappings between articulatory commands and corresponding sensory consequences are weak (Scheerer & Jones, Reference Scheerer and Jones2012, Reference Scheerer and Jones2014). These interpretations offer implications for what we observed, in that late bilinguals formed less stable sensorimotor representations for their later-acquired L2. Quantitative and qualitative differences in L1 and L2 learning support these findings, in that a language is practiced intensively and extensively during L1 acquisition, whereas the amount of input an L2 learner can access is comparatively limited (Saito, Sun & Tierney, Reference Saito, Sun and Tierney2020).

Another key finding was that (onset and peak) latencies of vocal compensation were significantly longer in L2 than in L1 production at all FAF levels, which replicated our earlier findings on online voice intensity control for the same bilingual population (Cai et al., Reference Cai, Yin and Zhang2021). This pattern of results provides direct evidence that the feedback pathways are less efficient in late bilinguals’ subordinate L2 than their L1 (Simmonds et al., Reference Simmonds, Wise and Leech2011b). Some studies on L1 acquisition demonstrated that vocal response latency can decrease with age (Liu, Russo & Larson, Reference Liu, Russo and Larson2010a), and provided a developmental trajectory across age groups (Scheerer et al., Reference Scheerer, Liu and Jones2013b). It has been interpreted as the maturation of the audio-vocal system, to be specific, increased synaptic efficacy promotes more efficient neural processing and hence faster behavioral vocal compensation (Liu et al., Reference Liu, Russo and Larson2010a; Scheerer et al., Reference Scheerer, Liu and Jones2013b). These observations also shed light on our research question regarding whether late bilinguals have less-developed L2 audio-vocal systems, as characterized by slower vocal responses to pitch-shifted feedback.

However, to draw a strong conclusion on the effects of language (i.e., L1 vs. L2), one limitation still exists concerning the selection of stimuli for cross-language comparison, with the only /u/ vowel in Chinese but neighboring /u/ and /ʊ/ vowels in English. Thus, we conducted additional post-hoc analyses examining whether there existed significant differences in the overall behavioral results of two groups of L2 words with /u/ and /ʊ/ vowels. The traditional null significance hypothesis testing and corresponding Bayesian analyses revealed no systematic differences in onset latencies (all ps ≥ .28, all BF01 ≥ 3.42), response magnitudes (all ps ≥ .65, all BF01 ≥ 7.21), and latencies (all ps ≥ .62, all BF01 ≥ 6.88) for /u/ and /ʊ/ vowels (see the results in supplement). As a complement, these findings added to our conclusion that the observed cross-language differences reflected late bilinguals’ heavier reliance and lower efficiency in L2 pitch feedback control. Future research could erase this concern by examining the same vowels shared by different languages in late bilinguals.

Previous literature also suggests that fundamental frequency and formant frequency play different roles in defining speech features (Coughler et al., Reference Coughler, de Launay, Purcell, Cardy and Beal2022; Purcell & Munhall, Reference Purcell and Munhall2006). Shifted fundamental frequency results in participants’ hearing their voice higher or lower in pitch. Formant frequencies relate to the positioning of the lip, tongue, and jaw, hence changes in formant frequencies resulting in different sounds (Purcell & Munhall, Reference Purcell and Munhall2006). Specially, the first formant (F1) relates to tongue height (i.e., a higher tongue position with a lower F1), and the second formant (F2) relates to tongue backness (i.e., closer to the front of the mouth with a higher F2). Our study manipulated only the fundamental frequencies (F 0, + 100, 200, 400 cents) but not the formant frequencies (F1 or F2), we thus reasoned that vowels height (F1) differences between /u/ and /ʊ/ were less likely to influence participants’ pitch (F 0) feedback control.

4.2 ERP differences in pitch feedback control for L1 and L2 production

To the best of our knowledge, this study is the first investigation into comparing the neural mechanism of L1 and L2 pitch feedback control for late bilinguals using electrophysiological measures. As shown in Figure 5, the P1-N1-P2 complex demonstrated a trend of amplitude modulations in response to varying FAF levels, implying that these ERP components may be sensitive neural markers reflecting auditory detection and motor correction in pitch feedback control (Behroozmand et al., Reference Behroozmand, Karvelis, Liu and Larson2009, Reference Behroozmand, Liu and Larson2011, Reference Behroozmand, Ibrahim, Korzyukov and Robin2015; Korzyukov et al., Reference Korzyukov, Karvelis, Behroozmand and Larson2012a, Reference Korzyukov, Sattler, Behroozmand and Larson2012b).

One critical finding from planned pairwise comparisons between the 0- and 400-cent conditions regarded late bilinguals’ language-specific modulation of N1 amplitudes. Specifically, a 400-cent shift elicited significantly larger N1 amplitudes compared with baseline in L1 production, whereas such differences failed to reach significance in L2 production. Our observation for L1 production indicates that N1 modulations in FAF paradigms reflect SoA-related discrimination between internal and external feedback (Korzyukov et al., Reference Korzyukov, Bronder, Lee, Patel and Larson2017), which is congruent with extant literature presenting larger N1 amplitudes for 400- or 500-cent FAF conditions compared with baseline (Behroozmand & Larson, Reference Behroozmand and Larson2011; Liu et al., Reference Liu, Meshman, Behroozmand and Larson2011). Previous research suggests that N1 amplitude reflects the amount of neural resources allocated to processing a stimulus (Heinks-Maldonado et al., Reference Heinks-Maldonado, Mathalon, Gray and Ford2005; Scheerer & Jones, Reference Scheerer and Jones2014; Sitek, Mathalon, Roach, Houde, Niziolek & Ford, Reference Sitek, Mathalon, Roach, Houde, Niziolek and Ford2013).

To explain N1 suppression when native speakers experience SoA, auditory N1 activity can be modulated by the efference copy involved in forming motor-based auditory predictions before actual feedback is available. Thus, fewer neural resources are allocated to process self-generated feedback that matches predicted feedback (Hickok, Reference Hickok2012; Houde et al., Reference Houde, Kort, Niziolek, Chang and Nagarajan2013). Notably, such motor-based prediction is based on the bidirectional associations between motor commands and their sensory consequences established during language learning (Liu & Tian, Reference Liu and Tian2018). For native speakers, the motor commands to produce target L1 sounds are finely tuned through long-term experience of native language learning/production (Saito et al., Reference Saito, Sun and Tierney2020; Simmonds et al., Reference Simmonds, Wise and Leech2011b). Therefore, late bilinguals could reliably predict auditory feedback via efference copy of issued motor commands during L1 speaking, and the additional information provided by the actual feedback to ensure accuracy became redundant; hence, N1 activity for processing the incoming feedback is suppressed (Behroozmand et al., Reference Behroozmand, Karvelis, Liu and Larson2009, Reference Behroozmand, Liu and Larson2011, Reference Behroozmand, Ibrahim, Korzyukov and Robin2015; Heinks-Maldonado et al., Reference Heinks-Maldonado, Mathalon, Gray and Ford2005).

In contrast, no variation in N1 amplitudes was observed between the 0- and 400-cent conditions in L2 production. Given that N1 suppression is thought to reflect motor-based auditory prediction (Hickok, Reference Hickok2012; Liu & Tian, Reference Liu and Tian2018), we conclude that late bilinguals are still unable to sufficiently form motor-based predictions during L2 speaking, which features lower stability in sensorimotor representations for L2 sounds. Some previous evidence supplements our findings by showing larger vocal variability in L2 production (Chakraborty, Reference Chakraborty2011; Wang & van Heuven, Reference Wang and van Heuven2006). It is highly possible that more attention and neural resources are needed for online monitoring and correcting L2 output when bilinguals cannot reliably predict their actions. This could also explain why we observed larger magnitudes of vocal compensation for L2 pitch deviations.

Several possible reasons may jointly explain why the stability of sensorimotor representations is lower for L2 speech sounds than native ones in late bilinguals. First, late L2 learning varies substantially from L1 acquisition in that no ‘silent period’ or ‘babbling phase’ occurs (Simmonds et al., Reference Simmonds, Wise and Leech2011b). During the silent period, infants become tuned to the phonetic repertoire of their native language and listen extensively to language without attempting to produce speech sounds. The next stage is the babbling phase, which begins with imitation of simplified syllables, followed by single words, short phrases, and sentences (Kuhl, Reference Kuhl2004). During these important periods in L1 acquisition, native speakers precisely encode auditory targets first and then gradually modify motor commands to realize auditory expectations. For late bilinguals, the lack of these important periods during L2 learning may lead to unstable sensorimotor representations; hence, more neural resources are allocated to L2 feedback to ensure accuracy. It has been proposed that late bilinguals might also benefit from a silent period and babbling phase in terms of L2 learning (Simmonds et al., Reference Simmonds, Wise and Leech2011b). A period of intense auditory exposure to L2 before production might enable learners to hear subtly different phonetic features, allowing more accurate auditory target representations. Then, by imitating the speech sounds in isolation through babbling, L2 learners might develop more accurate efference copies of the motor commands required to produce non-native sounds. Future studies could test this possibility by designing longitudinal language-training programs for adult L2 learners.

Second, the neural plasticity of sensorimotor areas is weakened for late bilinguals, thereby requiring additional neural resources to produce less-familiar L2 sounds. A recent fMRI study found that for sequential German–English bilinguals, the production of L2-English sounds was associated with increased activity in brain regions including the left primary sensorimotor cortex, bilateral cerebellar hemispheres, left inferior frontal gyrus, and left anterior insula (Treutler & Sörös, Reference Treutler and Sörös2021). Simmonds, Wise, Dhanjal and Leech (Reference Simmonds, Wise, Dhanjal and Leech2011a) observed similar findings, demonstrating that English L2 learners (L2 AoA at 12 yr) similarly exhibited higher brain activity in the planum temporal and parietal operculum (areas involved in L1 auditory feedback control). Concerning their relevance to our study, although the two studies did not incorporate perturbed auditory feedback during speaking, the results did emphasize the existence of some form of ‘critical periods’ in the process of language learning, as suggested by the critical period hypothesis (DeKeyser, Reference DeKeyser2013).

Third, late bilinguals have practiced L2 speech sounds much less than they have native speech sounds. Researchers propose that the associations between motor commands and sensory consequences are gradually strengthened with practice (Civier et al., Reference Civier, Tasko and Guenther2010; Guenther & Vladusich, Reference Guenther and Vladusich2012; Tourville & Guenther, Reference Tourville and Guenther2011). As schematized in the speech motor control model, initial attempts to produce speech sounds result in large sensory errors; with each production practice, feedback-based corrective motor commands are gradually added, improving the accuracy of feedforward commands (Tourville & Guenther, Reference Tourville and Guenther2011). For late bilinguals, L2 speech sounds are much less rehearsed, owing to factors such as AoA, language exposure, and daily usage (Abutalebi, Cappa & Perani, Reference Abutalebi, Cappa and Perani2001; Bylund et al., Reference Bylund, Hyltenstam and Abrahamsson2021; Parker Jones et al., Reference Parker Jones, Green, Grogan, Pliatsikas, Filippopolitis, Ali, Lee, Ramsden, Gazarian, Prejawa, Seghier and Price2012; Saito et al., Reference Saito, Sun and Tierney2020). It remains unanswered whether and how these factors (e.g., language proficiency, AoA, length of L2 exposure/practice) could potentially modulate the neurocognitive mechanism of auditory feedback control. With more delicate and experimentally controlled manipulations, future studies should work to establish the causal relations among these factors and auditory feedback control performance.

In addition to N1 amplitudes, the present study also detected a language effect on P1 latencies, with late bilinguals eliciting earlier P1 latencies in L1 than in L2 production. This finding might explain the more rapid behavioral responses in L1 production, because faster neural processing is considered to promote faster vocal compensation (see also Scheerer et al., Reference Scheerer, Liu and Jones2013b). Combined with previous literature suggesting that the P1 component reflects early detection of changes in auditory feedback (Korzyukov et al., Reference Korzyukov, Karvelis, Behroozmand and Larson2012a; Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a), our study offers preliminary evidence supporting language-specific differences in the neural efficiency of pitch feedback processing. Compared with L1, neural feedback pathways are likely to be less efficient in subordinated L2 (see also Simmonds et al., Reference Simmonds, Wise and Leech2011b). Given the null effects in N1 and P2 latencies, together with the fact that this study was the first to compare late bilinguals’ L1 and L2 pitch feedback control efficiency, the current ERP data did not allow us to draw strong conclusions on the effect of language on neural processing speed.

In the literature on pitch feedback control, studies have shown that modulations of P1-N1-P2 latencies are less consistent compared with amplitude modulations (Liu et al., Reference Liu, Chen, Jones, Wang, Chen, Huang and Liu2013; Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a). For example, Scheerer et al. (Reference Scheerer, Behich, Liu and Jones2013a) reported significant main effects of shift magnitude on P1, N1, and P2 amplitudes as well as N1 latencies, but they failed to detect such significant effects on P1 and P2 latencies. Furthermore, modulations of P1-N1-P2 latencies are inconsistent in measuring the developmental process in auditory feedback control. Some studies reported the latencies of these ERP components decreased with children's age (Liu et al., Reference Liu, Russo and Larson2010a, Reference Liu, Chen, Jones, Wang, Chen, Huang and Liu2013; Scheerer et al., Reference Scheerer, Liu and Jones2013b), others found an increase (Kraus, McGee, Micco, Sharma, Carrell & Nicol, Reference Kraus, McGee, Micco, Sharma, Carrell and Nicol1993; Oades, Dmittmann-Balcar & Zerbin, Reference Oades, Dmittmann-Balcar and Zerbin1997), while others observed no changes (Johnstone, Barry, Anderson & Coyle, Reference Johnstone, Barry, Anderson and Coyle1996; Ponton, Eggermont, Kwong & Don, Reference Ponton, Eggermont, Kwong and Don2000). Based on the mixed findings, it is far from enough to draw a complete picture of neural efficiency of pitch feedback control in late bilinguals. Future studies are needed to elucidate how L1 and L2 modulate P1-N1-P2 latencies and the underlying mechanism driving such language-related changes.

4.3 Similarities in pitch feedback control for L1 and L2 production

Despite the above-mentioned differences, the present study also provides evidence for some similarities between L1 and L2 pitch feedback control, allowing for the possibility that late L2 learners are still capable of developing near native-like control of speech production.

First, late bilinguals exhibited similar patterns of vocal compensation approximately 80–150 ms post-perturbation onset in L1 and L2 production. We conducted the first cross-language examination of pitch feedback control using linguistic materials from late bilinguals’ L1-Chinese and L2-English. Our finding was consistent with previous studies showing that speakers’ compensation latencies (100–150 ms) to pitch-shifted feedback during the production of Mandarin (Xu et al., Reference Xu, Larson, Bauer and Hain2004) and English (Chen et al., Reference Chen, Liu, Xu and Larson2007). Thus, the current study supports and extends previous findings to late bilinguals, suggesting that pitch feedback control incorporates an automatic mechanism to stabilize voice pitch for successful communication in both languages (Burnett et al., Reference Burnett, Senner and Larson1997, Reference Burnett, Freedland, Larson and Hain1998; Jones & Munhall, Reference Jones and Munhall2002).

Second, we observed similar patterns for how perturbation magnitudes affect late bilinguals’ vocal compensation and response latencies in L1 and L2 production. Unlike response latencies, which displayed a linear decrease as perturbation magnitudes increased, changes in vocal compensation were more complex. Smaller perturbations (100 and 200 cents) elicited larger magnitudes of compensatory responses, whereas larger perturbations (400 cents) elicited relatively smaller magnitude of compensatory responses. These observations were consistent with Burnett et al.'s (Reference Burnett, Freedland, Larson and Hain1998) findings that smaller perturbations (under 150 cents) elicited larger compensation, but larger perturbations (above 200 cents) elicited smaller compensation (see also Chen et al., Reference Chen, Liu, Wang, Larson, Huang and Liu2012; Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a, Reference Scheerer, Liu and Jones2013b). Previous research has suggested that small magnitudes (≤ 200 cents) were perceived as pitch errors in self-vocalization, but large magnitudes (400 cents) were perceived as externally generated pitch shifts (Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a, Reference Scheerer, Liu and Jones2013b). Taken together, these results reveal that perception of voice feedback as one's own voice (i.e., recognition of SoA) modulates sensorimotor processing (Korzyukov et al., Reference Korzyukov, Bronder, Lee, Patel and Larson2017). Thus, pitch feedback control is optimal for correcting small feedback errors when speakers still experience SoA, and a gating mechanism exists to prevent speakers from being excessively affected by external feedback and produce unnecessary motor adjustments (Liu & Larson, Reference Liu and Larson2007; Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a, Reference Scheerer, Liu and Jones2013b; Scheerer & Jones, Reference Scheerer and Jones2012). However, researchers have not yet reached a consensus on the perturbation magnitude threshold beyond which vocal compensation begins to decrease. Thus, future studies could address this by including more systematic and delicate manipulations of FAF levels.

Third, we observed similar patterns in how perturbation magnitudes affect late bilinguals’ P2 responses in L1 and L2 production. Smaller perturbation magnitudes (100 and 200 cents) induced greater P2 amplitudes compared to baseline, while larger perturbation magnitudes (400 cents) failed to induce greater P2 amplitudes relative to baseline. Previous findings regarding how perturbation magnitudes affect P2 amplitudes in FAF paradigms remain inconsistent (Behroozmand et al., Reference Behroozmand, Karvelis, Liu and Larson2009; Liu et al., Reference Liu, Meshman, Behroozmand and Larson2011; Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a). For example, Liu et al. (Reference Liu, Meshman, Behroozmand and Larson2011) reported graded increases in P2 amplitudes for 100-, 200-, and 500-cent conditions in English native speakers (see also Behroozmand et al., Reference Behroozmand, Karvelis, Liu and Larson2009). However, Chen et al. (Reference Chen, Liu, Wang, Larson, Huang and Liu2012) showed different findings between Mandarin and Cantonese speakers, which they interpreted as a result of the pitch-height dimensions being differently weighted. For Mandarin speakers, no significant effect of perturbation magnitudes was observed; however, Cantonese speakers produced systematic changes in P2 amplitudes as perturbation magnitudes increased from -50 to -500 cents. Our findings were consistent with those of Scheerer et al. (Reference Scheerer, Behich, Liu and Jones2013a) showing that P2 amplitudes did not increase linearly with increasing FAF levels, and began to decrease approximately 200–250 cents, in native Canadian English speakers. However, it remains unclear why perturbation magnitudes modulated P2 amplitudes differently across these studies.

As an important complement to gating mechanism, our study detected that vocal compensation also began to decrease under large feedback perturbation conditions (see also Scheerer et al., Reference Scheerer, Behich, Liu and Jones2013a). This finding has two important theoretical implications. First, it suggests that the P2 component reflects the computation of feedback-based corrective commands, resulting in similar modulations of perturbation magnitudes on vocal compensation and P2 amplitudes. Second, it suggests that human sensorimotor integration provides neural mechanisms for correcting one's voice in response to small perturbations recognized as self-produced (i.e., with SoA), and for stabilizing one's voice against large perturbations recognized as external environmental sound (i.e., without SoA; Korzyukov et al., Reference Korzyukov, Bronder, Lee, Patel and Larson2017).

One limitation emerging from this study and many previous studies is whether the relatively invariable vowel manipulation could be generalized to all vowels. Alemi, Lehmann, and Deroche (Reference Alemi, Lehmann and Deroche2020) manipulated more various vowels (i.e., /a/, /e/, /o/) during pitch feedback control and observed an automatic mechanism of stabilizing voice pitch. Future studies could test this observation by adding variety to the vowels used in FAF paradigms.

4.4 Effects of bilingualism and tonal language background

Finally, the present study allowed us to examine bilingual speakers of tonal and non-tonal languages (e.g., Chinese–English bilinguals), for whom the influence of language transfer makes pitch processing especially interesting. Intuitively, bilinguals are believed to potentially be equally proficient in L1 and L2 pitch feedback control, since they have sufficiently practiced the skill of incorporating pitch feedback for their L1. Further, the language transfer theory predicts a positive influence of L1 tonal language experience on L2 pitch processing, because pitch feedback control is shared between L1 and L2 production, and pitch saliency is more prominent in late bilinguals’ L1-Chinese than in their L2-English (Bialystok et al., Reference Bialystok, Majumder and Martin2003). Contrary to the intuition and language transfer theory's prediction, our study provided evidence that L2 production can elicit significantly larger vocal compensation in response to pitch-shifted feedback (i.e., relatively less stable L2 pitch feedback control). Thus, we tentatively conclude that no clear advantage of pitch feedback control in tonal L1-Chinese was found to be transferred to non-tonal L2-English, suggesting that pitch feedback control may require neuromuscular experience specific to a language or linguistic training.

Ning et al. (Reference Ning, Loucks and Shih2015) also reported indirect supporting evidence for the specificity of pitch processing, in that they showed that trained vocalists (with musical experience) outperformed naïve speakers (without tonal language background) when regulating voice F 0 in the nonlinguistic domain but not in the linguistic domain (Mandarin tone). Nevertheless, Giuliano et al. (Reference Giuliano, Pfordresher, Stanley, Narayana and Wicha2011) found that native use of tonal pitch contours in language generally enhances the acuity of pitch representations (linguistic and non-linguistic domains). Given the limited available research, it remains an open question whether enhanced pitch processing ability can be generalized to other linguistic or non-linguistic domains. In the context of bilingualism, because our study had a limitation of lacking a control group of native English speakers, we remain cautious regarding over-interpreting the findings concerning whether exposure to native tonal language may benefit pitch processing in non-tonal L2. Future studies could examine this question by adding a control group and comparing bilingual speakers of other tonal (e.g., Cantonese or Vietnamese) and non-tonal languages (e.g., German or Spanish).

Although the findings imply limited facilitative influence of L1-tonal experience on non-tonal L2's pitch processing, it is important to note one limitation of this study, which is that the effect of language status is inevitably mixed with that of tonal language background. In some research that compared pitch feedback control between two tonal languages (Chen et al., Reference Chen, Liu, Wang, Larson, Huang and Liu2012; Liu et al., Reference Liu, Wang, Chen, Liu, Larson and Huang2010b), varying pitch saliency in different tonal languages also resulted in different pitch feedback control abilities. Thus, to identify the role of language status in pitch feedback control in a purer sense, it would be helpful in future studies to compare the same non-tonal language in native and L2 speakers whose L1 is also non-tonal or compare late bilinguals whose L1 and L2 are both non-tonal, such as German–English bilinguals.

5. Conclusion

This study combined an FAF paradigm with electrophysiological measures and obtained interesting findings regarding the differences and similarities of pitch feedback control in late bilinguals’ L1 and L2 production. First, late bilinguals not only relied more heavily on pitch feedback control but also responded less efficiently to pitch deviations in L2 than in L1 production. Second, late bilinguals were unable to sufficiently form motor-based predictions, due to the lower stability of L2 sensorimotor representations. Third, there exists a similar gating mechanism in both L1 and L2 pitch feedback control, which helps speakers correct internally generated voice errors but remains less affected by externally generated feedback. Given the differences and similarities, our study suggests a less mature and developed L2 pitch feedback control for late bilinguals. Although more work is still needed to test these findings in different populations using improved methodologies, this study opens a potential new line of research into speech motor control in bilinguals.

Competing interests

The authors declare none.

Supplementary Material

For supplementary material accompanying this paper, visit https://doi.org/10.1017/S1366728923000019

Acknowledgements

This work was supported by the National Natural Science Foundation of China (grant No. 32171055), and the foundation of Humanities and Social Sciences, Ministry of Education of the People's Republic of China (grant No. 21YJA190011) to Qingfang Zhang. This work was supported by the Fundamental Research Funds for the Central Universities, and the Research Funds of Renmin University of China (grant No. 22XNF043) to Xiao Cai. This work was also supported by the Research Seed Funds of School of Interdisciplinary Studies, Renmin University of China.

References

Abutalebi, J, Cappa, SF and Perani, D (2001) The bilingual brain as revealed by functional neuroimaging. Bilingualism: Language and Cognition 4, 179190.CrossRefGoogle Scholar
Alemi, R, Lehmann, A and Deroche, ML (2020) Adaptation to pitch-altered feedback is independent of one's own voice pitch sensitivity. Scientific reports 10, 120.CrossRefGoogle ScholarPubMed
Ballard, KJ, Halaki, M, Sowman, PF, Kha, A, Daliri, A, Robin, D, Tourville, JA and Guenther, FH (2018) An investigation of compensation and adaptation to auditory perturbations in individuals with acquired apraxia of speech. Frontiers in Human Neuroscience 12, 510.CrossRefGoogle ScholarPubMed
Behroozmand, R, Ibrahim, N, Korzyukov, O and Robin, DA (2015) Left-hemisphere activation is associated with enhanced vocal pitch error detection in musicians with absolute pitch. Brain, & Cognition 84, 97108.CrossRefGoogle Scholar
Behroozmand, R, Karvelis, L, Liu, H and Larson, CR (2009) Vocalization-induced enhancement of the auditory cortex responsiveness during voice F0 feedback perturbation. Clinical Neurophysiology 120, 13031312.CrossRefGoogle ScholarPubMed
Behroozmand, R, Korzyukov, O, Sattler, L and Larson, CR (2012) Opposing and following vocal responses to pitch-shifted auditory feedback: Evidence for different mechanisms of voice pitch control. The Journal of the Acoustical Society of America 132, 24682477.CrossRefGoogle ScholarPubMed
Behroozmand, R and Larson, C (2011) Error-dependent modulation of speech-induced auditory suppression for pitch-shifted voice feedback. BMC Neuroscience 12, 5463.CrossRefGoogle ScholarPubMed
Behroozmand, R, Liu, H and Larson, CR (2011) Time-dependent neural processing of auditory feedback during voice pitch error detection. Journal of cognitive neuroscience 23, 12051217.CrossRefGoogle ScholarPubMed
Behroozmand, R, Sangtian, S, Korzyukov, O and Larson, CR (2016) A temporal predictive code for voice motor control: Evidence from ERP and behavioral responses to pitch-shifted auditory feedback. Brain research 1636, 112.CrossRefGoogle ScholarPubMed
Bialystok, E, Majumder, S and Martin, MM (2003) Developing phonological awareness: Is there a bilingual advantage? Applied Psycholinguistics 24(1), 2744.CrossRefGoogle Scholar
Birdsong, D and Molis, M (2001) On the evidence for maturational constraints in second-language acquisition. Journal of Memory and Language 44, 235249.CrossRefGoogle Scholar
Boersma, P and Weenink, D (2013) Praat: Doing Phonetics by Computer [Computer Program]. http://www.praat.org.Google Scholar
Boudewyn, MA, Luck, SJ, Farrens, JL and Kappenman, ES (2018) How many trials does it take to get a significant ERP effect? It depends. Psychophysiology 55(6), e13049.CrossRefGoogle ScholarPubMed
Brainard, DH and Vision, S (1997) The psychophysics toolbox. Spatial vision 10, 433436.CrossRefGoogle ScholarPubMed
Burnett, TA, Freedland, MB, Larson, CR and Hain, TC (1998) Voice F0 responses to manipulations in pitch feedback. The Journal of the Acoustical Society of America 103, 31533161.CrossRefGoogle ScholarPubMed
Burnett, TA, Senner, JE and Larson, CR (1997) Voice F0 responses to pitch-shifted auditory feedback: A preliminary study. Journal of Voice 11, 202211.CrossRefGoogle ScholarPubMed
Bylund, E, Hyltenstam, K and Abrahamsson, N (2021) Age of acquisition – not bilingualism – is the primary determinant of less than nativelike L2 ultimate attainment. Bilingualism: Language and Cognition 24, 1830.CrossRefGoogle Scholar
Cai, S, Beal, DS, Ghosh, SS, Tiede, MK, Guenther, FH and Perkell, JS (2012) Weak responses to auditory feedback perturbation during articulation in persons who stutter: Evidence for abnormal auditory-motor transformation. PLoS ONE 7(7), e41830.CrossRefGoogle ScholarPubMed
Cai, S, Boucek, M, Ghosh, SS, Guenther, FH and Perkell, JS (2008) A system for online dynamic perturbation of formant trajectories and results from perturbations of the mandarin triphthong /iau/. Paper Presented at the 8th International Seminar on Speech Production, ISSP 2008, Strassbourg.Google Scholar
Cai, X, Yin, Y and Zhang, Q (2020) A cross-language study on feedforward and feedback control of voice intensity in Chinese–English bilinguals. Applied Psycholinguistics 4(4), 771795.CrossRefGoogle Scholar
Cai, X, Yin, Y and Zhang, Q (2021) Online control of voice intensity in late bilinguals’ first and second language speech production: Evidence from unexpected and brief noise masking. Journal of Speech, Language, and Hearing Research 64, 14711489.CrossRefGoogle ScholarPubMed
Chakraborty, R (2011) Influence of L2 proficiency on speech movement variability: Production of prosodic contrasts by Bengali-English speakers. Bilingualism: Language and Cognition 14, 489505.CrossRefGoogle Scholar
Chandrasekaran, B, Krishnan, A and Gandour, JT (2009) Sensory processing of linguistic pitch as reflected by the mismatch negativity. Ear and hearing 30, 552558.CrossRefGoogle ScholarPubMed
Chang, EF, Niziolek, CA, Knight, RT, Nagarajan, SS and Houde, JF (2013) Human cortical sensorimotor network underlying feedback control of vocal pitch. Proceedings of the National Academy of Sciences of the United States of America 110, 26532658.CrossRefGoogle ScholarPubMed
Chen, SH, Liu, H, Xu, Y and Larson, CR (2007) Voice F0 responses to pitch-shifted voice feedback during English speech. The Journal of the Acoustical Society of America 121, 11571163.CrossRefGoogle ScholarPubMed
Chen, Z, Liu, P, Wang, EQ, Larson, CR, Huang, D and Liu, H (2012) ERP correlates of language-specific processing of auditory pitch feedback during self-vocalization. Brain, & Language 121, 2534.CrossRefGoogle ScholarPubMed
Chen, Z, Wong, FCK, Jones, JA, Li, W, Liu, P, Chen, X and Liu, H (2015) Transfer effect of speech-sound learning on auditory-motor processing of perceived vocal pitch errors. Scientific Reports 5, 13134.CrossRefGoogle ScholarPubMed
Civier, O, Tasko, SM and Guenther, FH (2010) Overreliance on auditory feedback may lead to sound/syllable repetitions: simulations of stuttering and fluency-inducing conditions with a neural model of speech production. Journal of Fluency Disorders 35, 246279.CrossRefGoogle ScholarPubMed
Cooper, A and Wang, Y (2012) The influence of linguistic and musical experience on Cantonese word learning. Journal of the Acoustical Society of America 131(6), 47564769.CrossRefGoogle ScholarPubMed
Coughler, C, de Launay, KLQ, Purcell, DW, Cardy, JO and Beal, DS (2022) Pediatric Responses to Fundamental and Formant Frequency Altered Auditory Feedback: A Scoping Review. Frontiers in Human Neuroscience 16, 858863.CrossRefGoogle ScholarPubMed
DeKeyser, RM (2013) Age effects in second language learning: Stepping stones toward better understanding. Language Learning 63(s1), 5267.CrossRefGoogle Scholar
Delorme, A and Makeig, S (2004) EEGLAB: An open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. Journal of Neuroscience Methods 134, 921.CrossRefGoogle ScholarPubMed
Demopoulos, C, Kothare, H, Mizuiri, D, Henderson-Sabes, J, Fregeau, B, Tjernagel, J, Houde, JF, Sherr, EH and Nagarajan, SS (2018) Abnormal speech motor control in individuals with 16p11. 2 deletions. Scientific reports 8(1), 110.Google ScholarPubMed
Elman, JL (1981) Effects of frequency-shifted feedback on the pitch of vocal productions. The Journal of the Acoustical Society of America 70, 4550.CrossRefGoogle ScholarPubMed
Flinker, A, Chang, EF, Kirsch, HE, Barbaro, NM, Crone, NE and Knight, RT (2010) Single-trial speech suppression of auditory cortex activity in humans. Journal of Neuroscience 30(49), 1664316650.CrossRefGoogle ScholarPubMed
Giuliano, RJ, Pfordresher, PQ, Stanley, EM, Narayana, S and Wicha, NY (2011) Native experience with a tone language enhances pitch discrimination and the timing of neural responses to pitch change. Frontiers in Psychology 2, 146.CrossRefGoogle ScholarPubMed
Guenther, F (2016) Neural control of speech. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Guenther, F and Vladusich, T (2012) A neural theory of speech acquisition and production. Journal of Neurolinguistics 25, 408422.CrossRefGoogle ScholarPubMed
Hawco, CS, Jones, JA, Ferretti, TR and Keough, D (2009) ERP correlates of online monitoring of auditory feedback during vocalization. Psychophysiology 46(6), 12161225.CrossRefGoogle ScholarPubMed
Heinks-Maldonado, TH, Mathalon, DH, Gray, M and Ford, JM (2005) Fine-tuning of auditory cortex during speech production. Psychophysiology 42(2), 180190.CrossRefGoogle ScholarPubMed
Hickok, G (2012) Computational neuroanatomy of speech production. Nature Reviews Neuroscience 13, 135145.CrossRefGoogle ScholarPubMed
Hickok, G, Buchsbaum, B, Humphries, C and Muftuler, T (2003) Auditory-motor interaction revealed by fMRI: speech, music, and working memory in area Spt. Journal of Cognition and Neuroscience 15, 673682.CrossRefGoogle ScholarPubMed
Hickok, G, Houde, J and Rong, F (2011) Sensorimotor integration in speech processing: Computational basis and neural organization. Neuron 69(3), 407422.CrossRefGoogle ScholarPubMed
Houde, JF and Nagarajan, SS (2011) Speech production as state feedback control. Frontiers in Human Neuroscience 5, 82.CrossRefGoogle ScholarPubMed
Houde, JF, Kort, NS, Niziolek, CA, Chang, EF and Nagarajan, SS (2013) Neural evidence for state feedback control of speaking. In Proceedings of Meetings on Acoustics ICA2013 (Vol. 19, No. 1, p. 060178). Acoustical Society of America.CrossRefGoogle Scholar
Johnstone, SJ, Barry, RJ, Anderson, JW and Coyle, SF (1996) Age-related changes in child and adolescent event-related potential component morphology, amplitude and latency to standard and target stimuli in an auditory oddball task. International Journal of Psychophysiology 24(3), 223238.CrossRefGoogle Scholar
Jones, JA and Munhall, KG (2002) The role of auditory feedback during phonation: Studies of Mandarin tone production. Journal of Phonetics 30, 303320.CrossRefGoogle Scholar
Jones, JA, Scheerer, N and Tumber, A (2013) The relationship between vocal pitch feedback error and event-related brain potentials. In Proceedings of Meetings on Acoustics, Vol. 19, 060151.CrossRefGoogle Scholar
Kearney, E and Guenther, FH (2019) Articulating: The neural mechanisms of speech production. Language, Cognition and Neuroscience 34, 12141229.CrossRefGoogle Scholar
Korzyukov, O, Bronder, A, Lee, Y, Patel, S and Larson, CR (2017) Bioelectrical brain effects of one's own voice identification in pitch of voice auditory feedback. Neuropsychologia 101, 106114.CrossRefGoogle ScholarPubMed
Korzyukov, O, Karvelis, L, Behroozmand, R and Larson, CR (2012a) ERP correlates of auditory processing during automatic correction of unexpected perturbations in voice auditory feedback. International Journal of Psychophysiology 83, 7178.CrossRefGoogle ScholarPubMed
Korzyukov, O, Sattler, L, Behroozmand, R and Larson, CR (2012b) Neuronal mechanisms of voice control are affected by implicit expectancy of externally triggered perturbations in auditory feedback. PLoS One 7, e41216.CrossRefGoogle ScholarPubMed
Kraus, N, McGee, T, Micco, A, Sharma, A, Carrell, T and Nicol, T (1993) Mismatch negativity in school-age children to speech stimuli that are just perceptibly different. Electroencephalography and Clinical Neurophysiology 88(2), 123130.CrossRefGoogle ScholarPubMed
Kuhl, PK (2004) Early language acquisition: Cracking the speech code. Nature Reviews Neuroscience 5(11), 831843.CrossRefGoogle ScholarPubMed
Lametti, DR, Nasir, SM and Ostry, DJ (2012) Sensory preference in speech production revealed by simultaneous alternation of auditory and somatosensory feedback. Journal of Neuroscience 32(27), 93519358.CrossRefGoogle Scholar
Larson, CR (1998) Cross-modality influences in speech motor control: The use of pitch shifting for the study of F0 control. Journal of Communication Disorders 31, 489503.CrossRefGoogle Scholar
Larson, CR, Kempster, GB and Kistler, MK (1987) Changes in voice fundamental frequency following discharge of single motor units in cricothyroid and thyroarytenoid muscles. Journal of Speech, & Hearing Research 30, 552558.CrossRefGoogle ScholarPubMed
Lenneberg, EH (1967) The biological foundations of language. New York: Wiley.CrossRefGoogle Scholar
Liu, H and Larson, CR (2007) Effects of perturbation magnitude and voice F 0 level on the pitch-shift reflex. The Journal of the Acoustical Society of America 122(6), 36713677.CrossRefGoogle ScholarPubMed
Liu, H, Meshman, M, Behroozmand, R and Larson, CR (2011) Differential effects of perturbation direction and magnitude on the neural processing of voice pitch feedback. Clinical Neurophysiology 122, 951957.CrossRefGoogle ScholarPubMed
Liu, H, Russo, N and Larson, CR (2010a) Age-related differences in vocal responses to pitch feedback perturbations: A preliminary study. The Journal of the Acoustical Society of America 127, 10421046.CrossRefGoogle ScholarPubMed
Liu, H, Wang, EQ, Chen, Z, Liu, P, Larson, CR and Huang, D (2010b) Effect of tonal native language on voice fundamental frequency responses to pitch feedback perturbations during vocalization. The Journal of the Acoustical Society of America 128, 37393746.CrossRefGoogle Scholar
Liu, H, Zhang, Q, Xu, Y and Larson, CR (2007) Compensatory responses to loudness-shifted voice feedback during production of Mandarin speech. The Journal of the Acoustical Society of America 122(4), 24052412.CrossRefGoogle ScholarPubMed
Liu, P, Chen, Z, Jones, JA, Wang, EQ, Chen, S, Huang, D and Liu, H (2013) Developmental sex-specific change in auditory–vocal integration: ERP evidence in children. Clinical Neurophysiology 124(3), 503513.CrossRefGoogle ScholarPubMed
Liu, X and Tian, X (2018) The functional relations among motor-based prediction, sensory goals and feedback in learning non-native speech sounds: Evidence from adult Mandarin Chinese speakers with an auditory feedback masking paradigm. Scientific Reports 8, 11910.CrossRefGoogle ScholarPubMed
MATLAB and Statistics Toolbox Release (2014b) The MathWorks, Inc., Natick, Massachusetts, United States.Google Scholar
Mitsuya, T, MacDonald, EN, Purcell, DW and Munhall, KG (2011) A cross-language study of compensation in response to real-time formant perturbation. The Journal of the Acoustical Society of America 130(5), 29782986.CrossRefGoogle ScholarPubMed
Moore, JW (2016) What is the sense of agency and why does it matter? Frontiers in Psychology 7, 1272.CrossRefGoogle ScholarPubMed
Murray, ESH and Stepp, CE (2020) Relationships between vocal pitch perception and production: A developmental view. Scientific Reports 10, 3912.CrossRefGoogle Scholar
Ning, LH, Loucks, TM and Shih, C (2015) The effects of language learning and vocal training on sensorimotor control of lexical tone. Journal of Phonetics 51, 5069.CrossRefGoogle Scholar
Ning, LH, Shih, C and Loucks, TM (2014) Mandarin tone learning in L2 adults: A test of perceptual and sensorimotor contributions. Speech Communication 63, 5569.CrossRefGoogle Scholar
Oades, RD, Dmittmann-Balcar, A and Zerbin, D (1997) Development and topography of auditory event-related potentials (ERPs): Mismatch and processing negativity in individuals 8–22 years of age. Psychophysiology 34(6), 677693.CrossRefGoogle ScholarPubMed
Parker Jones, Ō., Green, DW, Grogan, A, Pliatsikas, C, Filippopolitis, K, Ali, N, Lee, HL, Ramsden, S, Gazarian, K, Prejawa, S, Seghier, ML and Price, CJ (2012) Where, when and why brain activation differs for bilinguals and monolinguals during picture naming and reading aloud. Cerebral Cortex 22(4), 892902.CrossRefGoogle ScholarPubMed
Parrell, B and Houde, J (2019) Modeling the role of sensory feedback in speech motor control and learning. Journal of Speech, Language, and Hearing Research 62, 29632985.CrossRefGoogle ScholarPubMed
Perkell, J (2012) Movement goals and feedback and feedforward control mechanisms in speech production. Journal of Neurolinguistics 25, 382407.CrossRefGoogle ScholarPubMed
Perkell, J, Matthies, M, Lane, H, Guenther, F, Wilhelms-Tricarico, R, Wozniak, J and Guiod, P (1997) Speech motor control: Acoustic goals, saturation effects, auditory feedback and internal models. Speech communication 22(2–3), 227250.CrossRefGoogle Scholar
Perlman, AL and Alipour-Haghighi, F (1988) Comparative study of the physiological properties of the vocalis and cricothyroid muscles. Acta Oto-Laryngologica 105, 372378.CrossRefGoogle ScholarPubMed
Ponton, CW, Eggermont, JJ, Kwong, B and Don, M (2000) Maturation of human central auditory system activity: evidence from multi-channel evoked potentials. Clinical Neurophysiology 111(2), 220236.CrossRefGoogle ScholarPubMed
Purcell, DW and Munhall, KG (2006) Compensation following real-time manipulation of formants in isolated vowels. The Journal of the Acoustical Society of America 119, 22882297.CrossRefGoogle ScholarPubMed
Reiterer, SM, Hu, X, Erb, M, Rota, G, Nardo, D, Grodd, W, Winkler, S and Ackermann, H (2011) Individual differences in audio-vocal speech imitation aptitude in late bilinguals: Functional neuro-imaging and brain morphology. Frontiers in Psychology 2, 271.CrossRefGoogle ScholarPubMed
Saito, K, Sun, H and Tierney, A (2020) Domain-general auditory processing determines success in second language pronunciation learning in adulthood: A longitudinal study. Applied Psycholinguistics 41(5), 10831112.CrossRefGoogle Scholar
Scheerer, NE, Behich, J, Liu, H and Jones, JA (2013a) ERP correlates of the magnitude of pitch errors detected in the human voice. Neuroscience 240, 176185.CrossRefGoogle ScholarPubMed
Scheerer, NE, Jacobson, DS and Jones, JA (2020a) Sensorimotor control of vocal production in early childhood. Journal of Experimental Psychology: General 149(6), 1071.CrossRefGoogle ScholarPubMed
Scheerer, NE and Jones, JA (2012) The relationship between vocal accuracy and variability to the level of compensation to altered auditory feedback. Neuroscience Letter 529(2), 128132.CrossRefGoogle Scholar
Scheerer, NE and Jones, JA (2014) The predictability of frequency-altered auditory feedback changes the weighting of feedback and feedforward input for speech motor control. European Journal of Neuroscience 40, 37933806.CrossRefGoogle ScholarPubMed
Scheerer, NE and Jones, JA (2018) Detecting our own vocal errors: An event-related study of the thresholds for perceiving and compensating for vocal pitch errors. Neuropsychologia 114, 158167.CrossRefGoogle ScholarPubMed
Scheerer, NE, Jones, JA and Iarocci, G (2020b) Exploring the relationship between prosodic control and social competence in children with and without autism spectrum disorder. Autism Research 13(11), 18801892.CrossRefGoogle ScholarPubMed
Scheerer, NE, Liu, H and Jones, JA (2013b) The developmental trajectory of vocal and event-related potential responses to frequency-altered auditory feedback. European Journal of Neuroscience 38, 31893200.CrossRefGoogle ScholarPubMed
Simmonds, AJ, Wise, RJ, Dhanjal, NS and Leech, R (2011a) A comparison of sensory-motor activity during speech in first and second languages. Journal of Neurophysiology 106, 470478.CrossRefGoogle ScholarPubMed
Simmonds, AJ, Wise, RJ and Leech, R (2011b) Two tongues, one brain: Imaging bilingual speech production. Frontiers in Psychology 2, 166.CrossRefGoogle ScholarPubMed
Sitek, KR, Mathalon, DH, Roach, BJ, Houde, JF, Niziolek, CA and Ford, JM (2013) Auditory cortex processes variation in our own speech. PLoS One 8, e82925.CrossRefGoogle ScholarPubMed
Tourville, JA and Guenther, FH (2011) The DIVA model: A neural theory of speech acquisition and production. Language and Cognitive Processes 26, 952981.CrossRefGoogle Scholar
Tourville, JA, Reilly, KJ and Guenther, FH (2008) Neural mechanisms underlying auditory feedback control of speech. NeuroImage 39, 14291443.CrossRefGoogle ScholarPubMed
Treutler, M and Sörös, P (2021) Functional MRI of native and non-native speech sound production in sequential German-English bilinguals. Frontiers in Human Neuroscience 15, 683277.CrossRefGoogle ScholarPubMed
Van Hell., JG and Tanner, D (2012) Second language proficiency and cross-language lexical activation. Language Learning 62, 148171.CrossRefGoogle Scholar
Wang, H and van Heuven, VJ (2006) Acoustical analysis of English vowels produced by Chinese, Dutch and American speakers. Linguistics in the Netherlands 23, 237248.CrossRefGoogle Scholar
Xu, Y (1999) Effects of tone and focus on the formation and alignment of F0 contours. Journal of Phonetics 27, 55105.CrossRefGoogle Scholar
Xu, Y, Larson, CR, Bauer, JJ and Hain, TC (2004) Compensation for pitch-shifted auditory feedback during the production of Mandarin tone sequences. The Journal of the Acoustical Society of America 116, 11681178.CrossRefGoogle ScholarPubMed
Xu, Y and Xu, CX (2005) Phonetic realization of focus in English declarative intonation. Journal of Phonetics 33(2), 159197.CrossRefGoogle Scholar
Yates, AJ (1963) Delayed auditory feedback. Psychological Bulletin 60, 213232.CrossRefGoogle ScholarPubMed
Yekutieli, D and Benjamini, Y (1999) Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. Journal of Statistical Planning and Inference 82, 171196.CrossRefGoogle Scholar
Zhang, Z (2016) Mechanics of human voice production and control. The Journal of the Acoustical Society of America 140(4), 26142635.CrossRefGoogle ScholarPubMed
Zhu, X, Damian, MF and Zhang, Q (2015) Seriality of semantic and phonological processes during overt speech in Mandarin as revealed by event-related brain potentials. Brain and language 144, 1625.CrossRefGoogle ScholarPubMed
Figure 0

Fig. 1. Examples of experimental paradigm.

Figure 1

Fig. 2. Grand-averaged difference waves reflecting pitch changes from baseline trials in response to 100-cent, 200-cent, and 400-cent perturbations in L1-Chinese (A) and L2-English (B) production.

Figure 2

Fig. 3. Behavioral results of response magnitude analysis. (A) Violin plots of the absolute values of response magnitudes for L1-Chinese and L2-English conditions. (B) Violin plots of the absolute values of response magnitudes for 100-, 200-, and 400-cent conditions. (C) Violin plots of the absolute values of response magnitudes in the 100-, 200-, and 400-cent conditions as a function of language. (D) Violin plots of the absolute values of response magnitudes in L1-Chinese and L2-English conditions as a function of perturbation magnitudes. Full dots represent individual data points. White circles represent means. Error bars represent ± SEM. The lines' two ends represent individual participants.

Figure 3

Fig. 4. Behavioral results of the peak latency analysis. (A) Violin plots of peak latencies for L1-Chinese and L2-English conditions. (B) Violin plots of peak latencies for 100-, 200-, and 400-cent conditions. (C) Violin plots of peak latencies in the 100-, 200-, and 400-cent conditions as a function of language. (D) Violin plots of peak latencies in L1-Chinese and L2-English conditions as a function of perturbation magnitudes. Full dots represent individual data points. White circles represent means. Error bars represent ± SEM. The lines' two ends represent individual participants.

Figure 4

Fig. 5. Grand-averaged ERP waveforms as a function of perturbation magnitudes at the left, medial, and right electrodes in L1-Chinese (upper panel) and L2-English (lower panel).

Figure 5

Fig. 6. (A) Topographical distributions of P1, N1, and P2 components as a function of perturbation magnitudes in L1-Chinese (upper panel) and L2-English (lower panel). Comparisons between 0- and 400-cent conditions for L1-Chinese and L2-English production in the representative left region for P1 amplitudes (B), N1 amplitudes (C), and P2 amplitudes (D), respectively.

Figure 6

Table 1. Summary of the omnibus ANOVA results of behavioral and ERP analyses including the factors of language and perturbation magnitudes (upper panel), and planned pairwise comparison results of 0 vs. 400-cent conditions in L1 and L2 production (lower panel).

Supplementary material: File

Cai et al. supplementary material

Cai et al. supplementary material

Download Cai et al. supplementary material(File)
File 18.9 KB