Introduction
Every language has words that are imbued with emotionality. The family of affective words includes (1) emotion words, which describe specific affective states (e.g., happy, sad) or processes (e.g., worry, rage), (2) emotion-laden words (e.g., cancer, gift), (3) taboos (e.g., rape), (4) swear words and expletives (e.g., the f-word), and (5) reprimands (e.g., shame on you), endearments (e.g., darling), and interjections (e.g., ouch). Affective words can be characterized by the valence (e.g., positive, neutral, negative) and intensity (e.g., high, low) of the emotionality the word is associated with. The group of emotion words is further classified by the taxonomy of human emotions: albeit a still contentious subject in psychological research (see Ekman, Reference Ekman1992; John, Reference John1988; Oatley & Johnson-Laird, Reference Oatley and Johnson-Laird1987; Plutchik, Reference Plutchik1980; Strauss & Allen, Reference Strauss and Allen2008; Turner, Reference Turner2000), most researchers agree on a set of basic human emotions such as happiness, sadness, anger, fear, and surprise, with more complex emotions viewed as a blend of two or more basic emotions.
The emotionality of affective words is deeply embedded. Compared to words with no emotionality, affective words are known to elicit higher levels of arousal when encountered visually or auditorily. The most direct evidence comes from physiological and neural studies: higher levels of skin conductance responses and pupil dilation, as well as distinct neural patterns, are recorded when participants are exposed to emotionally-charged words and phrases compared to neutral ones (Bowers & Pleydell-Pearce, Reference Bowers and Pleydell-Pearce2011; Caldwell-Harris, Reference Caldwell-Harris2014; Chen, Lin, Chen, Lu, & Guo, Reference Chen, Lin, Chen, Lu and Guo2015; Eilola & Havelka, Reference Eilola and Havelka2011; Harris, Ayçiçegi, & Gleason, Reference Harris, Ayçiçegi and Gleason2003; Hsu, Jacobs, & Conrad, Reference Hsu, Jacobs and Conrad2015; Iacozza, Costa, & Duñabeitia, Reference Iacozza, Costa and Duñabeitia2017; Kensinger & Corkin, Reference Kensinger and Corkin2004; Kissler & Strehlow, Reference Kissler and Strehlow2017; Maddock, Garrett, & Buonocore, Reference Maddock, Garrett and Buonocore2003; Manning & Melchiori, Reference Manning and Melchiori1974; Maratos, Allan, & Rugg, Reference Maratos, Allan and Rugg2000; Mortier, Reference Mortier2013; Opitz & Degner, Reference Opitz and Degner2012; Partala & Surakka, Reference Partala and Surakka2003; Recio, Conrad, Hansen, & Jacobs, Reference Recio, Conrad, Hansen and Jacobs2014; Simcox, Pilotti, Mahamane, & Romero, Reference Simcox, Pilotti, Mahamane and Romero2012). Affective words are also distinguished from neutral words in cognitive processing, demonstrating better memory and recall performance (Anooshian & Hertel, Reference Anooshian and Hertel1994; Kensinger & Corkin, Reference Kensinger and Corkin2004; Maratos et al., Reference Maratos, Allan and Rugg2000; Talmi & Moscovitch, Reference Talmi and Moscovitch2004), faster responses in lexical decision and identification (Bayer, Sommer, & Schacht, Reference Bayer, Sommer and Schacht2011; Kazanas & Altarriba, Reference Kazanas and Altarriba2015; Recio et al., Reference Recio, Conrad, Hansen and Jacobs2014), stronger interference in emotional Stroop tasks (Winskel, Reference Winskel2013), and evidence of affective priming in lexical processing (Altarriba & Canary, Reference Altarriba and Canary2004). These effects are widely attributed to the automatic emotional arousal triggered by the presentation of affective words, which may either facilitate or inhibit lexical processing depending on the nature of the processing task (however, see Maratos et al., Reference Maratos, Allan and Rugg2000 and Talmi & Moscovitch, Reference Talmi and Moscovitch2004 for a discussion of possible confounding factors).
What happens if the language user is a bilingual who has access to two lexicons? A general consensus is that bilinguals respond more strongly to affective words in their first language (L1) than those in their second language (L2), a phenomenon often referred to as differential affective processing (for comprehensive reviews, see Caldwell-Harris, Reference Caldwell-Harris2014, Reference Caldwell-Harris2015; Pavlenko, Reference Pavlenko2012; Robinson & Altarriba, Reference Robinson and Altarriba2018). L1-L2 differences are observed when bilinguals are explicitly asked to rate the emotionality of affective words, yielding more extreme ratings of valence and higher ratings of intensity for L1 affective words than the L2 counterparts (Dewaele, Reference Dewaele2004, Reference Dewaele2008; Garrido & Prada, Reference Garrido and Prada2018; Milanović, Reference Milanović2019). More convincing evidence comes from neural/physiological measures of arousal and behavioral measures of cognitive processing performance, as the emotionality effects discussed above often have a greater magnitude in the L1 than the L2 – if present at all. A widely-cited study by Harris et al. (Reference Harris, Ayçiçegi and Gleason2003) reported that Turkish–EnglishFootnote 1 bilinguals had heightened skin conductance responses to taboo words (e.g., asshole) and reprimands (Shame on you!) in Turkish (L1) compared to the equivalent expressions in English (L2). Altarriba and colleagues (Altarriba, Reference Altarriba2003; Altarriba & Canary, Reference Altarriba and Canary2004; Kazanas & Altarriba, Reference Kazanas and Altarriba2016; Sutton, Altarriba, Gianico, & Basnight-Brown, Reference Sutton, Altarriba, Gianico and Basnight-Brown2007) conducted a series of cognitive processing experiments (emotional Stroop task, masked lexical decision, affective priming) with Spanish–English bilinguals, and found greater emotionality effects with Spanish (L1) stimuli than with English (L2) stimuli. Similar evidence has been reported for a wide range of bilingual profiles (English–French: Segalowitz, Trofimovich, Gatbonton, & Sokolovskaya, Reference Segalowitz, Trofimovich, Gatbonton and Sokolovskaya2008; English–Spanish: Simcox et al., Reference Simcox, Pilotti, Mahamane and Romero2012; German–English: Hsu et al., Reference Hsu, Jacobs and Conrad2015; Finnish–English: Eilola, Havelka, & Sharma, Reference Eilola, Havelka and Sharma2007; Portuguese–English: Garrido & Prada, Reference Garrido and Prada2018; Chinese–English: Caldwell-Harris, Tong, Lung, & Poo, Reference Caldwell-Harris, Tong, Lung and Poo2011; Fan et al., Reference Fan, Xu, Wang, Zhang, Yang and Liu2016; Thai–English: Winskel, Reference Winskel2013; Catalan–Spanish: Ferré, García, Fraga, Sánchez-Casas, & Molero, Reference Ferré, García, Fraga, Sánchez-Casas and Molero2010; Ferré, Sánchez-Casas, & Fraga, Reference Ferré, Sánchez-Casas and Fraga2013; German–French: Degner, Doycheva, & Wentura, Reference Degner, Doycheva and Wentura2012), mostly by comparing the processing of L1 and L2 by the same individual and occasionally by comparing native and non-native speakers’ processing of the same language (e.g., Eilola & Havelka, Reference Eilola and Havelka2011).
The current study extends this line of research to Cantonese–Mandarin bilinguals. As discussed above, previous studies are highly focused on European languages, with at least one (usually English) and often two European languages in the language pair. When the two languages are typologically close (e.g., Catalan-Spanish), cognates – i.e., translation equivalents with similar orthographic and phonological forms – are eschewed, presumably to avoid mixing cognates and non-cognates in the stimuli and to prevent possible confusion about which language is being presented. To this end, the study of Cantonese–Mandarin bilinguals not only extends the research to non-European languages, but also provides a precious opportunity of examining orthographically identical but phonologically different affective cognates, which exist in large quantities between Cantonese and Mandarin, due to the high degree of linguistic and cultural continuity among Chinese languages. The overlap between the Cantonese and Mandarin lexicons allows us to tease apart critical factors that are largely confluent in previous research, such as the roles of lemma and lexeme and the influence of lexical and cultural factors. In the following, we will first review in more detail the mediating factors and underlying mechanisms of differential affective processing, before introducing the design of the current study.
Despite the wide presence of supporting evidence, multiple studies reported a lack of differential affective processing (Altarriba & Basnight-Brown, Reference Altarriba and Basnight-Brown2012; Ferré et al., Reference Ferré, Sánchez-Casas and Fraga2013; Min & Schirmer, Reference Min and Schirmer2011) – or even a reversed L1-L2 difference (i.e., L2 being more emotional than L1, see Marian & Kaushanskaya, Reference Marian and Kaushanskaya2008; Ong, Hussain, Chow, & Thompson, Reference Ong, Hussain, Chow and Thompson2017). These exceptions can be linked to the discussion of what factors modulate the presence (or absence) and magnitude of differential affective processing. There are at least three sources of modulating factors: (1) the bilingual's L2 proficiency and experience, (2) the properties of affective words, and (3) the nature of the task and measure. Overall, bilinguals with higher L2 proficiency (i.e., more balanced bilinguals), who typically have earlier age of acquisition and higher levels of immersion and usage frequency of the L2, are less likely to show differential affective processing (Ayçiçegi-Dinn & Caldwell-Harris, Reference Ayçiçegi-Dinn and Caldwell-Harris2009; Degner et al., Reference Degner, Doycheva and Wentura2012; Eilola et al., Reference Eilola, Havelka and Sharma2007; Ferré et al., Reference Ferré, García, Fraga, Sánchez-Casas and Molero2010, Reference Ferré, Sánchez-Casas and Fraga2013; Harris, Gleason, & Ayçiçeǧi, Reference Harris, Gleason and Ayçiçeǧi2006; Sutton et al., Reference Sutton, Altarriba, Gianico and Basnight-Brown2007). Within the group of affective words, taboo words and other negative words tend to elicit stronger emotionality effects than positive words, and so do emotion words compared to emotion-laden words. Correspondingly, these variations translate into varying degrees of L1-L2 differences, with larger L1-L2 differences observed for strongly negative words (compared to positive words) and emotion words (compared to emotion-laden words) (Altarriba & Basnight-Brown, Reference Altarriba and Basnight-Brown2012; Ayçiçegi-Dinn & Caldwell-Harris, Reference Ayçiçegi-Dinn and Caldwell-Harris2009; Eilola & Havelka, Reference Eilola and Havelka2011; Garrido & Prada, Reference Garrido and Prada2018; Kazanas & Altarriba, Reference Kazanas and Altarriba2016). To this end, some researchers claim that the valence of negative words is greatly subdued or suppressed in the L2, causing less emotional disruption to the bilingual speaker than the L1 counterparts would (Jończyk, Boutonnet, Musiał, Hoemann, & Thierry, Reference Jończyk, Boutonnet, Musiał, Hoemann and Thierry2016; Marian & Kaushanskaya, Reference Marian and Kaushanskaya2008; Wu & Thierry, Reference Wu and Thierry2012; see more discussion below). Finally, the experimental task (or measure) used to gauge the emotionality effect may also play a role. In general, physiological measures (skin conductance responses, pupil dilation) tend to be more sensitive than behavioral measures (e.g., reaction times) or subjective emotionality ratings. In fact, a number of studies reported L1-L2 differences only in physiological measures while no differences were found in behavioral measures or ratings (Caldwell-Harris et al., Reference Caldwell-Harris, Tong, Lung and Poo2011; Eilola & Havelka, Reference Eilola and Havelka2011; Iacozza et al., Reference Iacozza, Costa and Duñabeitia2017; Ponari et al., Reference Ponari, Rodríguez-Cuadrado, Vinson, Fox, Costa and Vigliocco2015; Segalowitz et al., Reference Segalowitz, Trofimovich, Gatbonton and Sokolovskaya2008).
The phenomenon of differential affective processing can be imputed to cross-language differences in terms of learning context, the formation of emotional memory, and the structure of the affective lexicon. For sequential bilinguals, the L1 is learned through immersive experiences since birth whereas the L2 is usually learned through classroom instruction with impoverished emotional context. As a result, bilinguals are much more likely to form emotional memories associated with the L1 than with the L2 (Harris et al., Reference Harris, Gleason and Ayçiçeǧi2006; Robinson & Altarriba, Reference Robinson and Altarriba2018). The importance of emotional embodiment is reflected in the special status of childhood reprimands and endearments, which often show the highest amount of L1-L2 difference among affective words, even for early bilinguals (i.e., heritage speakers, see Harris et al., Reference Harris, Gleason and Ayçiçeǧi2006). It has been argued that without being grounded in emotional context, L2 affective words are less likely to develop a connection with the corresponding emotional concepts; instead, they may only be connected to the L1 counterparts at the lexical level (Altarriba, Reference Altarriba2001, Reference Altarriba2003). This view is compatible with the revised hierarchical model of the bilingual lexicon (Kroll & de Groot, Reference Kroll, de Groot, de Groot and Kroll1997; Kroll & Stewart, Reference Kroll and Stewart1994), which separates conceptual representations (i.e., meaning) from lexical representations. The model assumes that novice L2 learners tend to acquire L2 words via lexical mediation by translation equivalents in the L1, and that direct conceptual links between meanings and L2 words will only be established as the learner obtains higher proficiency and more experience with the L2. Thus, the asymmetry between L1 and L2 conceptual links will give rise to differential affective processing, as L2 affective words have less direct/automatic access to the emotional concepts.
The L1-L2 difference goes beyond single word/phrase processing, as bilinguals also tend to be more rational and less emotional in higher-level decision-making, e.g., when faced with moral dilemmas or financial incentive calculation (i.e., the “foreign-language effect”; Costa et al., Reference Costa, Foucart, Corey, Keysar, Hayakawa, Botella and Aparici2017, Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014; Gao, Zika, Rogers, & Thierry, Reference Gao, Zika, Rogers and Thierry2015; Keysar, Hayakawa, & An, Reference Keysar, Hayakawa and An2012; Pavlenko, Reference Pavlenko2017; Stankovic, Biedermann, & Hamamura, Reference Stankovic, Biedermann and Hamamura2022; but see also Oganian, Korn, & Heekeren, Reference Oganian, Korn and Heekeren2016). The emotional distance manifested in L2 processing also explains why bilinguals are more at ease with describing harsh or embarrassing memories in the L2 (Altarriba & Santiago-Rivera, Reference Altarriba and Santiago-Rivera1994; Bond & Lai, Reference Bond and Lai1986; Dewaele & Pavlenko, Reference Dewaele and Pavlenko2002; Heredia & Altarriba, Reference Altarriba2001; Marian & Kaushanskaya, Reference Marian and Kaushanskaya2008; Wu & Thierry, Reference Wu and Thierry2012) or using affectionate language in the L2 if such practice is not encouraged in the native culture (Caldwell-Harris et al., Reference Caldwell-Harris, Tong, Lung and Poo2011).
A common criticism of the existing literature of bilingual affective word processing is the lack of attention to crosslinguistic cultural and lexical differences (Grosjean, Reference Grosjean1998; Pavlenko, Reference Pavlenko1999). The wide use of translation equivalents in previous studies assumes a crosslinguistic equivalence of both emotional concepts and affective expressions, although arguments for the opposite have been made repeatedly (Altarriba, Reference Altarriba2003; Pavlenko, Reference Pavlenko2008; Robinson & Altarriba, Reference Robinson and Altarriba2018). Anyone who has attempted to translate affective words and phrases from one language to another would agree that the notion of a precise, one-to-one mapping of translation equivalents is nonexistent in most cases. For example, the Chinese expression 幸災樂禍 ‘pleasure derived from someone else's misfortune’ is not readily available in English, although German has a similar expression schadenfreude. Let alone nuanced differences in intensity and valence between translation equivalents. Given such, how can we be sure that any observation of differential affective processing is due to how bilinguals process L1 and L2 words per se, but not underlying differences between the L1 and L2 words?
The current study intends to fill this gap by focusing on two Chinese languages, Hong Kong Cantonese (“Cantonese” or “CAN” hereafter) and Standard Mandarin (“Mandarin” or “MAN” hereafter), which share a highly overlapping lexicon. While Mandarin is often the lingua franca of a Chinese-speaking community, Cantonese is the most common native language of the population of Hong Kong and one of the major languages in the Canton province of Mainland China. As with other varieties of Chinese, most (but not all) of the words in Cantonese and Mandarin belong to the lexicon of Standard Written Chinese, with no distinction in orthography due to the logographic nature of the Chinese writing system but mutually unintelligible in terms of the spoken forms (although phonologically often partially overlapping).Footnote 2 Figure 1 shows the lexical model of a typical affective cognate shared between Cantonese and Mandarin: the lemma of 憤怒 ‘rage’, which contains the syntactic and semantic properties of the word, is shared by Cantonese and Mandarin, and the conceptual representation of rage is shared as well; however, at the lexeme level, the word has different phonological forms in the two languages.Footnote 3 For Cantonese–Mandarin bilinguals, most of the effort when acquiring the nonnative Mandarin lexicon is spent on learning the pronunciations of words that have already been acquired in native Cantonese.
The critical question we ask in this study is whether we still find evidence for differential affective processing, given that the Cantonese–Mandarin cognates share the lemmas and only differ in phonological forms. In other words, will Cantonese–Mandarin bilinguals show greater emotional arousal when they are exposed to the Cantonese pronunciation of an affective word compared to the Mandarin pronunciation of the same word? The measure we use to gauge emotional arousal is pupil dilation (i.e., enlargement of pupil size), which has long been associated with a multitude of physical, psychological, and environmental factors such as illuminance, fatigue, sexual and emotional arousal, listening and processing effort, etc. (Hartmann & Fischer, Reference Hartmann and Fischer2014; Knapen et al., Reference Knapen, De Gee, Brascamp, Nuiten, Hoppenbrouwers and Theeuwes2016; Koelewijn, Zekveld, Festen, & Kramer, Reference Koelewijn, Zekveld, Festen and Kramer2012; Ksiazek, Wendt, Alickovic, & Lunne, Reference Ksiazek, Wendt, Alickovic and Lunne2018; Partala & Surakka, Reference Partala and Surakka2003; Robison & Unsworth, Reference Robison and Unsworth2019; Tamási, McKean, Gafos, Fritzsche, & Höhle, Reference Tamási, McKean, Gafos, Fritzsche and Höhle2017; Tryon, Reference Tryon1975; Wagner, Nagels, Toffanin, Opie, & Başkent, Reference Wagner, Nagels, Toffanin, Opie and Başkent2019; Winn, Wendt, Koelewijn, & Kuchinsky, Reference Winn, Wendt, Koelewijn and Kuchinsky2018). Despite the general consensus that the pupil dilates in response to emotionally charged stimuli (Bradley, Miccoli, Escrig, & Lang, Reference Bradley, Miccoli, Escrig and Lang2008; Hess, Reference Hess1965; Janisse, Reference Janisse1974), mixed findings have been reported for emotionality effects on pupil dilation during word processing. Bayer et al. (Reference Bayer, Sommer and Schacht2011) actually found smaller – instead of larger – dilations for high-arousing words than low-arousing words; Võ et al. (Reference Võ, Jacobs, Kuchinke, Hofmann, Conrad, Schacht and Hutzler2008) reported smaller dilations for negative than for positive words, which is in line with Bayer and colleagues’ finding if we assume that negative words tend to be more arousing than positive words. On the other hand, Kuchinke, Võ, Hofmann, and Jacobs (Reference Kuchinke, Võ, Hofmann and Jacobs2007) reported null effects of word valence on pupil dilation. A potential reason that underlies the mixed findings is the mediation of processing effort. Previous studies have shown that pupil dilations may also reflect enhanced lexical processing effort, e.g., low-frequency words, which require more retrieval effort, tend to elicit higher pupil dilations than high-frequency words (Kuchinke et al., Reference Kuchinke, Võ, Hofmann and Jacobs2007; Schmidtke, Reference Schmidtke2014). As mentioned above, words with high emotionality are associated with facilitated cognitive processing, evidenced by faster responses in lexical decision and identification, and this facilitative effect may lead to the reduction or cancellation of the anticipated emotionality effects on pupil dilation. Despite the potential interference of processing effort, a more recent study by Iacozza et al. (Reference Iacozza, Costa and Duñabeitia2017) demonstrated that pupil dilation is sensitive enough to differentiate emotional arousal during L1 vs. L2 processing: greater pupillary responses were recorded when participants read out aloud emotional texts in their L1 than when the same task was performed in the L2, although L2 processing should require greater effort.
In the current study, we record the pupil sizes of Cantonese–Mandarin bilinguals when listening to affective words in Cantonese (L1) or Mandarin (L2) pronunciations. The main hypothesis is that greater pupil dilations will be observed when Cantonese pronunciations are presented. Speakers with less balanced bilingual skills (i.e., weaker L2 proficiency) are predicted to show greater differential affective processing than more balanced bilinguals. We also predict a stronger effect of differential affective processing for strongly negative words.
Methods
Participants
52 participants (18–25 y.o.; 42F, 10M) were recruited from a university in Hong Kong. All the participants were native speakers of Cantonese, with Cantonese-speaking parents, born in Hong Kong (N = 48) or moved to Hong Kong between the age of one and four (N = 4). The participants started to learn Mandarin at a young age (M = 5.31 y.o., SD = 1.63), mostly in preschools, and had on average 1.82 hours per week (SD = 1.52) of Mandarin classes or classes taught in Mandarin in primary and secondary schools. Most of the participants rated their Mandarin proficiency as Intermediate (N = 21) or Upper Intermediate (N = 13), whom we refer to as Mid-Proficiency bilinguals, in contrast with the rest of the participants who self-rated as Beginner/Elementary (i.e., Low-Proficiency; N = 8) or Advanced/Proficient (i.e., High-Proficiency; N = 10).Footnote 4 Given the “biliterate and trilingual” policy in the education system of Hong Kong, all the participants also had early and continuing exposure to English, and a few participants reported knowledge of additional languages such as Japanese, Korean and French. Two participants are left-handed, one ambidextrous, and the rest right-handed.
A separate group of 16 non-Chinese-speaking listeners were recruited to participate in the validation test of the auditory stimuli.
Stimuli
The word list consists of 130 disyllabic words, of which roughly half are affective words (N = 70) and the rest neutral fillers (N = 60) (see Table S1 for the complete word list). The affective words are further divided into taboo words (N = 10), emotion words (N = 30) and emotion-laden words (N = 30). According to Janschewitz (Reference Janschewitz2008), taboo words can be divided into several subcategories, including religious profanity, cultural taboos (slangs or clinical terms referring to sexuality, bodily functions, death, etc.), and verbal insults based on a person's race, gender, sexuality. The taboo words in this study belong to either cultural taboos (e.g., 性交 ‘intercourse’, 大便 ‘feces’, 死亡 ‘death’) or verbal insults (e.g., 賤人‘bitch’, 畜生 ‘bastard’).Footnote 5 The group of emotion words we used are chosen from the Chinese emotion word database compiled by Lin and Yao (Reference Lin and Yao2016), which also reports ratings of emotion category, intensity and valence for the written forms of the words from Chinese speakers in mainland China, Hong Kong, and Singapore. We chose six words in each basic emotion category (happiness, sadness, anger, fear, and surprise): three words from the top intensity range and the other three from the bottom intensity range, with preference given to words that elicited similar intensity ratings from mainland Chinese and Hong Kong speakers. Valence ratings are largely aligned with emotional categories (e.g., happiness – positive, sadness/anger/fear – negative, surprise – neutral). The emotion-laden words (15 negative, 15 positive) and neutral fillers (e.g., 圓圈 ‘circle’) are both randomly sampled from the word lists in Chen et al. (Reference Chen, Lin, Chen, Lu and Guo2015). In subsequent data analysis, two test words (one emotion word and one emotion-laden word) are excluded because they were accidentally included in the practice trials (see more in Data preprocessing), leaving 128 test words (68 affective words, 60 neutral fillers) in the analysis.
Audio recordings of the words were made in a sound-treated booth with two female speakers, a native Cantonese speaker from Hong Kong (reading in Cantonese) and a native Mandarin speaker from northern China (reading in Mandarin), both in their 20s and naïve to the purpose of the study. Individual word recordings were scaled to an average intensity of 65 dB. Overall, the Cantonese word productions were slightly longer than the Mandarin productions (MCAN = 833 ms, SDCAN = 116, MMAN = 729 ms, SDMAN = 73; p < 0.001 in paired t-test). To ascertain that the speakers maintained a neutral tone when producing the stimuli, a validation test was conducted, where a separate group of 16 listeners – all of whom were non-speakers of either Cantonese or Mandarin and had no knowledge about the main study – listened to the recordings and provided intuitive judgements about the emotional intensity they perceived from the recordings on a 5-point scale. A cumulative link mixed model on intensity rating, fitted with the clmm() function of the ordinal package (Christensen, Reference Christensen2019; version 2019.12-10) in R (R Team, Reference Team2019; version 3.6.1), showed no significant effects of Language (CAN, MAN), WordType (taboo, emotion words, emotion-laden words, neutral filler), or the interaction of the two (all ps > 0.1), while controlling for by-listener and by-item random intercepts, by-listener random slopes for Language and WordType, and by-item random slope for Language.
Although the test words are shared between Cantonese and Mandarin lexicons, their use frequency may differ across languages. To control for frequency effects, we used subjective frequency ratings provided by the current participants (see Procedure below) instead of corpus-based frequency counts, due to the lack of corpus resources for Cantonese and the potential problem of estimating usage frequency in a nonnative language (Mandarin) with counts from monolingual corpora. Overall, the Cantonese pronunciations of the test words were rated as more frequently heard (Mean = 4.49 on a 7-point scale) than the Mandarin counterparts (Mean = 4.14; p < .001 in paired t-test), although the two ratings are closely correlated (r = 0.70). Participants also gave more “unknown” responses to the Mandarin stimuli (N_type = 81; N_token = 201) than the Cantonese counterparts (N_type = 27; N_token = 64).
Procedure
The experiment was run in SR Research Experiment Builder (SR Research, 2011). The participant was seated in front of a desktop computer in a soundproof booth, with the head position stabilized by a desk-mounted chin and forehead rest at a horizontal distance of 78 cm from the computer screen. The sound stimuli were played through a set of AKG K77 headphones, and the participant's pupil responses were tracked by an EyeLink 1000 eye tracker. The participant was told that they would be hearing and evaluating the usage frequency of a series of Cantonese or Mandarin words. Each session consisted of eight practice trials and 130 test trials (one for each test word, in a randomized order). Each participant would hear half of the test words in Cantonese, and the other half in Mandarin, presented in two language-specific blocks. The participant could take a short break between the two blocks, and a complete eye-tracking session lasted about 40 minutes. The assignment of languages and the order of language blocks were balanced across participants. Each trial would begin with a fixation period of 1.5 s, with a cross sign (+) that is 60 pixels in height and width shown at the center of the screen. After the stimulus started playing, the cross sign remained on the screen for a period of 2–5 s, to allow enough time for observing stimulus-evoked pupil size changes. The duration of this period (Mean = 2762 ms, SD = 646) varied based on the length of the stimulus and a quasi-random fluctuation. The participant was instructed to always look at the cross sign when it was on the screen. After the cross sign disappeared, the participant would press a key to proceed to rate how often they heard the word on a 7-point scale (7 = frequently, 1 = rarely, 0 = unknown word). An interval of 2 s was administered between trials in the same block. Throughout the experiment, there was no mention in any form of the emotionality of the stimuli. After the eye-tracking experiment, the participant would complete a survey about their language background and their experience with both Cantonese and Mandarin.
Data preprocessing
In each experimental trial, the participant's right pupil size and gaze coordinates were tracked at a sampling rate of 1000 Hz by the eye tracker. The tracking period – during which a cross sign remained on the screen – can be divided into two consecutive intervals: the fixation period (before stimulus onset) and the stimulus tracking period (after stimulus onset and before the beginning of the frequency rating task).
Following the recommendation in previous studies (Mathôt, Fabius, Van Heusden, & Van Der Stigchel, Reference Mathôt, Fabius, Van Heusden and Van Der Stigchel2018; Schacht, Dimigen, & Sommer, Reference Schacht, Dimigen and Sommer2010), raw pupillometry data were preprocessed with a pipelined procedure that (1) replaced intervals of blinks, saccades, and outlier measurements (2.5 standard deviations away from the mean pupil size of the current trial) with interpolations of nearest valid values, (2) smoothed the measurements with a floating 5-point window, (3) down-sampled the signal from 1000 Hz to 10 Hz (i.e., 10 samples per second) by averaging, and (4) baseline-corrected the measurements in the stimulus tracking period by subtracting the mean pupil diameter during the last 200 ms of the pre-stimulus fixation period. The result of data preprocessing is a time series of task-evoked pupil diameter measurements for each trial. Furthermore, we removed test trials that have (1) more than 50% of measurements replaced by interpolations (i.e., suggesting anomaly in pupil size measurement) or (2) more than 50% of the measurements that occurred outside of an area of 300 pixels in diameter around the cross sign (i.e., suggesting extensive eye movement; Mathôt et al., Reference Mathôt, Fabius, Van Heusden and Van Der Stigchel2018). Finally, we also excluded the test trials of two critical words that were accidentally included in the practice trials, as well as trials that yielded “unknown” responses in the frequency rating task (i.e., the participant did not recognize the word). Altogether 25.7% of the test trials were excluded, leaving 5,021 trials (featuring 52 participants and 128 critical words spoken in two languages) and 132,173 pupil size measurements for analysis.
Statistical analysis
In the rest of the paper, unless otherwise specified, all the pupillometry analysis is based on task-evoked pupil diameter measurements. The analysis focuses on a 3000 ms window after stimulus onset, which covers the whole durations of most of the trials and contains the critical time windows reported in previous pupillometry studies with similar stimuli or trial durations (e.g., 0–1200 ms in Schmidtke, Reference Schmidtke2014; 0–1500ms in Kuchinke et al., Reference Kuchinke, Võ, Hofmann and Jacobs2007; 1000 ms–1500 ms in Bayer et al., Reference Bayer, Sommer and Schacht2011 and Schacht et al., Reference Schacht, Dimigen and Sommer2010; 1000–2000 ms in Tamási et al., Reference Tamási, McKean, Gafos, Fritzsche and Höhle2017). Generalized additive mixed modeling (GAMM; Sóskuthy, Reference Sóskuthy2017, Reference Sóskuthy2021; van Rij, Hendriks, van Rijn, Baayen, & Wood, Reference van Rij, Hendriks, van Rijn, Baayen and Wood2019; Wieling, Reference Wieling2018; Wood, Reference Wood2017) is used to model whether the shape of the pupil diameter curve (over time) varies by language, word type, Mandarin proficiency, and their interactions. All the models are built with the bam() function in the mgcv package (Wood, Reference Wood2011, Reference Wood2017; v1.8.38) in R, while model checking and data visualizations are conducted with functions from the itsadug package (van Rij, Wieling, Baayen, & van Rijn, Reference van Rij, Wieling, Baayen and van Rijn2017; v2.4).
The GAMM technique effectively unveils both linear and non-linear dependencies – fixed and random – in the data while keeping under control the inherent autocorrelation in time series observations, an issue that has been shown to affect other statistical methods of analyzing time series data such as the growth curve analysis (Huang & Snedeker, Reference Huang and Snedeker2020). All the GAMMs reported in this paper incorporate an autoregressive error model at lag = 1 (AR(1)), with an autocorrelation coefficient rho estimated from a maximally similar base model without AR(1). Autocorrelations of model residuals in the final models are around 0.3 at lag = 1.Footnote 6 Model fits are checked with gam.check(). Initial model testing showed residual distributions with heavy tails on both ends, so a scaled-t link was used in the final models (Wieling, Reference Wieling2018). The values of basis dimension (k) are also adjusted according to the results from model fit testing. When a k value is too low, it would be doubled until the issue disappears or when k reaches the maximum (i.e., the number of unique values of the variable). All the models reported below have sufficient k values.
Results
Data summary
Overall, baseline pupil diameter shows a general trend of decreasing over time within a testing block (see Figure S2), possibly due to fatigue or disengagement, as noted in previous studies (McGinley, David, & McCormick, Reference McGinley, David and McCormick2015; Winn et al., Reference Winn, Wendt, Koelewijn and Kuchinsky2018). Against this backdrop, task-evoked pupil diameter in the analysis window is in general greater than zero, suggesting overall pupil dilation (compared to the baseline) after stimulus onset, although the degree of dilation tends to be smaller in the 2nd block than in the 1st block, probably also due to fatigue (see Figure S3). The peak of dilation usually occurs at around t = 1500 ms, at which point the increase in pupil diameter is on average 8.34% (SD = 6.57%) from the baseline diameter.
Modeling task-evoked pupil diameter change over time with GAMMs
The dependent measure in all the GAMMs is task-evoked pupil diameter. To control for the effects of fatigue, gaze position (Gagl, Hawelka, & Hutzler, Reference Gagl, Hawelka and Hutzler2011), and usage frequency on pupil size, all the models contain smooths for a group of control predictors including block (IsBlock2; a binary variable that is 1 if the response is from Block 2, 0 if from Block 1), trial order within a block (TrialNO), gaze coordinates (Gaze_x, Gaze_y), and subjective frequency rating (SubjectiveFreq). The models also include random smooths for participant (SubjID) and word (WordID). More complex random structures (e.g., random intercepts or slopes for certain predictors, or random smooths for individual trials) are not included due to model non-convergence.
In line with the central inquiries, the analysis is conducted in two stages: Level-1 models have language × word type as the critical predictors for the investigation of differential affective processing effect, and Level-2 models further examine how Mandarin proficiency modulates differential affective processing (i.e., language × word type × proficiency).
In the first stage, we first built a model gamm.1a.AR, where the interaction of language and word type was coded as a categorical interactional variable LangWtype, which combines the levels of language (“M” for Mandarin, “C” for Cantonese) and word type (“NT” for neutral filler, “EL” for emotion-laden word, “EW” for emotion word, “TB” for taboo), with “M.NT” (i.e., Mandarin neutral filler) being the reference level. Model formula and result summary can be found in Appendix S4.
All the control predictors in gamm.1a.AR have significant smooth terms (see Figure S5 for the visualization of the control effects). Specifically, later trials (either in Block 2 or occurring later within a block) and trials with more frequently heard stimuli tend to have reduced pupil dilations, which is compatible with the predicted effects of fatigue and frequency.
Regarding the critical predictor of LangWtype, the parametric effect shows that both Mandarin taboos (M.TB: β = 53.16, p < .001) and Cantonese taboos (C.TB: β = 74.61, p < .001) have significantly higher curves than the baseline Mandarin neutral fillers, but no difference in intercept between other levels of LangWtype and the baseline (all ps > .05). Among the smooth terms regarding LangWtype, only the smooth of Cantonese taboos is significantly different from zero (p < .001). As shown in Figure 2, the overall predicted response curve is higher for taboos than neutral fillers in both languages, from slightly after 1 s to the end of the window.
To verify the significance of the taboo vs. neutral difference and to compare the size of the taboo vs. neutral difference across languages, we built gamm.1b.AR, where the critical interaction of language and word type was re-coded as seven binary variables (IsCantonese, IsEL, IsEW, IsTB, IsC.EL, IsC.EW, IsC.TB) defined as follows (see Appendix S6 for model formula and result summary):
• IsCantonese = 1 if language is Cantonese, 0 otherwise.
• IsEL = 1 if word type is emotion-laden word, 0 otherwise.
• IsEW = 1 if word type is emotion word, 0 otherwise.
• IsTB = 1 if word type is taboo, 0 otherwise.
• IsC.EL = 1 if word type is emotion-laden word and language is Cantonese, 0 otherwise.
• IsC.EW = 1 if word type is emotion word and language is Cantonese, 0 otherwise.
• IsC.TB = 1 if word type is taboo and language is Cantonese, 0 otherwise.
Since the smooths of the binary variables include the differences in both the intercept and non-linear terms, there are no parametric terms for the binary variables. Given the coding scheme above, the reference smooth represents Mandarin neutral fillers, and all other Language × WordType levels are modeled by additional binary smooths. Specifically, Cantonese neutral fillers are represented by s(Time, by = IsCantonese); Mandarin taboos are presented by s(Time, by = IsTB); Cantonese taboos are represented by s(Time, by = IsCantonese)+ s(Time, by = IsTB)+ s(Time, by = IsC.TB).
Model summary shows that among the seven newly added binary variables, only IsTB and IsC.TB have smooths that are significantly different from zero (p < .05). The visualization (see Figure S7) shows that both IsTB and IsC.TB are associated with higher response curves, from around 1.2 s to the end of the window for IsTB and between 2 s and 2.5 s for IsC.TB. These effects confirm that task-evoked pupil diameter is significantly larger for taboo words than for neutral words in both languages, and that the effect is even greater in Cantonese than in Mandarin.
To summarize, the results of Level-1 models (gamm.1a.AR and gamm.1b.AR) provide evidence for differential affective processing in Cantonese–Mandarin bilinguals, with the largest emotionality effects observed for taboo words. While bilingual listeners exhibit higher pupil dilation to taboo words presented in both languages (compared with neutral fillers), the magnitude of the emotionality effect is stronger when the words are presented in L1 Cantonese pronunciations than in L2 Mandarin pronunciations.
Next, we added Mandarin proficiency to the model and its interaction with language and word type, in order to investigate whether differential affective processing is modulated by the bilingual's proficiency of the L2. Since the non-taboo affective words did not show significant differences from neutral fillers in the Level-1 models, we focused on only taboos and neutral fillers in the Level-2 models. The dataset includes 72,042 pupil size measurements from 2,739 trials. We first built gamm.2a.AR, with a categorical interactional variable LangWtypeMprof, which combines all the levels of language, word type, and Mandarin proficiency (“LowP” for low proficiency, “MidP” for intermediate proficiency, and “HighP” for high proficiency), with M.NT.LowP (i.e., low-proficiency bilinguals’ responses to Mandarin neutral fillers) as the reference level. Model summary (see Appendix S8) shows that among the levels of LangWtypeMprof, only C.TB.LowP (i.e., low-proficiency bilinguals’ responses to Cantonese taboos) has a significant parametric term – in the positive direction, but all the levels are associated with significant smooths over time. The visualization in Figure 3 reveals that low-proficiency bilinguals’ response smooth for Cantonese taboos is higher than that of Cantonese neutral fillers, but no such difference exists in Mandarin. By contrast, both mid- and high-proficiency bilinguals have higher response curves for taboos than neutral fillers in both languages (Figure 4 and Figure 5). The taboo-neutral difference, if existent, is usually evident in the second half of the window, starting between 1 s and 1.5 s.
To confirm the significance of the effects observed in gamm.2a.AR, we fitted gamm.2b.AR, which models the language × word type × proficiency interaction with a categorical variable of Mandarin proficiency (Mprof) and three binary variables for the language × word type interaction for each proficiency group (see Appendix S9). As an example, the binary variables for the low-proficiency group are as follows:
• IsTB.LowP = 1 if word type is taboo and Mandarin proficiency is low, 0 otherwise.
• IsC.LowP = 1 if language is Cantonese and Mandarin proficiency is low, 0 otherwise.
• IsC.TB.LowP = 1 if language is Cantonese, word type is taboo and Mandarin proficiency is low, 0 otherwise.
Accordingly, low-proficiency bilinguals’ response smooths are modeled by the following parametric and smooth terms (excluding control factors and random effects):
• Mandarin neutral fillers: MprofLowP + s(Time):MprofLowP
• Mandarin taboos: MprofLowP + s(Time):MprofLowP + s(Time, by= IsTB.LowP)
• Cantonese neutral fillers: MprofLowP + s(Time):MprofLowP + s(Time, by= IsC.LowP)
• Cantonese taboos: MprofLowP + s(Time):MprofLowP + s(Time, by= IsTB.LowP) + s(Time, by= IsC.LowP) + s(Time, by= IsC.TB.LowP)
If low-proficiency bilinguals show a significant effect of differential affective processing, IsC.TB.LowP should show a significant, positive smooth over time. Similar predictions can be made for mid- and high-proficiency bilinguals.
The results of gamm.2b.AR show no cross-language difference in neutral fillers for any proficiency group (s(Time):IsC.LowP , s(Time):IsC.MidP , and s(Time):IsC.HighP are non-significant; all ps > .05). But regarding the taboo-neutral difference, the three proficiency groups show different patterns. Low-proficiency bilinguals have significant difference between neutral fillers and taboos in Cantonese (s(Time):IsC.TB.LowP; p = .002) but not in Mandarin (s(Time):IsTB.LowP; p = .71). Mid-proficiency bilinguals show a significant taboo-neutral difference in Mandarin (s(Time):IsTB.MidP; p < .001) and an additional cross-language difference for the taboo-neutral comparison in Cantonese (s(Time):IsC.TB.MidP; p < .001). High-proficiency bilinguals also show a significant taboo-neutral difference in Mandarin (s(Time):IsTB.HighP; p < .001), but no additional cross-language difference for the taboo-neutral comparison in Cantonese (s(Time):IsC.TB.HighP; p = .8), suggesting that the same amount of taboo-neutral difference is present in both languages.
The visualization of the partial effects of the binary variables in gamm.2b.AR (Figure S10) confirms that the directions of the significant effects are all in the predicted, positive direction. It can also be seen that the effect size of IsC.TB.LowP is much larger than that of IsC.TB.MidP, confirming that the magnitude of the language × word type interaction is greater for low-proficiency bilinguals.
Taken together, the results of Level-2 models reveal a continuum of the degree of differential affective processing, with low-proficiency bilinguals on the highest end of the continuum, showing taboo-neutral differences only in Cantonese (L1) but not in Mandarin (L2), and high-proficiency bilinguals on the lowest end, showing taboo-neutral differences in both languages to the same degree. Mid-proficiency bilinguals are in the middle, showing taboo-neutral differences in both languages but with a slightly stronger effect in Cantonese than in Mandarin.
Discussion
An important feature of the current study is the focus on cognates, which were eschewed in previous research on differential affective processing. All the test words used in the current study are identical between Cantonese and Mandarin in both orthography and meaning, differing only (partially) in the phonological form. The existing literature has consistently found evidence for the so-called cognate effect (also known as the cognate advantage) in bilingual/L2 processing: compared to non-cognates, cognates are easier to recognize and name (Costa, Santesteban, & Caño, Reference Costa, Santesteban and Caño2005; Dijkstra, Hilberink-Schulpen, & Van Heuven, Reference Dijkstra, Hilberink-Schulpen and Van Heuven2010; Lalor & Kirsner, Reference Lalor and Kirsner2000; Peeters, Dijkstra, & Grainger, Reference Peeters, Dijkstra and Grainger2013), more susceptible to phonetic transfer and more likely to be influenced by the crosslinguistic counterparts (Amengual, Reference Amengual2012; Brown and Harper, Reference Brown and Harper2009; Flege & Munro, Reference Flege and Munro1994; Goldrick, Runnqvist, & Costa, Reference Goldrick, Runnqvist and Costa2014; Simonet & Amengual, Reference Simonet and Amengual2020; Yao & Chang, Reference Yao and Chang2016; but see Cochrane, Reference Cochrane1980; Flege, Frieda, Walley, & Randazza, Reference Flege, Frieda, Walley and Randazza1998; Flege, Takagi, & Mann, Reference Flege, Takagi and Mann1995 for different views). Given such, cognates should be less likely to elicit differential affective processing compared to non-cognates. Thus, the fact that we still observe significant cross-language differences with Cantonese–Mandarin affective cognates attests to the resilience of the differential affective processing phenomenon under high linguistic similarity. This study also contributes new data from Chinese languages to the research of cognate processing, which is disproportionately concentrated on European languages. The cognates shared between Cantonese and Mandarin (as well as other Chinese varieties) are uniquely characterized by their vast amount, identical orthography, and often mutually unintelligible pronunciations due to the differences in the segmental/tonal inventory across languages.
Locus of the emotionality effect
For differential affective processing to occur between Cantonese and Mandarin cognates, the locus of the emotionality effect should be the phonological form. We consider two possible accounts: (1) the emotionality effect arises from the manner of accessing the emotional concept from the phonological form; (2) the phonological form is associated with episodic emotional memory in the mental lexicon. Regarding the first possibility, following the hierarchical model reviewed above, L2 forms’ access to the emotional concept may require the mediation of the corresponding L1 forms, whereas L1 forms can access the emotional concept directly. For Cantonese–Mandarin bilinguals, while the lemma may be linked to both Cantonese and Mandarin phonological forms (see Figure 1), the link with the Cantonese form is probably stronger than the link with the Mandarin form. Thus, accessing the lemma and the higher-level emotional concept would be more efficient from the Cantonese side than from the Mandarin side, which could lead to high automaticity and stronger magnitude in emotional arousal from the Cantonese spoken forms.
The subjective frequency ratings we collected provide some support for the claim of easier lexical and conceptual activation from Cantonese spoken forms. Cantonese pronunciations of the test words were rated as more frequently heard than the Mandarin counterparts, suggesting that the bilinguals are indeed more familiar with the Cantonese forms. However, it is unlikely for ease of activation to account for heightened pupil responses in this study. As the modeling results suggest, subjective frequency per se tends to be negatively associated with pupil dilations, with more frequent/familiar forms eliciting weaker pupillary responses, probably due to lower processing effort. This is consistent with previous reports of the effects of lexical frequency on pupil dilation (Kuchinke et al., Reference Kuchinke, Võ, Hofmann and Jacobs2007; Schmidtke, Reference Schmidtke2014). In other words, the overall effect of ease of activation in the current dataset is weakening – as opposed to strengthening – the pupil response; as a result, ease of activation fails to provide a viable explanation for the observed pattern of heightened emotionality effect in L1 Cantonese.
The second account posits that emotionality is encoded in the phonological forms in memory. The association between emotionality and word pronunciations can be conveniently explained by an exemplar-based model of the lexicon (Johnson, Reference Johnson, Johnson and Mullennix1997; Pierrehumbert, Reference Pierrehumbert, Gussenhoven and Warner2002), where a lexical unit is represented by a cloud of tokens (i.e., exemplars). Each exemplar represents a phonetic form that the individual has encountered previously, including not only the phonetic detail but also the contextual information of the encounter such as the speaker's and the listener's emotional states. Thus, affective words, which tend to occur in emotional context, will have emotional memory associated with their exemplars. In subsequent language processing, the associated emotional memory may be invoked when the affective words are encountered again. For a bilingual speaker, since L2 affective words are less often encountered in naturally-occurring emotional context, the strength of emotional memory associated with an L2 affective word will be much lower compared to its L1 counterpart, thus explaining the differential affective processing phenomenon. Amengual (Reference Amengual2012) suggested that cross-linguistic cognates co-inhabit in a cloud in the bilingual lexicon: “Bilinguals may associate two phonologically similar word representations (cognates) in the same [‘]cloud[’], so the word for a particular concept/meaning is influenced by the orthographically, phonologically, and semantically similar representation from the other language” (Amengual, Reference Amengual2012, pp. 527). Our results indicate that even if phonologically similar cognates occupy similar locations in the mental acoustic space, the emotionality information stored in individual exemplars can differ significantly between the two languages. That is to say, cross-linguistic influence within a cloud – if present – is not strong enough to fully equalize the L1-L2 difference in emotional strength.
Obviously, the second account is fully compatible with the idea that emotionality effects are formed through emotional memory in the process of language learning and language use (e.g., Dewaele, Reference Dewaele2010; Harris et al., Reference Harris, Gleason and Ayçiçeǧi2006). It also receives support from previous findings of emotionality effects elicited by nonlinguistic auditory stimuli (Blasi et al., Reference Blasi, Mercure, Lloyd-Fox, Thomson, Brammer, Sauter and Murphy2011; Partala & Surakka, Reference Partala and Surakka2003) and enhanced emotionality effects from auditorily presented – compared with visually presented – word stimuli (Harris et al., Reference Harris, Ayçiçegi and Gleason2003).
The two accounts are not mutually exclusive. Potential evidence for the first account may come as a result of separating processing effort and efficiency of lexical/conceptual access. The second account could be further tested by examining if the degree of the emotionality effect varies with the fine phonetic detail in the auditory stimuli. Presumably, if the auditory token is phonetically more similar to previously stored exemplars that have high emotionality (e.g., produced with features of emotional speech, such as emotional prosody), the elicited emotionality effect should be stronger. These predictions await confirmation in future research.
L2 proficiency and differential affective processing
It is widely acknowledged that the degree of differential affective processing is smaller if the bilingual is more balanced between L1 and L2. Previous literature suggests that L2 proficiency, early exposure, learning mode (classroom instruction vs. immersion) and language use pattern as potential factors that would determine whether the bilingual would show the emotionality effects in the L2 (Caldwell-Harris, Reference Caldwell-Harris2014; Degner et al., Reference Degner, Doycheva and Wentura2012; Dewaele, Reference Dewaele2010; Harris et al., Reference Harris, Gleason and Ayçiçeǧi2006). In the current study, more than half of the participants (34 out of 52) rated their Mandarin proficiency as intermediate, and the rest split between low-proficiency (8) and high-proficiency (10). It should be noted that although this group of bilinguals may lack early childhood exposure to Mandarin in an immersive environment, their L2 experience is overall quite extensive for adult L2 speakers. Even the self-reported low-proficiency group started to learn Mandarin early, between the age of three and seven, and had regular Mandarin lessons throughout primary and secondary schools. Given the linguistic and geographical proximity between Mandarin and Cantonese, it is also safe to assume that the bilinguals in this study have ample exposure to Mandarin in everyday life (especially in recent years) through media and personal contact with Mandarin speakers.
Against the backdrop of overall extensive L2 experience, our results reveal a continuum from self-rated low-proficiency bilinguals, who show the greatest amount of differential affective processing, to mid-proficiency bilinguals, who have a reduced magnitude of the effect, to high-proficiency bilinguals, who show no significant difference in affective processing between L1 and L2. This pattern supports the prediction that the more balanced the bilingual is, the less the differential affective processing effect. Furthermore, the fact that differential affective processing is evidenced in early L2 learners with low-to-mid proficiency levels calls for further research to separate the effects of age of acquisition, learning mode, and proficiency.
Selective effects of differential affective processing
The last question we would like to address is the selectivity of the L1-L2 differences in emotionality effects. In the current study, we find the strongest emotionality effects – as well as the largest L1-L2 differences – with taboo words, while the emotionality effects in emotion-laden words or emotion words are not significant. This pattern is consistent with previous findings of the most pronounced emotionality effects from taboos, swear words, and reprimands. There are at least two mechanisms for taboo words to generate enhanced emotionality effects compared with other word categories. On one hand, taboo words tend to evoke stronger emotional memories; on the other hand, given the social expectation of avoiding taboo words – especially in an academic context – it is possible that taboo words elicit greater surprisal effects in the experiment, which would in turn lead to higher pupil responses. It should be noted, though, that the current study only used taboo words from the standard (i.e., non-slang) Chinese vocabulary, excluding vernacular taboo and swear words that are unique to the spoken variety, which would probably elicit even stronger responses. Thus, our results demonstrate the robustness of the emotionality effects associated with taboo words.
The patterns of differential affective processing observed for taboo words are also in line with previous literature that found larger L1-L2 differences in strongly negative stimuli than in positive/neutral stimuli (Altarriba & Santiago-Rivera, Reference Altarriba and Santiago-Rivera1994; Bond & Lai, Reference Bond and Lai1986; Dewaele & Pavlenko, Reference Dewaele and Pavlenko2002; Heredia & Altarriba, Reference Altarriba2001; Marian & Kaushanskaya, Reference Marian and Kaushanskaya2008; Wu & Thierry, Reference Wu and Thierry2012). Why should negativity be a modulating factor for differential affective processing? It probably has to do with how humans process negative stimuli. As a crucial feature for survival, we are naturally more alerted to negative stimuli in the environment (Baumeister, Bratslavsky, Finkenauer, & Vohs, Reference Baumeister, Bratslavsky, Finkenauer and Vohs2001; Rozin & Royzman, Reference Rozin and Royzman2001), a bias that is also reflected in the disproportionately large share of negatively-valenced words in the affective lexicon (Bardeen & Daniel, Reference Bardeen and Daniel2017; Blasi et al., Reference Blasi, Mercure, Lloyd-Fox, Thomson, Brammer, Sauter and Murphy2011; Cisler & Koster, Reference Cisler and Koster2010; Janschewitz, Reference Janschewitz2008; Lin & Yao, Reference Lin and Yao2016). Disengagement mechanisms such as emotion regulation (see Gratz & Roemer, Reference Gratz and Roemer2004; Gross, Reference Gross1998) are available as a way to prevent emotional distress, but such mechanisms could also lead to heightened sensitivity if we need to constantly monitor our attention level to the negative stimuli that are to be avoided (Bardeen & Daniel, Reference Bardeen and Daniel2017; Wenzlaff & Wegner, Reference Wenzlaff and Wegner2000). Following this line of reasoning, bilinguals may show a lower degree of the negativity bias in the L2 (and consequently greater L1-L2 differences), due to either a general reduction of emotional responses to aversive stimuli in the L2 or a more effective suppression mechanism for the L2 that does not require constant monitoring.
Conclusion
In this paper, we report a pupillometry study that found stronger pupil responses to the L1 (Cantonese) affective words than to the L2 (Mandarin) affective words from Cantonese–Mandarin bilinguals. Our results provide strong support for differential affective processing from a highly restrictive context, i.e., when the tested items are all identical cognates that only differ in pronunciation, suggesting that the emotionality effects in language processing must be at least partially rooted in the phonological forms. This research also extends the work on bilingual affective language processing by including a less-studied L1 and L2 pair.
Supplementary Material
For supplementary material accompanying this paper, visit https://doi.org/10.1017/S1366728922000931
Supplementary Materials
S1. Complete word list used in the study.
S2. Figure of baseline pupil diameter over trials by block by language.
S3. Figure of average task-evoked pupil diameter in the 3000 ms analysis window, separated by block by language.
S4. Summary of gamm.1a.AR in the analysis of Language × WordType effects.
S5. Visualization of the effects of IsBlock2, TrialID, and SubjectiveFreq over time in gamm.1a.AR.
S6. Summary of gamm.1b.AR in the analysis of Language × WordType effects.
S7. Visualization of the partial effects of IsTB and IsC.TB over time in gamm.1b.AR.
S8. Summary of gamm.2a.AR in the analysis of Language × WordType × MandarinProficiency effects
S9. Summary of gamm.2b.AR in the analysis of Language × WordType × MandarinProficiency effects
S10. Visualization of the partial effects of the critical binary variables over time for low-proficiency (top row), mid-proficiency (middle row), and high-proficiency (bottom row) bilinguals in gamm.2b.AR.
Acknowledgement
This research project is supported by a research grant from the Hong Kong Polytechnic University (P0001884). We thank Ms. Liu Chang, Ms. Hillarie Tse, and Mr. Eugene Wong for assistance with the experiments.
Competing interests
The authors declare none.
Data Availability Statement
The datasets and the R script for the analysis presented in this paper are publicly accessible from the website https://osf.io/8pgkj/.