Highlights
-
• Gender stereotype-laden information is highly internalized in females.
-
• Males display reduced gender-stereotypical bias in L2 than L1.
-
• The foreign language effect is observed in bilingual stereotype processing.
1. Introduction
Gender stereotypes refer to generalizations about social norms and expectations toward men and women (Hentschel et al., Reference Hentschel, Heilman and Peus2019) and derive from differences in social roles they engage in at home and work (social role theory; Eagly, Reference Eagly1997; Koening & Eagly, Reference Koenig and Eagly2014). As a result of such a gendered division of labor, men are stereotypically characterized as more agentic (i.e., focused on being in control), while women are believed to be more communal and relational (i.e., focused on building relationships; Abele, Reference Abele2003). Though an automatic activation of gender stereotypes has been previously shown in a number of studies on native language processing (e.g., Molinaro et al., Reference Molinaro, Su and Carreiras2016; Osterhout et al., Reference Osterhout, Bersick and Mclaughlin1997), previous experiments have not yet examined how stereotypes are stored and accessed in the bilingual mind. The present event-related potential (ERP) study is the first to test bilingual speakers’ sensitivity to gender stereotypes when operating in their native (L1) as compared to foreign (L2) language.
From a neuropsychological perspective, gender stereotypes constitute category-based knowledge (i.e., semantic knowledge about a particular class of entities) that is activated automatically (i.e., outside people’s awareness) and is difficult to inhibit (Contreras et al., Reference Contreras, Banaji and Mitchell2012; Proverbio et al., Reference Proverbio, Alberio and De Benedetto2018; Siyanova-Chanturia et al., Reference Siyanova-Chanturia, Warren, Pesciarelli and Cacciari2015). Following the social cognitive perspective of the correspondence bias, repeated observations of social roles that men and women tend to engage in lead to specific beliefs about the attributes of each sex (Eagly & Wood, Reference Eagly, Wood, Naples, Hoogland, Wickramasinghe and Wong2016). Importantly, the formulated beliefs impact not only people’s expectations but also their perception of actions and behaviors that are (in)consistent with gender roles, whereby stereotypically congruent behaviors receive more positive reactions (Eagly & Wood, Reference Eagly, Wood, Naples, Hoogland, Wickramasinghe and Wong2016; West & Zimmerman, Reference West and Zimmerman1987), thus further reinforcing gender stereotypes.
Language, as the primary system by which we transmit social and cultural norms and attitudes (Goodhew et al., Reference Goodhew, Reynolds, Edwards and Kidd2022), strongly contributes to perpetuating stereotypical beliefs about men and women (Holtgraves & Kashima, Reference Holtgraves and Kashima2008; Kashima et al., Reference Kashima, Kashima, Kidd and Holtgraves2014; Kiełkiewicz-Janowiak & Pawelczyk, Reference Kiełkiewicz-Janowiak, Pawelczyk and Dziubalska-Kołaczyk2006). Previous monolingual studies have shown that gender stereotypes are automatically accessed in the process of language comprehension. Electrophysiological (EEG) research that employed an ERP analysis reported modulations to stereotypically (in)congruent words within the N400 time frame, which marks lexico-semantic processing (Kutas & Federmeier, Reference Kutas and Federmeier2011). Namely, it has been observed that words violating gender stereotype-based expectations elicit larger N400 responses compared to words conforming to these stereotypes (e.g., Pesciarelli et al., Reference Pesciarelli, Scorolli and Cacciari2019; Siyanova-Chanturia et al., Reference Siyanova-Chanturia, Pesciarelli and Cacciari2012). This suggests that lexico-semantic processing is more cognitively demanding when individuals encounter words that defy their preconceived notions about men and women. Importantly, however, such an N400 stereotype congruency effect has primarily been observed in response to isolated words rather than sentences (Canal et al., Reference Canal, Garnham and Oakhill2015; Osterhout et al., Reference Osterhout, Bersick and Mclaughlin1997). This implies that gender stereotypes, when embedded in sentences, require reanalyses at a later stage of language processing that is more sensitive to feature agreement. Such modulations by stereotype congruency within sentence contexts were previously observed in the Late Positive Complex (LPC) time window, an ERP component indexing meaning integration and/or reanalysis (Kolk & Chwilla, Reference Kolk and Chwilla2007; Aurnhammer et al., Reference Aurnhammer, Delogu, Brouwer and Crocker2023). Previous studies have found larger LPC amplitudes evoked in response to sentences violating gender stereotypes (e.g., Canal et al., Reference Canal, Garnham and Oakhill2015; Lattner & Friederici, Reference Lattner and Friederici2003; Proverbio et al., Reference Proverbio, Alberio and De Benedetto2018), which shows that stereotype knowledge impacts meaning integration mechanisms.
Little scholarly attention has, however, been devoted to the EEG investigation of gender stereotype processing in the context of bilingualism, though the language of operation has recently been suggested to impact psychosocial processes. For instance, unbalanced (L1–dominant) bilingual speakers have been observed to be more empathetic and pro-social when operating in L2 (Castro et al., Reference Castro, Bukowski, Lupiáñez and Wodniecka2022; Wu et al., Reference Wu, Liu, Yao, Li and Peng2020), which is likely to result from an increased alertness, vigilance and sustained attention level when being in the L2 mode (Tomova et al., Reference Tomova, Majdandžić, Hummer, Windischberger, Heinrichs and Lamm2017). Also, as postulated by Gawinkowska et al. (Reference Gawinkowska, Paradowski and Bilewicz2013), long exposure to L1 culture, along with the process of socialization being mostly completed in L1, might potentially make L2 much less prone to normative influences that lead people to conform so as to be accepted by others. We thus believe that these influences should be further tested when processing gender stereotypes, and we expect an attenuated sensitivity to stereotype-laden meanings when bilinguals process them in their L2.
Such a potentially reduced sensitivity to stereotypes in L2 might extend the foreign language effect (FLE), which marks decreased emotional reactivity resulting from a psychological distance when processing L2 relative to L1 (e.g., Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014; Hayakawa et al., Reference Hayakawa, Tannenbaum, Costa, Corey and Keysar2017; Keysar et al., Reference Keysar, Hayakawa and An2012). Consistent with the L2 emotional detachment assumption, accumulating evidence has shown that the language of operation actively modulates bilinguals’ emotional sensitivity (Caldwell-Harris, Reference Caldwell-Harris2014; Dewaele, Reference Dewaele2004; Jankowiak & Korpal, Reference Jankowiak and Korpal2018) and affective states (García-Palacios et al., Reference García-Palacios, Costa, Castilla, del Río, Casaponsa and Duñabeitia2018; Iacozza et al., Reference Iacozza, Costa and Duñabeitia2017; Naranowicz et al., Reference Naranowicz, Jankowiak, Kakuba, Bromberek-Dyzman and Thierry2022b). Such dampened emotionality in L2 has been linked to a number interconnected factors, including late age of L2 acquisition combined with low L2 proficiency (Cieślicka & Guerrero, Reference Cieślicka and Guerrero2023; Harris et al., Reference Harris, Gleason, Ayçiçeǧi and Pavlenko2006), learning L2 mainly in the formal (i.e., classroom) environment (Degner et al., Reference Degner, Doycheva and Wentura2012; Dewaele, Reference Dewaele2010) and weaker neural connectivity between lexico-semantic representations and affect in L2 (Degner et al., Reference Degner, Doycheva and Wentura2012; Opitz & Degner, Reference Opitz and Degner2012). Crucially, we argue that the theoretical framework of psychological distance proposed by FLE may be broadened to encompass psychological constructs beyond emotions. Notably, stereotypes, as cognitive constructs ingrained in the conceptual store, could be subject to similar influences, as, similar to emotions, they have been recognized for their impact on human behavior (Kiełkiewicz-Janowiak & Pawelczyk, Reference Kiełkiewicz-Janowiak, Pawelczyk and Dziubalska-Kołaczyk2006; Stanciu et al., Reference Stanciu, Vauclair and Rodda2019). Extending the application of psychological distance to include stereotypes holds promise for gaining deeper insights into the neural mechanisms that shape and govern human behavior. In line with the FLE, here we hypothesize that the processing of gender stereotypes in the nondominant L2 might also lead to attenuated stereotype-driven responding due to potentially weaker connections between domain-general stereotype knowledge and L2 lexico-semantic representations.
Building upon previous research on gender stereotype processing in L1 (e.g., Canal et al., Reference Canal, Garnham and Oakhill2015; Osterhout et al., Reference Osterhout, Bersick and Mclaughlin1997; Pesciarelli et al., Reference Pesciarelli, Scorolli and Cacciari2019) and on distinct psychosocial mechanisms in L1 versus L2 (Castro et al., Reference Castro, Bukowski, Lupiáñez and Wodniecka2022; Gawinkowska et al., Reference Gawinkowska, Paradowski and Bilewicz2013; Wu et al., Reference Wu, Liu, Yao, Li and Peng2020), we make the first attempt to investigate the interplay between the language of operation and the automaticity of gender stereotype access and integration. To this end, we tested late proficient unbalanced Polish (L1) – English (L2) bilingual speakers (male and female) in a semantic decision task, involving L1 and L2 stereotypically congruent, stereotypically incongruent, semantically correct and semantically incorrect sentences. In L1, we predicted a stereotype congruency effect, reflected in larger ERP amplitudes for stereotypically incongruent relative to congruent sentences. This effect was primarily expected to be observed within the LPC time frame, which is known to be more sensitive than the N400 to stereotype-laden information presented within sentence contexts (Hypothesis 1; Canal et al., Reference Canal, Garnham and Oakhill2015; Osterhout et al., Reference Osterhout, Bersick and Mclaughlin1997; Proverbio et al., Reference Proverbio, Alberio and De Benedetto2018). In contrast, we predicted that bilingual speakers would be less sensitive to stereotype-laden information in their L2 (Castro et al., Reference Castro, Bukowski, Lupiáñez and Wodniecka2022; Wu et al., Reference Wu, Liu, Yao, Li and Peng2020), as evident in reduced ERP effects for stereotypically incongruent relative to stereotypically congruent sentences in L2 (Hypothesis 2). This would in turn suggest that the processing of stereotypical information might be less automatic and thus more cognitively taxing when operating in L2. Importantly, such an effect would extend FLE research to stereotype processing, suggesting that social norms and gender-based expectations encountered in L2 are less interconnected within the long-term memory store (Degner et al., Reference Degner, Doycheva and Wentura2012; Opitz & Degner, Reference Opitz and Degner2012). Specifically, it would indicate that the link between L2 lexico-semantic representations and stereotype knowledge is weaker in L2 relative to L1.
2. Methods
2.1. Participants
Following previous EEG research on stereotype processing (e.g., Grant et al., Reference Grant, Grey and van Hell2020; Pesciarelli et al., Reference Pesciarelli, Scorolli and Cacciari2019; Proverbio et al., Reference Proverbio, Alberio and De Benedetto2018), our original sample comprised 64 Polish (L1)–English (L2) bilingual speakers; yet, three of them were excluded from the analyses due to low quality of the recorded EEG data. The final sample therefore consisted of 61 native speakers of Polish (31 females, 30 males) aged 21–32 (M Females = 23.8 years, 95% CI [22.8, 24.8]; M Males = 24.8, 95% CI [23.9, 25.8]), who were students or graduates of English Studies at the Faculty of English, Adam Mickiewicz University, Poznań. Consistent with de Groot (Reference Groot2011), participants were classified as highly proficient unbalanced late bilinguals who had not lived in the L2 (English) environment and had acquired their L2 in an instructional yet immersive learning context (see Table 1). All participants had normal/corrected-to-normal vision and no language or neurological disorders. For their participation, participants received a gift card of 200 PLN.
1 LexTALE (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012; percentages).
2 Language History Questionnaire 3.0 (Li et al., Reference Li, Zhang, Yu and Zhao2020, as translated into Polish by Naranowicz & Witczak): the proficiency, dominance, and immersion scores (percentages); listening, speaking, reading, and writing skills (1 – very low proficiency, 7 – very high proficiency); age of acquisition (years).•* represent statistically significant between-gender differences, as revealed by the Welch two-sample t tests tests (•p < .06, *p < .01, **p < .001).
2.2. Materials
The stimuli included 480 Polish and 480 English sentences divided into four categories: 120 stereotypically congruent (e.g., Their niece became a hairdresser immediately after graduating.), 120 stereotypically incongruent (e.g., Their nephew became a hairdresser immediately after graduating.), 120 semantically correct (e.g., There is only one hairdresser with such experience.) and 120 semantically incorrect (e.g., The cooks seasoned the hairdresser with fresh chilli.) sentences in each language. The linguistic stimuli were adopted from a database by Jankowiak et al. (Reference Jankowiak, Naranowicz, Skałba, Drążkowski and Pawelczyk2024b) and were highly controlled for their meaningfulness, probability of use and stereotype congruency level. The two gender-stereotyped conditions (i.e., stereotypically congruent and incongruent sentences) featured 50% of female stereotype-biased and 50% of male stereotype-biased sentences.
In a series of normative studies (see Jankowiak et al. Reference Jankowiak, Naranowicz, Skałba, Drążkowski and Pawelczyk2024b for more details), 472 Polish native speakers and 470 English native speakers evaluated Polish and English sentences, respectively, on three 7-point Likert scales: meaningfulness (1 – very meaningless, 7 – very meaningful), probability of use (1 – very unlikely, 7 – very likely) and stereotype congruency (1 – very incongruent, 7 – very congruent). The results of meaningfulness ratings showed that Polish semantically correct and stereotypically congruent sentences were rated as similarly meaningful and received higher meaningfulness ratings compared to semantically incorrect and stereotypically incongruent sentences. Also, stereotypically incongruent sentences were rated as more meaningful than semantically incorrect items. In English, semantically correct, stereotypically congruent and stereotypically incongruent sentences were all rated as similarly meaningful and received higher meaningfulness ratings compared to semantically incorrect sentences. Next, the results of the probability of use ratings showed that, in both Polish and English, semantically correct and stereotypically congruent sentences were evaluated as similarly probable to use and obtained higher probability of use ratings compared to stereotypically incongruent sentences. Finally, the results of the stereotype congruency ratings revealed that in both Polish and English, stereotypically congruent sentences were rated as most consistent with gender stereotypes, followed by semantically correct and finally stereotypically incongruent sentences (see Table 3).
All the sentences were declarative and emotionally neutral. All Polish sentences featured 7–9 words (M = 8.00). The critical words were presented in fourth, fifth or sixth position in a sentence. Similarly, all English sentences were 8–10 words long (M = 9.00), with the critical words presented as fifth, sixth or seventh words. The larger number of words per sentence in English compared to Polish was due to articles in English, which do not exist in Polish. The critical words (all nouns) were adopted from a database by Jankowiak et al. (Reference Jankowiak, Naranowicz, Skałba, Drążkowski and Pawelczyk2024a) and were highly controlled for their frequency, valence, arousal, concreteness, age of acquisition and the number of letters and syllables.
2.4. Procedure
The procedure applied in the experiment was approved by the Ethics Committee for Research Involving Human Participants at Adam Mickiewicz University, Poznań (Resolution No. 12/2021/2022). Informed consents were obtained from all participants involved in the study. Prior to data collection, participants were screened online by means of a medical history questionnaire, LexTALE (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012), as well as a battery of questionnaires measuring their attitudes toward and identification with different gender roles (see Table 2): Importance to Identity subscale from the Collective Self-Esteem Scale (CES-R; Luhtanen, Crocker, Reference Luhtanen and Crocker1992; adapted into Polish by Bazińska, Reference Bazińska2015), Ambivalent Sexism Inventory (ASI; Glick & Fiske, Reference Glick and Fiske1996; adapted into Polish by Zawisza et al., Reference Zawisza, Luyt and Zawadzka2015), Attitudes toward Women Scale (AWS; Spence et al., Reference Spence, Helmreich and Stapp1973) and Bem Sex Role Inventory (BSRI; Bem, Reference Bem1974; adapted into Polish by Lipińska-Grobelny & Gorczycka, Reference Lipińska-Grobelny and Gorczycka2011). Additionally, having completed the experimental blocks, participants performed the Gender-Career Implicit Association Test (GCIAT; Greenwald et al., Reference Greenwald, McGhee and Schwartz1998) in the language of the final experimental block.
1 Edinburgh Handedness Inventory (Oldfield, Reference Oldfield1971): left-handedness (−100 to −28), ambidexterity (−29 to 48), and right-handedness (48–100).
2 Positive and Negative Affect Schedule (Watson et al., Reference Watson, Clark and Tellegen1988; adapted into Polish by Kaczmarek, Reference Kaczmarek, Kaczmarek and Sęk2004): the 0–100 range.
3 Bem Sex Role Inventory (Bem, Reference Bem1974; adapted into Polish by Lipińska-Grobelny & Gorczycka, Reference Lipińska-Grobelny and Gorczycka2011): the 7–70 range.
4 Importance to Identity subscale from the Collective Self-Esteem Scale (Luhtanen, Crocker, Reference Luhtanen and Crocker1992; adapted into Polish by Bazińska, Reference Bazińska2015): the 1–7 range.
5 Ambivalent Sexism Inventory (Glick & Fiske, Reference Glick and Fiske1996; adapted into Polish by Zawisza et al., Reference Zawisza, Luyt and Zawadzka2015): the 1–6 range.
6 Attitudes towards Women Scale (Spence et al., Reference Spence, Helmreich and Stapp1973): the 1–4 range.
7 Gender-Career Implicit Association Test (Greenwald et al., Reference Greenwald, McGhee and Schwartz1998): the Implicit Association Test effect size – the −2 to 2 range; •* represents statistically significant between-gender differences, as revealed by the Welch two-sample t tests (•p < .06, *p < .01, **p < .001).
The experiment proper was carried out in the Psychophysiology of Language and Affect Laboratory (Faculty of English, Adam Mickiewicz University, Poznań). Participants were seated in a dimly lit and quiet booth, 75 cm away from an LED monitor with a screen resolution of 1280 × 1024 pixels. E-Prime 3.0 was used to present the stimuli and collect the behavioral data.
Participants completed the Edinburgh Handedness Inventory (Oldfield, Reference Oldfield1971) and the Language History Questionnaire 3.0 (Li et al., Reference Li, Zhang, Yu and Zhao2020) during the EEG cap preparation. Also, given that previous research has pointed to a crucial role of participants’ mood in the automaticity of bilingual language processing (e.g., Jankowiak et al., Reference Jankowiak, Naranowicz and Thierry2022; Naranowicz et al., Reference Naranowicz, Jankowiak, Kakuba, Bromberek-Dyzman and Thierry2022b), participants were asked to rate their current affective state by means of the Polish version of PANAS-X (Positive and Negative Affective Schedule; Watson et al., Reference Watson, Clark and Tellegen1988, adapted into Polish by Kaczmarek, Reference Kaczmarek, Kaczmarek and Sęk2004; see Table 2). In the experiment proper, participants performed a semantic decision task, wherein they decided if a sentence was meaningless or meaningful by pressing designated keys, whose designation was counterbalanced. Participants completed two randomly presented blocks in Polish (L1) and two in English (L2). Half of the participants began the experiment with the Polish blocks, and the other half – with the English ones. Each of the four blocks comprised 120 experimental (30 stereotypically congruent, 30 stereotypically incongruent, 30 semantically correct and 30 semantically incorrect) sentences as well as 60 filler (semantically incorrect) sentences. The sentences were randomly presented on a computer screen using black letters and were centered on a gray background. The beginning of each sentence was presented at once, and upon pressing a button, the remaining words were presented automatically word-by-word (Naranowicz et al., Reference Naranowicz, Jankowiak and Behnke2022a, Reference Naranowicz, Jankowiak, Kakuba, Bromberek-Dyzman and Thierry2022b; Jankowiak et al., Reference Jankowiak, Naranowicz and Thierry2022). The time sequence of stimuli presentation is provided in Figure 1.
2.5. EEG data recording
EEG data were recorded at 2048 Hz from 64 Ag/AgCl electrodes (i.e., Fp1, Fpz, Fp2, AF7, AF3, AFz, AF4, AF8, F7, F5, F3, F1, Fz, F2, F4, F6, F8, FT7, FC5, FC3, FC1, FCz, FC2, FC4, FC6, FT8, T7, C5, C3, C1, Cz, C2, C4, C6, T8, TP7, CP5, CP3, CP1, CPz, CP2, CP4, CP6, TP8, P9, P7, P5, P3, P1, Pz, P2, P4, P6, P8, P10, PO7, PO3, POz, PO4, PO8, O1, Oz, O2 and Iz) placed at the standard extended 10–20 positions. The bipolar electrodes monitoring vertical (vEOG) and horizontal (hEOG) eye movements were placed above and below the left eye and next to the outer rims of both eyes, respectively. The EEG signals were recorded by ActiView (Biosemi B.V., Amsterdam) and amplified using an ActiveTwo AD-box (Biosemi B.V., Amsterdam). Since the recommendations for the BioSemi ActiveTwo system suggest recording an EEG signal with an impedance level for each electrode of no more than ±50 μV, in the present experiment, we ensured that the impedance (offset) level for each electrode was kept at ±20 μV.
2.6. Data analysis
All statistical analyses were performed in R Core Team (2020). We analyzed response accuracy along with two ERP components previously reported to be modulated by lexico-semantic processing in L1 and L2: the N400 and LPC. The ERP analyses were performed within pre-defined time windows, in accordance with previous EEG research on stereotype processing (e.g., Osterhout et al., Reference Osterhout, Bersick and Mclaughlin1997; White et al., Reference White, Crites, Taylor and Corral2009; Proverbio et al., Reference Proverbio, Alberio and De Benedetto2018; Lu et al., Reference Lu, Peng, Liao and Cui2019; Yang et al., Reference Yang, White, Fan, Xu and Chen2020): 300–500 ms (N400) over the FC1, FCz, FC2 (fronto-central), C1, Cz, C2 (central), CP1, CPz and CP2 (centro-parietal) electrodes and 600–800 ms (LPC) over the C1, Cz, C2 (central), CP1, CPz, CP2 (centro-parietal), P1, Pz and P2 (parietal) electrodes.
BrainVision Analyzer 2.1 software (Brain Products, Germany) was used to analyze the EEG data offline. Continuous EEG data were down-sampled to 500 Hz, referenced to the common average reference (Nunez & Srinivasan, Reference Nunez and Srinivasan2006; Luck, Reference Luck2014) and filtered offline (Butterworth zero-phase filters) with a high-pass filter set at 0.1 Hz (slope 24 dB/octave) and a low-pass filter set at 20 Hz (slope 24 dB/octave). They were then segmented from 200 ms before critical word onset to 1000 ms afterward, baseline-corrected relative to signal between −200 and 0 ms before stimulus onset and edited for artifacts (i.e., rejecting trials with flatlining events, voltage differences higher than 100 μV or voltage steps higher than 50 μV). Ocular artifacts were corrected using the ocular artifact regression method by Gratton and Coles (Reference Gratton, Coles and Donchin1983).
ERPs were time-locked to the onset of the critical word of each sentence, which was placed in a mid-sentence position. Both the response accuracy and ERP analyses (i.e., mean amplitudes for single trials in the N400 and LPC time windows) conformed to a 2 (Language: Polish [L1] versus English [L2]) × 4 (Sentence type: semantically correct versus semantically incorrect versus stereotypically congruent versus stereotypically incongruent sentences) × 2 (Gender: Females versus Males) design, with Language and Sentence type as within-subject factors and Gender as a between-subject factor. The respective ERP data falling outside the value of 1.5 interquartile range in the N400 and LPC time windows were discarded from the analyses, resulting in the normal distribution of the data in each time window. Altogether, 2.03% of all data was rejected from the N400 analysis and 2.34% from the LPC analysis.
The response accuracy data were analyzed with generalized linear mixed-effects models (Baayen et al., Reference Baayen, Davidson and Bates2008; Barr, Reference Barr2013; Barr et al., Reference Barr, Levy, Scheepers and Tily2013; Jaeger, Reference Jaeger2008), fitting a binominal model (Bolker, Reference Bolker2008), while the ERP data were analyzer with linear mixed-effects models (Baayen et al., Reference Baayen, Davidson and Bates2008; Barr et al., Reference Barr, Levy, Scheepers and Tily2013; Barr, Reference Barr2013), using the lme4 package (Bates et al., Reference Bates, Mächler, Bolker and Walker2015). A maximal model was first computed with a full random-effect structure, including subject- and item-related variance components for intercepts and by-participant and by-item random slopes for fixed effects (Barr et al., Reference Barr, Levy, Scheepers and Tily2013). The model complexity was then reduced to arrive at a parsimonious model using principal component analysis (Bates et al., Reference Bates, Mächler, Bolker and Walker2015). Sliding difference contrasts (Venables & Ripley, Reference Venables and Ripley2002) were applied for all predictors (Frömer et al., Reference Frömer, Maier and Abdel Rahman2018). b estimates and significance of fixed effects and interactions (p-values) were based on the Satterthwaite approximation for mixed-effects models (the lmerTest package; Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017). Pairwise comparisons were Bonferroni-corrected and calculated using the emmeans package (Lenth et al., Reference Lenth, Bolker, Buerkner, Giné-Vázquez, Herve, Jung, Love, Miguez, Reibl and Singmann2023).
Note that due to the presentation of the critical words in a mid-sentence position, participants could make semantic judgements long before pressing a corresponding key, which decreased the sensitivity of the response time data as a behavioral measure. The response time analysis is reported in Supplementary materials.
3. Results
3.1. Behavioral data: Response accuracy
The analysis of response accuracy showed a fixed effect of Sentence type, such that stereotypically congruent sentences (M = 95.54%, 95% CIs [95.14, 96.03) were responded to with greater accuracy than semantically correct sentences (M = 94.47%, 95% CIs [93.91, 95.04]), b = −.56, SE = .15, z = −3.83, p = .001, as well as semantically incorrect sentences (M = 93.61%, 95% CIs [93.00, 94.23]), b = .83, SE = .14, z = 6.08, p < .001. Also, participants responded to stereotypically incongruent (M = 94.33%, 95% CIs [93.82, 94.98]) sentences more accurately than to semantically incorrect sentences, b = .52, SE = .13, z = 3.96, p < .001. There was also a fixed effect of Language, b = .50, SE = .16, z = 3.15, p = .002, whereby participants responded to Polish (L1) sentences (M = 95.51%, 95% CIs [95.22, 95.91]) with greater accuracy than English (L2) sentences (M = 93.47%, 95% CIs [91.13, 93.90]). All remaining differences in response accuracy were statistically nonsignificant, ps > .05.
3.2. EEG data: N400 (300–500 ms)
Within the N400 time window (300–500 ms), the analysis yielded a fixed effect of Language (b = −.15, SE = .06, t(123) = −2.28, p = .024), with larger N400 amplitudes for Polish (L1) than English (L2) sentences. Moreover, there was a fixed effect of Sentence type, with larger N400 amplitudes for semantically incorrect relative to semantically correct and stereotypically congruent sentences (see Table 4).
Sentence types: [1] – semantically correct; [2] – semantically incorrect; [3] – stereotypically congruent; [4] – stereotypically incongruent.
The analysis also showed an interaction between Sentence type, Language and Gender. While the analysis for females showed larger N400 amplitudes for semantically incorrect sentences relative to all the other sentence types (see Table 4 below), the analysis for males revealed an interaction between Sentence type and Language. In Polish (L1), post-hoc t tests in males showed larger N400 amplitudes for semantically incorrect sentences than all the other sentence types. In English (L2), in contrast, semantically correct sentences elicited smaller N400 amplitudes than all the other sentence types (see Table 4, Figures 2 and 3). All remaining differences in the N400 amplitudes were statistically nonsignificant, ps > .05.
3.3. EEG data: LPC (600–800 ms)
Within the LPC time window (600–800 ms), the analysis showed a fixed effect of Sentence type, with larger LPC amplitudes for semantically incorrect sentences than all remaining sentence types (see Table 5).
Sentence types: [1] = semantically correct; [2] = semantically incorrect; [3] = stereotypically congruent; [4] = stereotypically incongruent.
Moreover, the analysis yielded an interaction between Sentence type, Language and Gender. In females, there was a fixed effect of Sentence type, whereby larger LPC amplitudes were observed for semantically incorrect sentences than all the remaining sentence types as well as for stereotypically incongruent than semantically correct sentences. In males, there was an interaction between Sentence type and Language. In Polish (L1), post-hoc t tests showed larger LPC amplitudes for both semantically incorrect and stereotypically incongruent sentences compared to semantically correct and stereotypically congruent sentences. In contrast, in English (L2), semantically incorrect sentences evoked larger LPC amplitudes than all the remaining sentence types, mirroring the general language-independent effect of Sentence type. Moreover, smaller LPC amplitudes were also observed for stereotypically congruent than semantically correct and stereotypically incongruent sentences (see Table 5, Figures 2 and 3). All remaining differences in the LPC amplitudes were statistically nonsignificant, ps > .05.
4. Discussion
The present ERP study explored how bilingual speakers process gender stereotypes in their native (L1) and foreign language (L2). Highly proficient unbalanced (L1-dominant) male and female Polish–English bilinguals performed a semantic decision task involving stereotypically congruent, stereotypically incongruent, semantically correct and semantically incorrect sentences in Polish (L1) and English (L2). We predicted stereotypically congruent and incongruent content to be processed in a language-dependent manner, with attenuated stereotype congruency effects in L2 relative to L1, thus reflecting a decreased sensitivity to stereotypes in L2. In both the N400 and LPC time frames, we found ERP patterns to be affected differentially by sentence type and the language of operation in a gender-dependent manner.
Partially in line with Hypothesis 1, we observed N400 and LPC modulations by the sentence types, which were additionally driven by the language of operation and participants’ gender. In females, in both L1 and L2, semantically incorrect sentences elicited larger N400 amplitudes than semantically correct, stereotypically congruent and stereotypically incongruent conditions. Since the N400 is argued to index the amount of information retrieved from memory during lexico-semantic processing (Kutas & Federmeier, Reference Kutas and Federmeier2011), the observed results suggest that compared to semantically incorrect sentences, accessing the lexico-semantic representations of the critical words embedded in semantically correct, stereotypically congruent and incongruent contexts required less extensive and cognitively demanding mechanisms in females. With the N400 also being an ERP response that reflects the unintentional access to implicit knowledge (Friederici, Reference Friederici2002; Kutas & Federmeier, Reference Kutas and Federmeier2000; Kotz et al., Reference Kotz, Rothermich, Schmidt-Kassow and Faust2012), our results indicate that for women, gender stereotype-laden information might be activated during language processing in a highly automatized manner, similar to semantically correct items.
A pattern resembling the effect observed in the N400 time window emerged within the LPC time frame, where in females, semantically incorrect sentences evoked larger LPC amplitudes relative to semantically correct, stereotypically congruent and stereotypically incongruent items in both L1 and L2. This implies that in females, at the stages of both lexico-semantic processing (indexed by the N400) and meaning integration (indexed by the LPC), semantically correct sentences aligned with not only stereotypically congruent conditions but also with sentences that violated stereotypes. This alignment suggests the equally effective lexico-semantic access and integration of stereotype-laden sentences, regardless of their stereotype congruency. Taken together, these results indicate a significant internalization of gender stereotype-laden linguistic information in females, evident in both their L1 and L2, and reflecting a reduced cognitive demand invested in lexico-semantic access and integration of gendered information.
In contrast to females, the N400 and LPC modulations by the sentence types in males were additionally driven by the language of operation. First, within the N400 time window, males displayed larger N400 amplitudes for semantically incorrect sentences compared to the other sentence types in L1, aligning with the general fixed effect observed among females. Conversely, in L2, both stereotypically congruent and incongruent sentences converged with semantically incorrect sentences. This convergence suggests that L2 stereotype-laden sentences, whether adhering to or violating stereotype congruency, posed greater demands on lexico-semantic memory retrieval compared to semantically correct items in males. Second, within the LPC time window, semantically incorrect sentences evoked larger LPC amplitudes than all the remaining sentence types, yet only in L2. In L1, on the other hand, both semantically incorrect and stereotypically incongruent sentences evoked larger LPC amplitudes compared to semantically correct and stereotypically congruent conditions. This indicates that male participants engaged in continuous meaning reanalysis of sentences that deviated from gender stereotypes only in L1, possibly reflecting their increased gender-stereotypical bias in L1 than L2.
The above results might be interpreted within the language of operation and gender-driven effects. First, the results indicating a more pronounced gender-stereotypical bias in males when they operated in L1 relative to L2 are partially in line with Hypothesis 2, possibly reflecting a decreased sensitivity to stereotype-laden information when operating in L2. Consequently, such results extend the FLE research to the processing of gender stereotypes in L2. Previous research on the FLE has mostly focused on affective processing, where reduced emotional reactivity has been consistently observed during emotion processing in L2 (e.g., Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014; Hayakawa et al., Reference Hayakawa, Tannenbaum, Costa, Corey and Keysar2017; Jankowiak & Korpal, Reference Jankowiak and Korpal2018; Keysar et al., Reference Keysar, Hayakawa and An2012). Here, we show that similar to affective stimuli, the processing of gender stereotypes in L2 might lead to attenuated stereotype-driven responding, potentially due to weaker connections between domain-general stereotype knowledge and L2 representations (Degner et al., Reference Degner, Doycheva and Wentura2012; Opitz & Degner, Reference Opitz and Degner2012). This interpretation is consistent with previous research indicating that bilinguals’ psychosocial processes are modulated by the language of operation. For instance, bilinguals have recently been observed to be more empathetic (Wu et al., Reference Wu, Liu, Yao, Li and Peng2020) and following reduced social biases (Castro et al., Reference Castro, Bukowski, Lupiáñez and Wodniecka2022) when operating in their L2 possibly due to increased alertness, vigilance and a sustained attention level when in the L2 mode (García-Palacios et al., Reference García-Palacios, Costa, Castilla, del Río, Casaponsa and Duñabeitia2018; Tomova et al., Reference Tomova, Majdandžić, Hummer, Windischberger, Heinrichs and Lamm2017). It is also noteworthy that our study included only late unbalanced (L1-dominant) bilingual participants who acquired their foreign language in a formal school setting. As a result, their L2 may have been less susceptible to normative influences, including those expressed through gender stereotypes, since their socialization was mostly completed in their L1 (Gawinkowska et al., Reference Gawinkowska, Paradowski and Bilewicz2013).
Second, we interpret the more demanding retrieval and integration mechanisms (indexed by the N400 and LPC modulations, respectively) in males but not in females as indicative of increased gender-stereotypical bias among men (Koch et al., Reference Koch, D’Mello and Sackett2015; Koenig et al., Reference Koenig, Eagly, Mitchell and Ristikari2011) and their reduced acceptance of scenarios that diverge from traditional gender roles (e.g., Negy & Eisenman, Reference Negy and Eisenman2005). Women are, on the other hand, significantly more affected by stereotypes in various aspects of their lives and are thus more consistently exposed to them (Eagly & Sczesny, Reference Eagly, Sczesny, Barreto, Ryan and Schmitt2009; Heilman, Reference Heilman2012; Peus et al., Reference Peus, Braun and Knipfer2015; Hentschel et al., Reference Hentschel, Heilman and Peus2019; Jankowiak et al., Reference Jankowiak, Naranowicz, Skałba, Drążkowski and Pawelczyk2024a). As a result, females may be more attuned to gender stereotypes, leading to facilitated lexico-semantic access to gender stereotype-laden linguistic content. Such a pattern aligns with findings from prior research (Cattaneo et al., Reference Cattaneo, Mattavelli, Platania and Papagno2011; Cikara et al., Reference Cikara, Eberhardt and Fiske2011; Proverbio et al., Reference Proverbio, Alberio and De Benedetto2018), indicating elevated levels of hostile sexism and prejudice in men as opposed to women. Crucially, similar effects were observed in an ERP study by Proverbio et al. (Reference Proverbio, Alberio and De Benedetto2018), who observed the modulations of the N400 and LPC by stereotype congruency solely in male participants. This effect is also in line with neuroimaging studies, where hostile sexism was found to be correlated with the activation of brain regions associated with mental state attributions, yet only in men (Cikara et al., Reference Cikara, Eberhardt and Fiske2011). Such an increased bias in males has been indicated to potentially stem from a “flight from femininity” attitude (Sereno & O’Donnell, Reference Sereno and O’Donnell2009), suggesting that males not only highly attend to their own gender role stereotypes but are also sensitive to stereotypes linked with identities they are expected to avoid. In contrast, females experience fewer constraints on masculine behaviors, potentially making them more accepting of gender stereotype violations.
Another language-dependent pattern was reflected in the fixed effect of language observed within the N400 time window, where we found smaller N400 amplitudes for L2 compared to L1, the effect being independent of the sentence type. Such results are highly consistent with previous EEG research on semantic processing in bilingualism (e.g., Jankowiak et al., Reference Jankowiak, Rataj and Naskręcki2017; Midgley et al., Reference Midgley, Holcomb and Grainger2009; Naranowicz et al., Reference Naranowicz, Jankowiak, Kakuba, Bromberek-Dyzman and Thierry2022b; Newman et al., Reference Newman, Tremblay, Nichols, Neville and Ullman2012). In line with the functional role of the N400 (Kutas & Federmeier, Reference Kutas and Federmeier2011), smaller N400 amplitudes for L2 are linked to weaker interconnectivity between L2 lexical items in the semantic network (Midgley et al., Reference Midgley, Holcomb and Grainger2009; Jankowiak et al., Reference Jankowiak, Rataj and Naskręcki2017). Such a decreased interconnectivity results in reduced activity in the memory store and less extensive spreading activation mechanisms during L2 lexico-semantic access.
Importantly, though the present study provides the first EEG exploration of the interplay between the language of operation and gender stereotype processing, more research is needed to provide more insights into the role of individual characteristics that might potentially modulate this relationship. First of all, in the present study, the interaction between the language of operation and sentence type emerged exclusively among male participants, which suggests that L2 might actively modulate gender-stereotypical bias mostly in those individuals who already exhibit higher levels of prejudicial behaviors (Cattaneo et al., Reference Cattaneo, Mattavelli, Platania and Papagno2011; Cikara et al., Reference Cikara, Eberhardt and Fiske2011; Proverbio et al., Reference Proverbio, Alberio and De Benedetto2018). Further research is therefore needed to investigate the interplay between the language of operation and gender-stereotypical bias in a wider range of participant samples, including individuals with varying degrees of conservatism and predisposition toward stereotype-driven behaviors and attitudes.
Furthermore, the FLE that emerged in the present study among males is interesting, given that our participant sample comprised highly proficient bilingual speakers. Though the presence of the FLE among proficient bilinguals might seem surprising, given that some studies have pointed to a negative correlation between L2 proficiency level and the magnitude of the FLE (e.g., Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014; Brouwer, Reference Brouwer2019), the results align with other experiments that have observed a robust FLE also in highly proficient bilinguals (e.g., Jankowiak & Korpal, Reference Jankowiak and Korpal2018; Naranowicz et al., Reference Naranowicz, Jankowiak and Behnke2022a). Due to such inconsistencies, future studies should extend their scope to include a comparison between bilinguals with varying levels of L2 proficiency, thus investigating the interplay between L2 proficiency level and psychophysiological responding in L2 (e.g., Cieślicka & Guerrero, Reference Cieślicka and Guerrero2023; Dewaele, Reference Dewaele2016; Harris et al., Reference Harris, Gleason, Ayçiçeǧi and Pavlenko2006; for discussions, see Del Maschio et al., Reference Del Maschio, Del Mauro, Bellini, Abutalebi and Sulpizio2022; Privitera et al., Reference Privitera, Li, Zhou and Wang2023; Privitera, Reference Privitera2023).
While the present study aimed to investigate the neurophysiological dynamics of gender stereotype processing in bilingualism within a controlled laboratory setting, future research should also strive to explore the practical implications of reduced sensitivity to gender stereotypes in L2 in a more qualitative and applied manner. Since gender stereotypes reflect one of the major barriers to achieving gender equality (Eagly & Sczesny, Reference Eagly, Sczesny, Barreto, Ryan and Schmitt2009; Heilman, Reference Heilman2012), the reduced automaticity of stereotype activation in L2 among men should be further tested to demonstrate whether people could use their bilingual experience to be less susceptible to stereotypes. This, in turn, could help create recommendations for job interviews or oral examinations, showing whether such events should be conducted in the L2 context so as to avoid the activation of gender stereotypes in a recruiter/examiner. Specifically, in highly stereotypical situations (e.g., job interviews), the bilingual context might become a favorable communication form in order to mitigate stereotype threat (i.e., being at risk of confirming a negative stereotype about one’s social group; Steele, Reference Steele1997) individuals commonly face. The L2 context may reduce the risk of a candidate/student susceptibility to such a threat, which is known to significantly impair cognitive abilities and performance in one’s L1 (von Hippel et al., Reference von Hippel, Sekaquaptewa and McFarlane2015). Overall, further research on bilingual stereotype processing is essential to contribute to the ongoing discussion on how the language of operation influences the activation of social norms. This line of inquiry can provide valuable insights into the fields of not only psycho- and neurolinguistics but also sociolinguistics and social psychology.
5. Conclusion
This contribution presents the first attempt to explore the relationship between the language of operation and gender stereotype processing. The study revealed gender- and language-dependent modulations by the sentence types in both the N400 and LPC patterns. In females, we observed a deep-rooted internalization of gender stereotype-laden linguistic information in both L1 and L2, as evidenced by reduced cognitive demands during lexico-semantic and meaning integration mechanisms. Conversely, males displayed a heightened gender-stereotypical bias, yet only in L1. In L2, on the other hand, they exhibited a reduced sensitivity to gender stereotypes. This finding suggests a reduced internalization of gender stereotype knowledge among males in L2, which might reflect their decreased sensitivity to stereotype-laden information when operating in L2. Thus, the present study extends the FLE research to the processing of gender stereotypes in L2 and highlights the possibility that male bilingual speakers exhibiting more stereotype-driven attitudes may use their L2 to engage in less stereotypical behavior. This may have significant implications for social, political and job contexts, where the second language may be used to promote gender-fair language and reduce gender stereotyping and discrimination.
Supplementary material
To view supplementary material for this article, please visit http://doi.org/10.1017/S1366728924000531.
Data availability
The data and codes are available at https://osf.io/bzm3s/.
Authors’ contribution
K.J.: conceptualization; data curation; formal analysis; funding acquisition; methodology; project administration; resources; software; supervision; validation; writing – original draft; writing – review & editing.
M.N.: conceptualization; data curation; formal analysis; methodology; software; validation; visualization; writing – original draft; writing – review & editing.
J.P.: conceptualization; methodology; validation; writing – review & editing.
D.D.: conceptualization; formal analysis; methodology; validation; writing – review & editing.
J.G.: investigation; writing – review & editing.
Funding
This work was funded by the National Science Centre, Poland (Grant Number 2021/41/B/HS2/00249), granted to Katarzyna Jankowiak.