Introduction
The “Moral Foreign Language Effect” (M-FLE) (Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner & Keysar, Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014; Hayakawa, Costa, Foucart & Keysar, Reference Hayakawa, Costa, Foucart and Keysar2016) shows that making decisions in a foreign (vs. native) language reduces biases. It has been studied in adults (e.g., Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014; Geipel, Hadjichristidis & Surian, Reference Geipel, Hadjichristidis and Surian2015; Hayakawa et al., Reference Hayakawa, Costa, Foucart and Keysar2016; Romero-Rivas, López-Benítez & Rodríguez-Cuadrado, Reference Romero-Rivas, López-Benítez and Rodríguez-Cuadrado2022), but not in children, where studies on reasoning and moral judgements have only relied on their native languages (e.g., Bucciarelli, Reference Bucciarelli2015; Mikhail, Reference Mikhail2011; Pellizzoni, Siegal & Surian, Reference Pellizzoni, Siegal and Surian2010). Considering that moral reasoning is a developmental process, and the rise of bilingual education worldwide (i.e., in the Autonomous region of Madrid, during the 19/20 academic year, the Spanish-English bilingual programme was present in 50% of public schools; Mañas Antón, Reference Mañas Antón2019), understanding how the M-FLE might affect decision-making in children is particularly relevant, as children might be educated about morality and related topics (i.e., ethics, affection) in their second language (L2).
M-FLE in adults
Costa et al. (2014, Experiment 2) presented two versions of the trolley dilemma. In the “footbridge” version (highly aversive), an on-coming train is about to kill five people and the only way to stop it is pushing a heavy man off the footbridge, falling in front of the train, and dying. In the less-aversive “switch” version, participants can flip a switch; if they do not, the train will kill five people, and if they do, it will kill one person. Responses are classically dichotomic: they can be utilitarian, thus supporting the common good, or deontological, supporting a person's rights. In Costa et al.'s (2014) study, the percentage of utilitarianism varied by aversiveness and language: when aversiveness was high, utilitarian choices were made by 18% of the L1 and 44% of the L2 participants; when aversiveness was low, percentages were 81% and 80%, respectively. Costa et al. (Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014) concluded that using the L2 increases emotional distance, leading to more utilitarian judgements in emotionally aversive scenarios. Differences on the M-FLE depending on the degree of aversiveness could depend on the activation of different processing routes (Greene, Nystrom, Engell, Darley & Cohen, Reference Greene, Nystrom, Engell, Darley and Cohen2004). Aversive dilemmas would activate a route based on automatic emotional processing, usually leading to deontological judgements, whereas less aversive dilemmas would activate conscious routes that prompt utilitarian decisions (Geipel et al., Reference Geipel, Hadjichristidis and Surian2015). However, Geipel et al. (Reference Geipel, Hadjichristidis and Surian2015) also proposed that the M-FLE could derive from a limited access to social or moral norms. Critically, Bialek, Paruzel-Czachura and Gawronski (Reference Bialek, Paruzel-Czachura and Gawronski2019) explored whether the M-FLE was motivated by differences in sensitivity to consequences (in a utilitarian sense), to norms (in a deontological sense), or in general action tendencies. People showed a reduced sensitivity to both consequences and norms when using their L2; therefore, these seem more relevant when using our L1 vs. our L2.
To further explore how M-FLE operates, Romero-Rivas et al. (Reference Romero-Rivas, López-Benítez and Rodríguez-Cuadrado2022) sought for the locus of this reduced sensitivity in the M-FLE by distinguishing between emotions related to the self and empathy. The classic “footbridge” and “switch” versions of the trolley dilemma were presented either in participants’ L1 or L2 adding the third response option of self-sacrifice. Results agreed with the attenuated emotionality account of M-FLE (e.g., Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014; Hayakawa, Tannenbaum, Costa, Corey & Keysar, Reference Hayakawa, Tannenbaum, Costa, Corey and Keysar2017), as participants in the L2 group were more willing to self-sacrifice in both versions of the dilemma irrespective of participant's empathy levels, suggesting that emotional attenuation applies mostly to emotions related to the self.
However, the M-FLE is not found ubiquitously, as it has been identified by recent meta-analyses (Circi, Gatti, Russo & Vecchi, Reference Circi, Gatti, Russo and Vecchi2021; Del Maschio, Crespi, Peressotti, Abutalebi & Sulpizio, Reference Del Maschio, Crespi, Peressotti, Abutalebi and Sulpizio2022; Stankovic, Biedermann & Hamamura, Reference Stankovic, Biedermann and Hamamura2022). Also, while identifying what influences the M-FLE, their conclusions sometimes differ. For example, L1-L2 similarity affected the M-FLE in Circi et al. (Reference Circi, Gatti, Russo and Vecchi2021), but not in Stankovic et al. (Reference Stankovic, Biedermann and Hamamura2022), and proficiency predicted the M-FLE in Stankovic et al. (Reference Stankovic, Biedermann and Hamamura2022), but not in Circi et al. (Reference Circi, Gatti, Russo and Vecchi2021) or Del Maschio et al. (Reference Del Maschio, Crespi, Peressotti, Abutalebi and Sulpizio2022). In addition, none of these meta-analyses have considered children. Given the long tradition of understanding moral reasoning as a developmental process (e.g., Piaget, Reference Piaget1932), learning about the M-FLE in children will add knowledge about the mechanisms responsible for this effect.
Moral reasoning in children
Moral reasoning in children has been only investigated using participants’ L1. Studies have used different protocols and children of different ages, but, in summary, they tend to find that children are utilitarian. Pellizzoni et al. (2010, studies 1 and 2) presented the “footbridge” and a variant of the “switch” versions of the trolley dilemma to children aged 3-5 and adults. Children, like adults, were more utilitarian in the less aversive (i.e., “switch”) scenario (but see Stey, Reference Stey2014), as in adult studies manipulating aversiveness (e.g., Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014). With similarly aged children (3-6), Dworazik, Kärtner, Lange, and Köster (Reference Dworazik, Kärtner, Lange and Köster2019) explored how they (and their mothers) responded to different versions of the trolley dilemma, finding a preference for utilitarian responses. Dworazik et al. (Reference Dworazik, Kärtner, Lange and Köster2019) argue that their results support the Universal Moral Grammar Theory (UMG; Cushman, Young & Hauser, Reference Cushman, Young and Hauser2006) as human morality would be innate and judgements will be built in the family. However, children were more utilitarian than adults in the footbridge version, contrary to Pellizzoni et al. (Reference Pellizzoni, Siegal and Surian2010), implying that there would be nuances regarding the UMG as some principles could apply differently to young children and adults.
Bucciarelli (Reference Bucciarelli2015) presented moral dilemmas to children aged 9-10, adolescents aged 13-14, and adults. Manipulating aversiveness and utilitarianism (i.e., anti-utilitarian dilemmas implied killing five to save one, and pro-utilitarian killing one to save five), she found that children were more utilitarian than adults (replicated by Daniele & Bucciarelli, Reference Daniele and Bucciarelli2016). Bucciarelli (Reference Bucciarelli2015) and Daniele and Bucciarelli (Reference Daniele and Bucciarelli2016) state that their results agree with the “mental model theory”. This theory states that moral judgements would not rely as much on emotion as they do on reasoning, reflecting cognitive capacities which advance with age, thus accounting for the differences between children and adults.
Finally, in relation to sacrifice, Weller and Hansen Lagattuta (Reference Weller and Hansen Lagattuta2013) studied how race influences prosocial moral judgements and attributions of emotion in children aged 5-13. They identified that the satisfaction experienced by helping others, even if sacrificing one's own desires, emerges at the age of seven. Therefore, considering the aforementioned evidence, and that dilemmas will be presented in the L2, we chose children aged 9-12 as the target group to explore the M-FLE.
The current study
Our main aim is to explore the M-FLE in children. Considering research on 1) M-FLE (e.g., Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014; Romero-Rivas et al., Reference Romero-Rivas, López-Benítez and Rodríguez-Cuadrado2022) and 2) moral judgements in children (Bucciarelli, Reference Bucciarelli2015; Daniele & Bucciarelli, Reference Daniele and Bucciarelli2016; Pellizzoni et al., Reference Pellizzoni, Siegal and Surian2010; Weller & Hansen Lagattuta, Reference Weller and Hansen Lagattuta2013), we expect that, when using their L2, children will be more: a) utilitarian; and b) willing to self-sacrifice.
We also studied whether aversiveness affects children's decision-making. That would be expected following Pellizzoni et al. (Reference Pellizzoni, Siegal and Surian2010) or Costa et al. (2014, adult study), but data from Bucciarelli (Reference Bucciarelli2015) or Dworazik et al. (Reference Dworazik, Kärtner, Lange and Köster2019) would predict the opposite, as children were utilitarian even in highly aversive scenarios (unlike most adults). Also, we included an anti-utilitarian dilemma. Bucciarelli (Reference Bucciarelli2015, experiment 2) presented “pro” and “anti-utilitarian” versions of their dilemmas to explore whether children are biased to act, which they were not. We expect a similar outcome.
Finally, we cannot make solid predictions about grade or age. First, past literature did not explicitly assess grade. However, we included this factor as an exploratory measure of moral development besides age, as each grade combines several ages (i.e., grade 4: 9-10; grade 5: 10-11; grade 6: 11-12), where different social interactions could occur at each grade level. Second, Pellizzoni et al. (Reference Pellizzoni, Siegal and Surian2010) compared children aged 3-5 and adults finding no differences, whereas Bucciarelli (Reference Bucciarelli2015) and Daniele and Bucciarelli (Reference Daniele and Bucciarelli2016) found children to be more utilitarian. We did not include adults, but we will analyse whether age is a factor modulating decision making during childhood. As per gender, Pellizzoni et al. (Reference Pellizzoni, Siegal and Surian2010) did not find differences, and although Bucciarelli (Reference Bucciarelli2015) only considered female data, their results were replicated by Daniele and Bucciarelli (Reference Daniele and Bucciarelli2016) including male and female participants. Therefore, no effect of gender is expected.
Methods
Participants
Eighty-five children aged 9-12 (12 girls and six boys aged 9, 11 girls and 22 boys aged 10, 10 girls and 13 boys aged 11, and five girls and six boys aged 12) participated. They belonged to six classes distributed in three grades in a public bilingual school in Madrid: 4th (28 children), 5th (29 children) and 6th (28 children) year of Primary Education (4th: mean age = 9.35, SD = 0.48, 16 girls, 12 boys; 5th: mean age = 10.20, SD = 0.41, eight girls, 21 boys; 6th: mean age = 11.39, SD = 0.49, 14 girls, 14 boys).
All were native speakers of Spanish and L2 speakers of English (in Spain, a foreign language is learned from the first year of compulsory education; Ley Orgánica de Modificación de la LOE or LOMLOE, 2020), randomly allocated to the native (L1) or foreign (L2) group. English proficiency was assessed through the average mark of each class in the subject “Foreign language: English”, which did not significantly differ between groups (average mark out of 10 and SDs for 4th year, L1: 8.29 (1.20), L2: 8.71 (1.07); 5th year, L1: 7.60 (1.06), L2: 7.72 (1.05); 6th year, L1: 8.29 (1.20), L2: 8.40 (1.06). All children gave informed consent and participated voluntarily.
Materials
Seven dilemmas were selected from and based on Bucciarelli (Reference Bucciarelli2015, experiment 1), with some modifications (see Table 3). Language was simplified and the gender of the actor and victim(s) were omitted. There were several response options – namely, a) “do nothing”; b) “push person/pull lever”; c) “self-sacrifice” (for self-sacrifice trials only).
The dilemmas varied in a) utilitarianism: pro-utilitarian (sacrificing one to save five; dilemmas 1, 2, 4, 5, 6, 7) or anti-utilitarian (sacrificing five to save one, yourself; dilemma 3); b) aversiveness: highly aversive (physical contact with a person, e.g., “push the person”; dilemmas 1, 4, 5, 6, 7) or less aversive (physical contact with an object, e.g., “pull the switch”; 2, 3); and c) self-sacrifice: sacrificing themselves to save five people instead of sacrificing another person (dilemmas 1, 4, 6). The “trolley” dilemma was used for dilemmas 1-3, the “boat” dilemma for 4-5 and the “bomb” dilemma for 6-7. They were translated from English to Spanish and back-translated for comparability (Brislin, Reference Brislin1970). Participants were given the option to justify their responses (see Table 4).
Procedure
Data were collected during school hours. At the beginning of the session, instructions were given out loud in the language corresponding to the group (L1 or L2). Dilemmas were randomly presented to each participant. The school board gave their consent and participants were informed that their participation was voluntary. All gave oral consent, none refused to take part, and all participants answered to all dilemmas.
Results
First, we carried out Pearson's Chi-Square tests (X2) to independently analyse whether responses to each moral dilemma varied according to language (L1 vs. L2), gender (female vs. male), grade (4th, 5th, 6th), or age (9, 10, 11, 12). Language was the only relevant factor, predicting participants’ responses in all dilemmas but #2 (V > .25 and < .61, indicating moderate to high associations), so children were more utilitarian when responding in their L2 vs. L1 in high and low aversive dilemmas (see Tables 1 and 2). Also, participants chose more often to sacrifice themselves in dilemmas 1, 4, 6 (those having a self-sacrifice option) and 3 (although not having a “self-sacrifice” option, “do nothing” implied that the participant would die) in their L2 vs. L1; but the opposite pattern emerged in dilemmas 5 and 7 (these did not have a “self-sacrifice” option, but the most deontological option [a)] implied both self-sacrifice and killing four to save one). Regarding pro and anti-utilitarianism, a higher proportion of utilitarian responses was observed for most dilemmas (i.e., significant language effects for all dilemmas but #2) when using the participants’ L2, even in the anti-utilitarian dilemma (#3).
V = Cramer's V (effect size); *** = p < .001; ** = p < .01; * = p < .05.
Then, we used logit linear mixed models to analyse utilitarian (vs. deontological/anti-utilitarian) and self-sacrifice (vs. other) responses, including language, gender, grade and age, and the two-ways interactions of these factors, as fixed effects variables, and participant and dilemma as random effects grouping factors. The specified random effects parameters included random intercepts for participant and dilemma; random slopes for dilemma were not included because they caused numerical problems with the maximum-likelihood estimate. Regarding utilitarian responses, language predicted participants’ responses, X2(1) = 16.96, p < .001; all other variables and interactions were not significant (X2 values < 4.11, p values < .17). As for self-sacrifice responses, language again predicted participants’ responses, X2(1) = 14.87, p < .001; all other variables and interactions were not significant (X2 values < 4.50, p values > .16) (Figure 1).
Conclusions
This study explored the M-FLE in children for the first time. Children aged 9-12 responded either in their L1 or L2 to seven dilemmas, varying in utilitarianism, aversiveness and self-sacrifice.
Our first prediction was that participants would make more utilitarian judgements and be more willing to self-sacrifice when using their L2 vs. L1. Our results agreed with both predictions, supporting that the M-FLE applies to children. Participants were more utilitarian when using their L2 vs. L1 (e.g., Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014; Geipel et al., Reference Geipel, Hadjichristidis and Surian2015; Romero-Rivas et al., Reference Romero-Rivas, López-Benítez and Rodríguez-Cuadrado2022), regardless of how aversive the scenario was (agreeing, e.g., with Bucciarelli, Reference Bucciarelli2015 or Daniele & Bucciarelli, Reference Daniele and Bucciarelli2016, but disagreeing with Pellizzoni et al., Reference Pellizzoni, Siegal and Surian2010). The pattern observed in the anti-utilitarian dilemma agrees with Bucciarelli's (Reference Bucciarelli2015) conclusions of children not being simply biased to act. Possibly, participants would be more willing to be utilitarian when using their L2 because they are less emotionally activated by that action (e.g., guilt, sadness), avoiding a passive, and thus, deontological (i.e., “do nothing”) answer (e.g., Caldwell-Harris, Reference Caldwell-Harris2014). Congruently, the higher perception of emotionality and aversion in the L1 group would prompt deontological decisions to avoid the impact of performing the action implied by utilitarian judgements. Regarding self-sacrifice, we extend the results of Romero-Rivas et al. (Reference Romero-Rivas, López-Benítez and Rodríguez-Cuadrado2022) to children. Following Bialek et al. (Reference Bialek, Paruzel-Czachura and Gawronski2019), processing dilemmas in L2 would lead to more emotional distance (e.g., Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014; Geipel et al., Reference Geipel, Hadjichristidis and Surian2015), and to a diminished sensitivity to the costs of a particular action. Therefore, participants could interpret the scenario globally, driven by the benefit of the action (i.e., one person dies vs. five) and not by its consequence (the participant's death). Also, the higher proportion of self-sacrifice in L2 could be caused by a reduction in the emotional responses related to the self (Romero-Rivas et al., Reference Romero-Rivas, López-Benítez and Rodríguez-Cuadrado2022).
We did not have specific predictions for grade or age, and we did not find either significant effects or interactions involving them. Although no former research has used grade, we included it for exploratory purposes as it provides an additional measure of moral development besides age. As per age, the chosen ages were 9-12. Following previous evidence (Bucciarelli, Reference Bucciarelli2015; Daniele & Bucciarelli, Reference Daniele and Bucciarelli2016), children 9-10 behave differently to adults, but adolescents do not. This allowed us to investigate moral decisions in children who, additionally, are mature enough to appreciate the gain of self-sacrifice, emerging around 7 years of age (Weller & Hansen Lagattuta, Reference Weller and Hansen Lagattuta2013). We did not find age differences within our sample, being plausible that our age rank is not wide enough to appreciate any (contrary to Weller & Hansen Lagattuta, Reference Weller and Hansen Lagattuta2013) or that moral development is relatively stable at those ages. Our results agree with Pellizzoni et al. (Reference Pellizzoni, Siegal and Surian2010) or Dworazik et al. (Reference Dworazik, Kärtner, Lange and Köster2019) and partially disagree with Bucciarelli (Reference Bucciarelli2015) or Daniele and Bucciarelli (Reference Daniele and Bucciarelli2016); however, not having an adult group limits our interpretation. Also, future studies comparing Primary school children with adults would allow us to investigate whether the mental model theory holds in a L2. Finally, there was not an effect of gender, congruent with Pellizzoni et al. (Reference Pellizzoni, Siegal and Surian2010) and Daniele and Bucciarelli (Reference Daniele and Bucciarelli2016).
Our study makes an initial impactful contribution to a field where many questions remain unanswered. Most literature has tested adults (to our knowledge, only van Hugten & van Witteloostuijn, Reference Van Hugten and Van Witteloostuijn2018, investigated the FLE in adolescents, but they explored the “self-serving bias”). For instance, does L1-L2 similarity, proficiency and language dominance influence M-FLE in children as it does in adults (e.g., Circi et al., Reference Circi, Gatti, Russo and Vecchi2021)? How does culture (e.g., individualistic vs. collectivist) influence making utilitarian or deontological decisions (e.g., Yi & Park, Reference Yi and Park2003; Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014; Gold, Colman & Pulford, Reference Gold, Colman and Pulford2014) when the individual is undergoing personal, cultural, emotional and moral development, as children are? This could be particularly relevant as our study is limited to a socio-economically and culturally homogeneous population. Another potential limitation of our study is not considering a range of additional information related to the participants, such as cognitive, socio-economic or personality measures, which alongside some sociolinguistic information (e.g., does the child attend private English lessons?) would expand the description of the M-FLE in children.
To conclude, our work has relevant educational implications. We showed that children support the common good when using their L2. Interestingly, a recent adult study (Rodríguez-Cuadrado & Romero-Rivas, Reference Rodríguez-Cuadrado and Romero-Rivas2021) found no FLE on altruistic and empathic behaviours, so our L2 does not reduce empathy (and empathy neither predicted responses to moral dilemmas). Thus, the evidence supports using the L2 to work on moral development, which should not affect the development of altruism or empathy. Upright (Reference Upright2002) proposed the use of moral dilemmas in the classroom to enhance empathy. Other studies found them to benefit reading comprehension (Clare, Gallimore & Patthey-Chavez, Reference Clare, Gallimore and Patthey-Chavez1996). These results are particularly relevant given the rise of bilingual education (Mañas Antón, Reference Mañas Antón2019), where using a L2 could be a good strategy to illustrate how to achieve the greater good. Further research will allow us to potentially design research-based programmes and strategies using the L2 to favour ethics and moral education.
Acknowledgements
The authors would like to thank the participating school, children, and their families.
Competing interests
The authors declare none.
Data availability
The data that support the findings of this study are openly available in OSF at https://osf.io/pkjxf/?view_only=4175cc6e693c4487a464b6643579d88a
Appendix
Responses are provided in English for the sake of clarity, although those participants performing the task in their L1 used Spanish in their responses, and participants performing the task in their L2 used English in their responses. Please note that justifications were optional, even though all participants completed all dilemmas, not everyone justified their responses, and some participants justified their responses to some dilemmas but not others.