1. Introduction
Bilingual and monolingual children differ in their everyday linguistic experience. This variability shapes their language learning skills (De Houwer, Reference De Houwer2009) and their sensitivity and adherence to different aspects of the external linguistic input: for instance, bilingual children are more willing to accept alternative labels for known objects than are monolinguals (e.g., Byers-Heinlein & Werker, Reference Byers‐Heinlein and Werker2009; Davidson, Jergovic, Imami & Theodos, Reference Davidson, Jergovic, Imami and Theodos1997). Bilingual and monolingual children also differ in terms of adherence to morphosyntactic word form cues (Yoshida, Tran, Benitez & Kuwabara, Reference Yoshida, Tran, Benitez and Kuwabara2011) and the integration of multiple cues (Yow & Markman, Reference Yow and Markman2015). Of relevance to the present study recent evidence suggests better pragmatic interpretational skills in bilingual when compared to monolingual children.
1.1 Pragmatic skills in bilingual children
Bilingual children show an enhanced understanding of nonverbal referential deixis through pointing or eye gaze (Brojde, Ahmed & Colunga, Reference Brojde, Ahmed and Colunga2012; Yow, Reference Yow2015; Yow & Markman, Reference Yow and Markman2011a). Other pragmatic abilities in which bilingual children outperform their monolingual peers include spatial perspective-taking (Greenberg, Bellana & Bialystok, Reference Greenberg, Bellana and Bialystok2013; see also Yow & Markman, Reference Yow and Markman2015), adapting communicative content to a listener's blindness (Genesee, Tucker & Lambert, Reference Genesee, Tucker and Lambert1975) and identifying a speaker's emotion through tone of voice (Yow & Markman, Reference Yow and Markman2011b). Additionally, bilingual children pay more attention to the socio-pragmatic context (Rosenblum & Pinker, Reference Rosenblum and Pinker1983), conversational maxims (Siegal, Surian, Matsuo, Geraci, Iozzi, Okumura & Itakura, Reference Siegal, Surian, Matsuo, Geraci, Iozzi, Okumura and Itakura2010), a person's belief when naming objects (Healey & Skarabela, Reference Healey, Skarabela, Marinis, Papangeli and Stojanovik2008), and a person's possible knowledge in a certain situation (Diesendruck, Reference Diesendruck2005). Such sophisticated pragmatic skills likely affect bilingual children's enhanced reasoning about the mental states of others, which in turn positively impacts their Theory Of Mind (Farhadian, Abdullah, Mansor, Redzuan, Gazanizad & Vijay, Reference Farhadian, Abdullah, Mansor, Redzuan, Gazanizad and Vijay2010; Goetz, Reference Goetz2003; Kovács, Reference Kovács2009). It has been suggested that this pragmatic advantage in bilingual children originates from greater communicative challenges (Yow & Markman, Reference Yow and Markman2011a): from early on, bilingually developing children have to recognize that people speak different languages and that another person's knowledge of languages may differ from their own (De Houwer, Reference De Houwer1983; Goetz, Reference Goetz2003; Saunders, Reference Saunders1988). They successfully adapt their language choice and their verbal behavior to others' linguistic skills (Tare & Gelman, Reference Tare and Gelman2010) and they recognize others' bilingual versus monolingual status (De Houwer, Reference De Houwer1983; Pitts, Onishi & Vouloumanos, Reference Pitts, Onishi and Vouloumanos2015). They modulate their language learning strategies accordingly (Atagi, Goldenberg & Sandhofer, Reference Atagi, Goldenberg and Sandhofer2016; Henderson & Scott, Reference Henderson and Scott2015), while taking into account their communicative history of credibility with their interlocutor (Hung, Patrycia & Yow, Reference Hung, Patrycia and Yow2015).
In sum, early bilingualism is assumed to increase sensitivity to the mental states of others and to various dimensions of a conversational situation, including an increased understanding of others' communicative intentions. To elucidate this general ability in a specific word learning context, we investigate children's understanding of communicative intentions expressed through nonverbal pragmatic gestures indicating a novel word's reference.
1.2 Use of pragmatic cues for word learning in bilingual and monolingual children
Children can identify a novel word's intended referent with the help of deictic information targeting whole objects or their parts as expressed through a speaker's gaze (Baron-Cohen, Baldwin & Crowson, Reference Baron‐Cohen, Baldwin and Crowson1997; Graham, Nilsen, Collins & Olineck, Reference Graham, Nilsen, Collins and Olineck2010), pointing (Kalagher & Yu, Reference Kalagher and Yu2006; Paulus & Fikkert, Reference Paulus and Fikkert2014), retracing contours (Hansen & Markman, Reference Hansen and Markman2009), or referent-related actions (Kobayashi, Reference Kobayashi1997, Reference Kobayashi1998). While a solitary deictic cue is sufficient for task performance (Baron-Cohen et al., Reference Baron‐Cohen, Baldwin and Crowson1997; Graham et al., Reference Graham, Nilsen, Collins and Olineck2010; Kalagher & Yu, Reference Kalagher and Yu2006), additional non-pragmatic object-inherent cues such as object familiarity facilitate the response (Graham et al., Reference Graham, Nilsen, Collins and Olineck2010). Moreover, the type (e.g., gaze versus pointing) and the number of deictic cues influence the ease with which they are interpreted (Grassmann & Tomasello, Reference Grassmann and Tomasello2010; Jaswal, Reference Jaswal2010). Interestingly, bilingual children interpret deictic cues more efficiently than their monolingual peers (Brojde et al., Reference Brojde, Ahmed and Colunga2012; Yow, Reference Yow2015; Yow & Markman, Reference Yow and Markman2011a). This effect varies in magnitude depending on age and on the complexity of pragmatic cues' interpretation in a given context.
When searching for a hidden toy, bilingual children are more sensitive to experimenter gaze than monolinguals (Yow & Markman, Reference Yow and Markman2011a): Starting at age two and continuing through ages three to four, bilinguals were successful in following gaze when the experimenter was sitting in a position contradicting the toy's position. In contrast, monolinguals only managed to use this cue at age five. In two less challenging tasks, i.e., when the experimenter was sitting in a neutral position or used a more salient deictic cue (pointing), bilingual and monolingual three- to four-year-olds performed equally well.
During novel-word learning, bilingual children attach more importance to non-verbal deictic cues than monolinguals (Brojde et al., Reference Brojde, Ahmed and Colunga2012). Two-and-a-half-year-old bilinguals and monolinguals learned novel words in four different conditions: congruent or incongruent combinations of a pragmatic cue (gaze) and an object-inherent cue (object similarity) and the use of each of these cues in isolation. Group differences emerged only in the incongruent condition: monolinguals disregarded speaker gaze in favor of the object-related cue associating novel words with similarly shaped objects. This shape bias (Graham & Diesendruck, Reference Graham and Diesendruck2010; Landau, Smith & Jones, Reference Landau, Smith and Jones1988) was not found for bilinguals, who attended more to speaker gaze than monolinguals.
In the same vein, Yow (Reference Yow2015) found that four-year-old bilinguals but not monolinguals showed adult-like patterns in the interpretation of deictic gestures to resolve ambiguous pronouns (e.g., she in “Miss Owl is going out with Miss Ducky. She wants the ball”, p. 1398). In line with the “advantage of first-mention” (Gernsbacher & Hargreaves, Reference Gernsbacher and Hargreaves1988, p. 699), adults interpreted the entity that was mentioned first (Miss Owl) as the referent for the ambiguous pronoun (she) when no deictic gestures were present. When a referential gesture was spatially co-localized with both, the entity mentioned second and the ambiguous pronoun, inducing a conflict with the order-of-mention cue, adults and bilingual but not monolingual children chose the first referent less often than in the other condition. Yow (Reference Yow2015) concludes that adults and bilingual children relied on the gesture more than monolingual children.
In sum, studies suggest more refined pragmatic skills for interpreting deictic gestures in bilingual when compared to monolingual children. Notably, this advantage mostly applies to challenging contexts and when pragmatic cues are rather weak or in conflict with other learning cues.
1.3 Adjective learning and its facilitation through pragmatic cues
For young children the learning of novel adjectives expressing an entity's properties is a particularly challenging word learning task: young children struggle with remembering the entity's features (Perry, Axelsson & Horst, Reference Perry, Axelsson and Horst2015) and with the long-term retention of property labels such as color, shape, and texture words (Holland, Simpson & Riggs, Reference Holland, Simpson and Riggs2015). They often misinterpret adjectives as nouns (Landau, Smith & Jones, Reference Landau, Smith and Jones1992; Taylor & Gelman, Reference Taylor and Gelman1988). This is assumed to relate to the widely documented word learning principle that novel words refer to whole objects rather than to their properties (cf. the Whole Object Constraint; Markman, Reference Markman1994).
The challenges of adjective learning result from several characteristics of this word class, including their relatively low frequency (Kauschke & Klann-Delius, Reference Kauschke, Klann-Delius, Guelzow and Gagarina2007; Sandhofer, Smith & Luo, Reference Sandhofer, Smith and Luo2000), their frequent syntactic ambiguity in the input (Sandhofer & Smith, Reference Sandhofer and Smith2007), and their semantic characteristics, which may involve antonym relations (Clark, Reference Clark and Moore1973; Eilers, Oller & Ellington, Reference Eilers, Oller and Ellington1974) or relational meanings that depend on normative standards of the categories described (e.g., the different extension of little used with elephants versus mice, Ebeling & Gelman, Reference Ebeling and Gelman1988; for a review, see Tribushinina, Reference Tribushinina2008). The demanding syntactic processing of adjectives in attributive constructions (Ninio, Reference Ninio2004; Fernald, Thorpe & Marchman, Reference Fernald, Thorpe and Marchman2010; but Tribushinina & Mak, Reference Tribushinina and Mak2016) and the degree of perceptual salience of the property and adjective labels may further affect rate of acquisition (Smith, Jones & Landau, Reference Smith, Jones and Landau1992). Additionally, acquisition demands for adjectives are influenced by the degree of form complexity of the object they relate to (Sandhofer & Smith, Reference Sandhofer and Smith2004), as well as by the type of objects with the property of interest (living vs. non-living; Hall, Reference Hall1994), their number (Hall, Reference Hall1996), and membership in shared or different basic level categories (Klibanoff & Waxman, Reference Klibanoff and Waxman2000; Waxman & Klibanoff, Reference Waxman and Klibanoff2000; Waxman & Markow, Reference Waxman and Markow1998).
Pragmatic cues such as gestures highlighting an object's property represent one type of input information that can help children with the difficult task of inferring the meaning of novel adjectives. Adults may emphasize reference to an object's texture through a descriptive hand gesture touching the object's surface in a specific manner. As an example, Zukow (Reference Zukow1990) points out that the “topography of a rough texture, such as corduroy, is traversed with a bouncing fingertip” (p. 714). Similarly, adult actions like rolling or squeezing an object can help children to infer whether a novel word refers to the object's shape or constituting material (Kobayashi, Reference Kobayashi1997). O'Neill, Topolovec and Stern-Cavalcante (Reference O'Neill, Topolovec and Stern-Cavalcante2002) found that three-year-old monolinguals exposed to a descriptive gesture highlighting an object's property extended the novel word more often to object properties than those exposed to a pointing gesture. Thus, the manual descriptive gesture helped children to infer the intended property reference. The results, however, are not fully conclusive since O'Neill et al.’s (Reference O'Neill, Topolovec and Stern-Cavalcante2002) design included two additional cues aiding adjective learning: (i) Novel words were embedded in an unambiguous syntactic adjective context in both attributive and predicative positions. This facilitated the task, since children learn that words that are syntactically marked as adjectives are likely to refer to object properties (e.g., Landau et al., Reference Landau, Smith and Jones1992); (ii) Objects were familiar to children, supporting an adjective interpretation (e.g., Hall, Reference Hall1996; Hall, Waxman & Hurwitz, Reference Hall, Waxman and Hurwitz1993) due to the Mutual Exclusivity Constraint (MEC): “Words are mutually exclusive – each object will have one and only one label” (Markman, Reference Markman and Bloom1993, p.161). Children tend to reject a novel word as referring to an object whose name they already know and will instead search for an alternative referent. The object's unknown property thus becomes a good candidate (Markman & Wachtel, Reference Markman and Wachtel1988).
Hall, Williams and Bélanger (Reference Hall, Williams and Bélanger2010) used descriptive gestures similar to the ones used by O'Neill et al. (Reference O'Neill, Topolovec and Stern-Cavalcante2002) to investigate children's adjective learning, but carefully controlled for additional cues. Monolingual four-year-olds correctly interpreted a descriptive gesture indicating a property meaning when a syntactic cue was simultaneously presented. At age four (but not at age three) a descriptive gesture in combination with a syntactic cue was sufficient for inferring the correct meaning.
Descriptive gestures, then, can help children in learning novel adjectives. So far, this has only been studied in monolinguals. One study investigated bilingual children's adjective learning but did not consider pragmatic learning cues: Yoshida et al. (Reference Yoshida, Tran, Benitez and Kuwabara2011) report that three-year-old bilinguals outperformed monolingual peers in a novel adjective learning task. The task used familiar objects and the novel word was morphosyntactically marked as an adjective. One might interpret this result as showing that bilinguals are more advanced in the processing of syntactic structures than monolinguals (see also Davidson, Raschke & Pervez, Reference Davidson, Raschke and Pervez2010). Based on additional results of non-verbal tasks testing executive control, however, Yoshida et al. (Reference Yoshida, Tran, Benitez and Kuwabara2011) favor another explanation: they interpret their findings as showing that bilingual children more efficiently inhibited the default interpretation that novel words refer to an object as a whole. More efficient suppression of this whole object bias may have facilitated attending to the morphosyntactic cue hinting at the object's property. A central limitation of Yoshida et al.’s (Reference Yoshida, Tran, Benitez and Kuwabara2011) study, however, is that the testing language was not controlled for (children were tested in either English, Spanish, or Vietnamese). Thus, language specific differences in morphosyntactic adjective marking could have influenced the results. Furthermore, bilinguals were tested in just a single language.
The present study contributes to the work on novel adjective learning. It is the first to compare bilingual and monolingual children's use of descriptive gestures as cues for learning novel adjectives. Since bilingual children are particularly geared towards interpreting pragmatic gestures, we expected a heightened pragmatic sensitivity for bilinguals in the learning of novel adjectives based on descriptive gestures. Moreover, we intended to provide a first glimpse into how the child's brain affords the use of pragmatic cues during adjective acquisition recording a measure of cortical brain activation during the experiment.
1.4 Neuronal processing of communicative intentions
Neuroimaging has delineated a supramodal neural network affording the identification of the communicative intentions of others irrespective of whether these are expressed through extralinguistic (e.g., gestures) or linguistic (e.g., written sentences) cues: the network comprises the superior parietal cortex (Precuneus), the bilateral posterior superior temporal sulcus (STS) extending to the temporal parietal junction (TPJ), and the medial prefrontal cortex (mPFC) (Enrici, Adenzato, Cappa, Bara & Tettamanti, Reference Enrici, Adenzato, Cappa, Bara and Tettamanti2011). This supramodal network partially overlaps with cerebral regions recruited for Theory of Mind (ToM) tasks: in particular, the TPJ and/or the mPFC have been suggested to be critical for the development of ToM (Bowman, Kovelman, Hu & Wellman, Reference Bowman, Kovelman, Hu and Wellman2015; Sabbagh, Bowman, Evraire & Ito, Reference Sabbagh, Bowman, Evraire and Ito2009; Sommer, Meinhardt, Eichenmüller, Sodian, Döhnel & Hajak, Reference Sommer, Meinhardt, Eichenmüller, Sodian, Döhnel and Hajak2010). Interestingly, recruitment of the ToM-related network is modulated by bilingualism and its age of onset, in addition to language- and culture-specific characteristics (Kobayashi, Glover & Temple, Reference Kobayashi, Glover and Temple2006, Reference Kobayashi, Glover and Temple2007, Reference Kobayashi, Glover and Temple2008). Most relevant to the present study, English–Japanese early bilingual and American English monolingual ten-year-olds showed different activation patterns in a widely distributed network in ToM-related false belief-tasks (Kobayashi et al., Reference Kobayashi, Glover and Temple2007). These differences included the left STS/ temporal pole (TP), the left inferior temporal gyrus (ITG), the left inferior frontal gyrus (IFG), the right TPJ, and the right superior temporal gyrus (STG)/TP region. Of particular note for our present study the posterior STS is consistently recruited when studies use nonverbal social cues for mental state attribution (Doré, Zerubavel & Ochsner, Reference Doré, Zerubavel, Ochsner, Mikulincer, Shaver, Borgida and Bargh2015). Yang, Rosenblau, Keifer and Pelphrey (Reference Yang, Rosenblau, Keifer and Pelphrey2015) confirm the prominent role of the posterior STS in social information processing by highlighting its functional connectivity to the neural systems of social perception, action observation, and ToM. Gweon and Saxe (Reference Gweon, Saxe, Rubenstein and Rakic2013) also propose the prominent role of the posterior STS in the processing of intentional human actions. In addition, the STS has been identified as a hub for the perception of biological motion in both adults and children (Allison, Puce & McCarthy, Reference Allison, Puce and McCarthy2000; Carter & Pelphrey, Reference Carter and Pelphrey2006; Mosconi, Mack, McCarthy & Pelphrey, Reference Mosconi, Mack, McCarthy and Pelphrey2005). A meta-analysis of 31 studies confirmed the involvement of the posterior STS in the visual perception of hand movements across several types of gestures (Yang, Andric & Matthew, Reference Yang, Andric and Mathew2015). This is of special relevance to our study as we implemented a nonverbal social cue expressed through biological motion (i.e., a hand gesture) to convey a speaker's intention to refer to a property.
To sum up, the posterior STS, the TPJ, and the prefrontal cortex have been identified as key players for the understanding of a speaker's communicative intention, with a prominent role of the posterior STS for gesture processing. The current work focuses on the understudied impact of bilingualism in this domain. Given that bilingual children behaviorally outperform their monolingual peers in the interpretation of pragmatic cues, we expect to find activation differences between bilinguals and monolinguals during the processing of co-speech gestures referencing a novel adjective's meaning.
1.5 Overview of the present study
We collected behavioral data from five-year-old bilinguals and monolinguals performing a novel adjective learning task. In this task children were familiarized with two identical unknown pseudo-objects with unknown surfaces while they heard a novel word that could structurally be either expressing a noun or a nominalized adjective. A descriptive hand gesture touching one of the objects' surface in a wave-like movement highlighted the novel word's property reference supporting an adjectival interpretation. As detailed below, children performed a forced choice task to test for their interpretation of the novel word: they could choose between (i) another object with the same surface property as the familiarized objects but with a different shape reflecting an adjective interpretation of the novel word, or (ii) a competitor object with the same shape but a different surface reflecting a noun interpretation of the novel word.
In order to minimize the potential influence of confounding variables, we controlled the stimulus material for perceptual factors (see section 2.2) and selected a homogeneous sample of bilingual children: all bilinguals had been regularly exposed to two languages from birth. The monolinguals had been exposed to a single language from birth. The testing languages, i.e., German and Spanish, were held constant for the bilinguals. All monolinguals were tested in German.
For a subgroup of participants we additionally recorded neurophysiological data during the behavioral task. We used functional near-infrared spectroscopy (fNIRS; see Obrig & Villringer, Reference Obrig and Villringer2003), a non-invasive neuroimaging method that in the past decade has increasingly been used for neurolinguistic research with infants and children (e.g., Bortfeld, Fava & Boas, Reference Bortfeld, Fava and Boas2009; Lloyd-Fox, Blasi & Elwell, Reference Lloyd-Fox, Blasi and Elwell2010; Pena, Maki, Kova ić, Dehaene-Lambertz, Koizumi, Bouquet & Mehler, Reference Pena, Maki, Kovacić, Dehaene-Lambertz, Koizumi, Bouquet and Mehler2003; Rossi, Telkemeyer, Wartenburger & Obrig, Reference Rossi, Telkemeyer, Wartenburger and Obrig2012), including bilingual children (e.g., Petitto, Berens, Kovelman, Dubins, Jasinska & Shalinsky, Reference Petitto, Berens, Kovelman, Dubins, Jasinska and Shalinsky2012). Besides its relative ease of use an important advantage of fNIRS in language research is the lack of instrumental noise, which is a major limitation of techniques based on magnetic resonance imaging (MRI). This advantage comes at the cost, though, of substantially lower spatial resolution (in the range of cm). Also, the method is blind to subcortical structures due to the pathlength of light in biological tissue (Obrig & Villringer, Reference Obrig and Villringer2003).
2. MethodsFootnote 1
2.1 Participants
60 children of preschool age living in Germany participated. Parents reported no abnormalities in children's language or general development. 32 children were raised with German and Spanish (bilinguals; mean age = 59.81 months; SD = 6.05, range: 4;3–6;0 years; 16 females), 28 children were raised with just German (monolinguals; mean age = 60.54 months, SD = 3.06; range: 4;9–5;11 years; 15 females). The two groups were similar in age (t = -0.95, ns) and sex (χ2 = 0.02, ns). The bilinguals were recruited by advertising in different Spanish-speaking institutions and through direct contact with bilingual child-care centers. The monolinguals were recruited through a database at the Max Planck Institute for Human Cognitive and Brain Sciences in Leipzig. Information about the children's language input situation was assessed through parental questionnaires. All parents of the bilingual children reported that their child had been regularly exposed to at least two languages from birth (Bilingual First Language Acquisition, BFLA; De Houwer, Reference De Houwer1990; Meisel, Reference Meisel, Hyltenstam and Obler1989). Most BFLA children had been exposed to German and Spanish input through their parents (n = 25). Two children heard just Spanish from their parents and German from older siblings. Two children had been exposed to three languages from birth (German, Spanish, and French/Slovak), while one child learned two languages from birth (Spanish and Galician) and German later in development. For two bilinguals specifications of the persons responsible for German and Spanish input were missing. Further information about the bilingual participants is shown in the appendix. All monolinguals had grown up in German-speaking families without extended contact to any other language (Monolingual First Language Acquisition, MFLA; De Houwer, Reference De Houwer2009).
2.2 Behavioral assessment
All children took part in a word learning experiment that measured behavioral responses to the experimental stimuli. We used a short version when the participants were only tested behaviorally (n = 14). These assessments took place in the different child-care centers. In order to obtain reliable neurophysiological data a longer version was needed for children who were in addition taking part in the fNIRS recordings (n = 46, see section 2.3). These fNIRS-cum-behavior measurements were carried out at the Max Planck Institute for Human Cognitive and Brain Sciences in Leipzig.
Bilingual children performed the experiment in both languages on two days within a two week period. Half of them started with Spanish. Because of illness two bilinguals completed only one session (1 German; 1 Spanish); two additional bilinguals did not know either German (n = 1) or Spanish (n = 1) sufficiently well to complete the tasks. Thus, there were 30 German and 30 Spanish datasets for 32 bilinguals available for analysis.
Monolinguals were tested only once. All 28 German datasets of the 28 monolinguals were used for analysis.
Experimental setting
The novel word learning task used in this study was embedded in a playful context designed as a computer video game, programmed in Presentation® (Neurobehavioral Systems). Children sat in front of a computer monitor and were asked to help an astronaut buy presents for an alien's party. The ‘presents’ were visually presented as novel pseudo-objects and were orally referred to by a novel word. A non-verbal, deictic pragmatic cue supported two possible interpretations of the novel word (see Procedure).
The experimental design and instructions were identical in German and Spanish, but included different experimental stimuli for each language. The novel pseudo-objects used in the German and Spanish settings were similarly visually salient as far as their form complexity and their surfaces were concerned; likewise, the novel words were similar in their phonological structure (see Stimulus material). Two different actresses (1 German and 1 Spanish female adult) played the German and the Spanish astronaut in the video. Both astronauts' voices were spoken by a single third person, i.e., a German–Spanish bilingual female adult. This ensured identical voice quality for the German and Spanish stimuli. All sound files were monophonically presented via two loudspeakers.
Procedure
The experimental session started with a videotaped instruction featuring the German or Spanish speaking astronaut who talked about an alien's party and asked children to select suitable presents for the aliens to bring to the party. Immediately afterwards the experiment started with two introductory trials, i.e., the focus trials (Table 1). The focus trials were constructed to direct the children's attention once to the novel objects' surface property (property focus, PF), and once to the novel objects' shape based category (category focus, CF). The rationale for implementing these two types of focus trials is explained in more detail below. The focus trials were followed by a block of four novel word learning trials, i.e., the test trials (TT). The short version of the experiment ended after this first block, with the astronaut thanking the children for their help. When fNIRS was recorded, a second block with two more focus trials and three additional test trials followed. Within each block, half of the children started with focus trials supporting a property interpretation (PF) and the other half with focus trials supporting a category interpretation (CF). The test trials were presented in a randomized order.
Trials consisted of a familiarization phase and a forced choice task. Both focus and test trials started with a similar familiarization phase (Fig. 1, left), i.e., a video in which two exemplars of the same novel pseudo-object, hereafter called Target Objects, rotated around their axes. We used two exemplars of the same target object instead of just one because this has been shown to facilitate novel adjective learning in young children (Hall, Reference Hall1996). The target objects were non-existing pseudo-objects with an artificial surface (Fig. 1, left). They were presented as rotating 3D-like pictures of different durations (1.5–4.5 s). In between each of five rotations, the two identical target objects were shown without motion for four seconds. During this still presentation, either of two gestures was used: (1) in both the property focus (PF) and the test (TT) trials a filmed human hand stroked the surface of one of the target objects' images, thus indicating its property. This descriptive gesture consisted of a slow wave-like touching with all fingers and the palm lasting for approximately two seconds (Fig. 1, left); (2) in the category focus (CF) trials a pointing gesture also lasting two seconds directed attention to the whole object.
In order to present the descriptive and pointing gesture similarly across trials the two hand gestures had been filmed in front of a blue screen and were then combined with different target objects. The gesture's visual presentation was temporally linked to the auditory presentation of a novel word. The same novel word was heard four times and was embedded in sentences like the ones listed in Figure 1 (verbal input).
In the forced choice task following each familiarization phase, two dissimilar objects contrasting with the target objects in different features appeared as pictures on the screen (Fig. 1, right). Children were asked to choose one of the objects after a question containing the novel word (Fig. 1, verbal instruction). Children chose the object to be used as a ‘present for the alien’ by touching one of the depicted objects on the touch-screen monitor.
Three different kinds of objects were constructed for the forced choice task: (i) Property Match objects matched the previously introduced target objects in surface property but not in shape, (ii) Category Match objects were identical to the target objects in shape but differed in surface, (iii) and No Match objects differed from the target objects in both property and category. The rationale was that selecting a property match object indicated an adjective interpretation for the novel word, whereas selecting a category match object suggested an interpretation of the novel word as a noun referring to the whole object. No match objects represented no possible candidate for the novel word's reference as they did not match the previously introduced target objects in either property, shape or any other discernible dimension. No match objects were important for constructing the focus trials. These should prevent an a priori bias towards an adjective or noun interpretation indicating either property (→ adjective, PF) or category (→ noun, CF) understandings of the novel word's reference. At the same time, the focus trials clearly illustrate the two options prior to the test trials. In the experiment's long version, two further focus trials were introduced before the second block of test trials to prevent a perseveration or fixed response strategy that might have been adopted in the first block. For property focus trials (Fig. 1, PF) the pragmatic cue in the familiarization phase highlighted a property interpretation through the stroking gesture before a property match and a no match object were presented in the forced choice task. As the property match object was the only logical object to select, this trial supported a property interpretation of the novel word. For category focus trials a deictic gesture pointing at the whole object highlighted a category interpretation before the corresponding category match object had to be chosen rather than the no match object (Fig. 1, CF). The experimenter corrected the participants if the wrong object (the no match object) was selected in the focus trials.
Just like in the property focus trials, the test trials (Fig. 1, TT) presented a descriptive gesture indicating the novel word's adjective interpretation during the familiarization phase. In order to investigate if children are able to interpret this gesture as referring to an object's property instead of linking the novel word to an object's category the test trials' forced choice tasks included one property match object and one category match object. Selecting the property match object would indicate the novel word's adjective interpretation following the descriptive gesture, whereas selecting the category match object would show disregard for the gesture's reference. Left and right positions of the property and category match objects were balanced across the forced choice tasks.
At the end of each test trial a happy alien appeared on the screen, irrespective of whether children chose the intended property or the category match object.
Stimulus material
Object pictures consisted of novel object forms and surfaces. They were custom constructed and presented as films using a freeware version of the 3D modeling program SketchUp and the SketchUp Construction Library. Descriptive hand and pointing gestures were added to the SketchUp stimuli using the video editing software Final Cut (Apple Inc.).
Forms and surfaces were carefully controlled for visual salience (surfaces) and shape complexity (forms) since these two factors are important for adjective learning (Smith et al., Reference Smith, Jones and Landau1992; Sandhofer & Smith, Reference Sandhofer and Smith2004). This was done based on a prior assessment performed with 20 German adults who rated the visual salience of surfaces and the complexity of forms on a scale from 1 (low) to 5 (high). Based on these ratings combinations of forms and surfaces were constructed that led to similar composite scores (i.e., more salient surfaces were paired with more complex forms and vice versa). Salience of the surfaces and complexity of the forms were similar for the German and the Spanish versions (surfaces: t = -.295, ns; forms: t = -.213, ns). To additionally attenuate confounds, trials consisted of property and category match objects of largely identical surface and form complexity (see for examples Fig. 1, bottom right). Differences in indices of surface salience and object complexity within these pairings were similar across the German and the Spanish versions (surfaces: t = -.038, ns; forms: t = .866, ns).
All auditorily presented novel words were disyllabic, had the same trochaic stress pattern and met the phonotactic constraints of both languages. They all ended with a schwa for German (e.g., /ᴚe:fə/) and a mid-front vowel /e/ for Spanish (e.g., /nu:je/). This reflects a very common word structure for existing German and Spanish nouns (e.g., German: Tasch-e, “bag”; Spanish: lech-e, “milk”) or nominalized adjectives (e.g., German: der Neu-e, “the new one”; Spanish: la grand-e, “the big one”).
The novel words were embedded in spoken sentences (Fig. 1, verbal input). In both German and Spanish, the sentential contexts used here can host either nominalized adjectives or nouns. The novel words' structure and their grammatical context were hence ambiguous with respect to a property (→ nominalized adjective) versus category (→ noun) interpretation. Within the sentential context novel words were inflected according to German (e.g., Refe-n) or Spanish (e.g., nuye-s) morphology. They were combined with definite articles, and cross-balanced for feminine and masculine gender.Footnote 2
Analysis of behavioral data
Since a whole object or shape bias (i.e., a preference for a category match object) can be expected for novel word learning (Graham & Diesendruck, Reference Graham and Diesendruck2010; Markman, Reference Markman1994; Landau et al., Reference Landau, Smith and Jones1988) we used the deviation from this bias to rate children's performance. Selections of the category match object in the test trials' forced choice task were counted for each child and transformed into proportions (ncategory choices / nall trials). Four test trials per participant (nall trials = 4) were taken into account (i.e., all test trials of the experiment's short version and the first block of test trials of the experiment's long version). Lower proportions of category choices indicate stronger deviation from the shape bias. This deviation was expected if children followed the pragmatic cue's referential content, i.e., the objects' surface property.
2.3 Assessment of cerebral oxygenation changes (fNIRS)
For fNIRS recording we used a dual wavelengths continuous wave-system with nine light emitters and 14 light detectors (NIRScout, NIRx Medizintechnik GmbH, Berlin/New York) covering bilateral prefrontal, frontal, temporal, and parietal areas based on 26 channels defined by all possible next-neighbor source-detector combinations (Fig. 2a). The source-detector distance was approximately 2.5 cm and probes were mounted using a modified EEG cap (Easy Cap, Herrsching, Germany). Over each hemisphere five regions of interest (ROIs) were defined: Prefrontal (preFRO), frontal (FRO), fronto-temporal (froTEMP), temporal (TEMP) and a temporo-parietal (tempPAR) region (see Fig. 2a).
The fNIRS system supplies continuous readings (sampling rate 6.25 Hz) of changes in light attenuation at two wavelengths (760 and 850 nm) to be converted into concentration changes in oxygenated (oxy-Hb) and deoxygenated (deoxy-Hb) hemoglobin based on a modified Beer-Lambert approach (Cope & Delpy, Reference Cope and Delpy1988). According to the principles of neurovascular coupling an increase in oxygenation is expected over an activated brain area (Fox & Raichle, Reference Fox and Raichle1986; Obrig & Villringer, Reference Obrig and Villringer2003). Thus increases in oxy-Hb and decreases in deoxy-Hb can be interpreted as markers of cerebral activation similar to other imaging techniques – especially fMRI – based on the hemodynamic response (e.g., Kleinschmidt, Obrig, Requardt, Merboldt, Dirnagl, Villringer & Frahm, Reference Kleinschmidt, Obrig, Requardt, Merboldt, Dirnagl, Villringer and Frahm1996).
fNIRS datasets
For 46 children (BFLA: n = 18; MLA: n = 28) functional near-infrared spectroscopy (fNIRS) data were continuously recorded during the long version of the experiment.
For the bilingual group of 18 children 2 German and 2 Spanish fNIRS-datasets were missing due to cancelled appointments because of illness (n = 2) or insufficient proficiency in one of the languages (n = 2; see section 2.2). Technical problems led to the exclusion of 10 recorded datasets (3 bilingual children's German data, 2 bilingual children's Spanish data, 5 monolingual children's German data); three further datasets (1 bilingual child's German data, 1 bilingual child's Spanish data, 1 monolingual child's German data) were excluded due to: (i) predominantly poor signal quality as a result of motion or technical artifacts, and/or (ii) low calibration values on at least five channels, and/or (iii) no oxygenation changes in the frequency range of the heart beat in more than ten channels (which indicates insufficient contact between the optodes and the skull).
After exclusion criteria were applied a total of 47 fNIRS datasets (34 in German, 13 in Spanish) remained for analysis: 12 German and 13 Spanish datasets for bilinguals and 22 German datasets for monolinguals. There were no differences in age (t = .638, ns) for the 12 bilinguals (mean age = 59.75 months; SD = 4.86, range: 55–72 months; 7 females) and 22 monolinguals (mean age = 60.64 months; SD = 3.23, range: 57–71 months; 11 females) who provided the 34 German fNIRS datasets.
Stimulation period analyzed by fNIRS
When analyzing neuronal correlates of cognitive processes by vascular-based techniques, including fNIRS, the relatively sluggish response peaking approximately 5–7 seconds after stimulus onset must be respected. Although event related designs are feasible, we here focus on the full length of a trial's familiarization phase (Fig. 2b). The rationale is that the use of a pragmatic gesture in conjunction with a novel word is a process extending over the full presentation and is not limited to the relatively brief co-occurrence of the gesture and the novel word. Therefore, fNIRS-trials started with the first co-occurrence of the descriptive hand gesture and the novel word. Familiarization lasted 26 seconds, including 3 more co-occurrences of the descriptive gesture and the novel word (Fig. 1). Analysis of the fNIRS recordings therefore included the period from 1s prior to the first gesture and novel word occurrence until 9s after its last occurrence. Thus the fNIRS epoch comprised 36 seconds for each stimulus. 9 stimulus periods of 36s duration were analyzed per participant. These stimuli were part of the 2 property focus (PF) and 7 test trials (TT) of the experiment's long version. We included the property focus trials in the analyses because their familiarization phase with the descriptive hand gesture was identically structured to the one used in test trials.
Interstimulus intervals were jittered using films of three different lengths showing happy aliens (M = 7 s; range: 5–9 s). Variations in participants' reaction times in response to the forced choice task led to additional temporal jittering. Such jittering helps to attenuate effects of low-frequency background oscillations that do not directly reflect stimulus evoked neuronal activity (Obrig, Neufang, Wenzel, Kohl, Steinbrink, Einhäupl & Villringer, Reference Obrig, Neufang, Wenzel, Kohl, Steinbrink, Einhäupl and Villringer2000).
Analysis of fNIRS-data
We assessed the concentration changes of oxygenated (oxy-Hb) and deoxygenated hemoglobin (deoxy-Hb) in response to the stimuli (Fig. 2b). There has been substantial debate about which of the two parameters is more robust. Additionally, some authors have postulated deviations from the typical adult response in children (see for a review Gervain, Mehler, Werker, Nelson, Csibra, Lloyd-Fox, Shukla & Aslin, Reference Gervain, Mehler, Werker, Nelson, Csibra, Lloyd–Fox, Shukla and Aslin2011). Therefore we analyzed both parameters, i.e., increases in oxy-Hb and/or decreases in deoxy-Hb, separately (Steinbrink, Villringer, Kempf, Haux, Boden & Obrig, Reference Steinbrink, Villringer, Kempf, Haux, Boden and Obrig2006).
Using a linear interpolation approach, sharp rises or falls suggesting motion artifacts were corrected channel-wise. This procedure used in a number of previous infant studies (e.g., Obrig, Mock, Stephan, Richter, Vignotto & Rossi, Reference Obrig, Mock, Stephan, Richter, Vignotto and Rossi2016; Rossi, Jürgenson, Hanulíková, Telkemeyer, Wartenburger & Obrig, Reference Rossi, Jürgenson, Hanulíková, Telkemeyer, Wartenburger and Obrig2011; Telkemeyer, Rossi, Koch, Nierhaus, Steinbrink, Poeppel, Obrig & Wartenburger, Reference Telkemeyer, Rossi, Koch, Nierhaus, Steinbrink, Poeppel, Obrig and Wartenburger2009) requires visual inspection of every trial. In case a brisk, clearly non-physiological step in the NIRS-readings is detected, this step is marked and the data prior to and after the step are fitted replacing the artifactual step by a linear interpolation. After this procedure data was low-pass filtered at 0.3 Hz to attenuate heart beat related oxygenation changes. To enhance reliability of the artifact detection another visual inspection of all trials was performed after the filtering. Data were next entered into a general linear model (GLM) yielding β-values for oxy-Hb and deoxy-Hb assuming a hemodynamic response function peaking at 5 seconds (Boynton, Engel & Heeger, Reference Boynton, Engel and Heeger2012). The resulting data (beta-values corresponding to µmolar changes as illustrated in the inset of figure 2) represent the mean change in the two hemoglobins over the full trial length of 36s compared to the (high-level) baseline. This ‘baseline’ includes both visual and auditory input and therefore changes in the two hemoglobins are expected also during this high-level baseline. High-level baselines may induce additional noise but are clearly preferable to ‘resting’ periods without any stimulation, especially in infant studies targeting cognitive tasks. Averages were computed for each channel in each participant. Based on the variance of the data values higher than 7.0 µmol/l were classified as outliers and were subsequently excluded (< 0.4% of the data). For further statistical analyses the data of different channels within each ROI was averaged.
3. Results
3.1 Behavioral data
For the German version we predicted that bilinguals (n = 30) would attend more strongly to the pragmatic cue than monolinguals (n = 28). Therefore, the bilinguals' proportion of category choices was expected to be lower than the monolinguals', indicating the former's willingness to override the shape bias in early word learning. No differences between the bilinguals' proportions in German (n = 30) and Spanish (n = 30) were expected.
As illustrated in the left part of Figure 3, all children showed a strong category match bias, irrespective of group and language. Contrary to our predictions, bilinguals (M = 0.86, SD = 0.28) and monolinguals (M = 0.79, SD = 0.36) did not differ in the proportion of category choices in German (U = 399.00, ns, r = -.052).Footnote 3 As expected there was no difference between German and Spanish versions (M = 0.80, SD = 0.30) for the bilinguals (Z = -1.02, ns, r = -0.13). Girls and boys behaved similarly in all subgroups (bilinguals in German: U = 101.00, ns, r = -.10 and Spanish: U = 107.00, ns, r = -.04; monolinguals in German: U = 89.00, ns, r = -.09). There was no sequencing effect in the bilingual group: children who began the experiment with the German version did not differ from those who started with the Spanish one in their German (U = 108.00, ns, r = -.04) and Spanish (U = 103.00, ns, r = -.02) results.
3.2 fNIRS data
Separate ANOVAs were performed to test for differences between (i) GROUPs (BLFA vs. MFLA, modeled as a between subjects factor) and (ii) LANGUAGEs (German vs. Spanish, within group factor).Footnote 4 The factor ROI (10 ROIs; see Fig. 2a) was included as a within factor in both ANOVAs. Since there is controversy about which parameter is more reliable we performed all ANOVAs separately for oxy-Hb and deoxy-Hb. The latter did not yield any statistically significant results; therefore we only report oxy-Hb results below. Following the generally accepted predictions of neurovascular coupling (Steinbrink et al., Reference Steinbrink, Villringer, Kempf, Haux, Boden and Obrig2006) an increase in oxy-Hb (oxy-Hb↑) is considered an indicator of increased neuronal signaling or ‘activation’.
Effect of BFLA versus MFLA
The main effect ROI (F (9, 288) = 3.17, p < .05, Greenhouse-Geisser corrected, ηp 2 = .09) and the interaction of ROI x GROUP (F (9, 288) = 3.03, p < .05, Greenhouse-Geisser corrected, ηp 2 = .09) were significant. The main effect GROUP did not reach significance (F (1,32) = 0.18, ns, ηp 2 = .01). For the main effect of ROI, post hoc testing revealed a higher activation (oxy-Hb↑; p < .05) over: (i) right-TEMP compared to all other ROIs except for the left-tempPAR; (ii) left-tempPAR compared to left-preFRO, right-preFRO, and left-TEMP; (iii) right-FRO and right-froTEMP compared to left-preFRO. For the interaction ROI x GROUP, post hoc t-testing detailed that higher activations (oxy-Hb↑; p < .05) in right-TEMP compared to the other ROIs were driven only by the bilingual group (right-TEMP > left-preFRO, right-preFRO, left-FRO, right-FRO, left-froTEMP, right-froTEMP, left-TEMP, right-tempPAR), whereas no significant differences were found in the monolingual group. Additionally, post hoc testing showed that over right-TEMP bilinguals (M = 1.47, SD = 1.90) showed significantly higher activations (oxy-Hb↑; t (32) = 2.17, p < .05, d = 0.78) than monolinguals (M = 0.19, SD = 1.51). Figure 3 (right side) provides the corresponding bar plots for stimulus-locked oxygenation changes in right-TEMP. None of the group comparisons in the other ROIs were significant.
Effect of German versus Spanish
This ANOVA was performed in the BLFA sample only (n = 12 German datasets; n = 13 Spanish datasets). It revealed a main effect of ROI (F (9, 90) = 3.75, p < .05, Greenhouse-Geisser corrected, ηp 2 = .27), while main effect (F (1, 10) = .35, ns, Greenhouse-Geisser corrected, ηp 2 = .03) and interaction (F (9, 90) = 1.02, ns, Greenhouse-Geisser corrected, ηp 2 = .09) involving the factor LANGUAGE did not reach statistical significance. The main effect of ROI was driven by higher activations (oxy-Hb↑) over right-TEMP compared to all other ROIs (p < .01) except for right-tempPAR.
To sum up, while the analysis of the behavioral data did not show any significant differences between groups, fNIRS data indicated that familiarization of the novel word with a property related deictic gesture elicited a prominent activation over right temporal areas. Most notable for our research question is that bilinguals showed a significant larger activation in these areas compared to monolingual peers.
4. Discussion and conclusion
Contrary to our hypothesis, five-year-olds with bilingual versus monolingual language learning experience showed equally strong tendencies towards interpreting novel words as category labels, even though a pragmatic gestural cue supported an intended property meaning. However, a measure of functional brain activation showed larger responses in bilinguals compared to monolinguals over a right temporal region of interest.
The finding of no behavioral difference is unexpected given that several behavioral studies suggest a general bilingual advantage for interpreting pragmatic deixis (Brojde et al., Reference Brojde, Ahmed and Colunga2012; Yow, Reference Yow2015; Yow & Markman, Reference Yow and Markman2011a). In contrast to our study, however, the pragmatic cues provided in these studies targeted whole objects (Yow & Markman, Reference Yow and Markman2011a), a group of objects (Brojde et al., Reference Brojde, Ahmed and Colunga2012), or the referents' locations in space (Yow, Reference Yow2015), thus supporting noun or pronoun interpretations. Pointing gestures support noun learning in isolation (e.g., Kalagher & Yu, Reference Kalagher and Yu2006), whereas studies investigating the interpretation of descriptive, property-indicating gestures usually supply additional learning cues: O'Neill et al. (Reference O'Neill, Topolovec and Stern-Cavalcante2002) implicitly presented two further cues by using familiar objects (providing a MEC cue) and by embedding the novel word in a syntactic adjective frame. Similarly, Hall et al. (Reference Hall, Williams and Bélanger2010) showed that a descriptive gesture successfully supports adjective learning in four-year-olds when it is presented in combination with a syntactic adjective context.
Thus, while pointing gestures used in bilingual-monolingual comparisons may be sufficient for a noun interpretation of a novel word, descriptive gestures in isolation may not supply a sufficient property cue for the generally dispreferred adjective interpretation. Using more property focus trials (instead of the single one in our experiment's short version) could have helped children to better understand the association between the descriptive gesture and the property of the object in our challenging isolating context. In addition, the deliberate uniformity of the descriptive gesture as used in our study may help to explain the lack of bilingual-monolingual differences: the same gesture was used for all properties, instead of different gestures highlighting particular characteristics of different surfaces. This may have decreased the transparency of the gestural property reference, thus potentially obscuring differences between bilinguals and monolinguals. Furthermore, in spite of a non-virtual real life presentation, the presentation on a screen may have neutralized any potential bilingual advantage. Future studies should target the question whether more natural gesturing in relation to physically presented objects and/or a combination with additional pragmatically based learning cues can further inform the expected behavioral advantage for bilingual children in this task. The procedure used in the present study may just have been too challenging to allow for an adjective interpretation to emerge on the behavioral level.
Our neuroimaging results, however, support the notion that there is in fact a group difference: fNIRS data indicated that bilingual and monolingual children processed the gestural cue used to support a property interpretation differently. Using fNIRS we covered cortical areas involved in the widely distributed neuronal network associated with the interpretation of communicative intentions expressed through extra-linguistic means (Enrici et al., Reference Enrici, Adenzato, Cappa, Bara and Tettamanti2011). While brain activation did not differ in the network's components that are considered to underlie the recognition of other people's perspectives and thoughts (i.e., TPJ, mPFC, Precuneus; Van Overwalle & Baetens, Reference Van Overwalle and Baetens2009), we found larger responses in bilinguals compared to monolinguals over a right temporal area including the posterior part of the superior temporal sulcus (STS). Thus, our fNIRS data support the notion that at five years of age the bilingual brain may be more ready to process and/or integrate gestural pragmatic information in the context of novel adjective learning. Moreover, the key role for this right temporal area is supported by the finding that bilinguals showed an overall larger functional activation in this region of interest in both the German and Spanish experimental versions.
Our fNIRS findings support recent claims made for monolingual adults about the prominent role of the right posterior STS for understanding gestural and sociolinguistic processing (Deen, Koldewyn, Kanwisher & Saxe, Reference Deen, Koldewyn, Kanwisher and Saxe2015; Lahnakoski, Glerean, Salmi, Jääskeläinen, Sams, Hari & Nummenmaa, Reference Lahnakoski, Glerean, Salmi, Jääskeläinen, Sams, Hari and Nummenmaa2012; von dem Hagen, Nummenmaa, Yu, Engell, Ewbank & Calder, Reference von dem Hagen, Nummenmaa, Yu, Engell, Ewbank and Calder2011). Neuroimaging studies investigating its relevance for gestural processing in typically developing children are rare (but see Dick, Goldin-Meadow, Solodkin & Small, Reference Dick, Goldin‐Meadow, Solodkin and Small2012). However, a key role of the right STS in pragmatic processing is additionally supported by literature on children suffering from autism spectrum disorder (ASD), who show reduced pragmatic skills (see for reviews Pelphrey, Yang & McPartland, Reference Pelphrey, Yang, McPartland, Andersen and Pine2014; Saitovitch, Bargiacchi, Chabane, Brunelle, Samson, Boddaert & Zilbovicius, Reference Saitovitch, Bargiacchi, Chabane, Brunelle, Samson, Boddaert and Zilbovicius2012). Hubbard, McNealy, Zeeland, Ashley, Callan, Bookheimer, and Dapretto (Reference Hubbard, McNealy, Zeeland, Ashley, Callan, Bookheimer and Dapretto2012) report that right STS activation in response to co-speech gestures is smaller in ASD compared to typically developing children. Moreover, functional connectivity of STS may be delayed or atypical (Alaerts, Nayar, Kelly, Raithel, Milham & Di Martino, Reference Alaerts, Nayar, Kelly, Raithel, Milham and Di Martino2015) and STS may show anatomical abnormalities in ASD populations (Boddaert, Chabane, Gervais, Good, Bourgeois, Plumet, Barthélémy, Mouren, Artiges, Samson, Brunelle, Frackowiak & Zilbovicius, Reference Boddaert, Chabane, Gervais, Good, Bourgeois, Plumet, Barthélémy, Mouren, Artiges, Samson, Brunelle, Frackowiak and Zilbovicius2004). The current study shows a potential contribution of the STS to heightened pragmatic processing in a bilingual population, nicely complementing the abnormal functioning in the ASD population. Notably, our study suggests such a role of the STS for pragmatic processing in a younger population (age 5) than has been studied so far (e.g., Alaerts et al., Reference Alaerts, Nayar, Kelly, Raithel, Milham and Di Martino2015: age 7).
Another point requiring discussion pertains to language-specific characteristics (German vs. Spanish). The bilinguals tested in the current study were acquiring German and Spanish from birth and were compared to monolinguals growing up in an only-German-speaking environment. Contact with a specific language group's gesturing style rather than the bilingual language experience itself might explain the observed bilingual-monolingual processing differences found through fNIRS. Müller (Reference Müller1998) has shown both similarities and differences in German and Spanish co-speech gesture behaviors of monolingual adults. However, the absence of any behavioral and/or neurophysiological differences between the Spanish and German experimental versions within the bilingual group speaks against such a potential language-specific effect. It could be argued that in the bilinguals general gesture interpretation skills were promoted by Spanish language learning and transferred to German language learning (or, indeed, the other way round). In that case, any potential effect would still stem from experience with a particular language and would not from bilingualism per se. Future research may address this question by testing e.g., German–Danish bilingual children, assuming there is greater similarity between German and Danish co-speech gesture behavior. Alternatively, Spanish monolingual children could be included as a third group to clarify a possible effect of bilingualism rather than the particular language being acquired. Future research may also control for the participants' socioeconomic status (SES) that was not assessed in the current study. As SES has been found to make a difference in very young children's early lexical development (e.g., Hart & Risley, Reference Hart and Risley1995; Hoff, Reference Hoff, Bornstein and Bradley2003) this might cause a potential limitation to the study. Nevertheless, this seems to be unlikely, because there is no clear indication in the literature that SES also affects pragmatic word learning strategies. Furthermore, German–Spanish bilingualism in Germany has no association to lower or higher class status.
Differences between bilingual versus monolingual children in experimental tasks are often attributed to more advanced executive functioning skills in bilinguals. In particular, a growing body of research has suggested increased inhibitory control in bilingual children (e.g., Bialystok, Barac, Blaye & Poulin-Dubois, Reference Bialystok, Barac, Blaye and Poulin-Dubois2010; Carlson & Meltzoff, Reference Carlson and Meltzoff2008; Crivello, Kuzyk, Rodrigues, Friend, Zesiger & Poulin-Dubois, Reference Crivello, Kuzyk, Rodrigues, Friend, Zesiger and Poulin-Dubois2016). Likewise, Yoshida et al. (Reference Yoshida, Tran, Benitez and Kuwabara2011) propose a more efficient suppression of the whole object bias in bilinguals. Enhanced inhibitory control could have been beneficial for our experiment because children had to inhibit their overall preference for associating novel words with whole objects (Markman, Reference Markman1994) or objects of the same shape (e.g., Landau et al., Reference Landau, Smith and Jones1988) in order to follow the gesture's property reference. Yet we find no evidence of such inhibition. Our neurophysiological evidence shows an effect of bilingualism over a right temporal area. Converging evidence in monolingual children and adults projects the neuronal correlates of inhibitory control elsewhere, that is, onto frontal and prefrontal cortices (e.g., Banich & Depue, Reference Banich and Depue2015; Janssen, Heslenfeld, van Mourik, Logan & Oosterlaan, Reference Janssen, Heslenfeld, van Mourik, Logan and Oosterlaan2015; Mehnert, Akhrif, Telkemeyer, Rossi, Schmitz, Steinbrink, Wartenburger, Obrig & Neufang, Reference Mehnert, Akhrif, Telkemeyer, Rossi, Schmitz, Steinbrink, Wartenburger, Obrig and Neufang2013; Tsuji & Watanabe, Reference Tsujii and Watanabe2010). Our imaging data failed to show any group differences over frontal and prefrontal cortices that could have indicated bilingual-monolingual differences in inhibitory control. Although the lack of a difference in fNIRS recordings may be due to a number of other factors, the prominent result over right STS speaks for an enhanced sensitivity to pragmatic gestures in bilinguals.
Our experiment, targeting children's approach to adjective learning from a linguistically ambiguous input, was constructed around a single pragmatic cue. As discussed above, a single pragmatic cue may not be sufficient for the property interpretation of a novel word for either bilinguals or monolinguals. The ‘real’ world, however, constitutes a less challenging hybrid context, where several cues are present that interact with each other.
What can the current study tell us, then, about the broader question of how young children learn adjectives in everyday life, that is, in such a hybrid context? Our neurophysiological results suggest that bilingual and monolingual preschoolers differ in their sensitivity to pragmatic cues and may weight them differentially in relation to other cues in a hybrid context: bilinguals may generally rely more strongly on pragmatic cues, whereas monolinguals may adhere more strongly to object-inherent cues, such as object shape (Brojde et al., Reference Brojde, Ahmed and Colunga2012) or object familiarity, allowing for the application of the Mutual Exclusivity Constraint (e.g., Davidson et al., Reference Davidson, Jergovic, Imami and Theodos1997).
Besides pragmatics and word learning principles children can rely on additional cues such as morphological and/or syntactic markers (e.g., Landau et al., Reference Landau, Smith and Jones1992; Mintz, Reference Mintz2005; Hiramatsu, Rulf & Epstein, Reference Hiramatsu, Rulf and Epstein2010; Rayas Tanaka, Reference Rayas Tanaka2014; Song, Nazzi, Moukawane, Golinkoff, Stahl, Ma, Hirsh-Pasek & Connell, Reference Song, Nazzi, Moukawane, Golinkoff, Stahl, Ma, Hirsh-Pasek and Connell2010), prosodic features (Hall & Moore, Reference Hall and Moore1997), and property- or object-inherent characteristics (e.g., Hall, Reference Hall1994; Sandhofer & Smith, Reference Sandhofer and Smith2004; Smith et al., Reference Smith, Jones and Landau1992). Following work on monolingual children (Hall et al., Reference Hall, Williams and Bélanger2010), future studies should clarify which combination of cues allows for adjective learning in children with bilingual input from birth (BFLA), but also in other bilingual populations, e.g., in children who started out learning a first language and added another one later on (Early Second Language Acquisition, ESLA).
A more comprehensive approach like that would not only be of theoretical significance, but would potentially allow for the development of clinical intervention programs that are tailored to the particular needs of children constituting different populations (that is, BFLA, ESLA, MFLA) and who face difficulties with learning new words (as found, for instance, in children with specific language impairment, Rice & Hoffman, Reference Rice and Hoffman2015). Although neurophysiological methods may not be suited for broad application to guide such development, the current study clearly demonstrates that brain activation patterns supply valuable information towards developing a comprehensive model of how bilingual and monolingual children differ in their trajectories during the language development process.
In conclusion, our study substantially broadens our knowledge on the challenging task of novel adjective learning through pragmatic cues. Different from other studies on adjective learning, our study strictly controlled the type and number of learning cues provided: we used a single pragmatic gesture and neutralized other linguistic and object-inherent cues. In selecting same aged preschoolers with bilingual input from birth in the bilingual group, we increased the comparability to their monolingual peers, keeping the overall time for language learning constant. Opening a novel methodological approach to this field of research, we combined behavioral and neurophysiological measures. Whereas we found no bilingual-monolingual differences on the behavioral level, the neurophysiological data offered clear evidence for different processing mechanisms in both participant groups: fNIRS revealed a higher activation in bilinguals than monolinguals over a cerebral region close to the posterior part of the right superior temporal sulcus (STS). This result is compatible with claims of the prominent role of the STS in processing pragmatic gestures. Additionally, it reflects a heightened pragmatic sensitivity in bilingual children. Future work is needed to investigate whether this heightened pragmatic sensitivity helps young bilinguals in acquiring novel adjectives in everyday life.