Introduction
Languages display variability in how they express experiential domains. These cross-linguistic differences not only influence how speakers talk about these domains but also how they think about the domains (e.g., Choi & Bowerman, Reference Choi and Bowerman1991; Evans & Levinson, Reference Evans and Levinson2009). This view, called the strong Sapir–Whorf hypothesis (Sapir, Reference Sapir and Mandelbaum1961; Whorf, Reference Whorf, Carroll, Levinson and Lee1956), posits an extended effect of language on cognition, an effect that is present not only when speaking but also when not speaking (Lucy, Reference Lucy1992a, Reference Lucy1992b). A weaker version of the Sapir–Whorf hypothesis holds that language has a more limited and transient effect on cognition. This view is reflected in Slobin’s (Reference Slobin, Gumperz and Levinson1996, Reference Slobin, Strömqvist and Verhoeven2004) thinking-for-speaking account, which proposes that language influences cognition during online production of speech, but not beyond speech production.
In this article, we use the gestures people produce to describe an event to explore the impact of language on thinking. We know from previous work that cross-linguistic differences in how an event is described in speech can also be found in the gestures that accompany speech (co-speech gesture, Kita & Özyürek, Reference Kita and Özyürek2003; Özçalışkan et al., Reference Özçalışkan, Lucero and Goldin-Meadow2016a). These findings suggest an effect of language on thinking-for-speaking that goes beyond the words in a communicative act. Here, we explore gesture with speech in children aged 3 to 12 years to determine when language first has an effect on thinking for speaking during the act of speaking but measured by gesture.
We also know from previous work that cross-linguistic differences found in co-speech gesture do not appear when people are asked to describe the same event in gesture without speech (silent gesture, Özçalışkan et al., Reference Özçalışkan, Lucero and Goldin-Meadow2016b, Reference Özçalışkan, Lucero and Goldin-Meadow2018). These findings point to limits, even within communication, on the strong form of the Sapir–Whorf hypothesis. We therefore observe gesture without speech in the same children to determine whether limits on how language affects thinking appear during childhood and, if so, when.
We investigated co-speech gesture and silent gesture in descriptions of motion events. A motion event consists of four key elements (Talmy, Reference Talmy and Kimball1985, Reference Talmy2000): a figure that moves (e.g., woman or boy), a ground anchoring the figure’s movement (e.g., house or bridge), a path marking the direction of the figure’s movement (e.g., into or across), and a manner specifying the pattern of the figure’s motion (e.g., run or crawl).
Speakers of different languages largely follow a binary cross-linguistic split in how they package manner and path elements of a motion event: satellite-framed languages (e.g., German, English, and Polish) and verb-framed languages (e.g., Spanish, Turkish, and Korean; Cardini, Reference Cardini2010; Choi & Lantolf, Reference Choi and Lantolf2008; Chui, Reference Chui2009, Reference Chui2012; Gennari et al., Reference Gennari, Sloman, Malt and Fitch2002; Ibarretxe-Antuñano, Reference Ibarretxe-Antuñano2004; Lewandowski & Özçalışkan, Reference Lewandowski and Özçalışkan2021; Naigles et al., Reference Naigles, Eisenberg, Kako, Highter and McGraw1998; Özçalışkan & Slobin, Reference Özçalışkan, Slobin, Greenhill, Littlefield and Tano1999, Reference Özçalışkan, Slobin, Özsoy, Nakipoglu-Demiralp, Erguvanli-Taylan and Aksu-Koç2003; Tütüncü et al., Reference Tütüncü, Emerson, Şengül, Kenesevic and Özçalışkan2023). Speakers of satellite-framed languages, such as English, prefer a conflated packaging strategy, placing manner information in the verb and path information in a satellite to the verb (preposition or particle) within the same clause (e.g., girl RUNS INTO [manner path] the house). In contrast, speakers of verb-framed languages, such as Turkish, rely on a separated packaging strategy, typically placing path information in the verb and manner information in an additional subordinate clause (e.g., kız eve GÌRER [path] KOŞARAK [manner] ‘girl house-to ENTER [path] RUNNING [manner]’; Allen et al., Reference Allen, Özyürek, Kita, Brown, Furman, Ishizuka and Fujii2007; Özçalışkan & Slobin, Reference Özçalışkan, Slobin, Greenhill, Littlefield and Tano1999; Slobin, Reference Slobin, Strömqvist and Verhoeven2004). Adult Turkish speakers also often convey only path, omitting manner from their descriptions of motion in speech (e.g., Özçalışkan, Reference Özçalışkan, Guo, Lieven, Ervin-Tripp, Budwig, Nakamura and Özçalışkan2009, Reference Özçalışkan2016).
Speakers of Turkish and English also follow a two-way split in their ordering of semantic elements in descriptions of motion events in speech. Adult speakers of English use a Figure–MOTION–Ground order, which locates motion in the middle position in the clause (e.g., she [figure] RUNS [motion] into house [ground] – consistent with the canonical subject–verb–object (SVO) order of the language. Adult speakers of Turkish, on the other hand, use a Figure–Ground–MOTION order, locating the key motion element at the final position in the clause (e.g., eve GÌRER ‘she [figure] house-to [ground] ENTER [motion]’) – consistent with the canonical subject–object–verb (SOV) order of Turkish (Özçalışkan et al., Reference Özçalışkan, Lucero and Goldin-Meadow2016b, Reference Özçalışkan, Lucero and Goldin-Meadow2018).
Effects of language on thinking-for-speaking that go beyond words: Co-speech gesture
Adult speakers of Turkish and English display the same cross-linguistic patterns in their co-speech gestures. English speakers express manner and path components of motion simultaneously within a single gesture (e.g., rotating the hand as it moves down, manner+path), and Turkish speakers express each component in separate gestures (e.g., rotating the hand and then moving the hand down, manner–path; Kita & Özyürek, Reference Kita and Özyürek2003; Özçalışkan et al., Reference Özçalışkan, Lucero and Goldin-Meadow2016a, Reference Özçalışkan, Lucero and Goldin-Meadow2018). Similarly, English speakers prefer to place the motion gesture in the middle of a gesture string; Turkish speakers prefer to place the motion gesture at the end of the gesture string (Goldin-Meadow et al., Reference Goldin-Meadow, So, Özyürek and Mylander2008; Özçalışkan et al., Reference Özçalışkan, Lucero and Goldin-Meadow2016b, Reference Özçalışkan, Lucero and Goldin-Meadow2018; Tütüncü et al., Reference Tütüncü, Emerson, Şengül, Kenesevic and Özçalışkan2023; see Methods section, for further details and examples of packaging and ordering of motion elements). These co-speech gesture patterns indicate an effect of language on thinking that goes beyond the words in a communicative act.
When do these language-specific patterns emerge in speech and gesture? Child learners of English or Turkish begin to follow language-specific patterns of motion expression in speech at a relatively young age (Allen et al., Reference Allen, Özyürek, Kita, Brown, Furman, Ishizuka and Fujii2007; Hickmann et al., Reference Hickmann, Taranne and Bonnet2009; Hohenstein, Reference Hohenstein2005; Özçalışkan & Slobin, Reference Özçalışkan, Slobin, Greenhill, Littlefield and Tano1999). Beginning at age 3–4 years, English learners use conflated descriptions (e.g., she runs into house), while Turkish learners rely on separated descriptions that typically convey only path information (e.g., Eve girdi ‘She entered the house’; Allen et al., Reference Allen, Özyürek, Kita, Brown, Furman, Ishizuka and Fujii2007; Özçalışkan, Reference Özçalışkan, Guo, Lieven, Ervin-Tripp, Budwig, Nakamura and Özçalışkan2009; Özçalışkan & Slobin, Reference Özçalışkan, Slobin, Greenhill, Littlefield and Tano1999). Children learning English or Turkish also show early sensitivity to the canonical ordering of semantic elements in their speech production (Ekmekçi, Reference Ekmekçi, Slobin and Zimmer1986; Radford, Reference Radford1990; Slobin & Bever, Reference Slobin and Bever1982) – Figure–MOTION–Ground in English (the girl ran towards the fence) versus Figure–Ground–MOTION in Turkish (kız çite dogru koştu ‘she fence-TO towards RAN).
The developmental findings for co-speech gesture are less conclusive. Children increase their production of representational iconic co-speech gestures, depicting features of objects (e.g., holding cupped hands in air to form a ball shape) or actions on objects (e.g., moving an empty palm forward as if throwing a ball), at age 3 or 4 years (McNeill, Reference McNeill1992; Özçalışkan & Goldin-Meadow, Reference Özçalışkan, Goldin-Meadow, Stam and Ishino2011). However, little is known about cross-linguistic patterns in children’s early iconic gestures. In addition to being sparse, the literature on children’s co-speech gesture production focuses exclusively on packaging of motion, with largely inconclusive findings: Some studies find language-specific gesture patterns in packaging around age 3–6 years (e.g., Özçalışkan, Reference Özçalışkan2007); others find a more extended timeline for language-specific gestures (e.g., Özyürek et al., Reference Özyürek, Kita, Allen, Brown, Furman and Ishizuka2008). However, work on comprehension of co-speech gestures shows that children aged 3–4 have greater difficulty understanding co-speech gestures that do not follow language-specific patterns than co-speech gestures that do follow language-specific patterns (Glaser et al., Reference Glaser, Williamson and Özçalışkan2018), suggesting early attunement to language-specific patterns in co-speech gesture. There is no existing cross-linguistic work that examines developmental changes in learning language-specific patterns of ordering in the description of motion events.
Limits on the effects of language on thinking, even during communication: Silent gesture
Interestingly, the language-specific patterns found in the gestures adults produce when speaking (co-speech gesture) are not found when adults describe the same events in gesture without speech (silent gesture, Özçalışkan et al., Reference Özçalışkan, Lucero and Goldin-Meadow2016a, Reference Özçalışkan, Lucero and Goldin-Meadow2016b, Reference Özçalışkan, Lucero and Goldin-Meadow2018). This finding indicates limits on the effects that language has on thinking, even within a communicative act. When is this limit first seen?
We know little about the development of silent gesture. The few studies that have been conducted focus on the structure of these gestures in speakers of a particular language (e.g., in 6- and 8-year-old German speakers, Bohn et al., Reference Bohn, Kachel and Tomasello2019; in 4- and 12-year-old British English speakers, Clay et al., Reference Clay, Pople, Hood and Kita2014). But these studies do not compare silent gesture to co-speech gesture in the same children nor do they compare silent gesture in child speakers of different languages. Nonetheless, the studies provide evidence for language-like structures – conventionality, abstraction, segmentation – in children’s early silent gestures. These patterns are largely independent of the grammatical structure of the child’s native spoken language. No work has yet examined similarities and differences in silent gesture in child speakers of different languages.
We address these gaps and inconclusive findings in our study by observing speech, co-speech gesture, and silent gesture in child speakers of two structurally different languages (Turkish, English) over a broad age span (3 to 12 years). We ask two questions: (1) When do language-specific patterns in children’s co-speech gestures appear in development? Based on the currently inconclusive literature on co-speech gesture, we expect that children will show language-specific adult-like patterns in co-speech gesture either later than (>ages 3–4) or at the same time as (ages 3–4) they show language-specific speech. If so, we will have evidence that language has an early effect on thinking that goes beyond words during communication (2). When do children first display cross-linguistic similarities in their silent gestures? Given the scarcity of work on silent gesture, we expect that children might or might not show the cross-linguistic similarities that adults exhibit in silent gesture at an early age (3–4). If the former holds true, we will have evidence that limits on language’s effect on thinking, even during communication, appear early in development.
Overall, our study, provides the first comprehensive analysis of developmental changes in co-speech and silent gesture in two structurally different languages, using a new corpus. It focuses on both emergence of language-specific patterns (or their lack) in packaging and ordering of semantic elements in the expression of motion events – a domain whose expression shows systematic variability and patterned regularities in adult speech and gesture.
Methods
Sample
Participants were 100 children, learning either English (n = 50) or Turkish (n = 50) as their native language, each equally divided into 5 age-groups: 3–4 (M age = 4;2 [SD = 0;5]), 5–6 (M age = 5;8 [SD = 0;7]), 7–8 (M age = 7;11 [SD = 0;7]), 9–10 (M age = 10;1 [SD = 0;8]), and 11–12 (M age = 11;11 [SD = 0;7]) years with roughly comparable numbers of boys and girls in each group; see Table 1 for sample characteristics by language and age. The choice of this age range was based on earlier work, which showed that language-specific patterns in descriptions of motion arise in children’s speech at age 3–4 years (Özçalışkan & Slobin, Reference Özçalışkan, Slobin, Greenhill, Littlefield and Tano1999) and in their gestures between ages 3 and 9 (Özçalıskan et al., Reference Özçalışkan, Gentner and Goldin-Meadow2014; Özyürek et al. Reference Özyürek, Kita, Allen, Brown, Furman and Ishizuka2008). Earlier work (Özçalışkan, Reference Özçalışkan, Guo, Lieven, Ervin-Tripp, Budwig, Nakamura and Özçalışkan2009) suggested that 10 children in each group would give 84% power for the detection of significant effects for p values <.05 and an effect size of η2 = 0.08. The data from children speaking English and Turkish were gathered in the United States and Turkey, respectively, as part of a bigger research project that focused on patterns of gesturing in blind individuals.Footnote 1 Participants’ families received monetary compensation. Data from 6 additional participants were excluded (n = 3/language) due to either speech production difficulties (e.g., stuttering) or failure to complete the experiment. The study was carried out in accordance with the Code of Ethics for the protection of human research participants. The protocol was approved by an American research university institutional review board and informed consent was obtained from the participants’ families prior to their children’s participation in the study.
Abbreviations: F, female; M, male; SD, standard deviation.
Procedure
Data collection
Children were asked to describe 8 three-dimensional scenes one at a time. Each scene showed motion in one of three path types (to, from, over) in relation to different landmarks (house, carpet, hurdle) with various manner types (e.g., run, jump, crawl). Each scene was glued onto a small board; the scene included a landmark and three stationary yet varying poses of the same doll, depicting a motion event involving both manner and path. Children were first introduced to the figure – with the name Oya in Turkish and Eve in English – and told that she would do different kinds of activities in different scenes involving various objects. Children were also explicitly informed that the figure would be repeated 3 times in each scene, but as part of one continuous movement. The motion scenes were presented with counterbalanced order in two blocks with 4 items (see Table 2).
Children provided descriptions for the scenes in two conditions: with speech while naturally moving their hands (co-speech gesture condition: ‘tell me what is happening in this scene using both your words and your hands’); and using their hands without any speech (silent gesture condition: ‘tell me what is happening in this scene but only using your hands without speaking’). The descriptions for all the scenes were first elicited in speech (and co-speech gesture) followed by silent gesture. We did not counterbalance the two conditions to eliminate any effect that silent gesture might have had on the naturalness of children’s co-speech gesture. Each child did two practice trials prior to describing the scenes in each condition; these familiarization trials were not included in any of the analyses. The co-speech and silent gesture conditions were separated from each other by two other unrelated tasks – one on metaphors and one on narratives – thus eliminating any possible immediate effect of responses in the co-speech condition on responses in the silent gesture condition. Children were not allowed to touch the scenes with their hands (see Fig. 1 for data collection set-up). We explicitly asked children to use their hands along with their words in the co-speech gesture condition to elicit comparable amounts of gesture production across ages and languages.
Transcription and coding
Children’s speech responses for each language in the co-speech gesture condition were transcribed by native speakers of that language; they were then parsed into sentence units based on earlier work (Özçalışkan, Reference Özçalışkan2016; Özçalışkan et al., Reference Özçalışkan, Lucero and Goldin-Meadow2016b, Reference Özçalışkan, Lucero and Goldin-Meadow2018). A sentence unit was defined as consisting of a verb, along with the arguments and subordinate clauses associated with it (e.g., She is running into house; Eve koşuyor ‘house-to running’; and Eve giriyor koşarak ‘House-to enters running’). All gesture responses children produced in the two conditions (co-speech, silent) were also coded. We defined gesture as movements of the hand or body that characterized movements or features of the scenes for communicative purposes. We further coded each sentence unit for (1) packaging and (2) ordering of semantic elements.
Packaging
Speech and gestures in each sentence unit were classified as either conflated or separated (see Fig. 2). Conflated sentence units included responses in which both motion components (manner+path) were expressed in a single clause or gesture. Separated sentence units included responses that expressed only manner (e.g., she runs, koşar ‘runs’), only path (e.g., she enters the house, ev-e girer ‘house-to enter’), or manner and path, expressed in separate gestures or separate clauses (e.g., eve girer koşarak ‘house-to enters running’) – a speech response produced once in English, but relatively often in Turkish (54 instances).
Ordering
Speech and gesture strings in each sentence unit were classified as following either Figure–MOTION–Ground order or Figure–Ground–MOTION order (see Fig. 3). Assigning spoken responses to one of these two orders was based on the location of the primary motion component, namely, the main verb that typically expressed path (Turkish) or manner (English). When expressed, secondary motion elements (e.g., prepositions, particles, and adjectives) conveying path in English and manner in Turkish were always associated with the main verb. Assigning gesture strings to either one of the orders was based on the location of the motion gesture that frequently expressed manner+path in English and only-manner, only-path, or sequential manner–path gesture in Turkish. Manner–path sequential gestures were always contiguous (i.e., followed each other without any other gesture in between) but were infrequent in both languages (English: 7 instances, Turkish: 15 instances). When combining gestures into strings, children were likely to combine a gesture for motion with a gesture for landmark, leaving out a gesture for the figure (Fig. 3A1,B1). Children typically expressed the ground element with a sideways or downward facing palm (e.g., right palms in Fig. 3A2,B2).
Children sometimes showed a mixed pattern: They used a separated and a conflated gesture together in a single sentence unit. These instances were infrequent; M = 1.8% of sentence units (range = 0.7%–3.5%) across languages and conditions. We omitted all the ‘mixed’ instances from our analysis for packaging, as we could not classify them as belonging to either packaging category. Most of the mixed sentence units in each language were also omitted from the analysis of ordering because they expressed either only motion or a mixed ordering pattern. However, we included responses that had mixed packaging but followed consistent ordering (English: 5 instances, Turkish: 10 instances) in the order analysis – constituting 38% of the small number of responses in the mixed category in both languages.
Children predominantly represented different motion elements with each hand (placing flat left palm on the left side of the body to represent house, placing the right index finger on the right side of the body as if figure, moving fingers of the right hand left to right to convey motion) in both co-speech gesture (67%) and silent gesture (77%). The use of both hands – with or without the accompanying bodily enactment – to represent a single motion element (e.g., rapidly circling both arms simultaneously to convey running, crawling forward on carpet on all fours to convey crawling forward, and placing downward facing palms in the shape of an inverse V as if house) was relatively less frequent, accounting for 33% of the co-speech and 23% of the silent gestures. The majority of the gestures were produced in the air (~80%) in both co-speech and silent gesture, but some (~20%) were also produced either on the body (e.g., hopping fingers on the upper leg to convey jumping) or on the table in front of the participant (e.g., placing the cupped left hand on the table as if carpet and crawling fingers of the right hand over the left hand as if crawling over landmark).
We assessed reliability with independent coders who were native speakers of each language. They coded a randomly selected 10% of the data for each age and condition in each language. Intercoder agreement was high: 97% for identification of gestures, 100% for description of gesture type, and 97% and 99% for categorization of motion elements for gesture and speech.
Analysis
The count data were analyzed using Bayesian mixed-effects Poisson generalized linear models implemented in the stan_glmer() function of the rstanarm package (Goodrich et al., Reference Goodrich, Gabry, Ali and Brilleman2020). The models provide fully-Bayesian inference with Markov Chain Monte Carlo (MCMC) sampling of posterior distributions to produce parameter estimates. The ‘mixed effects’ approach allows us to better estimate effects of interest by modeling and controlling for the idiosyncratic contributions of nuisance variables to the outcome (e.g., one stimulus scene eliciting more gestures than another stimulus scene). The Poisson linking function in the Bayesian mixed-effects models works as in any typical generalized linear model: it links a discrete dependent variable with a set of independent variables.
In our analysis, we specified our mixed-effects structures according to the ‘keep it maximal’ principle (Barr et al., Reference Barr, Levy, Scheepers and Tily2013); that is, by including random subject and scene slopes for the fixed-effect term of interest where the design allowed for their estimation, as well as random intercepts for subject and scene. We adopted the ‘weakly informative’ default priors provided by stan_glmer(), which show normal distribution that is centered at 0, with standard deviations of 2.5. We increased the adapt_delta, number of chains, iterations per chain, and warmup iterations where it was necessary to get stable parameter estimates with no divergent transitions. All models we reported converged well with Rhat values that fell within .005 of 1.0. All parameter estimates had associated effective sample sizes of at least 1000.
We submitted the fitted models to bayesfactor models() or describe_posterior() functions from the bayestestR package (Makowski et al., Reference Makowski, Ben-Shachar, Chen and Lüdecke2019) to support inferences from the modeling results. For each model, we report values describing the 90% credible interval (90% CI) based on the highest density interval (HDI) of the posterior distribution. The highest density interval corresponds to a range of the posterior within which all points inside the bounds of the interval have a higher probability density than the points outside the bounds of the interval. The interpretation of a 90% (HDI) CI meant that there is a 90% chance that the true parameter would fall within the CI range. The point estimate of the parameter (b) value that we provide with each 90% CI is the median of that highest-density interval of the posterior. We also report a Bayes factor with each result, which is a ratio that quantifies how strongly the data support the null hypothesis, as opposed to the alternative hypothesis, namely, values under 1.0 in favor of the null and values above 1.0 in favor of the alternative hypothesis. For example, a Bayes factor of 12.0 would mean that the alternative hypothesis is 12 times more likely than the null hypothesis (given the priors and the data), whereas a Bayes factor of 0.5 would mean the null hypothesis is twice as likely as the alternative hypothesis. For a given effect, the characterization of the strength of support (e.g., ‘the data provided strong/moderate/anecdotal evidence that…’) corresponded to the interpretation of the associated Bayes factor, using language adopted from Lee and Wagenmakers (Reference Lee and Wagenmakers2014). Anonymized data summaries and coding manuals can be found at the link: https://osf.io/fse6r/?view_only=a90669bcf98b4dcd8e4e24b3aef1d32b.
Results
Packaging motion elements
Speech
Children learning English or Turkish differed in the way they packaged motion components in speech (language x packaging interaction; b = 1.93, 90% CI = [1.31, 2.52], BF > 100; Fig. 4A). Children learning English preferred conflated to separated packaging (b = −0.82, 90% CI = [−1.09, −0.53], BF > 100), expressing manner and path in the same clause (e.g., she runs into the house). Conversely, children learning Turkish preferred separated to conflated packaging (b = 1.36, 90% CI = [0.71, 2.03], BF = 8.4), describing similar scenes by expressing path by itself (Ev-e giriyor ‘house-to entering’), manner by itself (koşuyor ‘running’), or path in the main clause and manner in the subordinate clause (eve girer koşarak ‘house-to enter running’). The language-specific patterns in speech were evident by 3–4 years (b = 2.61, 90% CI = [1.80, 3.54], BF > 100) and remained relatively stable over developmental time (BFs > 50). The 11–12 age-group showed the same pattern, but this cross-linguistic difference was not reliable (b = .99, 90% CI = [0.36, 1.60], BF = 1.27).
Co-speech gesture
Co-speech gesture showed the same pattern of cross-linguistic differences as speech (language x packaging interaction; b = 1.54, 90% CI = [1.28, 1.79], BF > 100, Fig. 4B). Children learning English used a greater number of gestures with conflated than with separated packaging (b = −1.47, 90% CI = [−1.96, −0.97], BF > 100), expressing manner and path in one gesture (e.g., running fingers [manner] as the hand moved along a forward trajectory [path]). Turkish speakers, in contrast, opted for more separated than conflated responses (b = 1.05, 90% CI = [0.53, 1.51], BF = 12.1), producing a gesture for either path (e.g., moving finger forward) or manner (e.g., running fingers in place), or using two sequential gestures (one expressing path and one expressing manner) in the same sentence unit (e.g., running fingers in place, followed by moving the hand forward). The language-specific patterns in co-speech gesture were evident by 3–4 years (b = 1.53, 90% CI = [0.92, 2.15], BF = 49.2) and remained unchanged over developmental time, BFs = 2.93–100. We thus observed language-specific patterns in the packaging of motion components across speech and co-speech gesture starting at age 3–4 years.
Silent gesture
We next asked whether child English and Turkish speakers, when communicating without speech, displayed the same patterns that adult speakers of the two languages used in their silent gestures – that is, packaging patterns without any cross-linguistic differences (Fig. 4C). Child speakers of Turkish and English preferred conflated to separated responses in their silent gesture (English; b = −3.72, 90% CI = [−4.73, −2.8], BF > 100, Turkish; b = −2.99, 90% CI = [−3.69, −2.32], BF > 100); there was no cross-linguistic difference across groups in the strength of the preference for conflated packaging (b = 0.133, 90% CI = [−2.65, 0.53], BF = 0.028). Moreover, the preference for conflated responses emerged at an early age: 3- to 4-year-old children produced more conflated than separated responses in both the American (b = −2.79, 90% CI = [−4.30, −1.55], BF > 100) and Turkish (b = −2.97, 90% CI = [−3.71, −2.33], BF > 100) groups – a pattern that held within each age-group (all BFs < 1), and with moderate to strong evidence of no difference in the conflated packaging preference between language groups (language–packaging interaction; b = 0.67, 90% CI = [−0.137, 1.45], BF = 0.13; see Table A.1 in Appendix for means and standard errors for each packaging type by age and language in speech, co-speech gesture, and silent gesture).
Ordering semantic elements
Speech
Children learning English or Turkish differed in their ordering of semantic elements in speech (language x order interaction; b = −12.78, 90% CI = [−16.55, −9.74], BF > 100; Fig. 5A). They showed greater production of Figure–MOTION–Ground than Figure–Ground–MOTION order in English (b = 7.34, 90% CI = [5.00, 9.85], BF > 100), and greater production of (Figure)–Ground–MOTION than Figure–MOTION–Ground order in Turkish (b = −4.53, 90% CI = [−5.32, −3.78], BF > 100); parentheses around the Figure indicate that it was optional and not always produced. The language-specific ordering patterns in speech were evident at age 3–4 years (b = −11.00, 90% CI = [−15.28, −7.23], BF > 100) and remained unchanged over developmental time (all BFs > 100).
Co-speech gesture
Children speaking English or Turkish rarely concatenated gestures in sequences before age 5–6 years and, in fact, did not produce many concatenated gestures at any age, making it difficult to observe robust patterns. Indeed, the ordering of the children’s co-speech gestures from age 5–6 years onward showed no cross-linguistic differences (language–order interaction; b = −0.59, 90% CI = [−1.19, −0.01], BF = 0.14; Fig. 5B). Turkish speakers strongly preferred (Figure)–Ground–MOTION ordering in their co-speech gestures starting at age 5–6 years (b = −2.73, 90% CI = [−4.33, −1.39], BF = 67.3), language-specific pattern that was found in adult Turkish speakers. English speakers also showed an anecdotal to moderate preference for this ordering (b = −1.67, 90% CI = [−2.91, −0.48], BF = 2.3). English speakers did, however, produce proportionally more (Figure)–MOTION–Ground orders (the English-specific pattern) than Turkish speakers. We thus observed language-specific ordering patterns in speech in both Turkish- and English-speaking children, and in co-speech gesture in Turkish-speaking children throughout development.
Silent gesture
Next, we asked whether children’s gestures, when produced without speech, would show cross-linguistic similarities in the ordering of motion elements. We found that children learning Turkish or English did not show the differences we observed in children’s speech or co-speech gesture (see Fig. 5C). Instead, both language groups showed an overall preference for (Figure)–Ground–MOTION ordering in silent gesture (English; b = −2.79, 90% CI = [−4.25, −1.44], BF > 100, Turkish; b = −3.01, 90% CI = [−3.76, −2.34], BF > 100). This preference was evident in children by age 5–6 years in Turkish (b = −4.17, 90% CI = [−5.88, −2.67], BF > 100) and 7–8 years in English (b = −5.41, 90% CI = [−8.32, −3.08], BF > 100) speakers, remaining relatively stable thereafter. The children’s silent gestures did show a moderate cross-linguistic difference (language x order interaction; b = −0.53, 90% CI = [−0.78, −0.31], BF = 9.9), which stemmed from the slightly more pronounced preference for (Figure)–Ground–MOTION ordering – the typical Turkish pattern – in Turkish speakers than in English speakers (see Table A.1 in Appendix for means and standard errors for each ordering type by age and language in speech, co-speech gesture, and silent gesture).
Discussion
Adults display systematic cross-linguistic differences in speech when they package and order the semantic elements of a motion event (Özçalışkan et al., Reference Özçalışkan, Lucero and Goldin-Meadow2016a, Reference Özçalışkan, Lucero and Goldin-Meadow2016b, Reference Özçalışkan, Lucero and Goldin-Meadow2018). These cross-linguistic differences also affect the organization of semantic elements in gesture, but only when those gestures are produced with speech (co-speech gesture), not when they are produced without speech (silent gesture). More specifically, adult speakers of Turkish and English package and order semantic elements of events differently, and in accordance with the language they speak, when describing those semantic elements in co-speech gesture. However, they package and order the same semantic elements similarly when describing them in silent gesture (Özçalışkan et al., Reference Özçalışkan, Lucero and Goldin-Meadow2016b, Reference Özçalışkan, Lucero and Goldin-Meadow2018; Tütüncü et al., Reference Tütüncü, Emerson, Şengül, Kenesevic and Özçalışkan2023). Here, we found that children learning either Turkish or English display these adult patterns in co-speech gesture and silent gesture as early as ages 3–4 years.
Focusing first on co-speech gesture, we found that children learning English used more conflated gestures than children learning Turkish, who produced more separated gestures. Neither group produced many multiple-gesture combinations in their co-speech gestures, limiting our ability to draw strong conclusions about gesture ordering. But the children’s packaging patterns in co-speech gesture strongly mirrored their spoken language and were found in the youngest groups (ages 3–4), providing support for an early influence of language on thinking during the speaking act.
Turning next to silent gesture, we found that children learning either English or Turkish both conflated manner and path within a single gesture (the English pattern) and did so at age 3–4 years. Children learning either English or Turkish also followed the (Figure)–Ground–MOTION gesture order (the Turkish pattern) in their multi-gesture combinations, beginning around age 5 to 6 years. Our silent gesture results thus provide evidence for an early limit on the effect that language has on thinking, even during communication.
Note that the cross-linguistic differences we observed in speech and gesture were robust, even with a relatively modest sample. The magnitude of the Bayes factors (>100 for speech and co-speech gesture) indicates that our alternative hypotheses were at least 100 times more likely to be true than the null hypotheses. The large effect sizes, as indexed by Bayes factors, thus provide strong evidence for early emerging cross-linguistic differences in co-speech gesture, along with cross-linguistic similarities in silent gesture. We explore the implications of our developmental findings for co-speech gesture and silent gesture in the next two sections.
What co-speech gesture tells us about the effects of language on thought
Packaging manner and path
We found an early effect of language on co-speech gesture in how children package manner and path semantic elements. Children learning Turkish or English followed their respective spoken languages’ packaging strategies (manner and path separated for Turkish, conflated for English) in their gestures at age 3 to 4 years. Language can thus influence the nonverbal representation of an event during speech production at an early age, providing support for Slobin’s (Reference Slobin, Gumperz and Levinson1996) thinking-for-speaking account and its early onset. Our findings on co-speech gesture also lend support to theories of gesture–speech integration. Under Kita and Özyürek’s (Reference Kita and Özyürek2003) interface theory, gesture and speech are assumed to arise from two separate systems (an action generator for gesture, a message generator for speech). Nevertheless, the two work in tandem from conceptualization to articulation to convey intended meanings, constituting an integrated system (McNeill, Reference McNeill1992). Our finding that language-specific patterns appear in 3-year-olds’ co-speech gestures indicate that gesture–speech integration begins early in development.
Co-speech gesture not only reveals the effects of language on thought, but it can also shape those thoughts. Three- to 4-year-old English-speaking children, when taught novel verbs accompanied by iconic gestures depicting manner, generalized significantly more verbs to novel events that depicted the same or similar types of action than when taught novel verbs accompanied by gestures that did not convey manner (Aussems & Kita, Reference Aussems and Kita2021, see also Mumford & Kita, Reference Mumford and Kita2014). Observing co-speech gesture can change thought. Note, however, that the children in this earlier study were all English speakers, for whom manner constitutes a frequently expressed semantic component both in speech and in gesture. The fact that the majority of the co-speech gestures produced by the Turkish children in our study expressed only path information raises the possibility that the beneficial effects of the type of information conveyed in co-speech gesture might also vary by language – a possibility that needs to be explored in future research.
Ordering ground and motion
Our results for ordering in co-speech gesture are tentative because children rarely produced multiple semantic elements in co-speech gesture in either language. This pattern is consistent with the ‘one gesture per spoken clause’ preference observed in adult speakers (McNeill, Reference McNeill1992). The few strings children learning Turkish produced in co-speech gesture followed the ordering patterns in their speech (optional Figure–Ground–Motion), but the co-speech gestures that children learning English produced did not mirror their speech. The limited number of strings children used in co-speech gesture in either language prevents us from making broad conclusions based on these patterns.
Might the cross-linguistic differences in packaging that we observed in speech by age 3 to 4 years be evident in gesture even earlier, thus preceding and/or predicting upcoming changes in language-specific speech? Research examining children’s gesture–speech system at different language milestones (i.e., first words, first sentences, or first noun phrases) has found that children take their first step into a milestone in gesture alone or in gesture with speech, only later attaining the same milestone exclusively in speech (e.g., Cartmill et al., Reference Cartmill, Hunsicker and Goldin-Meadow2014; Iverson & Goldin-Meadow, Reference Iverson and Goldin-Meadow2005; Özçalışkan et al., Reference Özçalışkan, Adamson, Dimitrova and Baumann2017; Özçalışkan & Goldin-Meadow, Reference Özçalışkan and Goldin-Meadow2005). However, this pattern is primarily found for deictic gestures (i.e., pointing at objects, which precedes producing nouns for the same objects). Deictic gestures emerge earlier in development than iconic gestures, the type of gesture we focus on here. Iconic gestures emerge around age 3 years, long after children have begun producing their first verbs conveying motion (Özçalışkan et al., Reference Özçalışkan, Gentner and Goldin-Meadow2014; Stites & Özçalışkan, Reference Stites and Özçalışkan2017, Reference Stites and Özçalışkan2021). The relatively late onset of iconic gestures makes it less likely that precursors of language-specific patterns in speech will be found in gesture. Future studies, however, can shed further light on this question by studying younger children using nonverbal tasks other than gesture (e.g., ordering pictures that depict ground, motion, or figure) as a way to test the effect of language on the nonverbal representation of events.
What silent gesture tells us about limits on the effects of language on thought
Packaging manner and path
In contrast to the differences in packaging found between Turkish and English learners’ co-speech gestures, both Turkish- and English-speaking children display a robust preference for conflating manner and path in their silent gestures. Conflation in silent gesture appeared in each group at age 3–4 years, and the preference remained unchanged over developmental time. Interestingly, even child speakers of English (who use conflation in their co-speech gestures) increased their use of conflation in their silent gestures.
What explains the early emergence of the conflated pattern in silent gesture, particularly in Turkish children, who followed a separated pattern almost exclusively in their co-speech gestures? The verbal expression of motion in Turkish requires that path be expressed in the main clause (gir ‘enter’), accompanied by manner in a subordinate clause (koşarak ‘by running’), resulting in two clauses. The two-clause requirement in speech might create a heavier cognitive load for Turkish speakers than for English speakers, who need to produce only one clause (run to). In fact, Turkish speakers – adult and child – frequently leave out manner from their motion descriptions and only express path (Özçalışkan, Reference Özçalışkan, Guo, Lieven, Ervin-Tripp, Budwig, Nakamura and Özçalışkan2009, Reference Özçalışkan2016). Unlike speech, gesture allows expression of both manner and path at the same time in a relatively easy-to-produce form. Perhaps this is the reason that Turkish- and English-speaking children find it easy to adopt the conflated form in silent gesture and that Turkish- and English-speaking adults maintain the form in silent gesture.
The conflated form for manner and path is also found in the earliest stages of homesigns, gesture languages created by children who have no usable model for language. Homesigners are children whose hearing losses are so profound that they cannot make use of the spoken language input that surrounds them, and whose hearing parents have not exposed them to sign language. Despite this lack of linguistic input, the children create gesture systems that have many of the properties of natural language (Goldin-Meadow, Reference Goldin-Meadow, Werker and Wellman2003, Reference Goldin-Meadow2023). Homesigners in both the United States and in Turkey use the conflated form to convey manner and path. However, the homesigners in both countries also produce a form that is partially sequenced – a conflated manner+path gesture produced along with either a path gesture (e.g., CLIMB+UP – UP) or a manner gesture (CLIMB+UP – CLIMB) (Goldin-Meadow et al., Reference Goldin-Meadow, Namboodiripad, Mylander, Ozyurek and Sancar2015). This mixed form is the first step in segmenting and sequencing the path and manner semantic elements, a step that precedes the fully segmented form (CLIMB – UP) in language emergence (Senghas et al., Reference Senghas, Özyürek, Goldin-Meadow, Botha and Everaert2013).
Ordering ground and motion
Our ordering results in silent gesture echoed our packaging results. We found the same ordering preference in both Turkish- and English-speaking children when describing events using only their hands – (Figure)–Ground–Motion, the Turkish pattern. This preference emerged slightly later in English than in Turkish, possibly because English speakers had to switch their ordering preference from (Figure)–Motion–Ground (SVO) in speech to (Figure)–Ground–Motion (SOV) in silent gesture. The SOV ordering in silent gesture mirrors earlier work with adult speakers using their hands to describe motion events (Özçalışkan et al., Reference Özçalışkan, Lucero and Goldin-Meadow2018; Tütüncü et al., Reference Tütüncü, Emerson, Şengül, Kenesevic and Özçalışkan2023) and to describe events in which an animate entity acts on an inanimate entity (Goldin-Meadow et al., Reference Goldin-Meadow, So, Özyürek and Mylander2008; Hall et al., Reference Hall, Mayberry and Ferreira2013; Langus & Nespor, Reference Langus and Nespor2010; Meir et al., Reference Meir, Lifshitz, Ilkbasaran, Padden, Smith, Schouwstra, de Boer and Smith2010; Schouwstra & de Swart, Reference Schouwstra and de Swart2014).
Why do young speakers prefer to express ground before motion in their silent gestures? The three key components of a motion event include two entities – a figure and a ground – along with a motion that stipulates the relation between the two entities. When describing events in silent gesture, it might be communicatively more informative (i.e., providing the most information with the fewest tools; Grice, Reference Grice, Cole and Morgan1975) and/or cognitively less burdensome (Gentner, Reference Gentner and Kuczaj1982; Goldin-Meadow et al., Reference Goldin-Meadow, So, Özyürek and Mylander2008) to set up figure and ground as anchors before conveying the motion that relates the two – resulting in (Figure)–Ground–Motion ordering.
Is the ordering of semantic elements in silent gesture unique to gesture? In other words, is the ordering unique to communication, or does it extend to other nonverbal representations of events? Earlier work (Gershkoff-Stowe & Goldin-Meadow, Reference Gershkoff-Stowe and Goldin-Meadow2002; Goldin-Meadow et al., Reference Goldin-Meadow, So, Özyürek and Mylander2008) has found that adult native speakers of different languages order pictorial depictions of semantic elements as they would have ordered the elements in silent gesture, picking up the picture depicting the action or motion after picking up the picture of the figure and the ground. Whether this default ordering is found in other non-communicative nonverbal behaviors is an important question for future research and can be explored by examining a broader range of cognitive tasks in children learning structurally different languages.
Interestingly, the ordering found in silent gesture – ground (or patient) preceding motion (or act) – is also found in the signs of homesigners aged 3 to 5 years in the United States (Gershkoff-Stowe & Goldin-Meadow, Reference Gershkoff-Stowe and Goldin-Meadow2002; Goldin-Meadow, Reference Goldin-Meadow, Werker and Wellman2003), China (Goldin-Meadow & Mylander, Reference Goldin-Meadow and Mylander1998), and Turkey (Goldin-Meadow et al., Reference Goldin-Meadow, Namboodiripad, Mylander, Ozyurek and Sancar2015). Silent gesture thus appears to simulate the first step in creating a manual language (see Goldin-Meadow, Reference Goldin-Meadow2015, for a review that compares packaging and ordering patterns in silent gesture to conventional sign languages and homesign systems).
In sum, we have found that at an early age, children learning languages that differ in how they organize motion events display language-specific patterns in co-speech gesture, but not in silent gesture. The close alignment between speech and gesture during communication highlights the integration of the two modalities and leads to cross-linguistic differences in co-speech gesture. These cross-linguistic differences reflect an early effect of language on thought during the act of speaking, observable in co-speech gesture. But the cross-linguistic similarities that we found in silent gesture suggest the possibility of a language of gesture that is not affected by the speaker’s language. This language of gesture appears early in development in speakers of two structurally distinct languages; it is also evident in homesign systems around the globe. As such, these gestures are likely to reflect a basic cognitive structure that is recruited for communicating about events when no conventional system is available.
Data availability statement
The video records for the study consist of identifiable data; we cannot provide access to these data based on the confidentiality agreement we signed with the participants and the regulations of the institutional review board for research at our institution. However, we posted anonymized quantitative summaries of our data along with coding manuals at the link: https://osf.io/fse6r/?view_only=a90669bcf98b4dcd8e4e24b3aef1d32b. The photographs of all the three-dimensional stimulus scenes are available upon request. We do not have any computational models associated with the analysis.
Acknowledgments
This research was supported by a grant from the March of Dimes Foundation (#12-FY08-160) to Özçalışkan and Goldin-Meadow. We thank Andrea Pollard, Christianne Ramdeen, and Burcu Sancar for help with data collection, transcription, and coding.
Competing interest
The authors declare none.
Appendix
Abbreviations: F-M-G, Figure–MOTION–Ground ordering; F-G-M, Figure–Ground–MOTION ordering; M, mean; SE, standard error.