1. Introduction
When people speak, they are likely to pause, repeat, or revise their message. This temporary disruption in speech is called disfluency (Maclay & Osgood, Reference Maclay and Osgood1959). Putting their role in communication aside (Corley & Stewart, Reference Corley and Stewart2008), speech disfluencies are likely to stem from increased cognitive load and word retrieval issues (Bortfeld, Leon, Bloom, Schober & Brennan, Reference Bortfeld, Leon, Bloom, Schober and Brennan2001). People are more disfluent when they produce longer words (Oviatt, Reference Oviatt1995) or sentences (Shriberg, Reference Shriberg1996). Similarly, disfluent segments in speech are more likely to be observed in sentence beginnings as a result of high planning load (Shriberg, Reference Shriberg1996). Being disfluent is a part of a healthy speech production, yet the likelihood of being disfluent might differ across populations and individuals. Although age has been considered an important variable while studying disfluency (e.g., Arslan & Göksun, Reference Arslan and Göksun2022; Bortfeld et al., Reference Bortfeld, Leon, Bloom, Schober and Brennan2001; Cooper, Reference Cooper1990), there is relatively limited focus on bilingualism to understand disfluent speech, especially from a developmental perspective (e.g., Brundage & Rowe, Reference Brundage and Rowe2018; Dumont, Reference Dumont2010). Previous research on bilingual children's language development suggests that bilinguals may reach language competency later than monolinguals (Bialystok, Luk, Peets & Yang, Reference Bialystok, Luk, Peets and Yang2010; Oller, Pearson & Cobo-Lewis, Reference Oller, Pearson and Cobo-Lewis2007). Bilingual children are comparable with monolingual children in receptive vocabulary, yet they are more likely to experience difficulties in producing the target words (Yan & Nicoladis, Reference Yan and Nicoladis2009). Disfluent segments in speech might reflect individuals’ speech planning process and communication strategies (Fraundorf & Watson, Reference Fraundorf and Watson2014). Focusing on whether bilingual and monolingual children's disfluency rates and patterns differ might provide insight into the language development of these groups, particularly for language production.
Individuals’ speech can be accompanied by spontaneous co-speech gestures (McNeill, Reference McNeill1992). As a nonverbal form of language, gesture interacts with the speech production mechanism (Kita & Özyürek, Reference Kita and Özyürek2003). Gestures might help speech production by facilitating lexical retrieval (Krauss, Chen & Gottesman, Reference Krauss, Chen, Gottesman and McNeill2000) and conceptualizing information (Kita, Alibali & Chu, Reference Kita, Alibali and Chu2017). When individuals use gestures, they are more fluent compared to when their gesture use is experimentally restricted (Morsella & Krauss, Reference Morsella and Krauss2004; Rauscher, Krauss & Chen, Reference Rauscher, Krauss and Chen1996). The gesture-for-conceptualization hypothesis suggests that gestures have self-oriented functions. In particular, using gestures can reduce cognitive load by enabling gesturers to activate, manipulate, package, and explore information units (Kita et al., Reference Kita, Alibali and Chu2017). In general, regardless of age, bilinguals gesture more frequently in either of their languages compared to monolinguals speaking those languages (e.g., Nicoladis, Pika & Marentette, Reference Nicoladis, Pika and Marentette2009; Pika, Nicoladis & Marentette, Reference Pika, Nicoladis and Marentette2006; So, Reference So2010). Through the lens of gestures’ self-oriented functions, gesturing might enable bilingual children to deal with the cognitive load of having two separate language systems by helping them to reach the correct lexicon from the target language (Nicoladis et al., Reference Nicoladis, Pika and Marentette2009).
Considering the link between cognitive load and disfluency rates in speech, bilingual children are more disfluent than their monolingual peers due to the load of managing two vocabulary systems (Yan & Nicoladis, Reference Yan and Nicoladis2009). Yet, gestures might come into play to decrease the cognitive load in language production, particularly for bilinguals in the form of frequent gesture use. From a developmental perspective, gestures help children complement and build on their verbal messages, playing a role in language acquisition (Iverson & Goldin-Meadow, Reference Iverson and Goldin-Meadow2005). Therefore, we focus on bilingual and monolingual children to understand the interplay within and between gesture and speech modalities, which would shed light on individual and group differences in the course of language development. This study asks whether (1) disfluency and gesture rates differ across 5- and 7-year-old monolinguals and bilinguals, (2) bilinguals reveal similar disfluency and gesture rates across their two languages, and (3) children's gesture use is associated with their speech fluency in narrative production.
1.1. Speech disfluency
The main classification that is proposed by Maclay and Osgood (Reference Maclay and Osgood1959) has four disfluency types: (1) Silent pauses are temporary silent moments within or between phrases, (2) Filled pauses are breaks in one's speech, filled with nonword sounds (e.g., um), (3) Repetition means repeating sentence units, such as syllables or words, or sentences themselves (e.g., the child child woke up), (4) Repairs are revisions in speech that refer to replacing already uttered words or grammatical structures with their more plausible alternatives (e.g., the child – the dog barked). It is important to note that disfluency categories might slightly differ in the literature (e.g., Brundage & Rowe, Reference Brundage and Rowe2018; Graziano & Gullberg, Reference Graziano and Gullberg2018; for the stuttering population, see Yaruss, Reference Yaruss1997; Yaruss, Newman & Flora, Reference Yaruss, Newman and Flora1999). Yet, the classification of Maclay and Osgood (Reference Maclay and Osgood1959) encompasses most of the disfluency classifications, which are frequently used while studying typical populations (e.g., Arslan & Göksun, Reference Arslan and Göksun2022; Avcı, Arslan & Göksun, Reference Avcı, Arslan, Göksun, Culbertson, Perfors, Rabagliati and Ramenzoni2022; Fraundorf & Watson, Reference Fraundorf and Watson2014).
Disfluency from a developmental perspective
Infants start producing simple speech sounds at around 6 months and can construct one or two-word phrases by 18 months (Parish-Morris, Hirsh-Pasek & Golinkoff, Reference Parish-Morris, Hirsh-Pasek, Golinkoff and Zelazo2013). Earlier studies on disfluency target as young as 2-year-old children through interviews (Yairi, Reference Yairi1981, Reference Yairi1982), suggesting individual differences in their disfluent speech. Additionally, children's disfluency patterns slightly change as they move from age two to three. More specifically, repetitions are more likely to occur at the phrase level than at the word level while the proportion of revisions in total disfluency also increases. Similarly, DeJoy and Gregory (Reference DeJoy and Gregory1985) obtained speech samples from 3.5- and 5-year-old children using a picture story book. They found that, although the two age groups did not differ in terms of their total disfluency rates, they varied in using certain disfluency types. More specifically, 3.5-year-olds’ disfluent speech was dominated by repetitions and incomplete phrases while 5-year-old children's disfluencies occurred in the form of grammatical pauses (e.g., silent pauses). These findings are in line with children's increasing syntactic complexity due to their growing vocabulary and grammar knowledge (Haynes & Hood, Reference Haynes and Hood1977, 1978). Moreover, using filled pauses over repetitions in general conversation increases with age (Haynes & Hood, Reference Haynes and Hood1977), which can be interpreted as filled pauses being more socially acceptable than repeating words in daily communication. Thus, there is a great variability in children's disfluency patterns that are prone to reveal rapid changes year by year due to their growing vocabulary and grammar knowledge.
In sum, yearly or even monthly changes in disfluency rates and patterns until the first years of primary school might be informative in understanding language development, particularly language production (Nettelbladt & Hansson, Reference Nettelbladt and Hansson1999). Only a few and earlier studies targeted speech disfluency among typically developing nonstuttering children (e.g., DeJoy & Gregory, Reference DeJoy and Gregory1985; Haynes & Hood, Reference Haynes and Hood1977, Reference Haynes and Hood1978; Yairi, Reference Yairi1981, Reference Yairi1982). Moreover, these studies have not differentiated between monolingual and bilingual children, which might provide a greater insight into the link between speech disfluency and language development.
Speech disfluency in bilingual children
Since bilingual children simultaneously master two languages, they might have more trouble in language production compared to their monolingual peers (Bialystok, Reference Bialystok2009), which might be reflected in the form of disfluency. Bilinguals are also sensitive to communication (Gampe, Wermelinger & Daum, Reference Gampe, Wermelinger and Daum2019). Their use of disfluencies might reflect not only their language development but also communication strategies. Therefore, comparing bilingual children with their monolingual peers would give further insight into the nature of speech disfluencies. Although some studies focus on bilingualism and second language (L2) in the case of stuttering children (for a review, see Van Borsel, Maes & Foulon, Reference Van Borsel, Maes and Foulon2001), there is relatively less focus on typically developing young children's disfluency patterns with mixed findings. Some of these studies that recruit typically developing children either have a small sample size (e.g., Bedore, Fiestas, Peña & Nagy, Reference Bedore, Fiestas, Peña and Nagy2006; Byrd, Bedore & Ramos, Reference Byrd, Bedore and Ramos2015; Lee, Sim & Shin, Reference Lee, Sim and Shin2007) or a wide age range with a limited sample size (e.g., Carias & Ingram, Reference Carias and Ingram2006).
Developmental studies on the relationship between bilingualism and speech disfluency are mainly conducted with Spanish–English bilingual children. For instance, Bedore et al. (Reference Bedore, Fiestas, Peña and Nagy2006) obtained narrative retellings from L1-Spanish–L2-English bilinguals (N = 22) as well as Spanish (N = 22) and English monolinguals (N = 22) between 4 and 6 years of age. They found that, in both languages, bilinguals were more disfluent than monolinguals. Additionally, grammatical repairs were common while speaking Spanish both for bilinguals and monolinguals. However, those grammatical repairs were not common for bilinguals and monolinguals while speaking English. This finding, however, was attributed to phonological and grammatical differences between the two languages. Moreover, using a narrative task, Byrd et al. (Reference Byrd, Bedore and Ramos2015) found that L1-Spanish–L2-English bilingual children between 5 and 7 years of age were more disfluent when their disfluency rates were compared with that of monolinguals reported in the literature.
In contrast to the above findings, Brundage and Rowe (Reference Brundage and Rowe2018) found that 30-month-old L1-Spanish–L2-English bilinguals (N = 53) revealed lower disfluency rates compared to monolingual disfluency rates reported in the literature. This study, however, did not have an age-matched monolingual control group. Moreover, those monolingual disfluency rates obtained from the literature as a comparison were not representative for 30-month-olds. The authors also suggested that this finding might be the result of using a spontaneous speech task instead of a narrative task, which might be cognitively more demanding (see Bedore et al., Reference Bedore, Fiestas, Peña and Nagy2006; Byrd et al., Reference Byrd, Bedore and Ramos2015).
Furthermore, Fiestas, Bedore, Peña, Nagy, Cohen, and McAlister (Reference Fiestas, Bedore, Peña, Nagy, Cohen and McAlister2005) obtained narrative retellings from L1-Spanish–L2-English bilingual as well as Spanish monolingual and English monolingual children between 4 and 7 years of age. They found that there was a direction that bilinguals were slightly more disfluent. Compared to monolingual children, the number of repetitions was higher in the bilingual group. The study also asked whether bilinguals revealed different disfluency rates and patterns in their L1 and L2. Results showed that the proportion of disfluent segments in children's speech was comparable across their two languages. However, in L1-Spanish, bilingual children commonly used grammatical repairs, indicating a high awareness of grammatical rules as a result of being more frequently exposed to Spanish. When they retold the story using L2-English, in which they were less proficient, disfluencies mostly occurred in the form of word-finding failures. These findings suggest that language proficiency might be crucial in understanding disfluency in a bilingual context. Children's knowledge in grammar and vocabulary might be linked to disfluency rates and patterns.
In sum, there are limited studies in the literature that target bilingualism in relation to disfluency in children. Not only disfluency rates but also disfluency patterns might differ between monolingual and bilingual children. However, speech disfluency alone might not be enough to portray the difference between the two groups in using language. Language is multimodal: children also produce gestures while speaking. Considering gestures’ self-oriented functions (Kita et al., Reference Kita, Alibali and Chu2017), examining gesture production among bilingual and monolingual children might shed light on gestures’ role in decreasing cognitive load and enhancing speech fluency.
1.2. Gesture
McNeill (Reference McNeill1992) classified gestures into five categories as iconic, metaphoric, deictic, beat, and emblem gestures. Iconic (e.g., marking a space in the air with two hands to refer to the size of an object) and metaphoric gestures (e.g., moving one hand forward while referring to the future) depict shapes and relations in space regarding concrete and abstract concepts, respectively. A deictic gesture is pointing at something with fingers or hands. Beat gestures, on the other hand, are rhythmic hand movements that lack propositional content. Last, emblems (e.g., waving hands to say bye) are gestures that are understood without using their lexical affiliates. The current gesture frameworks mainly focus on iconic and metaphoric gestures by using the umbrella term representational gesture (Hostetter & Alibali, Reference Hostetter and Alibali2019; Kita et al., Reference Kita, Alibali and Chu2017; Krauss et al., Reference Krauss, Chen, Gottesman and McNeill2000).
Gesture production from a developmental perspective
Children start producing gestures even before they utter their first words (Iverson & Goldin-Meadow, Reference Iverson and Goldin-Meadow2005). These gestures are generally simple in form such as pointing at objects (Bates, Benigni, Bretherton, Camaioni & Volterra, Reference Bates, Benigni, Bretherton, Camaioni and Volterra1979) and help children produce sentence level messages later when they are combined with words (e.g., Iverson & Goldin-Meadow, Reference Iverson and Goldin-Meadow2005; Özçalışkan & Goldin-Meadow, Reference Özçalışkan and Goldin-Meadow2005). Gesture-word combinations are likely to predict the onset of children's two-word combinations (e.g., Iverson, Capirci, Volterra & Goldin-Meadow, Reference Iverson, Capirci, Volterra and Goldin-Meadow2008; Iverson & Goldin-Meadow, Reference Iverson and Goldin-Meadow2005; Özçalışkan & Goldin-Meadow, Reference Özçalışkan and Goldin-Meadow2005). Therefore, one could consider early pointing gestures as being a precursor to children's language development, although these gestures might reflect weak language proficiency by the time they are produced (Iverson & Goldin-Meadow, Reference Iverson and Goldin-Meadow2005). On the other hand, iconic and beat gestures are relatively more complex compared to deictics, and they appear mostly when children become more proficient in language (Goldin-Meadow, Reference Goldin-Meadow1998; Nicoladis, Mayberry & Genesee, Reference Nicoladis, Mayberry and Genesee1999). As children start constructing more complex ideas, deictic gestures mostly get replaced by iconic gestures (Mayberry & Nicoladis, Reference Mayberry and Nicoladis2000; Nicoladis et al., Reference Nicoladis, Mayberry and Genesee1999) that are intended to convey complex messages (McNeill, Reference McNeill1992). Similarly, as beats are associated with temporal word relations and stress patterns in speech (McClave, Reference McClave1994), at least sentence level information and multimorphemic utterances should be at stake for the production of beat gestures (Nicoladis et al., Reference Nicoladis, Mayberry and Genesee1999). These findings together suggest that gesture development is closely linked with language development (Bates et al., Reference Bates, Benigni, Bretherton, Camaioni and Volterra1979; Goldin-Meadow, Reference Goldin-Meadow1998; Mayberry & Nicoladis, Reference Mayberry and Nicoladis2000).
Gesture production in bilingual children
Bilinguals gesture more frequently in both of their languages compared to monolinguals speaking those languages (e.g., Nicoladis et al., Reference Nicoladis, Pika and Marentette2009; Pika et al., Reference Pika, Nicoladis and Marentette2006). Such a pattern might stem from bilinguals’ controlling of two language systems. Gesture and speech are closely associated mechanisms, and gestures comply with the verbal packaging system in speech (Kita & Özyürek, Reference Kita and Özyürek2003). Considering that bilinguals select among competing representations as a result of having two language systems, and thus, two verbal packaging styles, gestures might facilitate the process by helping individuals to opt for the most appropriate packaging method (Nicoladis, Reference Nicoladis2007; Nicoladis et al., Reference Nicoladis, Pika and Marentette2009). This situation might reflect gestures’ self-oriented functions as suggested by the gesture-for-conceptualization hypothesis (Kita et al., Reference Kita, Alibali and Chu2017). Additionally, bilinguals’ frequent use of gestures might also stem from speaking a high gesture frequency language (So, Reference So2010). That is, gesture frequency observed in a high gesture frequency language (e.g., Spanish, French) might be transferred to the low gesture frequency language (e.g., English), called gesture transfer (Pika et al., Reference Pika, Nicoladis and Marentette2006).
Findings from developmental research are in line with the former argument. For instance, Nicoladis et al. (Reference Nicoladis, Pika and Marentette2009) compared French–English bilingual preschoolers (between age 4 and 6) with their French and English monolingual counterparts. Narrative retellings of both language groups indicated that bilingual children gestured more frequently than monolingual children both in French and English. However, French and English monolinguals gestured at comparable rates, suggesting that frequent gesture production might be a result of bilinguals’ challenge to choose between the competing vocabulary systems.
Understanding the use of specific gesture types might provide further insight into bilingual children's language development. Research has suggested that the use of iconic and beat gestures goes hand in hand with language development while deictics or emblems are considered prelinguistic gestures, appearing quite early in language development (Iverson & Goldin-Meadow, Reference Iverson and Goldin-Meadow2005; Nicoladis et al., Reference Nicoladis, Mayberry and Genesee1999). Research on bilinguals’ gesture production has indicated similar findings. In either of their languages, bilingual children start using iconic and beat gestures by the time they can construct sentence-like utterances (Mayberry & Nicoladis, Reference Mayberry and Nicoladis2000; Nicoladis et al., Reference Nicoladis, Mayberry and Genesee1999). In line with these findings, studies have indicated that bilinguals use fewer iconic gestures in the language they are far less proficient in (Gullberg, Reference Gullberg1998). Therefore, one could suggest that iconic gestures do not compensate for weak language skills (Gullberg, Reference Gullberg, Bhatia and Ritchie2013; Nicoladis, Reference Nicoladis2007).
Considering that the language one speaks influences the way gestures are produced (Kita & Özyürek, Reference Kita and Özyürek2003), it is not surprising that the development of iconic gestures is closely linked to language development. From such a perspective, iconic gestures can be interpreted as an advanced gesture form – its final shape and referent being determined by linguistic factors. Therefore, it would be plausible to expect more iconic gestures where one's language skills are advanced (e.g., the dominant language of bilinguals, see Nicoladis, Reference Nicoladis2002). On the other hand, bilinguals produce more deictic gestures in their weak language compared to monolinguals of that language (Azar, Backus & Özyürek, Reference Azar, Backus and Özyürek2020; Gullberg, Reference Gullberg, Bhatia and Ritchie2013). The critical point here is that most of those studies were carried out by using story retelling tasks. In such narrative contexts, deictic gestures would be abstract deictics as they do not directly point at concrete objects (Nicoladis, Reference Nicoladis2007). Azar et al. (Reference Azar, Backus and Özyürek2020) found that Turkish–Dutch bilinguals used more deictic gestures in either of their languages compared to monolinguals, suggesting that those (abstract) deictics might be a part of the bilingual strategy that uses space to organize notions and helps speakers to deal with cognitive load.
Gesture and speech disfluency
Using gestures might decrease cognitive load by enabling speakers to organize information units (Kita et al., Reference Kita, Alibali and Chu2017). People are more likely to gesture when they engage with cognitively demanding tasks (Kita & Davies, Reference Kita and Davies2009; Melinger & Kita, Reference Melinger and Kita2007; Morsella & Krauss, Reference Morsella and Krauss2004) that are particularly spatial in nature (Alibali, Reference Alibali2005; Arslan & Göksun, Reference Arslan and Göksun2021). Moreover, individuals are more fluent when they spontaneously gesture compared to when their hand use is experimentally restricted (Morsella & Krauss, Reference Morsella and Krauss2004; Rauscher et al., Reference Rauscher, Krauss and Chen1996). Thus, gesturing might pave the way for a more fluent speech by decreasing the cognitive load of the speech production process. Therefore, people can frequently produce gestures when speech is difficult.
Individuals might differ in terms of benefiting from using gestures to facilitate cognitive processes, including speech production. Since disfluent segments in one's speech might reflect individuals’ speech planning process (Fraundorf & Watson, Reference Fraundorf and Watson2014), studying disfluencies of different groups in a multimodal context might provide insight into how cognitive load and communicative strategies interact (Arslan & Göksun, Reference Arslan and Göksun2022). In the case of bilingual children, the frequent use of gestures might reduce the cognitive load of having two separate language systems by helping them to reach the correct lexicon from the target language (Nicoladis et al., Reference Nicoladis, Pika and Marentette2009). Observing the relationship between gesture production and speech fluency in bilingual and monolingual children would give insight into the facilitative roles of gestures in the speech production process.
1.3. Present Study
The aim of this study is to examine speech disfluency and gesture production of 5- and 7-year-old typically developing Turkish monolingual and Turkish–English bilingual children. We examine monolingual's Turkish language samples and bilinguals’ Turkish and English language samples. Focusing on two typologically different languages, English and Turkish, would bring diversity to the existing literature that is mainly built on the disfluency rates and patterns of Spanish–English bilingual children. Moreover, gesturing frequency and the use of specific gesture types revealed by bilingual and monolingual children might inform the literature about the language development of these groups. The association of speech disfluency with gesture in these groups might shed light on gestures’ facilitative roles in speech.
We hypothesize that, compared to Turkish monolingual children, bilinguals would be more disfluent in L1-Turkish, as the challenge of being bilingual is to master and control two languages instead of one. To eliminate a possible bias, we aim to control children's syntactic (grammatical) complexity scores. It is because compared to simple sentences, producing complex sentences are cognitively more demanding, resulting in more errors and higher disfluency rates in children's speech (Gordon, Luper & Peterson, Reference Gordon, Luper and Peterson1986; Haynes & Hood, Reference Haynes and Hood1978; Ratner & Sih, Reference Ratner and Sih1987; Yaruss et al., Reference Yaruss, Newman and Flora1999). Considering that the use of pauses is linked to speech planning (Fraundorf & Watson, Reference Fraundorf and Watson2014), bilingual children might be more likely to use silent pauses and filled pauses to make sure that they stick to the vocabulary of the target language. The use of repetitions and repairs, however, might be comparable between bilingual and monolingual groups.
In line with the previous literature (Nicoladis et al., Reference Nicoladis, Pika and Marentette2009; Pika et al., Reference Pika, Nicoladis and Marentette2006), we hypothesize that bilingual children would gesture more frequently than their monolingual peers. Similarly, since bilinguals’ challenge is to master two languages instead of one, they might slightly lag behind their monolingual peers in language development (Bialystok, Reference Bialystok2009) and iconic gestures are linked to spoken language skills (Nicoladis et al., Reference Nicoladis, Mayberry and Genesee1999). Therefore, we expect monolingual Turkish children to use more iconic gestures than the same age bilinguals while speaking Turkish. Conversely, compared to Turkish monolinguals, bilingual children might be more likely to produce deictic gestures in Turkish, which might be a part of their bilingual strategy (Azar et al., Reference Azar, Backus and Özyürek2020). Last, we expect Turkish–English bilingual children of both age groups to reveal similar overall gesture frequency across their two languages as they are proficient in both. We also explore whether bilingual children differ in terms of the use of specific gesture types in L1-Turkish and L2-English.
Moreover, we hypothesize that 5- and 7-year-old children reveal comparable disfluency rates in L1-Turkish and in L2-English. However, we expect disfluency types to differ between 5- and 7- year-old children due to increasing language competency and starting primary school. Compared to 7-year-olds, we expect 5-year-old children to produce more repetitions, which are developmentally less advanced than other disfluency forms (e.g., filled pause, repairs) (Bortfeld et al., Reference Bortfeld, Leon, Bloom, Schober and Brennan2001). In contrast, 7-year-old children would produce more filled pauses than 5-year-olds, reflecting their planning processes (Fraundorf & Watson, Reference Fraundorf and Watson2014) to express more complex ideas boosted by primary school education. As iconic and beat gestures are closely associated with language development (Nicoladis et al., Reference Nicoladis, Mayberry and Genesee1999), we expect 7-year-old children to produce more iconic and beat gestures compared to 5-year-olds, regardless of being bilingual or monolingual.
Last, as gestures help one to think and speak (Kita et al., Reference Kita, Alibali and Chu2017), we expect that gesture frequency in Turkish, particularly iconic gesture use, would be negatively associated with speech disfluency rates, regardless of age group (5- and 7-year-old) and language group (bilingual and monolingual) by also controlling for participants’ syntactic complexity and language proficiency scores in Turkish. Similarly, regardless of age, we expect bilinguals’ iconic gesture frequency in L2-English to negatively predict their speech disfluency rate in this language, controlling for their syntactic complexity and L2-English proficiency.
2. Methods
2.1. Participants
Fifty-one Turkish–English bilingual (25 females) and 61 Turkish monolingual (31 females) children participated in the study as a part of a larger project that investigated the effects of early and intense exposure of L2-English on L1-Turkish narrative skills and motion event conceptualization (see Aktan-Erciyes, Reference Aktan-Erciyes2019). For the original study the data came from, G*Power tool (Version 3.0) (Faul, Erdfelder, Lang & Buchner, Reference Faul, Erdfelder, Lang and Buchner2007) was used to estimate sample size with .80 power and .30 effect size. The calculated sample size was 118 and data was collected from 112 children. We also performed post-hoc sensitivity analyses for the present study again using the G*Power software package (Faul et al., Reference Faul, Erdfelder, Lang and Buchner2007) to determine the effect size. The sample size of the present study 112 was used in the sensitivity analysis for the regression analyses. The alpha level set to .05 and power was taken as .80. The sensitivity analysis revealed an effect size of 0.13, which around the same level we obtained in our regression analyses (R2=0.12)
We recruited 5-year-old bilinguals (M = 68.95 months, SD = 4.02, N = 26) and monolinguals (M = 69.24 months, SD = 3.40, N = 28) as well as 7-year-old bilinguals (M = 92.37 months, SD = 3.50, N = 25) and monolinguals (M = 89.98 months, SD = 4.62, N = 33). Bilingual children's L1 was Turkish and L2 was English, who were first exposed to English around age three or earlier. The motivation of recruiting 5-year-olds was that the bilinguals of this age had been exposed to early intense exposure to L2-English before starting L1-Turkish dominant primary school. The reason behind recruiting 7-year-olds was that the bilinguals of this age had been exposed to full-time L2-English instruction at the preschool and then started L1-Turkish dominant primary school at the age of 6. As a result, the study could also investigate the effects of this shift.
Bilingual participants were recruited from eligible private schools for intense L2 exposure, and monolingual participants were recruited from SES comparable private schools. Our bilingual participants were enrolled in a preschool in which native English speaker teachers and staff communicated with children only in English (8 hours/day). All bilingual children were exposed to American English for instruction. By the time we tested 5-year-old bilinguals, they were already exposed to English in this preschool for three years. On the other hand, the 7-year-old bilingual group included children who had completed three years of education in the same English immersion preschool and continued their primary school education in Turkish for two years. By the time we tested 7-year-old bilinguals, they were about to finish the second grade. Although 7-year-old bilinguals started the L1-Turkish dominant curriculum at the age of 6, they still received English instructions for 10 hours per week and gained L2-English literacy, which further supported their L2-English.
We asked parents to fill a demographic form to better understand children's language background. For both age groups, parents reported that the use of L2 was not limited to the school context, they were also exposed to L2 at home from time to time. Children were not exposed to another language other than Turkish and English. In the demographic form, parents were also asked to indicate their income level (in monetary terms) in a 5-point Likert scale and give information about their education level on an ordinal scale (1 to 6; from primary school to Ph.D. degree). Both bilingual and monolingual children were raised by Turkish native speaker parents who identified themselves as upper-middle to high socioeconomic status. The mothers of bilingual and monolingual children did not differ in terms of their education levels, t(110)=−1.64, p = .103, as most of the mothers from the two groups had completed at least an undergraduate degree. It is important to note that mothers were the only ones who responded to demographic forms and that's why only maternal education level was taken into account. Yet, research suggests a high correlation between maternal and paternal education (Jeong, Kim & Subramanian, Reference Jeong, Kim and Subramanian2018).
The project was reviewed and approved by Boğaziçi University Ethics Committee Board with acceptance number 2016/22. Informed consents were obtained from the parents prior to data collection. Parents were given their children's vocabulary scores and children were given stickers as compensation.
2.2. Materials
The Frog Story (Mayer, Reference Mayer1969), a wordless book depicting the story through a series of pictures, was used to elicit narrative retellings from the participants. A video camera was used to record the sessions. We used ELAN software (Version 6.2) (Lausberg & Sloetjes, Reference Lausberg and Sloetjes2009) to transcribe speech and code speech disfluency and gesture.
To measure vocabulary knowledge in Turkish, we used the Turkish Expressive and Receptive Language Test (TIFALDI)–Receptive subtest (Berument & Güven, Reference Berument and Güven2010). As a norm referenced test, TIFALDI includes 159 items and targets children between 2 and 12 years old. In each trial, a participant is presented with a word along with four pictures and asked to indicate the picture that best corresponds to the given word. A total score is calculated based on the number of correct answers.
To examine vocabulary knowledge in English, the Peabody Picture Vocabulary Test (PPVT-4) (Dunn & Dunn, Reference Dunn and Dunn2015) was used. As a norm referenced test, PPVT 4 includes 228 items and aims to measure individuals’ receptive vocabulary in English. In each trial, a participant is presented with a word along with four pictures and asked to indicate the picture that best corresponds to the given word. As in TIFALDI, a total score is calculated based on the number of correct answers.
2.3. Procedure
Each child participated in a session where they were alone with the experimenter in a silent room of their school. First, children were presented with The Frog Story book. They were asked to examine the pictures while the experimenter was turning the pages one by one. While turning the pages, the experimenter did not comment or tell the story. After children examined all pictures, the experimenter asked them to retell the story by looking at the pages. The experimenter sat in front of the participants and turned the pages one by one while children were retelling the story. For bilinguals, another session with the same task was carried out in English. The English session took place one week after the Turkish session, and it was carried out by a different experimenter who ran all sessions in English. The same standardized instructions were used both in Turkish and English sessions to prevent possible confounding factors. All sessions were videotaped for further transcription and coding. Additionally, all children completed the TIFALDI receptive vocabulary subtest by the end of the first session, which was held in Turkish. Bilingual children were also asked to complete PPVT-4 by the end of the English session.
2.4. Coding
Speech and disfluency
Speech samples obtained from children were transcribed including all disfluent segments. The Turkish sessions were transcribed by Turkish native speakers while the English sessions were transcribed by a native English speaker. Four types of speech disfluency as indicated by the classification of Maclay and Osgood (Reference Maclay and Osgood1959) were targeted. Silent pauses (where children were silent for more than 200 milliseconds within their sentences), filled pauses (e.g., Eee kurbağa kaçtı “Um the frog escaped.”), repetitions (e.g., Köpek köpek çocuğu takip etti “The dog dog followed the boy.”), and repairs (e.g., Köpek kurbağa- arılardan kaçıyordu “The dog was running away from the frog- the bees.”) were coded. For the Turkish sessions, a trained assistant coded all participants’ disfluencies. For reliability, another trained assistant coded 38% of participants. The interrater agreement was high between the two coders (κ=.92, p < .001). Similarly, for the English sessions, a trained research assistant coded all participants’ disfluencies while a second assistant coded 33% of participants for reliability, yielding a high interrater agreement (κ=.90, p < .001). For each participant, we counted the total number of words produced in each session. All disfluency rates were calculated per word.
Gesture
In line with the classification of McNeill (Reference McNeill1992), iconic, metaphoric, deictic, beat and emblem gestures were coded. As children did not produce metaphoric gestures at all, we mainly focused on iconic gestures as representational gestures. For the Turkish sessions, a trained assistant coded all participants’ gestures. For reliability, a second assistant coded 38% of participants. The interrater agreement was found to be high between the two coders (κ=.88, p < .001). Similarly, for the English sessions, a trained research assistant coded all participants’ gestures while another trained assistant coded 33% of participants for reliability, yielding a high interrater agreement (κ=.87, p < .001). All gesture frequencies were calculated per word.
Syntactic complexity
For each participant, we transcribed Turkish and English speech samples in Microsoft Excel files. We used the coding schema of Berman and Slobin (Reference Berman and Slobin1994) by parsing discourse into verbed clauses, “…expressing a single situation (activity, event, or state)” (Berman & Slobin, Reference Berman and Slobin1994, p.660) and placing them sequentially per line. A clause consists of at least one predicate. If a clause included a single predicate, it was coded as a simple clause (e.g., Bir çocuk vardı. “There was a boy.”). If, under a single clause, two or more predicates were linked with conjunctions (e.g., and, or, but), adverbials (e.g., while, when), relative clauses, reported speech, or if-then statements, it was considered a complex clause (for a detailed coding scheme for both languages, see Aktan-Erciyes, Reference Aktan-Erciyes2019; Kizildere, Aktan-Erciyes, Tahiroğlu & Göksun, Reference Kizildere, Aktan-Erciyes, Tahiroğlu and Göksun2020). For each participant, the total number of complex clauses was divided by the total number of clauses to calculate the syntactic complexity score, separately for Turkish and English narrative samples. The coding was done by two trained assistants. We found a high interrater reliability between the coders (κ=.92, p < .001).
3. Results
3.1. Analysis plan
We will present two sets of results – namely, preliminary results and main results. The preliminary results section includes analyses that do not specifically target our hypotheses, but provide initial information with regard to the sample characteristics and participants’ overall performance in the given tasks (Table 1). The main results section, however, presents analyses, which are directly motivated to test our hypotheses.
Note. All disfluency and gesture rates/frequencies were calculated per word. For gesture frequencies, we reported three decimals as there was a limited number of gestures per word. Syntactic complexity was calculated as proportions by dividing the total number of complex clauses by the total number of clauses.
3.2. Preliminary results
TIFALDI and PPVT-4 scores
We carried out a two-way ANOVA to examine the effect of language group (bilingual or monolingual) and age group (5- or 7-year-old) on receptive vocabulary scores measured by TIFALDI (Berument & Güven, Reference Berument and Güven2010). Results indicated a main effect of age group, F(1,108) = 91.10, p < .001, ηp2=.458. The receptive vocabulary scores were higher in 7-year-old than 5-year-old children (Table 1). However, the main effect of language group, F(1,108) = 0.41, p = .525, ηp2=.004, and the interaction between age group and language group, F(1,108) = 0.10, p = .747, ηp2=.001, was not significant.
We also conducted an independent samples t-test to examine whether 5- and 7-year old bilingual children differed in terms of the English vocabulary scores measured by PPVT-4 (Dunn & Dunn, Reference Dunn and Dunn2015). Results suggested that 7-year-old children were more advanced in English vocabulary knowledge than 5-year-old children, t(47)=−4.41, p < .001, d = 1.26 (Table 1). Moreover, as a norm-referenced test, the PPVT-4 results indicated that 5-year-old bilingual children's vocabulary performance corresponded to 4;2 age level for monolingual English speakers. Similarly, the 7-year-old bilingual children's English vocabulary skills were found to be around the 6;1 age level for monolingual English speaker children.
Syntactic complexity
We conducted a two-way ANOVA to examine the effect of language group (bilingual or monolingual) and age group (5- or 7-year-old) on syntactic complexity scores. We found that the main effect of age group, F(1,107) = 8.06, p = .005, ηp2=.070, was significant. The syntactic complexity scores were higher in the 7-year-old than the 5-year-old group. Similarly, we found a significant main effect of the language group, F(1,107) = 17.36, p < .001, ηp2=.140. In other words, monolinguals’ speech was syntactically more complex than bilinguals’ speech (Table 1). However, results did not yield a significant interaction between these variables, F(1,107) = 2.56, p = .113, ηp2=.023.
We also carried out a paired samples t-test to examine whether bilinguals’ English and Turkish narratives were comparable in terms of syntactic complexity. We found that bilingual children produced syntactically more complex speech in Turkish (M = 0.40, SD = 0.20) than English (M = 0.24, SD = 0.18), t(45) = 5.00, p < .001, d = 0.73.
3.3. Speech disfluency
Monolingual-bilingual comparison
We carried out a two-way ANCOVA to examine the effect of language group (bilingual or monolingual) and age group (5- or 7-year-old) on children's overall disfluency rates in Turkish, controlling for the (Turkish) syntactic complexity scores. The main effect of the language group was significant, F(1,106) = 12.42, p < .001, ηp2=.105. Bilingual children were more disfluent than monolingual children in Turkish (Table 1). The main effect of age group, F(1,106) = 0.15, p = .702, ηp2=.001, and the interaction between age group and language group were not significant, F(1,106) = 0.23, p = .634, ηp2=.002. The syntactic complexity score was not a significant covariate, F(1,106) = 0.70, p = .792, ηp2=.001.
We then repeated the same two-way ANCOVA for the specific disfluency categories. For the silent pause rate, we found a significant main effect of the language group, F(1,106) = 12.03, p < .001, ηp2=.102. Overall, bilingual children used more silent pauses than monolingual children (Table 1). Similarly, the main effect of the age group was significant, F(1,106) = 25.25, p < .001, ηp2=.192, suggesting that the 5-year old group produced more silent pauses than the 7-year-old group. However, the interaction between these variables was not significant, F(1,106) = 0.04, p = .847, ηp2=.000. The syntactic complexity score was not a significant covariate, either, F(1,106) = 0.14, p = .712, ηp2=.001.
For the filled pause rate in children's Turkish narrations, we again found a main effect of language group, F(1,106) = 11.89, p < .001, ηp2=.101, suggesting that bilingual children used more filled pauses than monolingual children (Table 1). Similarly, the main effect of age was significant, F(1,106) = 24.12, p < .001, ηp2=.185, suggesting that 7-year old children produced more filled pauses than 5-year-old children. However, the interaction between language group and age group was not significant, F(1,106) = 0.93, p = .337, ηp2=.009. The syntactic complexity score was not a significant covariate, either, F(1,106) = 0.23, p = .637, ηp2=.002.
For the repetition rate in Turkish narrations, we found that the main effect of language group, F(1,106) = 2.38, p = .126, ηp2=.022, the main effect of age group, F(1,106) = 0.01, p = .908, ηp2=.000, and interaction between these variables, F(1,106) = 3.44, p = .067, ηp2=.031, were not significant. The syntactic complexity score was not a significant covariate, F(1,106) = 0.06, p = .800, ηp2=.001. For the repair rate, results suggested that the main effect of language group, F(1,106) = 0.33, p = .564, ηp2=.003, the main effect of age group, F(1,106) = 3.56, p = .062, ηp2=.032, and the interaction between these variables were not significant, F(1,106) = 3.75, p = .056, ηp2=.034. The synactic complexity score was not a significant covariate, F(1,106) = 1.55, p = .216, ηp2=.014.
Bilinguals’ L1 vs. L2
To examine whether overall disfluency differed between bilinguals’ two languages (Turkish and English) and across age groups (5- and 7-year-old), we conducted a repeated measures ANOVA. Results showed that the main effect of language, F(1,45) = 1.01, p = .319, ηp2=.022, the main effect of age group, F(1,45) = 0.01, p = .915, ηp2=.000, and the interaction between these variables, F(1,45) = 0.69, p = .412, ηp2=.015, were nonsignificant.
We then repeated the same repeated measures ANOVA for the use of specific disfluency categories. The dependent variables were silent pause rate, filled pause rate, repetition rate, and repair rate. For the silent pause rate, we found a significant main effect of language, F(1,45) = 4.06, p = .050, ηp2=.083, suggesting that, regardless of age, bilingual children produced more silent pauses in English (M = 0.16, SD = 0.07) than Turkish (M = 0.14, SD = 0.05). The main effect of age group, F(1,45) = 6.82, p = .012, ηp2=.132, and interaction between age group and language were not significant, F(1,45) = 0.33, p = .567, ηp2=.007. For the filled pause rate, results yielded a significant main effect of age group, F(1,46) = 13.26, p < .001, ηp2=.224. Regardless of language, 7-year-old bilinguals produced more filled pauses than 5-year-old bilinguals (Table 1). The main effect of language, F(1,46) = 2.74, p = .104, ηp2=.056, and the interaction between age group and language, F(1,46) = 0.19, p = .665, ηp2=.004, were not significant. For the repetition rate, results suggested a significant main effect of language, F(1,46) = 7.15, p = .010, ηp2=.134. Regardless of age, bilingual children produced more repetitions in English (M = 0.03, SD = 0.02) than Turkish (M = 0.02, SD = 0.02). The main effect of age group, F(1,46) = 0.84, p = .366, ηp2=.018, and the interaction between age group and language, F(1,46) = 0.44, p = .513, ηp2=.009, were not significant. Last, for the repair use, results showed that the main effect of language, F(1,46) = 1.39, p = .245, ηp2=.029, the main effect of age group, F(1,46) = 2.98, p = .091, ηp2=.061, and the interaction between these variables, F(1,46) = 3.68, p = .061, ηp2=.074, were all nonsignificant.
3.4. Gesture
Monolingual-bilingual comparison
We carried out a two-way ANOVA to examine the effect of language group (bilingual or monolingual) and age group (5- or 7-year-old) on children's overall gesture frequency in Turkish. We found that the main effect of language group, F(1,108) = 0.04, p = .841, ηp2=.000, and the main effect of age group, F(1,108) = 0.12, p = .732, ηp2=.001, were not significant. Similarly, the interaction between these variables was nonsignificant, F(1,108) = 0.82, p = .369, ηp2=.007.
We then repeated the same two-way ANOVA for the specific gesture categories. For the iconic gesture frequency, we found a significant main effect of language group, F(1,108) = 10.64, p = .001, ηp2=.090. Overall, monolingual children used more iconic gestures than bilingual children (Table 1). However, the main effect of age, F(1,108) = 2.90, p = .091, ηp2=.026, and the interaction between language group and age group, were not significant, F(1,108) = 1.50, p = .223, ηp2=.014. For the deictic gesture frequency, results demonstrated that the main effect of language group, F(1,108) = 0.82, p = .367, ηp2=.008, and the main effect of age group, F(1,108) = 0.02, p = .967, ηp2=.000, were not significant. Similarly, the interaction between these variables was nonsignificant, F(1,108) = 0.40, p = .528, ηp2=.004. Last, for the beat gesture frequency, the main effect of language group, F(1,108) = 1.26, p = .265, ηp2= .011, the main effect of age group, F(1,108) = 3.09, p = .082, ηp2=.028, and the interaction between these variables, F(1,108) = 0.33, p = .569, ηp2=.003, were not significant.
Bilinguals' L1 vs. L2
To examine whether overall gesture frequency differed between bilinguals’ two languages (Turkish and English) and across age groups (5- and 7-year-old), we carried out a repeated measures ANOVA. Results indicated that the main effect of age, F(1,46) = 0.52, p = .476, ηp2=.011, and the main effect of language, was not significant, F(1,46) = 1.76, p = .191, ηp2=.037. We also did not find an interaction between these two variables, F(1,46) = 0.04, p = .848, ηp2=.001.
We then repeated the same repeated measures ANOVA for the use of specific gesture categories. The dependent variables were iconic gesture frequency, deictic gesture frequency, and the beat gesture frequency. For the iconic gesture frequency, we found a significant main effect of language, F(1,46) = 7.17, p = .010, ηp2=.135, suggesting that, although few in number, bilingual children produced more iconic gestures in English (M = 0.01, SD = 0.01) than Turkish (M < 0.001, SD = 0.01). The main effect of age group, F(1,46) = 0.72, p = .400, ηp2=.015, and interaction between age and language were not significant, F(1,46) = 3.00, p = .090, ηp2=.061. For the deictic gesture frequency, the main effect of language, F(1,46) = 3.46, p = .069, ηp2=.070, the main effect of age group, F(1,46) = 0.34, p = .564, ηp2=.007, and the interaction between these variables, F(1,46) = 0.17, p = .685, ηp2=.004, were not significant. Similarly, for the beat gesture frequency, the main effect of language, F(1,46) = 0.03, p = .866, ηp2=.001, the main effect of age group, F(1,46) = 0.16, p = .688, ηp2=.004, and the interaction between these variables, F(1,46) = 0.02, p = .899, ηp2=.000, were nonsignificant.
3.5. Gesture and speech disfluency
Turkish narrative samples
We carried out two separate linear regression analyses to predict the total disfluency rate in children's Turkish narratives. For the first regression analysis, the predictor variables were iconic gesture frequency, language group (monolingual or bilingual), age group (5 or 7 years), TIFALDI score, and syntactic complexity score (in Turkish) (Model 1). The dependent variable was the total disfluency rate in children's Turkish narratives. Results yielded a significant regression equation explaining 10% of the total variance (R2=.10, F(5,105) = 3.45, p = .006). Only language group was a significant predictor of disfluency rate (Table 2). That is, being bilingual was associated with higher disfluency rates.
Note. *p < .05, **p < .01, N = 112.
We then repeated the same linear regression analysis by replacing the iconic gesture frequency with the overall gesture frequency. The dependent variable was again the total disfluency rate in children's Turkish narratives. The predictor variables were overall gesture frequency, language group (monolingual or bilingual), age group (5 or 7 years), TIFALDI score, and syntactic complexity score (in Turkish) (Model 2). The regression equation was significant explaining 12% of the variance (R2=.12, F(5,105) = 3.88, p = .003). This time, not only language group but also gesture frequency significantly predicted the total disfluency rate in Turkish. Being bilingual was associated with a higher disfluency rate. Moreover, the frequency of using gestures negatively predicted the total disfluency rate; as the children produced more gestures, they were less disfluent. Neither age group, TIFALDI score, or syntactic complexity score significantly contributed to the model (Table 2).
English narrative samples
We carried out two separate linear regression analyses to predict the total disfluency rate in bilingual children's English narratives. For the first regression analysis, the predictor variables were iconic gesture frequency, age group (5 or 7 years), PPVT score, and syntactic complexity score (in English) (Model 3). The dependent variable was the total disfluency rate in children's English narratives. The regression equation was not significant, F(4,40) = 1.26, p = .303, with an R2 of .023, and none of the variables significantly predicted the total disfluency rate in English (Table 3).
Note. N = 51.
When we conducted the same regression analysis by replacing the iconic gesture frequency with the overall gesture frequency, the model was still nonsignificant, F(4,40) = 1.83, p = .142, with an R2 of .070 (Model 4). Neither overall gesture frequency nor other predictors (age group, PPVT score, and syntactic complexity score) significantly contributed to the model (Table 3).
4. Discussion
This study investigated the speech disfluency and gesture production of 5- and 7-year-old typically developing Turkish monolingual and Turkish–English bilingual children. We examined Turkish narratives from monolinguals, and both Turkish and English narratives from bilinguals. We asked whether children's gesture use was associated with their speech fluency in narrative production, regardless of being bilingual or monolingual. Our results indicated that, overall, bilinguals were more disfluent than monolinguals. There were also some age and language group differences in terms of the frequency of using specific disfluency types. Bilinguals used silent pauses and filled pauses more frequently than monolinguals. Regardless of being monolingual or bilingual, the frequency of using silent pauses was higher in the 5-year-old group while the filled pause frequency was higher in the 7-year-old group. However, there was neither an age group nor a language group difference in terms of the frequency of using repetitions and repairs. We also found that bilinguals used silent pauses and repetitions more frequently in English than Turkish. For gesture use, we demonstrated that all gesture frequencies were comparable across the two language groups and the two age groups, except the iconic gesture frequency. Monolingual children produced more iconic gestures than bilinguals. Additionally, bilingual children produced iconic gestures more frequently in English than Turkish. Last, our results suggested that along with the overall gesture frequency, being bilingual or monolingual were significant predictors of overall disfluency rate in Turkish. Iconic gesture frequency, however, was not associated with the disfluency rate. For the English narrative samples, we found that neither overall gesture frequency nor iconic gesture frequency were significant predictors of the overall disfluency rate.
As we expected, compared to Turkish monolinguals, bilingual children were more disfluent in Turkish. The use of specific disfluency types might give cues regarding the nature of disfluent speech in a bilingual context. For instance, the higher frequency of silent pauses and filled pauses observed in bilinguals’ than monolinguals’ speech might reflect bilinguals’ effort to control two lexicons and maintain the conversational floor at the same time. Considering that bilinguals’ two language systems are simultaneously active (Bialystok, Reference Bialystok2009), they might have frequently preferred giving silent breaks while planning their speech to make sure that they select from the vocabulary of one language and ignore the other. Those silent breaks might indeed prevent bilinguals from switching between two languages and help them harmonize with their listeners. Using filled pauses might be a different form of maintaining the conversational floor as filled pauses signal speakers’ aim to continue speaking, which in turn keeps listeners engaged (Bortfeld et al., Reference Bortfeld, Leon, Bloom, Schober and Brennan2001; Corley & Stewart, Reference Corley and Stewart2008). Bilinguals might strategically use filled pauses to keep listeners’ attention while resolving the competition between their two vocabulary systems. Thus, bilinguals’ frequent use of filled pauses might result from their heightened communicative sensitivity to maintain the conversational floor.
Similarly, as we predicted, silent pauses were more commonly observed in the 5-year-old group, while the use of filled pauses was more frequent in the 7-year-old group. This finding is in line with children's enhanced language and communication skills (Nettelbladt & Hansson, Reference Nettelbladt and Hansson1999) as well as increased pragmatic skills and narrative competence (Hickmann, Reference Hickmann2003). Perspective-taking and meta-representational abilities, which might further strengthen communication, also increase with age (Astington & Baird, Reference Astington and Baird2005). As children's language and social skills develop, they might prefer filled pauses over silent pauses to create a stronger communicative ground in which they better manage listeners’ point of view and turn-takings (Haynes & Hood, Reference Haynes and Hood1977). Moreover, unlike what we expected, we found that the use of repetitions was comparable between the two age groups. We argue that as the decrease in repetitions is usually observed before children reach preschool age (DeJoy & Gregory, Reference DeJoy and Gregory1985; Haynes & Hood, Reference Haynes and Hood1977), there might not be a prominent difference in repetition rates between 5- and 7-year old children's speech.
Bedore et al. (Reference Bedore, Fiestas, Peña and Nagy2006) demonstrated that Spanish–English bilingual children used more repetitions and grammatical repairs in Spanish than in English. The authors suggested that as Spanish is more complex than English in terms of morphosyntactic elements, bilinguals might be likely to repeat and revise their language output to comply with the specific rules such as gender-noun agreement in Spanish. Our study focused on the two typologically different languages, Turkish and English. Although bilinguals’ overall disfluency rates were comparable between Turkish and English, our results indicated differences in terms of the frequency of using silent pauses and repetitions. The interpretation of this finding might be open to alternative explanations, but the results may stem from following factors which might be future questions to be asked. Turkish is a gender neutral language, meaning that it does not contain grammatical genders or gender pronouns. In English, however, there are gender pronouns. Turkish is a gender neutral language, meaning that it does not contain grammatical genders or gender pronouns. In English, however, there are gender pronouns. Moreover, the two languages differ in terms of determinants as Turkish does not have a definite article such as the in English. Therefore, Turkish–English bilinguals might give silent breaks more frequently in English to evaluate whether they comply with the specific and relatively complex grammatical rules that do not exist in Turkish. Similarly, in some cases, after passing the internal evaluation, overtly articulating some parts of speech in the form of repetitions might reflect bilinguals’ hesitations, and they need to double check whether everything sounds good regarding the grammatical rules in English.
Unlike what we expected, we found that bilingual and monolingual children's overall gesture frequencies were comparable in Turkish. This finding is in line with Azar et al. (Reference Azar, Backus and Özyürek2020) as Turkish–Dutch bilingual adults’ gesture frequencies were comparable with those of monolinguals both in Turkish and Dutch. Similarly, Cavicchio and Kita (Reference Cavicchio and Kita2013) obtained narratives from Italian-English bilingual and English and Italian monolingual adults. They found Italian monolinguals gesturing more frequently and using a larger gesture space than English monolingual speakers. However, bilinguals’ gestures were similar to the baseline rates in either of their languages. Although our sample did not include English monolinguals as a control group, observing similar overall gesture rates in Turkish across the two groups partially corroborates with these findings. One explanation is that since the bilingual children in our study were mostly dominant in Turkish with limited native and cultural contact to English, this finding might have stemmed from a dominance effect. They might be using the gestures of their dominant language in both languages. Yet, similar to bilingual children in this study who used Turkish mainly at home and English mainly at school, the bilingual adults of Azar et al. (Reference Azar, Backus and Özyürek2020) and Cavicchio and Kita (Reference Cavicchio and Kita2013) used one of their languages mostly at home with friends and family. In both these studies, bilinguals were highly proficient in both of their languages. Our replication with children supports the argument that cultural and experience-related differences might explain contrasting results regarding the gesture frequency of bilingual speakers.
Our results indicated a higher frequency of iconic gestures in monolinguals’ than bilinguals’ narratives in Turkish. Since bilinguals acquire two languages, they might practice less in each of their languages compared to monolinguals speaking those languages. As a result, they control a smaller vocabulary in either of their languages (Bialystok & Feng, Reference Bialystok and Feng2009) and they are disadvantaged in language related tasks (for a review, see Bialystok, Reference Bialystok2009). Language is multimodal, and the use of iconic gestures is positively related to spoken language skills among children (Nicoladis, Reference Nicoladis2002). As a result, monolingual children might also be ahead of their bilingual peers in terms of using complex gesture forms such as iconic gestures (Nicoladis et al., Reference Nicoladis, Mayberry and Genesee1999). On the other hand, although we expected higher deictic gesture frequency in the bilingual group than the monolingual group, our results did not yield such a difference. Deictic gestures are considered a part of the bilinguals’ strategy of decreasing the cognitive load – however, those deictic gestures are abstract in nature (i.e., pointing in the absence of referents). In our study, as the story book was open in front of the participants, the deictic gestures they produced were mostly concrete deictics (i.e., pointing at visually present objects). Unlike abstract deictics (Nicoladis, Reference Nicoladis2007), concrete deictics are not associated with the bilingual strategy of decreasing the cognitive load. Thus, it is not surprising to observe similar deictic gesture rates across the narratives of bilingual and monolingual groups. Moreover, although the two age groups were comparable in iconic and beat gesture frequencies, there was a tendency that the 7-year-old group used these gestures more frequently than the 5-year-old group. We suggest that two years of age difference between these groups might not be sufficient to yield a significant difference in terms of the use of iconic and beat gestures. Rather, such a difference might be more prominent when 5-year-olds are compared with children older than 7 years of age. In line with this argument, Colletta (Reference Colletta2009) suggests that, around 9 years of age, children are able to spontaneously produce narratives similar to that of adults in the sense that they effectively use co-speech gestures. Future work can examine 9-year-old Turkish–English bilinguals’ gesture use.
Although we expected bilinguals to reveal comparable gesture frequencies across their two languages, we found that iconic gesture frequency was higher in bilinguals’ English than Turkish narratives. Having Turkish native speaker parents and living in Turkey, bilingual children have been predominantly exposed to Turkish since birth. They are sequential bilinguals and English is their L2. As a result, overall, bilingual children might have relatively less practice in English than Turkish, although they are proficient in both languages. Levelt, Roelofs, and Meyer (Reference Levelt, Roelofs and Meyer1999) suggests that lexical selection and phonological form retrieval are linked to each other, and together they form the lexical retrieval process. As a result of less practice, bilinguals’ lemma to word form connections might be weaker in English. Such weak connections might be associated with slower lexical retrieval (Levelt et al., Reference Levelt, Roelofs and Meyer1999). Considering iconic gestures’ role in lexical retrieval (Morsella & Krauss, Reference Morsella and Krauss2004; Rauscher et al., Reference Rauscher, Krauss and Chen1996), bilinguals might have used more iconic gestures in English than Turkish to enhance the word retrieval process.
Our results indicated that, although not significant, there was a tendency for iconic gesture use to be negatively associated with children's disfluency rates both in Turkish (controlling for language group) and English narrative samples. Producing gestures might decrease cognitive load and enhance cognitive processes (Kita et al., Reference Kita, Alibali and Chu2017). Yet, using gestures in a way that they facilitate cognitive processes, particularly speech production might be more prominent in a more mature language system. It is because iconic gestures are complex in nature, which are intact with advanced language skills (Iverson & Goldin-Meadow, Reference Iverson and Goldin-Meadow2005). Yet, it is important to note that even though a significant negative association of iconic gestures with speech disfluency was observed in adults, it would only be an indirect evidence of gestures’ facilitative roles in speech production. Disfluency is a complex concept to study as disfluent segments in speech might also stem from communicative intentions (Fraundorf & Watson, Reference Fraundorf and Watson2014).
We demonstrated that rather than iconic gesture frequency, overall gesture frequency was negatively associated with disfluency rates in Turkish narratives, regardless of being bilingual or monolingual. Although iconic gestures are extensively investigated in terms of having self-oriented functions (for a review, see Kita et al., Reference Kita, Alibali and Chu2017; Kita & Davies, Reference Kita and Davies2009; Melinger & Kita, Reference Melinger and Kita2007), recently there is an increased focus on the role of different gesture types in relation to cognitive processes, such as deictic gestures (e.g., Arslan & Göksun, Reference Arslan and Göksun2022; Avcı et al., Reference Avcı, Arslan, Göksun, Culbertson, Perfors, Rabagliati and Ramenzoni2022; Azar et al., Reference Azar, Backus and Özyürek2020) and beat gestures (e.g., Llanes-Coromina, Vilà-Giménez, Kushch, Borràs-Comes & Prieto, Reference Llanes-Coromina, Vilà-Giménez, Kushch, Borràs-Comes and Prieto2018; Vilà-Giménez, Igualada & Prieto, Reference Vilà-Giménez, Igualada and Prieto2019; Vilà-Giménez & Prieto, Reference Vilà-Giménez and Prieto2020). Research suggests that not only using iconic gestures (Demir, Fisher, Goldin-Meadow & Levine, Reference Demir, Fisher, Goldin-Meadow and Levine2014; Stites & Özçalışkan, Reference Stites and Özçalışkan2017) but also using deictic gestures (Goldin-Meadow & Butcher, Reference Goldin-Meadow and Butcher2003; Iverson & Goldin-Meadow, Reference Iverson and Goldin-Meadow2005; Özçalışkan & Goldin-Meadow, Reference Özçalışkan and Goldin-Meadow2005) and beat gestures (Vilà-Giménez, Dowling, Demir-Lira, Prieto & Goldin-Meadow, Reference Vilà-Giménez, Dowling, Demir-Lira, Prieto and Goldin-Meadow2021) can predict children's language development and narrative performance. Moreover, the use of deictics is observed in different populations, including older adults (Arslan & Göksun, Reference Arslan and Göksun2021, Reference Arslan and Göksun2022) and bilinguals (Azar et al., Reference Azar, Backus and Özyürek2020; Nicoladis, Reference Nicoladis2007), suggesting that pointing at abstract referents might decrease the cognitive load in these groups. Similarly, Vilà-Giménez and Prieto (Reference Vilà-Giménez and Prieto2020), demonstrated that children produce narratives more fluently when they are encouraged to produce beat gestures, suggesting that beats might be pragmatically meaningful gestures.
These findings together indicate that different gesture types might be orchestrated in a coordinated multimodal system, which contributes to the narrative performance. Focusing solely on iconic gestures might provide a limited perspective, considering the role of different gesture types in terms of decreasing cognitive load and enhancing fluency. Our results are partially in line with this argument as we found a significant negative association of overall gesture frequency with disfluency rates only in children's Turkish narratives. One possible explanation of not observing such an association in the English narratives might be related to the sample size. We ran the regression analysis for the Turkish narratives with a total of 112 children, 61 of them being monolingual and 51 of them being bilingual. However, as our sample did not include English monolingual children, we carried out the regression analysis for the English narratives with a total of 51 bilinguals. The multiple linear regression analysis is sensitive to the number of observations and the number of independent variables (J. Cohen, P. Cohen, West & Aiken, Reference Cohen, Cohen, West and Aiken2003). Thus, it would not be surprising to demonstrate such association in Turkish but not in English narratives. In line with this argument, although not significant, there is still a salient tendency that overall gesture frequency and disfluency rate in children's English narratives were negatively associated. Future research should investigate and compare children's multimodal language samples in different contexts such as free talk or play talk, which might further shed light on typical language development as a result of being cognitively less demanding than narrative production.
In conclusion, the current study suggests that speech disfluency cannot be fully understood without incorporating the gesture modality into the picture, in line with the close link between gesture and speech (Kita & Özyürek, Reference Kita and Özyürek2003). Our findings highlight the importance of overall gesture use rather than iconic gesture use alone in portraying gesture-speech interaction with regard to speech fluency. We also found that bilingual and monolingual children differed in the use of specific disfluency and gesture types, which might shed light on their language development and communication strategies. Overall, our study suggests that each gesture type might carry a function in a coordinated multimodal system, which might, in turn, influence speech quality.
Acknowledgements
We thank Can Avcı, Selin Tezel, Sıla Sevi Çapar, and Ela Erciyes for their support in transcription and coding.
Competing interests declaration
Competing interests: The authors declare none.
Data availability statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.