Introduction
Demonstratives, such as this/that and here/there, play a starring role in language development. English-acquiring children produce demonstratives before other function words (Clark, Reference Clark, Bruner and Garton1978; González-Peña, Doherty & Guijarro-Fuentes, Reference González-Peña, Doherty and Guijarro-Fuentes2020) and use them with great frequency during the one-word stage, where they represent up to 7% of all tokens (Diessel & Coventry, Reference Diessel and Coventry2020; González-Peña et al., Reference González-Peña, Doherty and Guijarro-Fuentes2020, p. 5). Beyond English, similar patterns also hold in Spanish (González-Peña et al., Reference González-Peña, Doherty and Guijarro-Fuentes2020; Shin & Morford, Reference Shin, Morford, Colomina-Almiñana and Sessarego2020), Italian (Todisco, Guijarro-Fuentes, Collier & Coventry, Reference Todisco, Guijarro-Fuentes, Collier and Coventry2021), Turkish (Küntay & Özyürek, Reference Küntay and Özyürek2006), and Mandarin (Chu & Minai, Reference Chu and Minai2018). This early emergence and high production frequency are typically attributed to demonstratives’ close relationship with joint attention (Diessel & Coventry, Reference Diessel and Coventry2020; Tomasello, Reference Tomasello2008).
Yet while demonstrative acquisition has been studied across several languages, all of these languages have relatively simple demonstrative systems, with just two or three terms (e.g., English this/that; Spanish este/ese/aquel). By contrast, some systems are much larger, contrasting four to twelve demonstratives and displaying correspondingly more complex semantics for each term (Anderson & Keenan, Reference Anderson, Keenan and Shopen1985; Levinson, Reference Levinson, Levinson, Cutfield, Dunn, Enfield and Meira2018). Though demonstratives in such ‘multi-term’ systems remain closely tied to joint attention, these systems are both larger and semantically more complex than the two- and three-term systems represented in the literature, suggesting that they may pose different challenges for child learners.
Against this background, this study investigated the L1 acquisition of demonstratives in Ticuna, an Indigenous Amazonian language with four demonstrative terms, though an observational study of 45 children aged 1;0 to 4;11. To preview the findings, only one of the four demonstratives, the Speaker-Proximal, followed the developmental trajectory typically seen in two- and three-term demonstrative systems. All of the other terms diverged from the patterns of smaller systems: some demonstratives emerged later than expected, while others emerged on the expected timeframe but followed an unpredicted path to adult-like use. Before turning to the specifics of this study, I first present background about cross-linguistic diversity in demonstrative meaning, demonstrative acquisition, and the demonstrative system of Ticuna.
Background on demonstratives
Diversity in demonstrative meaning
To describe demonstrative meaning, I employ the concepts of deictic content and origo. A demonstrative’s origo (Bühler, 1982) is the discourse participant(s) to whom it relates the referent; its deictic content is the information which it conveys about the referent relative to the origo. For example, on many analyses, English that conveys that the referent is far from the speaker. Thus, its origo is the speaker, while its deictic content conveys ‘far from origo.’
Demonstratives’ origo and deictic content vary substantially across languages (Diessel, Reference Diessel1999; Levinson, Reference Levinson, Levinson, Cutfield, Dunn, Enfield and Meira2018; Peeters, Krahmer & Maes, Reference Peeters, Krahmer and Maes2020). The origo may be the speaker only, the addressee only, or the interactive dyad composed of both participants (Jungbluth, Reference Jungbluth and Lenz2003; Peeters, Hagoort & Özyürek, Reference Peeters, Hagoort and Özyürek2015). Deictic content varies more, but falls into three broad categories: spatial, perceptual, and attentional (Hanks, Reference Hanks, Bublitz and Norrick2011; Peeters et al., Reference Peeters, Krahmer and Maes2020).
Spatial deictic content concerns the demonstrative referent’s location in space. While this is traditionally understood as distance (Diessel, Reference Diessel1999, p. 2), recent work argues that most spatial deictic content instead concerns location: relative to an interactionally emergent ‘here-space’ (Enfield, Reference Enfield2003), relative to the origo’s reaching space (Caldano & Coventry, 2019; Kemmerer, Reference Kemmerer1999), or relative to the origo within a geocentric or intrinsic frame of reference (Burenhult, Reference Burenhult2003; Grenoble, McMahan & Petrussen, Reference Grenoble, McMahan and Petrussen2019).
Perceptual deictic content concerns the sense that the origo uses to perceive the demonstrative referent. Perceptual deictic content most often concerns visibility (Hanks, Reference Hanks, Bublitz and Norrick2011, p. 329), conveying whether or not the origo can see the demonstrative referent, but may also relate to other senses, especially hearing (Levinson, Reference Levinson, Levinson, Cutfield, Dunn, Enfield and Meira2018).
Attentional deictic content conveys the joint attention status of the referent – whether it is already in joint attention, or whether the speaker is establishing new joint attention on it. While authors have argued for attentional deictic content in a variety of languages, including Turkish, Dutch, and English (Coventry, Griffiths & Hamilton, Reference Coventry, Griffiths and Hamilton2014; Küntay & Özyürek, Reference Küntay and Özyürek2006; Piwek, Beun & Cremers, Reference Piwek, Beun and Cremers2008), its existence is contested. For example, while Küntay and Özyürek (2006) argue that the Turkish demonstrative ʂu calls new joint attention to the referent, Peeters, Azar & Ozyurek, (Reference Peeters, Azar, Ozyurek, Bello, Guarini, McShane and Scassellati2014) observe no effect of joint attention on the use of ʂu.
Beyond deictic content, speakers’ use of demonstratives also responds to many other pragmatic factors. These may include the ownership of the demonstrative referent (Coventry et al., Reference Coventry, Griffiths and Hamilton2014; Cutfield, Reference Cutfield, Levinson, Cutfield, Dunn, Enfield, Meira and Wilkins2018; Hanks, Reference Hanks2005); participants’ affective evaluation of the referent (Cutfield, Reference Cutfield, Levinson, Cutfield, Dunn, Enfield, Meira and Wilkins2018; Rocca, Tylén & Wallentin, Reference Rocca, Tylén and Wallentin2019); and the involvement of the referent in the participants’ joint activity (Enfield, Reference Enfield2003; Hanks, Reference Hanks2005).
Last, demonstratives are often paired with deictic elements from other word classes. Many languages have deictic verbs like come/go, which convey the direction of motion relative to the deictic center (Wilkins & Hill, Reference Wilkins and Hill1995). Some languages also display dedicated presentative forms, such as French voilà or Yucatec Maya heʔel (Espinosa Ochoa, Reference Espinosa Ochoa2021; Hanks, Reference Hanks, Bublitz and Norrick2011), which draw attention to discourse-new referents.
Acquisition of demonstratives
Across languages, children produce demonstratives in the one-word stage, but display non-adult-like production and comprehension of the items through at least six or seven years. While most research supporting this generalization is on English (Clark & Sengul, Reference Clark and Sengul1978; González-Peña, Reference González-Peña2020; Tanz, Reference Tanz1980; Webb & Abrahamson, Reference Webb and Abrahamson1976), the same pattern is also documented for Spanish (Rodrigo et al., Reference Rodrigo, González, de Vega, Muñetón-Ayala and Rodríguez2004; Shin & Morford, Reference Shin, Morford, Colomina-Almiñana and Sessarego2020), Turkish (Küntay & Özyürek, Reference Küntay and Özyürek2006), Mandarin (Chu & Minai, Reference Chu and Minai2018), and Yucatec Maya (Espinosa Ochoa, Reference Espinosa Ochoa2021).
To explain the cross-linguistic early emergence of demonstratives, authors invoke frequency and ties to joint attention. Children produce demonstratives early because the items are both exceptionally frequent in the input and centrally involved in joint attention (Diessel & Coventry, Reference Diessel and Coventry2020; González-Peña et al., Reference González-Peña, Doherty and Guijarro-Fuentes2020). By contrast, to explain the persistent immaturity of children’s demonstrative comprehension and production, authors posit a conflict between demonstratives’ deictic semantics and children’s cognitive bias toward spatial egocentrism (Chu & Minai, Reference Chu and Minai2018; Clark & Sengul, Reference Clark and Sengul1978; Küntay & Özyürek, Reference Küntay and Özyürek2006; Tanz, Reference Tanz1980; Webb & Abrahamson, Reference Webb and Abrahamson1976). Since the demonstrative origo shifts as new participants speak, adult-like comprehension and production of demonstratives require the ability to construe referents from others’ spatial perspectives. Immature Theory of Mind causes young children to struggle with such perspective-shifting, instead preferring to construe referents from their own spatial perspective. Children’s egocentric perspective then inhibits adult-like comprehension of demonstratives directly, and adult-like production indirectly.
This Piagetian theory finds some support in error patterns. Some English-learning children 2;7 to 5;3 make egocentric comprehension errors involving construing themselves as the origo of others’ demonstratives (i.e., reading here as ‘near me, the child’ rather than ‘near me, the speaker’) (Clark & Sengul, Reference Clark and Sengul1978, p. 471), though the prevalence and developmental duration of such errors is disputed (Charney, Reference Charney1979; de Villiers & de Villiers, Reference de Villiers and de Villiers1974; González-Peña, Reference González-Peña2020, ch. 3; Wales, Reference Wales, Fletcher and Garman1986). Similarly, Turkish-speaking four-year-olds underuse demonstratives sensitive to others’ visual attention (Küntay & Özyürek, Reference Küntay and Özyürek2006, p. 315), representing a production error plausibly driven by egocentrism. Beyond egocentrism proper, there is also evidence that the related concept of proximity bias – children’s tendency to focus on their immediate surroundings and proximal objects – affects the emergence of demonstratives: speaker-proximal demonstratives appear before distal ones in languages such as Spanish, Hebrew, Japanese, and Yucatec Maya (Diessel & Coventry, Reference Diessel and Coventry2020, p. 6; Espinosa Ochoa, Reference Espinosa Ochoa2021; Rodrigo et al., 2004).
Yet egocentrism and proximity bias are not the only possible explanation for the persistently immature use of demonstratives. Rather, as González-Peña (Reference González-Peña2020, p. 94) argues, children may also struggle with these items because of their semantic complexity. Even in languages with small demonstrative systems, the semantic contrasts between demonstratives can be subtle, and speakers’ use of the items can be simultaneously influenced by multiple properties of the referent (e.g., location, joint attention, ownership). Many subtle meaning contrasts – including those that do not involve deixis, such as a vs. the (Brown, Reference Brown1973, pp. 353-355; Rozendaal & Baker, Reference Rozendaal and Baker2008) – are hard for children to learn; some fine spatial contrasts, such as be in vs. be on, display the same long-lasting immaturity as demonstratives (Johannes, Wilson & Landau, Reference Johannes, Wilson and Landau2016). Thus, non-adult-like production and comprehension of demonstratives may arise from general difficulty with semantic complexity, rather than problems unique to Theory of Mind.
Beyond semantics, syntactic category also affects the emergence of demonstratives. Young children learning English and Spanish are more likely to produce locative demonstratives (here/there) than nominal demonstratives (this/that). This asymmetry lasts from 1;6 to 1;10 in English and 1;8 to 1;10 in Spanish (González-Peña et al., Reference González-Peña, Doherty and Guijarro-Fuentes2020, pp. 4, 11). One possible explanation for the facilitation of locatives is that presentative constructions – utterances that focus a demonstrative and call new joint attention to the referent – are an especially early-emerging use of demonstratives (Harris, Barrett, Jones & Brookes, Reference Harris, Barrett, Jones and Brookes1988; Barrett, Harris & Chasin, Reference Barrett, Harris and Chasin1991; cited by González-Peña, Reference González-Peña2020, p. 51), and in English and Spanish, these constructions happen to use locative forms ( Here it is, Aquí está). In Yucatec Maya, however, presentative constructions use a unique presentative demonstrative heʔel ‘here/there it is,’ and some children produce this item before either nominal or locative demonstratives (Espinosa Ochoa, Reference Espinosa Ochoa2021, p. 11). This suggests the hypothesis that it is presentative demonstratives as a pragmatic category – rather than nominals or locatives as a syntactic category – which emerge first across languages.
Other factors matter too. Structurally, the marked phonology of English this and that tends to inhibit their adult-like pronunciation (González-Peña, Reference González-Peña2020, p. 50). Pragmatically, Platt (Reference Platt, Schieffelin and Ochs1986) suggests that social norms encouraging requests for goods facilitate Samoan children’s learning of deictic verbs equivalent to give. This said, considerations of semantics (i.e., complexity and egocentrism) and syntax (i.e., presentative structure) make the clearest predictions for demonstrative acquisition across languages, and my analysis therefore focuses on them.
Language background
Location, classification, and speaker population
Ticuna is a language isolate spoken by roughly 69,000 people (Faço Soares, Reference Faço Soares2021) living mostly along the Amazon/Solimões River in Peru, Colombia, and Brazil. The language is acquired widely by children in Peru and Brazil, but not Colombia (Santos, Reference Santos2004). Data in this paper comes from my own fieldwork in Cushillococha, Peru, over 13 months total since 2015.
Cushillococha is a land-titled Indigenous community with about 5,000 residents, located in Mariscal Ramón Castilla district and province, Loreto region. Nearly all households in the town are ethnically Ticuna and make most of their living from subsistence agriculture on communal land. Most families also earn cash by marketing produce or working for wages in the provincial capital, Caballococha (population ~12,000), which is located 8km (about 15 minutes by motorcycle) from the Cushillococha town center. There is also significant traffic between Cushillococha and urban centers, especially the twin cities of Leticia, Colombia and Tabatinga, Brazil (combined pop. ~110,000), which can be reached in roughly three hours via twice-daily speedboats.
During my fieldwork from 2015 to 2019, almost everyone in Cushillococha spoke Ticuna natively. It was the main language of daily conversation; of official environments like church and government; and of most instruction in the locally controlled preschool, primary school, and secondary school. Most adults and children over eight years also knew at least some Spanish and/or Brazilian Portuguese, with abilities ranging from limited passive knowledge to fluency.
Compared to most of rural Amazonia, Cushillococha had a relatively high level of economic development, with 24-hour electricity, cell phone service, in-home water service, and a locally staffed health clinic. Most families lived in homes made from commercial materials, such as poured concrete, and owned large appliances such as gas stoves, televisions, and motorcycles. Thus, the most significant logistical challenges for this research stemmed from (a) the echo-prone acoustics of poured concrete buildings, (b) the high level of environmental noise in participants’ homes, and (c) adults’ availability to transcribe recordings. As described in the Methods, mitigating echo and noise required the use of highly redundant recording methods, and completing transcription required my involvement as a transcriber.
Phonology and morphology
To understand the data in this paper, it is necessary to know that Ticuna displays eight lexical tones (Anderson, Reference Anderson1959). Transcriptions use IPA and represent tones with raised numerals; 1 is the lowest tone. It is also relevant that Ticuna nouns are divided into five classes, primarily based on semantic principles such as animacy. Most noun phrase constituents, including demonstratives, undergo noun class agreement.
Demonstrative system
Ticuna has six nominal demonstratives, equivalent to English this/that, and six locative demonstratives, equivalent to here/there. Four demonstratives in each category have productive deictic uses; of the other two, one is exclusively anaphoric and the other occurs only in idioms. I discuss only the four productive, deictic items; my description follows the analysis of adults’ demonstrative use in Skilton (Reference Skilton2019, Reference Skilton2021).
Table 1 displays the four deictic demonstratives. Nominal demonstratives appear in the upper portion of Table 1, locative demonstratives in the lower portion.
The Speaker-Proximal demonstrative, which can be paraphrased this/here near me, has spatial and attentional deictic content. It indexes referents located within the speaker’s reach (as in this cup) and referents which enclose the speaker (as in this country). It can also be used to establish new joint attention on referents anywhere in space.
The Dyad-Proximal and Speaker-Distal demonstratives have spatial and perceptual deictic content. Spatially, the Dyad-Proximal (this/here between us) conveys that the referent is within the space occupied by the interactive dyad – most often, between the speaker and the addressee. The Speaker-Distal (that/there far from me) conveys that the referent is beyond the speaker’s reach. Perceptually, the Dyad-Proximal and Speaker-Distal nominal demonstratives also require that the speaker sees the demonstrative referent at the moment of speech. The Dyad-Proximal and Speaker-Distal locative forms do not have this visibility requirement.
The Addressee-Centered demonstrative (that/there near you) has only spatial deictic content, conveying that the referent is within reach for the addressee. Additionally, this demonstrative has two non-spatial uses. First, the Addressee-Centered term can be used anaphorically; none of the other deictic demonstratives allow anaphoric use. Second, the Addressee-Centered nominal form can also be used as an invisible deictic demonstrative. In this use, it can index any referent that the speaker does not see, regardless of location. The Addressee-Centered locative demonstrative lacks this invisible use, but shares the nominal term’s addressee-proximal and anaphoric uses.
In terms of form, none of the demonstratives include segments or tones which are restricted to function words or otherwise phonologically marked. Additionally, while the Addressee-Centered and Dyad-Proximal forms have a clear phonological relationship, various criteria show that they are synchronically monomorphemic (Skilton, Reference Skilton2019, p. 14).
Ownership
Referent ownership substantially influences Ticuna-speaking adults’ demonstrative use. Speakers often use Addressee-Centered (that near you) nominal demonstratives to index referents owned by the addressee, even when they are not spatially addressee-proximal. I have never observed parallel speaker-ownership uses of the Speaker-Proximal.
One example of the ownership use of the Addressee-Centered term appears in (1). As shown in the image, the speaker and addressee are separated by a few meters; the demonstrative referent, a marble, is next to the speaker’s hand and across the room from the addressee. Referents in this location are normally indexed by the Speaker-Proximal. However, because the referent in (1) belongs to the addressee, the speaker instead uses the Addressee-Centered term.Footnote 1
(1) Caregiver and child (1;2) are playing with marbles, which the caregiver has divided into the child’s and hers. One of the child’s marbles rolls into a corner of the room. The caregiver crosses the room to pick it up, then says,
Since the referent in (5) is visible, discourse-new, and not spatially addressee-proximal, the Addressee-Centered tokens cannot represent the invisible, anaphoric, or addressee-proximal uses; instead, they are motivated only by the addressee’s ownership of the referent. Most tokens of the Addressee-Centered demonstrative, however, are ambiguous between two or more uses. Due to this ambiguity, my quantitative analyses do not calculate individual frequencies for each Addressee-Proximal use.
Deixis is unique to demonstratives
Besides personal pronouns, demonstratives are the only deictic elements in Ticuna. The language does not have deictic motion verbs like English come/go. Instead, speakers convey the direction of motion relative to the deictic center by combining non-deictic motion verbs with locative demonstratives. This is shown in (2) and (3), where both the Speaker-Proximal (2) and the Speaker-Distal (3) combine with the same motion verb, ũ⁴³ ‘come, go, walk.’
(2) Adult: Ti³ti¹, nu⁵a² na¹ʔũ⁴³
(3) Caregiver: ɟe⁵a² na¹ʔũ⁴³
As well as deictic motion verbs, Ticuna also lacks presentative forms like voilà. Instead, presentative constructions are formed using nominal demonstratives. The nominal demonstrative is fronted, then followed optionally by the copula and the descriptive noun phrase, as in (4). Note that while the demonstrative in (4) is translated as here, it is actually a nominal demonstrative otherwise equivalent to this.
(4) Caregiver shows two children a large teddy bear, then presents a smaller bear and says:
The isolation of deixis in demonstratives adds to the typological uniqueness of the Ticuna demonstrative system, but prevents me from comparing the acquisition of demonstratives to the acquisition of other deictic terms (cf. Tanz, Reference Tanz1980).
Frequency of demonstratives
Between 2017 and 2018, I recorded a corpus of informal conversation (Rossi, Floyd & Enfield, Reference Rossi, Floyd, Enfield, Floyd, Rossi and Enfield2020) between Ticuna-speaking adults. This corpus, which contains 1h49m of conversation from 8 interactions (mean time = 13 min 38 sec, SD = 7 min 37 sec), was collaboratively transcribed and translated into Spanish by me and Angel Bitancourt Serra, an L1 Ticuna speaker.
The adult conversational corpus contains 2,360 turns of adult-directed speech (ADS); 2,224 turns (94.1%) include at least one intelligible Ticuna word. Table 2 reports the frequency of each demonstrative in these 2,224 analyzable turns. Because of the high frequency of all demonstratives, frequencies are reported per 100 words.
Note. Frequency values are calculated by interaction, not by speaker.
Looking first to nominal demonstratives, the Speaker-Proximal (this near me) and Addressee-Centered (that near you) terms are very frequent, each making up ~2.1% of all ADS word tokens, while the Speaker-Distal and Dyad-Proximal are numerically less frequent. To compare the frequencies of each demonstrative, I conducted a series of two-sided, pairwise Wilcoxon rank sum tests, then adjusted p-values for multiple comparisons using the Benjamini-Hochberg correction. These tests showed that in conversational ADS, the Speaker-Proximal and Addressee-Proximal are not significantly different in frequency (W = 36, p = 0.72). However, the Speaker-Distal term (that far from me) is significantly less frequent than the Speaker-Proximal (W = 64, p = 9.3e-4) or Addressee-Centered (W = 63, p = 9.3e-4). In turn, the Dyad-Proximal term (this between us) is significantly less frequent than all other demonstratives (compared to the Speaker-Distal, W = 53, p = 0.037).
Turning to locative demonstratives, the Speaker-Proximal (here near me), Addressee-Centered (there near you), and Speaker-Distal (there far from me) terms are all indistinguishable in ADS frequency (in an ANOVA, F(2,21) = 0.83, p = 0.45). The Dyad-Proximal locative demonstrative (here between us) is significantly less frequent than the other locative types (compared to the Speaker-Proximal in a one-tailed Wilcoxon rank-sum test, W = 61.5, p = 0.0011).
Predictions
In combination with the language-specific facts above, the theories discussed in the Background yield three predictions about Ticuna-learning children’s acquisition of demonstratives.
Prediction 1: emergence
If cognitive egocentrism is a central factor in Ticuna-learning children’s initial production of demonstratives, semantically egocentric demonstratives are predicted to emerge before non-egocentric ones. Thus, the Speaker-Proximal and Speaker-Distal will appear before the Addressee-Centered and Dyad-Proximal. Additionally, if Ticuna-acquiring children share the proximity bias documented for children acquiring other languages, they are predicted to produce Speaker-Proximal demonstratives before Speaker-Distal demonstratives. Demonstratives will therefore appear in the order Speaker-Proximal > Speaker-Distal > {Addressee-Centered, Dyad-Proximal}. As the Addressee-Centered demonstratives are semantically more complex (e.g., more polysemous and more influenced by ownership) than the Speaker-Proximal and Speaker-Distal, complexity also favors late emergence of these terms.
In contrast, if cognitive biases favoring egocentric, (speaker-)proximal, and semantically simple items do not substantially influence demonstrative acquisition, then demonstratives – like other function words – will emerge roughly in order of CDS token frequency (Ambridge, Kidd, Rowland & Theakston, Reference Ambridge, Kidd, Rowland and Theakston2015, pp. 243-248; Lieven, Reference Lieven2010; Rowland, Pine, Lieven & Theakston, 2003; though cf. Brown, Reference Brown1973). Assuming that the CDS frequency ranking of demonstratives mirrors the ADS ranking in Table 2, nominal demonstratives will appear in the order {Speaker-Proximal, Addressee-Centered} > Speaker-Distal > Dyad-Proximal, and locative demonstratives in the order {Speaker-Proximal, Addressee-Centered, Speaker-Distal} > Dyad-Proximal. Given that none of the demonstratives are especially phonologically marked, phonology is unlikely to influence the order of acquisition.
Thus, where these two competing sets of predictions differ is in the predicted behavior of the Addressee-Centered demonstratives (that/there near you). In ADS, the Ticuna Addressee-Centered items are as frequent as the Speaker-Proximals. But the Addressee-Centered terms are also more semantically complex, and using them correctly requires non-egocentric construals of the demonstrative referent. Both egocentrism and semantic complexity therefore predict that Addressee-Centered demonstratives will emerge later than the Speaker-Proximal (and Speaker-Distal). Frequency, in contrast, predicts that the Addressee-Centered demonstratives will be learned at the same time as the Speaker-Proximal.
Concerning the absolute age of emergence, children acquiring languages with two or three demonstrative terms typically produce all of the terms by 3;0 (e.g., Espinosa Ochoa, Reference Espinosa Ochoa2021; González-Peña et al., Reference González-Peña, Doherty and Guijarro-Fuentes2020). Consequently, I predict that Ticuna-learning children will produce all four demonstrative types before 3;0.
Prediction 2: maturity
Seven-year-olds acquiring English (González-Peña, Reference González-Peña2020) and six-year-olds acquiring Turkish (Küntay & Özyürek, Reference Küntay and Özyürek2006) do not yet display adult-like production of any demonstrative. I therefore predict that children acquiring Ticuna will not attain adult-like production of demonstratives until at least 6;0. As this is beyond the age range of the current study, I make no predictions concerning the order in which demonstratives attain adult-like use.
Prediction 3: nominal advantage
As discussed in the Theoretical Background, I hypothesize that children produce the syntactic category of demonstratives used in presentative constructions, such as English Here it is, before producing other categories of demonstratives. Since presentative constructions in Ticuna involve nominal demonstratives ( This it is), I predict that Ticuna-learning children will produce nominal demonstratives (this/that) earlier than locative ones (here/there).
Methods
Participants
Via relationships established in my earlier fieldwork with adults, I recruited 46 children aged 1;0 to 4;11 (mean age = 2;10.12, SD = 14 months 28 days) from families residing in Cushillococha. Families were paid for participation; as is standard in the Amazon Basin, additional collective compensation was paid to a preschool and church which served the families. Children were included if they (a) were acquiring Ticuna as an L1, (b) appeared to be typically developing, and (c) were aged at least 1;0 but less than 5;0. Twelve participants were siblings; ten more participants lived in the same household, but were not siblings. Sixteen participants were exposed to Spanish at home or showed evidence of bilingualism in Spanish during procedures.
I conducted recruitment and all study procedures in 2019. I am a proficient L2 speaker of both Ticuna and Spanish and communicated with families monolingually in Ticuna, using Spanish only with family members who did not speak Ticuna. Study procedures were approved by the Institutional Review Board of the University of Texas at Austin.
Procedures
Children completed three procedures: a daylong audio recording, an object play session, and a free play session. Procedures took place, in the participants’ homes, on three different days within a 10-day period. This study analyzes only the object play and free play sessions.
Object play
Participants were video recorded for 30 minutes of object play with one caregiver. The six sibling pairs were recorded together with one caregiver, producing 40 total recordings (34 with one child, six with two children). The stimulus for the object play was a locally acquired set of 50 marbles. Marbles were chosen because they are a common toy for young children in Cushillococha. As participants did not play with the marbles following a conventional set of rules, the task represents object play, not a structured game.
Object play was recorded with two high-definition video cameras (Sony PJR540 and Canon XA30) at opposing angles. Audio tracks were recorded via body-worn audio recorders (Olympus VP10); a stand-mounted microphone (Rode NT4); and the cameras’ internal microphones. Multiple audio tracks were necessary because most participants’ homes were located within 10m of a busy road, making the environment extremely noisy, and had sheet metal roofs and poured concrete floors, generating significant echo on camera microphones.
My goal in object play was to gather maximally comparable data for each participant. Thus, I was present to operate the video cameras and discourage non-target household members from entering the scene. While some research with Indigenous children has found that participants are uncomfortable with dyadic child-caregiver interactions (Brown, Reference Brown, Duranti, Ochs and Schieffelin2011, p. 37; Kelly, Forshaw, Nordlinger & Wigglesworth, Reference Kelly, Forshaw, Nordlinger and Wigglesworth2015, pp. 296-297), dyadic interaction styles are relatively common in Cushillococha, and participants appeared comfortable with the object play task.
Free play
Participants were video recorded for 60 minutes of free play with one or two caregivers. One child was withdrawn from the study before completing this procedure due to family travel. She was excluded from all analyses, leaving 45 complete participants.
The six sibling pairs and five pairs of non-sibling participants who lived together were each recorded together. This yielded a total of 34 free play recordings (23 with one child, 11 with two children). Six sibling or co-resident pairs were recorded with two caregivers simultaneously, while five pairs were recorded with only one caregiver, as were the 23 non-paired children. The same equipment was used as in object play.
My goal in free play was to record maximally naturalistic interaction between children and caregivers. Thus, during free play sessions, I told participants that they could do whatever they liked as long as they remained in front of the cameras. I also explained that other family members were welcome to enter the scene, which they often did. To facilitate natural interaction, I left the room during the recording.
Sampling and annotation
Sampling
In object play recordings with one child, I sampled the first 10 minutes, measuring from the first turn at talk after participants received the objects. In recordings with two children, I sampled the first 20 minutes. This time-based sampling procedure was appropriate for object play sessions, where turns were distributed evenly across time, but not for free play sessions, where turns occurred in short bursts at unpredictable timepoints. Thus, I sampled the free play based on child volubility, identifying the 10-minute segment of each recording which contained the most child and child-directed turns per time (cf. Casillas, Brown & Levinson, Reference Casillas, Brown and Levinson2020, p. 1824). In 10 recordings, the location of the highest-volubility segment was apparent on viewing. In the other 24 free play recordings, the segment was identified via pitch criteria. I used Praat (v. 6.0.40; Boersma & Weenink, Reference Boersma and Weenink2018) to automatically label all audio intervals with F0 > 300Hz and duration > 100ms, then wrote an R script (v. 4.0.1; R Core Team, 2020) to identify the 10-minute segment with the highest proportion of time meeting the pitch criteria. In recordings with two children, I sampled two 10-minute segments, following the same procedure.
Annotation
All 45 complete participants had 10 minutes of usable data from object play. For free play, 44 participants had 10 minutes of usable data. One participant’s free play data was unusable because she was on camera for only three minutes of her high-volubility segment. Calculations for this child were performed using her productions from the 10-minute high-volubility segment defined for her co-participant brother.
In each of the 89 10-minute samples, we phonetically transcribed all vocalizations produced by or directed to the target participants. Of the 89 samples, 17 were transcribed collaboratively by me and Angel Bitancourt Serra in the Cushillococha area, and the other 72 were transcribed only by me. This division of labor was necessary for logistical reasons. While native speakers generally produce the most accurate transcriptions, in Cushillococha literate Ticuna speakers are in high demand for essential jobs, such as teaching primary school (cf. Kelly et al., Reference Kelly, Forshaw, Nordlinger and Wigglesworth2015, n. 1). As a result, I was able to recruit and hire only one transcriber, and he was unable to transcribe all samples during his available time. This made it necessary for me to transcribe the remaining data.
After speech in the samples was transcribed, it was translated into Spanish by Bitancourt Serra or into English by me. Additionally, I coded all turns in the free play data for addressee type, distinguishing between turns directed to adults, target children, and non-target children. Since object play samples generally included only target participants, I did not code this data for addressee and instead treat all adult turns as child-directed.
Annotation was performed in ELAN (Wittenburg, Brugman, Russel, Klassmann & Sloetjes, Reference Wittenburg, Brugman, Russel, Klassmann and Sloetjes2006) using Transcription Mode (Dingemanse, Hammond, Stehouwer, Somasundaram & Drude, Reference Dingemanse, Hammond, Stehouwer, Somasundaram and Drude2012). The sampled data had a total time of 15 hours 12 minutes (7 hours 44 minutes object play, 7 hours 28 minutes free play) and contained 24,491 total turns at talk. Target caregivers produced 13,217 turns (54.0%); target children, 8,480 turns (34.6%); non-target children, 1,450 turns (5.9%); and non-target adults, 1,344 turns (5.5%).
Post-processing
I exported the transcripts from ELAN as CSV files, then used regular expressions implemented with the stringr package of tidyverse (v. 1.3.0; Wickham et al., Reference Wickham, Averick, Bryan, Chang, McGowan, François, Grolemund, Hayes, Henry, Hester, Kuhn, Pedersen, Miller, Bache, Müller, Ooms, Robinson, Seidel, Spinu and Yutani2019) in R to count the tokens of each demonstrative in each turn. This procedure collapsed demonstratives across the non-deictic features of noun class, number, and case. I manually checked the regular expression output and added codes for tokens which were not identified automatically.
Reliability
While some of the data was transcribed by Angel Bitancourt Serra, most of it was transcribed only by me. Since I am not an L1 speaker of Ticuna or a member of the language community, my transcriptions are more likely to be incorrect than Bitancourt Serra’s. Thus, in order to assess the reliability of my transcriptions compared to his, I blindly re-transcribed 40 minutes (23.5%) of the data that was originally transcribed by Bitancourt Serra. My transcriptions and Bitancourt Serra’s transcriptions displayed substantial inter-rater agreement on the number and identity of demonstratives per turn. Raw agreement was 82.4% (Cohen’s k = 0.69) for object play data and 86.8% (Cohen’s k = 0.67) for free play data.Footnote 2 The R scripts used for sampling, post-processing, and reliability calculations are available, along with the raw and post-processed data, at https://osf.io/x7p2f/?view_only=d03141ccf7494d61b6dd163f70f10339. Additionally, all video and audio recordings are available in Collection 2018-19 at the California Language Archive.
Results
Analyses were conducted in R using tidyverse packages. Analysis scripts are available at https://osf.io/x7p2f/?view_only=d03141ccf7494d61b6dd163f70f10339.
To assess differences in demonstrative use between free play and object play, I conducted a series of eight Wilcoxon rank-sum tests (four demonstrative types x two syntactic categories for each type) with the Benjamini-Hochberg correction comparing the frequency of each demonstrative in object play vs. in free play. Frequency in this analysis was calculated by participant. As shown in Table 3, frequency was significantly different between session types for the Speaker-Proximal nominal term (more frequent in object play), Speaker-Distal nominal term (more frequent in free play), and Dyad-Proximal locative term (more frequent in object play). For all other demonstratives, there was no significant difference in frequency between session types (all p > 0.05). As a result, I collapsed object play and free play data in all analyses. I report data first from caregivers, then from children.
Caregiver production
I analyzed caregiver turns that (a) contained at least one intelligible word of Ticuna and (b) were directed to the target children, including turns directed to a target child and another addressee. A total of 10,467 caregiver turns, containing 29,545 word tokens, met these criteria.
Figure 2 displays caregivers’ frequency of each demonstrative per 100 words. Frequencies were calculated independently for each recording session and, in multi-caregiver sessions, for each caregiver. I characterize the analyzed turns as ‘target child-directed speech’ (TCDS; Casillas et al., Reference Casillas, Brown and Levinson2020) rather than ‘child-directed speech’ because they do not include speech directed to children besides the target participants.
As Figure 2 shows, caregivers’ TCDS displayed the same frequency ranking for both nominal and locative demonstratives: Speaker-Proximal > {Addressee-Centered, Speaker-Distal} > Dyad-Proximal. Given this new information, I amended my frequency-based prediction for the order of emergence (Prediction 1). Based on TCDS frequencies, I now predict that both nominal and locative demonstratives will emerge in the order Speaker-Proximal > {Addressee-Centered, Speaker-Distal} > Dyad-Proximal.
To evaluate the reasons for these changes in rank frequency, I compared the frequency of each demonstrative in TCDS vs. ADS, again using a series of eight two-sided Wilcoxon rank sum tests with the Benjamini-Hochberg correction. The Speaker-Proximal locative demonstrative (here near me) was much more frequent in TCDS than ADS, occurring on average ~230% more often in TCDS than ADS (mean frequency per 100 words in TCDS = 5.27, SD = 3.53; mean frequency in ADS = 1.58, SD = 0.86; W = 81, p = 0.0046). I view the radically increased use of the Speaker-Proximal locative form in TCDS as a task artefact, as caregivers often produced many tokens of this demonstrative while directing children to stay on the scene and in front of the cameras. Besides the Speaker-Proximal locative term, no other demonstrative displayed significantly different frequency in TCDS vs. ADS (all p > 0.1).
Despite the ADS-TCDS differences in relative demonstrative frequency, the new order of demonstrative acquisition predicted by frequency still differs from the order predicted by cognitive biases. Frequency predicts that the Addressee-Centered term will emerge at the same time as the Speaker-Distal, since the two terms display equal frequency in TCDS. By contrast, cognitive biases predict that the Addressee-Centered will emerge after the Speaker-Distal.
Child production
Target children produced 8,480 total turns, of which 5,729 turns (67.6%) contained at least one intelligible word of Ticuna. The 5,729 analyzable turns contained 11,422 total word tokens. Most non-analyzable turns consisted of fussing and noncanonical babbling (1321 turns), laughter (606 turns), or crying (370 turns).
To give a sense of the data, (5) and (6) provide two examples of children’s production, with demonstrative tokens in bold type.
(5) Older child (3;11) is across the room from caregiver chasing a marble; younger child (1;6) sits in caregiver’s lap.
(6) Child (3;5) has just gotten on rocking horse, carrying a monkey stuffed animal. Caregiver is seated on floor next to her, beside a baby doll.
I divided the participants into four one-year age groups. Table 4 shows participant characteristics by age group, and Table 5 shows general language production measures. In Table 5, MLU was calculated in terms of words, as the data is not labeled for morpheme boundaries.
Note. Token, type, and MLU values are calculated per 10-minute sample.
Type analysis
I begin by analyzing participants’ inventory of demonstrative types across age groups. Table 6 presents the first analysis of the type data, showing the percentage of participants in each age group who ever produced a given demonstrative type. For caregivers, only TCDS was analyzed.
The only participants who never produced any demonstratives were two of the 14 one-year-olds, aged 1;0 and 1;6 (representing the 14.3% of this group with no demonstratives in Table 6). The 12-month-old produced just two total word types in two 10-minute samples; the 18-month-old produced 11 word types across three 10-minute samples. Since so few participants lacked demonstratives, I cannot identify implicational relationships between the presence of demonstratives as a category and the number of total word types (cf. González-Peña et al., Reference González-Peña, Doherty and Guijarro-Fuentes2020).
Because the one-year-old age group exhibited numerically much higher type prevalence of nominal demonstratives (present for 85.7% of participants) than locative ones (42.9% of participants), I assessed the prevalence of the nominal vs. locative syntactic categories for this age group. This revealed an implicational relationship between the two syntactic categories. Of the 12 one-year-olds who produced demonstratives, six used only nominal demonstratives, while six used both nominal and locative terms. None used only locative terms.
Interim discussion
I investigated changes in the prevalence of each demonstrative type over developmental time. This analysis suggested that Speaker-Proximal nominal demonstratives were present for most children between 1;0 and 2;0. Speaker-Distals were present for some children between 1;0 and 2;0, but were not present for the majority until 2;0 to 3;0. By contrast, Addressee-Centered terms were not present for the majority of children until much later, between 3;0 and 4;0. The Dyad-Proximal terms never attained prevalence greater than 50% in any age group, but – in light of their low prevalence among adults – this can be attributed to floor effects. Additionally, nominal demonstratives displayed much higher prevalence among one-year-olds than locative demonstratives, and locative terms never appeared in the absence of nominal ones.
Token analysis 1: zeroes included
Children in older age groups produced more total word types and tokens than children in younger groups (Table 5). This raises the possibility that the low type prevalence of demonstratives in younger age groups (Table 6) is an artefact of these children’s low number of total types, rather than reflecting a true difference in their demonstrative inventory.
To control for this possibility, I calculated the token frequency of each demonstrative type for each age group, including participants who never produced the relevant type. This proportional measure has the advantage of controlling for differences in sample size between older and younger groups, but the disadvantage that it is affected by both the type’s prevalence across the group and its token frequency for individual participants. Table 7 reports these token frequency figures.
I conducted a series of pairwise, two-sided Wilcoxon rank sum tests comparing the token frequency of each demonstrative type among the child age groups. In this analysis, I adjusted p-values using the Benjamini-Hochberg correction within demonstrative types (i.e., one type was treated as one family of analyses). There were no significant differences among child age groups in the token frequency of the Speaker-Proximal nominal demonstrative (all p > 0.2). The Speaker-Proximal locative demonstrative, however, displayed lower frequencies for one-year-olds than for two-year-olds (p = 0.047), three-year-olds (p = 0.0013) or four-year-olds (p = 5.5e-4) and lower frequencies for two-year-olds than three-year-olds (p = 0.037) or four-year-olds (p = 0.014). For the Speaker-Distal nominal and locative demonstratives, I observed no significant differences in token frequency between age groups (all p > 0.3).
Turning to the non-speaker-centered demonstratives, token frequencies of the Addressee-Centered terms varied between age groups. One-year-olds’ frequencies of the Addressee-Proximal nominal term (which were all zero) were significantly lower than three-year-olds’ (p = 0.0012) or four-year-olds’ (p = 0.0012). Two-year-olds also used this term less than three-year-olds (p = 0.033), but not less than four-year-olds (p = 0.052). One-year-olds’ frequencies of the Addressee-Centered locative term (which, again, were all zero) were also significantly lower than four-year-olds’ (p = 0.027). Finally, token frequencies of the Dyad-Proximal terms displayed significant age group differences as well. One-year-olds used the Dyad-Proximal locative term significantly less than four-year-olds (p = 0.027) but not less than three-year-olds (p = 0.055). No other age group differences attained significance for the Addressee-Centered or Dyad-Proximal terms (all p > 0.1).
Interim discussion
In order to control for effects of total type count (i.e., sample size) on the type prevalence data, I analyzed the token frequency of each demonstrative type across child age groups. The results suggest that age group differences in the type prevalence of the Speaker-Proximal nominal demonstrative, and of both Speaker-Distal demonstratives, were epiphenomenal on differences in total type count. This null hypothesis is not fully supported, however, for the remaining demonstrative types. In particular, one- and two-year-olds displayed both lower type prevalence and significantly lower token frequency for the Addressee-Centered demonstratives than three- and four-year-olds. The same pattern also held for the Speaker-Proximal locative demonstrative. Thus, age groups differed in their use of these terms even when differences in total type count were controlled.
Token analysis 2: nominal demonstratives, zeroes excluded
Because the previous token analysis was used for evidence about type prevalence, it included many frequency values of zero from participants who never produced the relevant demonstrative type. This collapses type prevalence with token frequency and means that children’s token frequencies cannot meaningfully be compared with adults’.
As a result, in order to compare the children with adults, I conducted separate token frequency analyses which included only child participants who produced the relevant type at least once (i.e., only nonzero frequency values). Because children displayed order of magnitude differences in token frequency between nominal and locative demonstratives, I investigated the two syntactic categories separately.
Figure 3 visualizes the token data for nominal demonstratives, showing the frequency of each type per 100 words for each age group and excluding zeroes. Only TCDS is included for adults. Because the exclusion of zeroes is at the level of participant, data is reported by participant, collapsing session types and samples.
As Figure 3 shows, child participants across all age groups produced the Speaker-Proximal nominal demonstrative (this near me) with very high token frequency – so high that the noun class I form of the Speaker-Proximal nominal demonstrative was the single most frequent word in children’s production. To determine whether children produced this type more often than adults, I conducted a series of four pairwise, one-sided Wilcoxon rank-sum tests comparing the frequency of the Speaker-Proximal for each child age group vs. in caregivers’ TCDS. For these and all subsequent comparisons of adult vs. child production, I again applied the Benjamini-Hochberg correction within demonstrative types. Children in all age groups who produced the Speaker-Proximal type used it significantly more often than caregivers (one-year-olds: W = 394, p = 1.1e-4; two-year-olds: W = 351, p = 2.1e-5; three-year-olds: W = 317, p = 0.0024; four-year-olds: W = 336, p = 0.0010). Figure 3 additionally suggests that children who produced the Speaker-Distal nominal demonstrative type used it more often than caregivers. In a series of one-sided Wilcoxon tests, this difference was significant for one-year-olds (W = 149, p = 0.018), but not for two-, three-, or four-year-olds (all p > 0.2).
In contrast to the speaker-centered demonstratives, I found no evidence that children who produced the Addressee- and Dyad-Proximal demonstratives used those types with different frequency than adults. As reviewed in the previous section, one-year-olds never produced Addressee-Centered terms. This said, a series of pairwise two-sided Wilcoxon tests showed that two-year-olds, three-year-olds, and four-year-olds who produced the Addressee-Centered nominal type at least once did not use it with significantly different frequency than adults (all p > 0.3). Similarly, relatively few children produced the Dyad-Proximal nominal type. But for those who did produce it, two-sided pairwise Wilcoxon tests provided no evidence of a difference in token frequency from adults in any age group (all p > 0.2).
Interim discussion
I evaluated the token frequency of each nominal demonstrative type among children vs. adults displaying that type. All age groups of children used the Speaker-Proximal nominal demonstrative type more frequently than adults. For each other nominal demonstrative type, token frequencies were – for children over 2;0 who displayed the type – not significantly different from adults. In other words, children attained adult-like frequencies for the Speaker-Distal nominal demonstrative between 2;0 and 3;0, and for the Addressee-Proximal and Dyad-Proximal nominal forms as soon as they appeared, between 3;0 and 4;0.
Token analysis 3: locative demonstratives, zeroes excluded
Children produced ~90% fewer locative demonstrative tokens than nominal demonstrative tokens. Figure 4 visualizes the token data for locative demonstratives in each age group. It was constructed following the same procedures as Figure 3.
Visual inspection of Figure 4 suggests that children displayed lower token frequencies of the Speaker-Proximal locative demonstrative (here near me) than adults. Pairwise one-sided Wilcoxon tests indicated that this difference was significant for all age groups (one-year-olds: W = 4, p = 0.0095; two-year-olds: W = 7, p = 1.4e-5; three-year-olds: W = 83, p = 0.0075; four-year-olds: W = 51, p = 1.1e-4). Recall from the caregiver results, though, that adults’ frequencies of the Speaker-Proximal locative demonstrative were ~230% higher in TCDS than in ADS. Because I hypothesized that the frequency of the Speaker-Proximal locative term in TCDS was an artefact of the study procedures, I also compared children’s token frequencies of the Speaker-Proximal locative to the term’s ADS frequencies. Pairwise two-sided Wilcoxon tests showed that children’s frequencies and ADS frequencies of the Speaker-Proximal locative form were statistically identical for one-year-olds, two-year-olds, and four-year-olds (all p > 0.7). Three-year-olds’ frequencies of the Speaker-Proximal locative were numerically greater than ADS frequencies (three-year-olds: mean frequency per 100 words = 3.23, SD = 1.67; ADS: mean frequency = 1.58, SD = 0.86), but this difference only approached significance (W = 11, p = 0.061).
Besides the Speaker-Proximal, the remaining locative demonstratives displayed no differences in token frequency between children’s production and adults’ TCDS. The Speaker-Distal locative demonstrative was produced by children in all age groups, and pairwise two-sided Wilcoxon tests did not indicate a significant difference from adults in token frequency for any age group (all p > 0.2). Similarly, for each of the 2;0 and older age groups, pairwise two-sided Wilcoxon tests did not indicate a significant difference from adults in token frequency of the Addressee-Centered locative demonstrative (all p > 0.7) or Dyad-Proximal locative demonstrative (all p > 0.9). The one-year-old group was not included in these comparisons, as no one-year-olds produced the Addressee-Proximal locative type and only one produced the Dyad-Proximal locative type.
Interim discussion
I evaluated the token frequency of each locative demonstrative type in each age group of children vs. in caregivers’ TCDS, including only the child participants who produced the relevant type. As the Speaker-Proximal locative demonstrative (here near me) was more than three times as frequent in TCDS as in ADS, I compared children’s production of this form to both ADS and TCDS. Children of all age groups produced the Speaker-Proximal locative form with lower frequencies than observed in TCDS, but equal frequencies to ADS. By contrast, for the Speaker-Distal, Addressee-Centered, and Dyad-Proximal locative types, there are no differences in ADS vs. TCDS frequency, and I observed no differences in token frequency between caregivers (in TCDS) and children aged 2;0 and above. Instead, children attained adult-like frequencies for these locative demonstrative types as soon as they appeared: between 1;0 and 3;0 for the Speaker-Distal locative, and between 3;0 and 4;0 for the Addressee-Centered and Dyad-Proximal locatives.
Relationships between child and caregiver speech
In order to assess whether differences in children’s demonstrative production arose from differences in the input, I analyzed correlations between the frequency of each demonstrative type in the speech of children vs. their caregivers. Because I hypothesized that children’s demonstrative production might be primed by caregivers’ production in the same session, this analysis did not collapse data across recording sessions. Children who were recorded with two caregivers simultaneously in a given session were compared to both caregivers present, yielding 102 total comparisons for the 89 samples.
Child and caregiver frequencies of the Speaker-Proximal nominal demonstrative showed a significant positive association (Spearman’s rho = 0.26, p = 0.0085). For each other nominal and locative demonstrative type, however, the association between child and caregiver frequencies was not significant (all p > 0.5, all |rho| < 0.18). To control for potential zero-inflation of the child token frequencies, this analysis was repeated, for each demonstrative type, excluding sessions where the child never produced that demonstrative type. This exclusion did not change the results. Child and caregiver frequencies of the Speaker-Proximal nominal demonstrative still showed a significant positive association (Spearman’s rho = 0.30, p = 0.0052, 86 comparisons), and there was still no significant association between child and caregiver frequencies for any other type (all p > 0.1, all |rho| < 0.33, 15 to 50 comparisons per type).
Additionally, to evaluate whether children’s differences in demonstrative production were related to differences in speech directed to younger vs. older children, I analyzed correlations between children’s age and the frequency of each demonstrative in caregivers’ TCDS. This revealed a significant positive association between child age and caregiver frequency of the Dyad-Proximal nominal demonstrative (Spearman’s rho = 0.22, p = 0.029) as well as the Dyad-Proximal locative demonstrative (Spearman’s rho = 0.28, p = 0.0045). For each other nominal and locative demonstrative type, the association between child age and caregiver token frequency was not significant (all p > 0.05, all |rho| < 0.19).
Interim discussion
Caregivers’ production of demonstratives in TCDS generally did not correlate with either the target child’s age or their production of demonstratives. This indicates that differences in demonstrative use between child age groups, as well as between individual children, likely do not arise from differences in the lexical composition of TCDS.
General discussion
This paper describes a cross-sectional, observational study of Ticuna-acquiring children’s production of demonstratives. Forty-five children aged 1;0 to 4;11 were recorded interacting with their caregivers, once playing with a standardized set of objects (30 minutes) and once playing freely (60 minutes). I examined 20 minutes of recording time per child, split between sessions, and analyzed children and caregivers’ production of demonstratives in type and token terms.
Emergence
Based on the hypothesis that demonstrative acquisition is constrained by cognitive biases toward egocentric, proximal, and semantically less complex items, I predicted that the Ticuna demonstratives would emerge in the order Speaker-Proximal > Speaker-Distal > {Addressee-Centered, Dyad-Proximal}. The results are largely consistent with this prediction. In the Type Analysis, the Speaker-Proximal was the first demonstrative to attain >50% prevalence, followed by the Speaker-Distal, and finally the Addressee-Centered term. I additionally predicted that all four demonstratives would emerge by 3;0. In the Type Analysis, this prediction was supported only for the Speaker-Proximal and Speaker-Distal, which both displayed >50% type prevalence by the two-year-old age group. It was not supported for the Addressee-Proximal, which reached this prevalence level only in the three- and four-year-old groups. Analyses were unable to establish the ordering or age of emergence of the Dyad-Proximal, as children and caregivers’ low frequencies of this item gave rise to floor effects.
These conclusions remained largely unchanged when I considered, in Token Analysis 1, whether age group differences in type prevalence could arise from differences in total type count. This analysis suggested that the type prevalence analysis may underestimate the prevalence of the Speaker-Distal nominal and locative terms in the one-year-old age group, supporting a {Speaker-Proximal, Speaker-Distal} > {Addressee-Proximal} order of emergence. It did not affect the other results.
These findings show that the Addressee-Centered demonstratives emerge late both in comparison to the other Ticuna demonstratives, and in comparison to the age range – roughly 1;0 to 3;0 – when demonstratives typically emerge in languages with smaller systems (Clark, Reference Clark, Bruner and Garton1978; Diessel & Coventry, Reference Diessel and Coventry2020; González-Peña et al., Reference González-Peña, Doherty and Guijarro-Fuentes2020). Importantly, this lateness cannot be explained exclusively by CDS frequency. While the Dyad-Proximal terms’ low CDS frequencies do represent a plausible explanation for their lateness, the Addressee-Centered demonstratives are statistically identical in CDS frequency to the much earlier-emerging Speaker-Distals. The CDS frequency of the Addressee-Centered term is also not correlated with children’s age – caregivers treat even one-year-olds as sufficiently legitimate addressees to use this demonstrative. This result eliminates an account of the Addressee-Centered terms’ late emergence based on input frequency, and supports an account based on cognitive biases. It does not, however, indicate which biases are responsible for this result. Since the Ticuna Addressee-Centered terms combine an addressee origo with other forms of semantic complexity, such as polysemy (e.g., use as an anaphor) and sensitivity to ownership, children’s production of the Addressee-Centered term could be inhibited either by a bias toward egocentric spatial cognition or by biases toward semantically simple and/or monosemous terms.
To adjudicate between these explanations for the late emergence of the Addressee-Centered type, more research on the acquisition of other languages with addressee-proximal demonstratives, such as Finnish (Nahkola, Reile, Taremaa & Pajusalu, Reference Nahkola, Reile, Taremaa, Pajusalu, Ashild Næss and Treis2020, p. 250), is necessary. If the late emergence of the Ticuna Addressee-Centered demonstrative reflects an innate bias toward egocentric spatial cognition, it is expected that addressee-centered demonstratives will emerge late in other languages as well. On the other hand, if the lateness of the Addressee-Centered term is due to language-specific properties of this demonstrative, such as its anaphoric use, then addressee-centered demonstratives in other languages may emerge early.
Maturity
I hypothesized that children would not display adult-like use of any demonstrative type before this study’s age limit of 5;0. Due to the size of the data set, I did not attempt to assess whether any individual demonstrative token produced by a child was adult-like in terms of semantics and pragmatics (cf. Espinosa Ochoa, Reference Espinosa Ochoa2021; Küntay & Özyürek, Reference Küntay and Özyürek2006). Instead, I characterize participants’ production of a demonstrative type as ‘adult-like’ if their token frequency of that type is not significantly different from adult frequencies. This coarse-grained measure of maturity represents a limitation of the study, and collaborators and I plan to include token-level analyses in future work with the data.
Within the frequency-based analytic framework, my hypotheses about adult-like use were supported only for Speaker-Proximal demonstratives. Compared to adult TCDS, all age groups from 1;0 to 4;11 overused the Speaker-Proximal nominal term (this near me). One-year-olds who produced the Speaker-Distal nominal demonstrative (that far from me) also overused it compared to adults. For the Speaker-Proximal locative form (here near me), which is the only demonstrative to display large differences in frequency between TCDS and ADS, I compared children’s production to both registers of adult production. Children produced this demonstrative with lower frequency than TCDS, but equal frequency to ADS. For all other demonstrative terms, children across all age groups – provided that they produced a given demonstrative type at least once – did not display significantly different token frequencies than adults (in TCDS).
Thus, the results provide little evidence for any lag time between emergence and adult-like use for Speaker-Distal, Addressee-Centered, or Dyad-Proximal nominal demonstratives, or for any locative demonstrative. This is consistent with experimental production studies in English by de Villiers & de Villiers (Reference de Villiers and de Villiers1974) and Charney (Reference Charney1979), who argue that children may display adult-like production of speaker-proximal and speaker-distal demonstratives as early as three years. It is inconsistent with the greater number of studies which argue for non-adult-like production through six to seven years (Clark & Sengul, Reference Clark and Sengul1978; González-Peña, Reference González-Peña2020; Tanz, Reference Tanz1980; Webb & Abrahamson, Reference Webb and Abrahamson1976). One possible explanation for this difference is the observational method of this study, since some authors argue that children’s non-adult-like production of demonstratives in experimental work is an artefact of inadequate warm-up or unnatural communicative situations (Charney, Reference Charney1979; González-Peña, Reference González-Peña2020, p. 28).
The finding that children display adult-like use of the Addressee-Centered and Dyad-Proximal forms is more difficult to compare with other studies, as other languages represented in the deixis literature generally do not have addressee- or dyad-centered terms. With this proviso, these results depart from Küntay and Özyürek’s (Reference Küntay and Özyürek2006) finding that Turkish-acquiring four-year-olds produce the demonstrative ʂu, which is sensitive to the addressee’s visual attention (a form of addressee-centering), less often than adults. I suggest that this difference arises because the Ticuna Addressee-Centered form has primarily spatial deictic content, which children may learn earlier than the attentional deictic content of ʂu (Küntay & Özyürek, Reference Küntay and Özyürek2006, p. 317). Another possible explanation is that my frequency-based analyses miss non-adult-like use at the token level; as acknowledged above, this potentially affects the findings for all demonstratives.
Turning to findings that support non-adult-like use, Ticuna-learning children’s overuse of the Speaker-Proximal nominal demonstrative (this near me) is consistent with results indicating that children overuse proximals in English (Tanz, Reference Tanz1980), Spanish (González-Peña et al., Reference González-Peña, Doherty and Guijarro-Fuentes2020; Rodrigo et al., 2004), and Yucatec Maya (Espinosa Ochoa, Reference Espinosa Ochoa2021). These authors attribute children’s elevated use of speaker-proximals to egocentrism and/or promixity bias. While egocentrism and proximity biases potentially explain Ticuna-learning children’s overuse of the Speaker-Proximal nominal demonstrative, these ideas predict that they will also overuse the Speaker-Proximal locative form (here near me), since it too is proximal and egocentric. In fact, though, Ticuna-acquiring children do not overuse the Speaker-Proximal locative at any age from 1;0 to 4;11.
This asymmetry between Speaker-Proximal nominal and locative terms suggests that Ticuna-acquiring children’s overuse of the nominal form is motivated, at least in part, by factors other than egocentrism and proximity bias. One possible motivation is that children’s overuse of the Speaker-Proximal reflects a high frequency of ritualized uses associated with particular actions, such as giving or showing a referent to the addressee (cf. González-Peña et al., Reference González-Peña, Doherty and Guijarro-Fuentes2020, p. 11; Harris et al., Reference Harris, Barrett, Jones and Brookes1988). Testing this hypothesis requires an analysis, which collaborators and I are currently preparing, of the relationship between children’s demonstrative production and their nonverbal behavior. Another possible interpretation, suggested by Diessel and Coventry (Reference Diessel and Coventry2020, p. 5), is that young children overuse nominal demonstratives because they know relatively few nouns. Under this hypothesis, children in this study overused the Speaker-Proximal nominal form because they produced it (pronominally) in contexts where adults would use lexical nouns. To test this claim, I suggest that future research analyze the relationship between children’s token frequency of demonstratives and the number of total noun types they display.
Nominal advantage
In Ticuna, presentative constructions use nominal demonstratives (this/that) rather than locative demonstratives (here/there). In light of the cross-linguistic earliness of presentatives, I hypothesized that Ticuna-acquiring children would produce nominal demonstrative terms before locative ones. This prediction was supported: I found that Ticuna-acquiring one-year-olds never produced locative demonstrative types unless they also produced nominal types.
These results are inconsistent with González-Peña et al.’s (Reference González-Peña, Doherty and Guijarro-Fuentes2020, p. 11) claim that locative demonstratives emerge first in English and Spanish because place reference is less complex than object reference. This semantically oriented hypothesis predicts that locative demonstratives will appear before nominal ones across languages. These findings show that González-Peña’s prediction is incorrect for Ticuna. It is also not supported in Yucatec Maya (Espinosa Ochoa, Reference Espinosa Ochoa2021), where presentative demonstratives sometimes emerge before either nominal or locative types. In contrast to González-Peña, I therefore suggest that the earlier emergence of locative demonstratives in English and Spanish reflects not the locatives’ semantics, but rather their language-specific role in presentative constructions. This interpretation crucially rests on the claim that presentatives emerge earlier than other uses of demonstratives. As this claim has been supported only in small studies of English (Harris et al., Reference Harris, Barrett, Jones and Brookes1988; Barrett et al., Reference Barrett, Harris and Chasin1991), future research should test it more systematically and in a larger sample of languages.
Conclusion
This study investigated the acquisition of the four-term demonstrative system of Ticuna by children aged 1;0 to 4;11. Ticuna contrasts Speaker-Proximal, Speaker-Distal, Addressee-Centered, and Dyad-Proximal demonstratives in nominal (this/that) and locative (here/there) syntactic categories. In line with findings for languages with smaller demonstrative systems, Ticuna-learning children produced the Speaker-Proximal and Speaker-Distal demonstratives between 1;0 and 3;0 – but departing from the cross-linguistic pattern, most children did not produce the Addressee-Centered demonstrative terms until between 3;0 and 4;0. This result supports the hypothesis that children’s acquisition of demonstratives is constrained by cognitive biases which facilitate the learning of egocentric, proximal, and semantically less complex terms. Diverging from other studies’ findings of persistent immaturity in children’s demonstrative production, I also found that Ticuna-learning children aged 2;0 and above displayed similar token frequency to adults for those demonstrative types which they produced at least once. The sole exception was the Speaker-Proximal nominal demonstrative, which children in all age groups overused. These findings illustrate that multi-term demonstrative systems are structured and learned in substantially different ways than two- and three-term systems, underlining the importance of studying acquisition across a typologically diverse sample of languages.
Acknowledgements
The author thanks the Ticuna-speaking families and children who allowed her to record them in their homes. Her special thanks go to Angel Bitancourt Serra for transcription of the adult conversational corpus and portions of the child recordings. Additional thanks for comments go to Marisa Casillas and audiences at MPI Nijmegen, the University of Texas at Austin, and Cornell University. Portions of this paper were written while the author was a guest researcher at the Language Development Department at MPI Nijmegen and the Baby & Child Research Centre at Radboud University; the author thanks these institutions for their hospitality and logistical support. The author has no conflicts of interest to disclose. This research was supported by the National Science Foundation under Grants BCS-1741751 and SMA-1911762. Previous versions of the study were presented at the 2020 Boston University Conference on Language Development and the 2021 Linguistic Society of America meeting.