Introduction
Word learning is commonly construed as the process of detecting a word form, hypothesizing about candidate meanings, and mapping the form to the intended meaning (Clark, Reference Clark1993, p. 43). While this might sound straightforward, it represents a challenging problem because each word is in theory compatible with a variety of meanings (Quine, Reference Quine1960). Imagine someone pointing to a fish tank and saying mahi in a foreign language. What could mahi mean? Maybe “look”, “pretty”, “fish”, “swim”, or one of many other possible meanings. However, research suggests that children solve the mapping problem by relying on a variety of conceptual preferences, cues, and learning mechanisms. For example, studies of early word learning have shown that children favor whole objects as referents over object parts, taxonomic relations over thematic ones, and one-to-one mappings over one-to-many mappings (Clark, Reference Clark and MacWhinney1987, Reference Clark1993; Markman, Reference Markman1990; Markman & Hutchinson, Reference Markman and Hutchinson1984; Markman & Wachtel, Reference Markman and Wachtel1988). In addition, social cues like pointing and eye gaze help direct learners’ attention to the relevant referents in context (Baldwin, Reference Baldwin1993; Tomasello, Reference Tomasello2003), and morphosyntactic cues that distinguish nouns, adjectives, and verbs help learners restrict their hypotheses to the domain of objects, properties, and actions respectively (Brown, Reference Brown1957; Gleitman, Reference Gleitman1990; Mintz, Reference Mintz2003). Finally, the mapping mechanism can be part of the solution too. While each instance of hearing a word in isolation could be compatible with a range of different meanings, any mapping mechanism that aggregates candidate meanings across multiple contexts will reduce this indeterminacy substantially (Siskind, Reference Siskind1996; Smith, Smith & Blythe, Reference Smith, Smith and Blythe2011; Yu & Smith, Reference Yu and Smith2007). So if mahi is uttered in the context of a fish tank, of drawing a fish, and of eating fish, learners can become more confident about its possible meaning. The set of preferences, cues, and mechanisms that result in the successful acquisition of a word like mahi constitute a word learning strategy.
Since the lexicon consists of diverse elements, children may need different strategies for assigning meanings to different word classes. In short, the combination of preferences, cues, and mapping mechanism that works for one class might not work so well for another. Consider a basic and broad distinction in the lexicon: that of content versus function words. Content words consist of nouns, verbs, adjectives, and some adverbs. They often refer to everyday aspects of experience – objects, properties, and actions – and encode an extensive range of meanings. But function words like or, not, can, and the have small and often subtle meanings that link content words within an utterance. Their meanings are best understood in terms of the combinatorial role they play in building the overall interpretation of the utterance. While there has been considerable research on the learning of content words, there has been much less on the learning of function words. Many of the preferences, cues, and mechanisms identified so far apply more directly to content words; and social cues (such as pointing and eye gaze) that play a role in mapping words to concrete referents appear less helpful when it comes to words like or and not. Similarly, whole-object and taxonomic constraints do not extend to function words in any straightforward manner. In order to arrive at a more general solution of the mapping problem, we therefore need to look at preferences, cues, and mechanisms for function words as well.
Quine (Reference Quine1960, p. 12) proposed three form-to-meaning mapping strategies for different words and word classes. Following Quine, we call them “isolated” mapping, “context-dependent” mapping, and “description” mapping. Isolated mapping involves hearing a word (a linguistic form) and mapping it to a possible meaning in isolation from any linguistic context. For instance, hearing mahi (as an utterance or part of an utterance) and mapping it to the concept “fish”. Concrete nouns are prototypical examples of isolated mapping. Context-dependent mapping is learning a word “contextually, or by abstraction, as a fragment of sentences learned as wholes”. Note that context here is the linguistic context. Quine suggested that all words are to some degree learned in a context-dependent way, but, he noted “prepositions, conjunctions, and many other words, are bound to have been learned only contextually; we get on to using them by analogy with the ways in which they have been seen to turn up in past sentences”. Finally, “description mapping” refers to cases where the word is defined explicitly using other words, as in a dictionary entry. Quine gives “molecule” as an example of a word whose meaning is given via a description or definition. In Quine’s account, word learning starts with isolated mapping and slowly increases its dependence on context-dependent mappings until finally many words may be learned via linguistic descriptions or definitions (see Gleitman, Cassidy, Nappa, Papafragou & Trueswell, Reference Gleitman, Cassidy, Nappa, Papafragou and Trueswell2005 for a similar view emphasizing the role of syntax in word learning). Function words, therefore, are assumed to be learned using the context-dependent strategy.
This paper focuses on the acquisition of linguistic disjunction, and proposes a context-dependent strategy for learning the word or in English. Disjunction is a fundamental logical concept that has played a major role in theories of formal semantics and pragmatics. Uses of disjunctive terms like or often give rise to complex implications such as inclusivity, exclusivity, ignorance, and free-choice shown with examples in Table 1 (see Aloni, Reference Aloni and Zalta2016 for an overview). The diverse set of inferences generated by the term or offers important insights into human semantic and pragmatic knowledge. Disjunction has also presented theories of language acquisition with a learning puzzle. While experimental studies have shown that preschool children understand the inclusive meaning of disjunction (Crain, Reference Crain2012; Jasbi & Frank, Reference Jasbi and Frank2021 among others), research on child-directed speech has shown that most of the uses children hear are exclusive (Morris, Reference Morris2008). How do children learn the inclusive meaning of or if they are rarely exposed to it? We argue that this puzzle arises because of an assumption that the word or is mapped to its meaning using an “isolated” mapping strategy. We show that a context-dependent strategy provides a straightforward solution to the puzzle of learning disjunction. It also provides a general solution for learning words that are polysemous or can give rise to multiple context-dependent interpretations.
Previous Studies
Morris (Reference Morris2008) investigated the spontaneous productions of and and or in the speech of parents and their children between the ages of 2;0 and 5;0. He took 240 transcripts from the CHILDES database and analyzed each connective with respect to its frequency, sentence-type, and meaning (or use). Overall, he found that and was 12.8 times more likely to be produced than or. And appeared mainly in statements (90% of the time) while or was most common in questions (85% of the time). Children started to produce and at 2;0 and or at 2;6 years of age.
In analyzing the meaning of these connectives, Morris (Reference Morris2008) adopted a usage-based (item-based) approach (Levy & Nelson, Reference Levy and Nelson1994; Tomasello, Reference Tomasello2003): he predicted that children would first produce connectives with a single “core meaning” (also referred to as “use” or “communicative function”). These core meanings, Morris suggested, would be mapped to the most frequent interpretations of these terms in child-directed speech. Less frequent interpretations would be acquired as children got older, but he did not discuss exactly how children would learn these interpretations. He found that children started producing and as a conjunction at 2;00, and or as exclusive disjunction at 2;6. In line with a usage-based account, these are the most frequent uses in parents’ speech. For disjunction, 75-80% of the or uses children heard had an exclusive interpretation. But as children got older, they started to use these connectives to convey additional meanings: inclusive disjunction for or and temporal conjunction for and. Temporal conjunction referred to cases that implied order of events – for example, “Adam fell down and broke his arm”. In adult speech, use of inclusive or was very rare though, and children rarely produced it, even at age 5. Morris (Reference Morris2008) argued that the development of connectives conforms to the predictions of a usage-based account and that, in the first five years of children’s development, the core (initial) meaning of or is exclusive disjunction.
However, a number of experimental studies have shown that preschool children (3;0-6;0) are likely to interpret or as inclusive in certain linguistic contexts such as negative sentences (Crain, Gualmini & Meroni, Reference Crain, Gualmini and Meroni2000), conditional sentences (Gualmini, Crain & Meroni, Reference Gualmini, Crain and Meroni2000), restriction and nuclear scope of the universal quantifier every (Chierchia, Crain, Guasti, Gualmini & Meroni, Reference Chierchia, Crain, Guasti, Gualmini and Meroni2001; Chierchia, Guasti, et al., Reference Chierchia, Guasti, Gualmini, Meroni, Crain, Foppolo, Noveck and Sperber2004), nuclear scope of the negative quantifier none (Gualmini & Crain, Reference Gualmini and Crain2002), restriction and nuclear scope of not every (Notley, Thornton & Crain, Reference Notley, Thornton and Crain2012), and prepositional phrases headed by before (Notley, Zhou, Jensen & Crain, Reference Notley, Zhou, Jensen and Crain2012). These studies adopt a Gricean approach to meaning (Grice, Reference Grice1989), and consider the semantics of or to be inclusive. Exclusive interpretations are attributed to factors external to or itself, such as “exclusivity implicatures” – namely, pragmatic (scalar) inferences based on the addressee’s reasoning about the speaker’s choice of or over and (Horn, Reference Horn1972; Gazdar, Reference Gazdar1979; Levinson Reference Levinson2000; Chierchia, Fox & Spector, Reference Chierchia, Fox, Spector, Portner, Maienborn and von Heusinger2012). These studies have argued that (at least in declarative sentences) the inclusive interpretation of or emerges earlier than the exclusive interpretation. This is in line with the literature on the acquisition of scalar implicatures in experimental pragmatics, which argues that the semantics of words like some and or develops earlier than their pragmatics (Noveck, Reference Noveck2001; Pouscoulous & Noveck, Reference Pouscoulous, Noveck and Foster-Cohen2009; Crain, Reference Crain2012].
The results from previous corpus-based and experimental studies give rise to a puzzle: how do children learn to interpret or as inclusive, when they mostly hear it being used as exclusive? One way to solve this puzzle is “logical nativism” (Crain, Reference Crain2012; Crain & Khlentzos, Reference Crain and Khlentzos2008, Reference Crain and Khlentzos2010). This view proposes that the language faculty constrains the connective meanings entertained by the learner to those used in classical logic: negation, conjunction, and inclusive disjunction. Crain (Reference Crain2012) considered it unlikely that children learn the meaning of or directly from the uses they hear from adults. Rather, he argued, children rely on the innate knowledge that the meaning of disjunctive words in natural languages must be inclusive. That is, upon hearing a connective word, children consider inclusive, but not exclusive, disjunction as a possible meaning. In this account, the exclusive interpretation of or emerges as part of children’s pragmatic development, after they have mastered the inclusive meaning of disjunction.
While logical nativism can address the puzzle of learning disjunction, it does not provide an explanation for cases where children interpret disjunction as exclusive. Morris (Reference Morris2008) reported that the vast majority of children used or in its exclusive sense. But this is inconsistent with preschool children considering disjunction to be inclusive. Moreover, other experimental studies, especially those testing disjunction in imperatives, have found that preschool children can interpret or as exclusive (Braine & Rumain, Reference Braine and Rumain1981; Johansson & Sjolin, Reference Johansson and Sjolin1975). For example, in response to a command such as “give me the doll or the dog”, three and four-year-olds give one of the objects, but not both.
Current Study
In this study, we offer an alternative solution to the puzzle of learning disjunction. The main claim of this paper is that child-directed speech contains cues that allow children relying on a context-dependent mapping strategy to successfully interpret a disjunction as either exclusive or inclusive. We support this claim with three studies. Study 1 presents the distribution of disjunction and conjunction in parents’ and children’s speech and addresses the following questions: (a) how often do children hear and produce or? (b) when do children start to produce or? In a large corpus of parent-child interactions, we found that children heard 1-2 examples of or per 1000 words. They started producing or themselves between 18 and 30 months, and by 42 months reached the rate of one or per 1000 words. Studies 2 and 3 provide support both for the presence of cues to the relevant interpretation and for their usefulness in learning. In Study 2, we asked what interpretations or had in child-directed speech. We annotated examples of or uses, and found that its most frequent interpretation was exclusive, as Morris (Reference Morris2008) had found. We also found that exclusive interpretations were often accompanied by two cues: rise-fall prosody, and logically inconsistent propositions connected by or. When these cues were absent, or was generally non-exclusive. In Study 3, we asked if it was possible to learn the relevant interpretations of a disjunction from these cues. We used the annotation data from Study 2 and a supervised learning task that quantified cue relationship and reliability, to show that a decision-tree classifier could use prosody and consistency of disjuncts to predict interpretation (exclusive vs. non-exclusive disjunction) with high accuracy.
Based on our results, we propose a new account we call cue-based context-dependent mapping of disjunction. This is inspired by prior usage-based and nativist accounts as well as Quine’s approach to word learning. Like the nativist account, our account assumes that the semantic hypothesis space includes binary logical relations. But we do not constrain the hypothesis space further and do not bias the learning towards any particular binary meaning. Instead, we show that the cues available in the linguistic input do that for us. Like the usage-based proposals, we rely on information in adult input to distinguish between exclusive and inclusive uses of disjunction. And following Quine’s suggestions for mapping the meanings of function words, we rely on a mechanism that takes into account the linguistic contexts of or. Instead of assuming that the acquisition of or depends directly on the most frequent interpretation in the input, we assume that a context-dependent mapping mechanism partitions the adult input using various cues to distinguish different contexts of use. We take up this account in the broader context of current word learning theories in the General Discussion.
Study 1: Production Analysis
In this study, we examined the frequencies of or and and in a large corpus of parent-child conversational interactions consisting of 14,159,609 tokens, taken from the CHILDES archives. This is a considerably larger corpus than in previous studies, which allowed us to measure developmental changes in more detail.
Methods
In selecting samples of parents’ and children’s speech, we used the online database childes-db.stanford.edu and its associated R programming package childesr (Sanchez et al., Reference Sanchez, Meylan, Braginsky, MacDonald, Yurovsky and Frank2018). Childes-db is an online interface to the child language components of TalkBank – namely, CHILDES (MacWhinney, Reference MacWhinney2000) and PhonBank. We chose two collections of corpora: English-North America and English-UK. All word tokens were tagged for the following information: 1. The speaker role (mother, father, child), 2. the age of the child when the word was produced, 3. the type of utterance the word appeared in (declarative, question, imperative, other), and 4. whether the word was and, or, or neither.
Exclusion Criteria. The collection contained an initial 16,179,076 tokens. First, we excluded tokens coded as unintelligible (N = 290,119). Second, we excluded tokens where information about child age was missing (N = 1,042,478). Third, we excluded tokens outside the age range of 1 to 6 years old (N = 686,870). After these exclusions, the collection contained 14,159,609 tokens from 504 children and their parents.
Procedure
Each token was coded for the utterance type it appeared in. We grouped utterances into four main categories: declarative, question, imperative, and other. This utterance characterization followed the convention used in the TalkBank manual. The utterance types are similar to sentence types (declarative, interrogative, imperative) with one exception: the “question” category consists of interrogatives as well as rising declaratives (i.e. declaratives with rising question intonation). In the transcripts, declaratives are marked with a period, questions with a question mark, and imperatives with an exclamation mark. The manual also provides terminators for special-type utterances. Among these in the category of questions were: trailing off of a question, question with exclamation, interruption of a question, and self-interrupted question. The category of imperatives also included emphatic imperatives. The rest of the special type utterances such as “interruptions” and “trailing off” were included in the category “other”.
Results
Overall, and was about 10 times more likely to occur in parents’ speech than or. That is, and occurred 15 times per 1000 words and or only 1.5 times per 1000 words. Children produced and at the same rate as their parents, but produced or less often, at only 0.5 per 1000 words (Figure 1, Left).
Parental production trends over child age for and varied between 10 and 20 uses per 1000 words (Figure 1, Right). Children started to produce and between 12 and 18 months, with a sharp increase in production until they reached the parent level between 30 to 36 months of age. Child production levels stayed close to their parental levels between 36 and 72 months, possibly even surpassing them at 60 months but the data from 60 months on are sparse.
Parental production of or was 1 to 2 per 1000 words. Children started to produce or between 18 to 30 months, with increasing uses until they approached 1 use per 1000 words at 48 months (4 years). At this point, their productions plateaued and stayed at this rate through 72 months (6 years). Children started producing or about six months later than and. While their uses of and reached parental levels by around 30 months, their uses of or rose more slowly and did not reach the parental level even at age 6.
What factors account for this difference? Previous research has focused on the role of frequency and conceptual complexity (Morris, Reference Morris2008). First, and is far more frequent than or. Goodman, Dale, and Li (Reference Goodman, Dale and Li2008) argued that words from the same syntactic category that are more frequent in child-directed speech are acquired earlier. The conjunction word and is at least 10 times more likely to occur than or – so earlier acquisition of and is consistent with the effect of frequency on age of acquisition. Second, research on concept attainment and Boolean concept learning suggests that the concept of conjunction is easier to acquire than disjunction (Feldman, Reference Feldman2000; Neisser & Weene, Reference Neisser and Weene1962; Piantadosi, Tenenbaum & Goodman, Reference Piantadosi, Tenenbaum and Goodman2016; Shepard, Hovland & Jenkins, Reference Shepard, Hovland and Jenkins1961). This suggests that children might grasp the concept underlying the meaning of and more easily and so produce it early, but need more time to develop the concept underlying the meaning of or.
Here we consider a third option: the difference in production between and and or may be partly due to different patterns in usage. Parent-child interactions are not symmetrical, so the speech acts most favored by parents do not match those favored by young children. This also results in asymmetries in the functional elements used by parents versus children. Child uses of or seem to be affected here. First, or was more likely to occur in questions than in declaratives (Figure 2, Left). But and, in contrast, was more likely to occur in declaratives (Figure 2, Right). Second, parents asked more questions from children than children did from parents. Questions had their own developmental trajectory, emerging in the second year of children’s lives and rising to a relatively constant rate of about 15% of children’s utterances in their fourth year. Parents, in comparison, produced questions in about 25% of their utterances (see also Cameron-Faulkner, Lieven & Tomasello, Reference Cameron-Faulkner, Lieven and Tomasello2003). Therefore, parent-child interaction offers more opportunities for parents to ask questions (and consequently produce or), than for children to do so.
Figure 3 shows the developmental trends for the relative frequencies of and and or in questions and declaratives. When uses of and in these two speech acts are compared, it is clear that the onset of and was slightly delayed in questions, but in both utterance types, children reached the parental level by around 30 months (2.5 years). There is a similar delay for or: children began producing it in declaratives at around 18 months but not until 24 months in questions. Their production of or increased in both declaratives and questions until it reached a constant rate in declaratives between 48 and 72 months. The relative frequency of or in questions continued to rise until 60 months. Comparing Figure 1 and Figure 3, children were closer to the adult rate of production in declaratives than questions.
To test these observations more formally, we used a multiple linear regression model with the relative frequency of or in each monthly time-bin as the dependent variable. The relative frequency was computed by pooling parents’ and children’s productions across corpora at a given month and dividing the frequency of or by the frequency of total words produced in that month by parents or children. Given that there is often very sparse data for each child and corpus, such cross-corpus averaging can help boost signal to noise ratio. Children’s age, speaker (child vs. parent), utterance type (declarative vs. question), and their interactions served as predictors. The intercept was set to children’s productions in declaratives.
Table 2 presents the coefficient estimates of the model and Figure 4 shows the model fit against the data. This model suggests a significant positive effect of children’s age on their production of declarative disjunction (Table 2, “age” row). As children grew older, they produced more instances of or. The model also estimated significantly higher intercepts for parents producing or in declaratives (Table 2, “parent” row) as well as questions (parent*question row), which suggests that parents produced more or on average than children at the beginning of children’s productions. Finally, the model reports a significant interaction of age and utterance type (“age*question” row), suggesting children increase their production of or as they grow older even more in questions, than declaratives. These results are consistent with the hypothesis that frequency and distribution of or is partly affected by the production of questions in parent-child interactions.
However, there may be considerable variation among children in disjunction production and the model described above does not take this variation into accountFootnote 1 . To account for such variation, we fit a separate linear mixed-effects regression model with the relative frequency of or in each monthly bin computed separately per child and corpora and used as the dependent variable. The model included random intercepts for corpora and children (nested within corpora), as well as random slopes for age and utterance type (declarative vs. question). Like the previous model, children’s age, speaker (child vs. parent), utterance type (declarative vs. question), and their interactions served as predictors and the intercept was set to children’s productions of or in declaratives.
This mixed-effects model also shows a significant interaction of age and utterance type in children’s production of or (age*question row), suggesting again that children produce more instances of or in questions as they get older compared to declaratives (b = 0.05, t = 2.72, p = 0.01). The model reports the effect of age on children’s production of or in declaratives is not significant (b = 0.02, t = 1.90, p = 0.06). It also estimated a positive intercept for children’s production of or in questions compared to declaratives (b = 2.08, t = 2.64, p = 0.01), suggesting that children may start by producing more instances of or in questions too. No other effects were significant in this model. Taking both models into account, there is evidence that children’s production of disjunction is affected by utterance-type and specifically production of questions in early parent-child interactions.
Conclusions from Study 1
In a large-scale quantitative analysis of parents and children’s productions of and and or, we found that children started producing and in the second year of life, and reached parental levels of production by 2;6. Their production of disjunction came about six months later: they started producing or between 1;6 and 2;6, arriving at a constant rate around 3;6, but this was at a rate below that of their parents. We suggested some factors that could explain this difference in production such as the frequency or complexity of the connectives. Since parents produced more questions than children, and or is more likely to occur in questions, it may be more frequent in parental speech partly because parents ask more questions than children.
Study 2: Data Annotation
In this study we focused on the interpretations of a subset of connective examples in child-directed speech from Study 1. Research in formal semantics has shown that the interpretation of disjunction depends on several factors, including prosody (Pruitt & Roelofsen, Reference Pruitt and Roelofsen2013), logical consistency of the disjuncts (Geurts, Reference Geurts2006), presence of modals (Kamp, Reference Kamp1973) or negation, and pragmatic reasoning (Grice, Reference Grice1989). We therefore annotated examples of disjunction for their interpretation, as well as potential cues such as the logical consistency of the disjuncts, the utterance type, the intonation contour, the syntactic category of the disjuncts, the communicative function of the utterance, and the presence or absence of negative or modal morphemes. Since it is difficult to independently verify and annotate for pragmatic reasoning, our study does not identify cases of exclusivity that are due to scalar implicatures (Grice, Reference Grice1989). However, instances that are not due to any of the factors we have annotated for could potentially be due to scalar implicatures, even though as we shall see, such cases were rare in our dataset.
Our main finding is that in our sample of child-directed speech, exclusive interpretations of or are accompanied by rise-fall prosody and logically inconsistent propositions. In the absence of these two properties, or is most likely “not exclusive”. Therefore, these cues could be informative for children with respect to the interpretation of disjunction, and so allow them to partition otherwise inconsistent input. In this section, we provide a descriptive analysis of our annotations without statistical models, leaving statistical modeling for Study 3.
Methods
This study used Providence corpus (Demuth, Culbertson & Alter, Reference Demuth, Culbertson and Alter2006) available from the PhonBank section of TalkBank. This corpus was chosen because of its relatively dense data on child-directed speech as well as the availability of audio and video recordings that would allow annotators access to the context of the utterance. These data were collected between 2002 and 2005 in Providence, Rhode Island. Table 3 reports the name, age range, and the number of recording sessions for each child in this study. All the children were monolingual English speakers, followed between the ages of 1 and 4 years, the age range when children develop early understanding of and and or. The corpus contains 364 hours of biweekly hour-long interactions between parents and children.
Procedure
We extracted all the utterances containing and and or using http://alpha.talkbank.org/clan/the CLAN software, with automatic tagging for the following: (1) the name of the child; (2) the transcript address; (3) the speaker of the utterance (father, mother, or child); (4) the child’s birth date, and (5) the recording date. Since the focus of this study was on disjunction, we annotated instances of or in child-directed speech from the earliest examples to the latest ones. Since the corpus contained more than 10 times the number of and than of or, we randomly sampled 1000 examples of and to match 1000 examples of or in the same age range. After checking for inter-rater reliability, we annotated and analyzed 608 examples of or and 627 examples of and in the allotted time for annotations.
Annotation Categories
Every extracted instance of and and or was manually annotated for eight properties: 1. connective interpretation, 2. logical consistency, 3. utterance type, 4. intonation type, 5. syntactic level, 6. communicative function, 7. answer type, and 8. negation and modals. Below we briefly explain how each annotation was defined. Further details and examples are given in the appendix.
Annotation Category 1. Connective Interpretation
This annotation category was the dependent variable in this study. Annotators listened to utterances such as “A or B” and “A and B”, and decided on the intended interpretation with respect to the truth of propositions A and B. We considered 16 possible binary connective meanings. Table 4 shows the most common connective meanings we found in child-directed speech with some examples. Annotators were asked to consider the two propositions (A and B) in the coordinated structure, ignoring the connective and functional elements such as negation. Consider: “Bob plays soccer or tennis” and “Bob doesn’t play soccer or tennis”. Both contain the same two propositions: A. Bob playing soccer, and B. Bob playing tennis, but the functional elements that combine the two propositions (namely or and doesn’t) result in different interpretations with respect to the truth of A and B. In “Bob plays soccer or tennis”, which contains a disjunction, the interpretation is that Bob plays one or possibly both sports (i.e. inclusive disjunction annotated as IOR). In “Bob doesn’t play soccer or tennis”, which contains a negation and a disjunction, the interpretation is that Bob plays neither sport (NOR).
In a different sentence like “Bob drank coffee or tea this morning”, the dominant interpretation is that he drank one or the other, but not both (i.e. exclusive disjunction annotated as XOR). However, sometimes disjunction is used to provide a conversational repair. Consider “Bob drank coffee, or I mean, tea this morning.” In such cases the speaker intends the second proposition as true and the first is false or not intended. We annotated such cases as NAB. A very common interpretation for both conjunction and disjunction is that both propositions are true (AND). Consider this example with or: “Bob plays sports like soccer or tennis”. Here the intended meaning is that Bob plays both sports. Notice that in this example changing the connective from or to and creates no change in the intended meaning: “Bob plays sports like soccer and tennis.” Another interpretation attested in our sample of child directed speech was one in which the speaker conveys that both propositions are not true but one or the other could be true, and possibly neither (NAND). For example, if someone says “I do not like peanut butter and jelly”, they may still like one without the other or possibly like neither. Finally, sometimes a connective can convey that one proposition is true if and only if the other is true. For example, a mother may say “come here and I’ll show you” which can be equivalent to: if and only if you come here, I’ll show you. We annotated such cases as IFF. For all annotations of connective interpretations, the annotators first reconstructed the coordinated propositions without the connectives or negation, and then decided which propositions were implied to be true/false.
Annotation Category 2. Logical Consistency
Propositions can have logical, temporal, or causal relations with each other. For logical consistency, annotators decided whether the propositions in each coordination could be true at the same time or not. If they could not, because that would result in a contradiction, they were marked as inconsistent (Table 5). The annotations used the following diagnostic here: two disjuncts were inconsistent if replacing the word or with and resulted in a contradiction. For example, changing “the ball is in my room or your room” to “the ball is in my room and your room” produces a contradiction because a ball cannot be in two rooms at once.
Two issues arise with respect to logical consistency. First, our diagnostic is quite strict. In many cases, propositions are not inconsistent so much as implausible. For example, drinking both tea and coffee at the same time is consistent, but not conventionally likely or plausible. Many exclusive interpretations may be based on such judgments of plausibility. Second, if the coordinands are inconsistent, this does not necessarily mean that the connective interpretation must be exclusive. For example, in “you could stay here or go out”, the alternatives “staying here” and “going out” are inconsistent, yet the overall interpretation of the connective could still be conjunctive: you could stay here AND you could go out. Both possibilities hold. This pattern of interaction between possibility modals like can and disjunctive terms like or are often discussed as “free-choice inferences” in the semantics and pragmatics literature (Kamp, Reference Kamp1973; Von Wright, Reference Von Wright1968). Another example is unconditionals such as “Ready or not, here I come!”. The coordinands are contradictions: one is the negation of the other. But the overall interpretation is that, in both cases, the speaker is going to come.
Annotation Category 3. Utterance Type
Annotators decided whether an utterance was an instance of a declarative, an interrogative, or an imperative. We occasionally found examples with different utterance types for each coordinand. A mother might say “put your backpack on and I’ll be right back”, where the first coordinand is an imperative and the second a declarative. These were coded for both utterance types with a dash in between: imperative-declarative. Table 6 in the appendix provides the detailed definitions and examples for each utterance type.
Annotation Category 4. Intonation Type
Annotators listened to the utterances and decided whether the intonation contour was flat, rise, or rise-fall. Table 7 in the appendix gives the definitions and examples for these intonation types. In order to judge the intonation of an utterance accurately, annotators were asked to construct all three intonation contours for the same utterance from transcriptions and see which one matched the actual intonation in the video recordings. For example, to judge “do you want orange juice $ \uparrow $ or apple juice $ \downarrow $ ?”, they reconstructed the sentence with the prototypical flat, rising, and rise-fall intonations and checked to see which was closer to the actual contour.
Annotation Category 5. Syntactic Level
Annotators marked whether the coordination was at the clausal level or sub-clausal level (Table 8). Clausal level was defined as sentences, clauses, verb phrases, and verbs. Coordination of other categories was coded as sub-clausal. This annotation category was introduced to check whether the syntactic category of the coordinands influenced the interpretation of a coordination. For example, “He drank tea or coffee” is less likely to be interpreted as exclusive than “He drank tea or he drank coffee.” The clausal vs. sub-clausal distinction was inspired by the fact that, in many languages, coordinators that connect sentences and verb phrases differ from those that connect nominal, adjectival, or prepositional phrases (Haspelmath, Reference Haspelmath and Shopen2007).
Annotation Category 6. Communicative Functions
We constructed a set of categories to capture particular usages or communicative functions of the words or and and. These included descriptions, directives, preferences, identifications, definitions-examples, clarifications, repairs (see Appendix, Table 9. These communicative functions were created using the first 100 examples, then used for the classifications of all the rest. Some are general and some specific to coordination. For example, directives are general while conditionals (e.g. Put that out of your mouth, or I’m gonna put it away) are more specific to coordinated constructions. Our list was not unstructured: some communicative functions are subtypes of others. For instance, “identifications” and “unconditionals” are subtypes of “descriptions” while “conditionals” are a subtype of directives. Furthermore, “repairs” seem parallel to other categories in that any type of speech can be repaired. Such details will matter for any general theory of acquisition where the speaker’s communicative intentions offer cues for the eventual acquisition of function words.
Annotation Category 7. Answer Type
Whenever a parent’s utterance was a polar question, annotators coded for the type of response it received from the children. This category was different from others because it was not a potential cue for learning disjunction. Instead, it offered an opportunity to assess (in a limited, conservative manner) children’s comprehension within the corpus data. Table 10 (Appendix) gives the answer types in this study, along with definitions and examples. Utterances that were not polar questions were simply coded as NA. If children responded to polar questions with “yes” or “no”, the category was YN, and if they repeated one of the coordinands, the category was AB. If children said yes/no and followed it with one of the coordinands, the answer type was determined as YN (yes/no). For example, if a child was asked “Do you want orange juice or apple juice?” and the child responded with “yes, apple juice”, our annotators coded the response as YN, because in almost all cases, if simple yes/no is felicitous, then it can also be followed (optionally) with one of the disjunct. But, if yes/no is not a felicitous response, then mentioning one of the disjuncts is the only appropriate answer. For example, if someone asks “Do you want to stay here or go out?” a response such as “yes, go out” is infelicitous; a better response is simply “go out”. We therefore counted responses with both yes/no and mention of a disjunct as a yes/no response. We did not annotate for non-verbal answers like head nods or head shakes. This is therefore a limited and conservative measure of children’s comprehension of disjunctive questions.
Annotation Category 8. Negation and Modals
Finally, we used a script to automatically mark utterances that contained sentential negation (not/n’t) or any modal element such as maybe, can, could, should, would, or need to. This allowed us to see whether the presence or absence of negation or modals affected the overall interpretation of the utterance.
Inter-annotator Reliability
To train annotators and assess their reliability, two annotators coded the same 240 instances of disjunction. Their reliability was calculated over eight iterations of 30 examples each. After each iteration, annotators met to discuss and resolve disagreements. They also decided whether to make category definitions or annotation criteria more precise. Training was completed after three consecutive iterations showed substantial agreement for all categories (Cohen’s $ \kappa >0.7 $ ) (for further details, see the Appendix).
Exclusion Criteria
We excluded one child (Ethan) from the Providence corpus, given his diagnosis of Asperger’s Syndrome at age 5. We also excluded all examples from conversations over the phone, in adult-adult exchanges, and in utterances heard from TV or radio. Such utterances were not counted as child-directed speech. We also excluded proper names and fixed forms like “Bread and Circus” (name of a local place) or “trick-or-treat” from the set of examples to be annotated. Such forms could be learned as chunks with no actual understanding of the connective meaning. We counted multiple instances of or and and with the same disjunction/conjunction as one instance. Our reasoning was that, in a coordinated structure, the additional occurrences of a connective typically did not alter the annotation categories nor the interpretation of the coordination. For example, there is little difference between “cat, dog, and elephant” versus “cat and dog and elephant” in interpretation. Our focus was on the “coordinated construction” as a unit rather than on every separate instance of and and or. Instances of multiple connectives in a coordination were rare.
Results
We start with “answer types”. This category provides some measure of children’s comprehension by showing when children provide appropriate answers to questions containing a disjunction. We then look at our dependent variable – namely, the “connective interpretations” – and then move on to the cues that potentially aid the acquisition of connective interpretations.
Answer Types
Figure 5 (Left) shows the monthly proportions of “yes/no” (Y/N) and alternative (AB) answers between the ages of 1 and 3 years. At first, children provided no answers, but by the age 3, they gave a yes/no (YN) or alternative (AB) answer to most polar questions. To assess how often their answers were appropriate, we defined as an appropriate answer the following: an alternative (AB) answer was appropriate for an alternative question (one with “or” and rise-fall intonation). A yes/no answer (YN) is appropriate for a yes/no (polar) question (one with or and a rising intonation). This strict classification misses some nuanced cases, but it provides a useful, if conservative, estimate of comprehension. Figure 5 (Right) shows the monthly proportion of children’s appropriate answers between the ages of 1 and 3. Children offered an increasing number of appropriate answers to questions containing or between 20 to 30 months of age. This suggests that they form initial mappings for the meaning of disjunction in this age range. We now turn to cues that could assist children in making successful mappings for disjunctive meanings.
Connective Interpretation
Regardless of the connective word used, the most common interpretation was conjunction (AND, 55%) followed by exclusive disjunction (XOR, 31%). Figure 6 shows the distribution of connective interpretations according to the connective term – and vs. or Footnote 2 ( $ {N}_{and} $ = 627 utterances, $ {N}_{or} $ = 608 utterances). Almost all instances of the connective and were interpreted as conjunction (AND). There were also a small number of NAND interpretations (e.g. “don’t swing that in the house and hit things with it”) and IFF interpretations (e.g. “come here and I’ll show you”) in the sample. For the connective or, the most frequent interpretation was exclusive disjunction (XOR, 62%) followed by inclusive disjunction (IOR, 18%) and conjunction (AND, 11%). There were also a small number of NOR (e.g. “you never say goodbye or thank you”) and NAB interpretations (e.g. “those screws, or rather, those nuts”). Overall, these results are consistent with the findings of Morris (Reference Morris2008) who concluded that exclusive disjunction is the most common interpretation of or in child-directed speech. Therefore, by simply associating the most common interpretations with the connective words, learners are expected to acquire and as conjunction, and or as exclusive disjunction (Grice, Reference Grice1989; Horn, Reference Ho1972; Gazdar, Reference Gazdar1979; Levinson, Reference Levinson2000; Crain, Reference Crain2012; Morris, Reference Morris2008). However, the learning outcome might be different if factors other than the connective word are also taken into account. In the next section, we look at how different annotation categories accompany the interpretations of or.
Cues to Disjunction Interpretation
We set and aside because it was nearly always interpreted as conjunction (AND). Figure 7 shows the proportions of connective interpretations in disjunctions with consistent (N = 364 utterances) vs. inconsistent disjuncts (N = 244 utterances). When the disjuncts were consistent (i.e. could be true at the same time), the interpretation could be exclusive (XOR), inclusive (IOR), or conjunctive (AND). When the disjuncts were inconsistent, a disjunction almost always received an exclusive (XOR) interpretation. This suggests that the exclusive interpretation of a disjunction often stems from the inconsistent or contradictory nature of the disjuncts themselvesFootnote 3 .
Next we set aside cases with inconsistent disjuncts and look at instances of disjunction with consistent disjuncts. Figure 8 shows their interpretations in declarative (N =158 utterances), interrogative (N =178 utterances), and imperative sentences (N =10 utterances). Interrogatives were selected for either exclusive or inclusive interpretations. Imperatives were more likely to be interpreted as inclusive (IOR), but declaratives could receive almost any interpretation: conjunctive (AND), exclusive (XOR), inclusive (IOR), or even that “neither” disjunct was true (NOR). A common example of inclusive imperatives was invitation to action such as “Have some food or drink!”. Such invitational imperatives seem to convey inclusivity (IOR) systematically, and often give the addressee full permission with respect to both alternatives. In fact, it can be odd to use them to imply exclusivity (e.g. “Have some food or drink, but not both!”), and they are not conjunctive either; they do not invite the addressee to do both actions (e.g. “Have some food, and have some drink!”).
While interrogatives select for both exclusive and inclusive interpretations, their intonation can distinguish between the two. Figure 9 shows the different intonation contours – flat (N = 186 utterances), rise (N = 77 utterances), rise-fall (N = 101 utterances) – for the three interpretations of consistent disjunction. The rise and rise-fall contours are typical of interrogatives, and disjunctions with rise-fall contours are typically exclusive (XOR). With rising intonation, disjunctions are typically inclusive (IOR), and disjunctions with flat intonation could be exclusive (XOR), conjunctive (AND), inclusive (IOR), or neither (NOR). These results are consistent with Pruitt and Roelofsen (Reference Pruitt and Roelofsen2013)’s experimental findings with adults on the role of intonation in the interpretation polar and alternative questions.
What about consistent disjunctions with flat intonation? Figure 10 presents the interpretations based on whether the utterance contained a negation or a modal (positive modal = 41, positive nonmodal = 109, negative modal = 7, negative nonmodal = 29). Disjunctions containing a modal like can or maybe were more likely to have a conjunctive interpretation. This is consistent with free-choice inferences (Kamp, Reference Kamp1973), where statements like “you can have tea or coffee” are interpreted conjunctively as “you can have tea and you can have coffee”. When the utterance contained a negation, the disjunction could be interpreted as exclusive (XOR) or as neither (NOR). These two interpretations correspond to the scope relations between negation and disjunction. If negation scopes above disjunction, we get a neither (NOR) interpretation (e.g. “I do not eat cauliflower, cabbage or baked beans.”). But if disjunction scopes above negation, the interpretation is likely to be exclusive (e.g. don’t throw it at the camera or you’re going in the house.) These results also suggest that learners who track the co-occurences of or with negative morphemes can learn about the scope interaction of disjunction and negative particles in their native language.
The connective interpretations of the remaining two categories, syntactic level and communicative intent, are shown in Figures 11 and 12. For these categories, we show connective interpretations over all instances of disjunction. Figure 11 shows connective interpretations by syntactic level (sub-clausal = 329 utterances, clausal = 279 utterances). Over all annotated instances, disjunctions were more likely to be interpreted as exclusive if their disjuncts were clauses or verbs rather than nominals, adjectives, or prepositions (all sub-clausal units). As we noted earlier, the intuition here is that utterances like “They had tea or coffee” are less likely to be exclusive than “they had tea or they had coffee.” But syntactic level can be correlated with other factors predicting connective interpretation. As we will see in Study 3, a computational learning model did not find syntactic level useful in classifying instances of disjunction, compared to other annotation categories.
Figure 12 shows the connective interpretations for the 10 different communicative functions annotated here (Number of utterances: clarification = 45, conditional = 32, definitions and examples = 17, description = 150, directive = 22, identification = 30, options = 77, preference = 199, repair = 34, unconditional = 2). With certain functions, the likelihood of some interpretations was higher. An exclusive interpretation (XOR) was common in acts of clarification, identification, stating/asking preferences, stating/asking about a description, or making conditional statements. These results are consistent with expectations about the communicative intentions these kinds of speech acts carry. In clarifications, the speaker needs to know which of two alternatives the other party intended. In identifications, the speaker needs to know which category a referent belongs to. In preferences, the parent seeks to know which alternative the child wants. Even though descriptions can be either inclusive or exclusive, in the current sample most descriptions were questions about the state of affairs and required the child to provide one of the alternatives as the answer. In conditionals such as “come here or you are grounded”, the point of the threat was that only one disjunct could be true: either “you come and you are not grounded” or “you don’t come and you are grounded”.
Repairs often received an exclusive (XOR) or a second-disjunct-true (NAB) interpretation. This is predictable given that, in making a repair, the speaker intends to say that the first disjunct is inaccurate or incorrect. Unconditionals and definitions/examples always had a conjunctive interpretation (AND). Again, this is predictable: the speaker intends to communicate that all options apply. If the mother says that “cats are animals like lions or tigers”, she intends to say that both lions and tigers are cats, and not one or the other. Interestingly, in some cases, or can even be replaced by and: “cats are animals like lions and tigers”. In unconditionals, the speaker communicates that, for both alternatives, a certain proposition holds. For example, if the mother says “ready or not, here I come!”, she communicates that “I come” is true both when the child is ready and when the child is not ready.
The category “options” contained examples of free-choice inferences such as “you could drink orange juice or apple juice”. These were often interpreted as conjunctive (AND) or as inclusive (IOR). We found that free-choice utterances were more common in child-directed speech than previously assumed. Finally, directives received either an IOR or XOR interpretation. Note that the most common communicative functions in our sample were preferences and descriptions. Other communicative functions such as unconditionals or options were fairly rare. But despite their rarity, such constructions must eventually be learned by children – since almost all adults know how to interpret them.
Conclusions from Study 2
This study focused on the interpretations that connectives and and or received in child-directed speech. It also investigated certain cues that appear to help children in their learning of these interpretations. We annotated examples of and and or in child-directed speech for their truth-conditional interpretations, along with six candidate cues to interpretation: logical consistency, utterance type, intonation, negative or modal morphemes, syntactic level of the coordinands, and the communicative function of the utterance. Like Morris (Reference Morris2008), we found that the most common interpretations of and and or are conjunction (AND) and exclusive disjunction (XOR) respectively. So if children relied only on the presence of connective word forms, they should assign and the meaning of conjunction and or the meaning of exclusive disjunction.
But we also found that the most likely interpretation of a disjunction depended on the cues that co-occurred with it in context. A disjunction was most likely exclusive if the alternatives were inconsistent (i.e. contradictory). If the alternatives were consistent, then the disjunction could be either inclusive or exclusive. In questions, if the intonation on the disjunction was rising it was inclusive; and if the intonation was rise-fall it was mostly likely exclusive. In declaratives and imperatives with flat intonation, disjunctions were most likely interpreted as AND if there was a modal present, and as NOR or XOR if there was a negation present in the utterance. Finally, in the absence of any of these cues, a disjunction was more likely to be non-exclusive (IOR + AND) than exclusive (XOR). Several cues therefore can carry informational value about the interpretation of disjunction, and learners can make use of these to arrive at the relevant interpretation in context. While this is a reasonable conjecture from the pattern of data in our annotation study, we have not yet presented any formal model or statistical analysis that can determine the relative utility of these cues in predicting connective interpretations. Given that we have several predictors that might be correlated and we want to select for a parsimonious set of explanatory predictors, we use decision tree learning (instead of linear regression) in Study 3 to implement and test our cue-based model of learning connective interpretations.
Study 3: The Computational Model
In this study, we use a computational learning model to formalize the context-dependent account of learning linguistic disjunction. Our computational model represents an ideal observer (Geisler, Reference Geisler2003) who has access to data labeled for the cues discussed in Study 2 as well as the interpretation of a disjunction. The task of the model is to learn to use the available cues to predict the interpretation of a new disjunction. Such a model provides two major contributions. First, it provides “proof of concept” for the context-dependent account presented in the paper, showing that it is possible to learn the interpretation of a disjunction using the cues in Study 2. Second, it can help us quantify and understand how useful each cue is to the learner, by systematically selecting and ordering cues that have higher informational value for the interpretation of disjunction.
A decision tree is a classification model structured as a hierarchical tree with an initial node, called the root, that branches into more nodes until it reaches the leaves (Breiman, Reference Breiman2017). Each node represents a test on a feature, each branch represents an outcome of the test, and each leaf represents a classification label. With a decision tree, observations can be classified or labeled based on a set of features.
Decision trees have at least four advantages for modeling cue-based accounts of semantic acquisition. First, the features used in decision trees for classification can stand for the cues that help in the acquisition and interpretation of a word or an utterance. Second, the degree to which a decision tree relies on available cues in the data can be varied, and so test cue-based models to varying degrees. Third, unlike many other machine learning techniques, decision trees result in models that are interpretable. Fourth, the order of decisions or features used for classification is based on information gain. Features that appear higher (earlier) in the tree are more informative and helpful for classification. Decision trees, therefore, can help us understand which cues are more helpful for semantic acquisition.
Decision tree learning is the construction of a decision tree from labeled training data. We applied decision tree learning to the annotated data of Study 2 by constructing random forests (Breiman, Reference Breiman2001; Ho, Reference Ho1995). In random forest classification, multiple decision trees are constructed on subsets of the data, and each tree predicts a classification. The ultimate outcome is a majority vote of each tree’s classification. Since decision trees tend to overfit data, random forests control for overfitting by building more trees and averaging their results (Breiman, Reference Breiman2001; Ho, Reference Ho1995).
Methods
The random forest models were constructed using python’s Sci-kit Learn package (Pedregosa et al., Reference Pedregosa, Varoquaux, Gramfort, Michel, Thirion and Grisel2011). The annotated data had a feature array and a connective interpretation label for each connective use. Connective interpretations included exclusive (XOR), inclusive (IOR), conjunctive (AND), neither (NOR), and NAB which states that only the second proposition is true. The features or cues used included the following annotation categories: intonation, consistency, utterance type, syntactic level, negation, and communicative function. All models were trained with stratified 10-Fold cross-validation to reduce overfitting. Stratified cross-validation maintains the distribution of the initial data in the random sampling to build cross-validated models. Maintaining the data distribution ensures a more realistic learning environment for the forests. Tree success was measured with F1-Score, harmonic average of precision and recall (Rijsbergen, Reference Rijsbergen1979).
We first ran a grid search on the hyperparameter space to establish the number of trees in each forest and the maximum tree depth allowable. The grid search creates a grid of all combinations of forest size and tree depth and then trains each forest from this grid on the data. The forests with the best F1-score and lowest size/depth are reported (Pedregosa et al., Reference Pedregosa, Varoquaux, Gramfort, Michel, Thirion and Grisel2011). The default number of trees for the forests was set to 20, with a max depth of eight and a minimum impurity decrease of zero. Impurity was measured with Gini impurity, which states the odds that a random member of the subset would be mislabled if it were randomly labeled according to the distribution of labels in the subset (Gini, Reference Gini, Pizetti and Salvemini1912).
Decision trees were fit with high and low minimum-Gini-decrease values. High minimum-Gini-decrease results in a tree that does not use any features for branching. Such a tree represents the baseline or traditional approach to mapping that maps a word directly to its most likely interpretation. Low minimum-Gini-decrease allows for a less conservative tree that uses multiple cues or features to predict the interpretation of a disjunction. Such a tree represents the cue-based context-sensitive account of word learning.
Results
We first present the results of the random forests in a binary classification task where the models were trained to classify whether an interpretation was exclusive or not. In the next section, we use a more general classifier to predict all interpretations of disjunction using the annotated cues. For visualization of trees, we selected the highest performing tree in the forest by testing each tree and selecting for highest F1 score. While the forest’s performance is not identical to the highest performing tree, the best tree illustrates successful learning from data.
Detecting Exclusivity
Figure 13A shows the best performing decision tree with high minimum Gini decrease. As expected, a learner that does not use any cues would interpret or as exclusive all the time. This is the baseline model. Figure 13B shows the best performing decision tree with low minimum Gini decrease. The tree has learned to use intonation and consistency to classify disjunctions as exclusive or inclusive. As expected, if the intonation is rise-fall or the disjuncts are inconsistent, the interpretation is exclusive. Otherwise, the disjunction is classified as not exclusive.
Figure 13C shows the average F1 scores of the baseline and cue-based models in classifying exclusive examples as the number of training examples increases. The models perform similarly, but the cue-based model performs slightly better. The real difference between the baseline model and the cue-based model is in their performance on inclusive examples. Figure 13D shows the F1 score of the forests as a function of the training size in classifying inclusive examples. As expected, the baseline model performs poorly while the cue-based model improves with more examples and performs better than the baseline tree.
Detecting All Interpretations
We next look at decision trees trained on the annotation data to predict all the interpretation classes for disjunction: AND, XOR, IOR, NOR, and NAB. Figure 14A shows the baseline model that only uses the words and and or to classify. As expected, and receives a conjunctive interpretation (AND) and or receives an exclusive interpretation (XOR). Figure 14B shows the best example tree of the cue-based model. The leaves of the tree show that it recognizes exclusive, inclusive, conjunctive, and even neither (NOR) interpretations of disjunction. How does the tree achieve that? Like the baseline model, the tree first asks about the connective used: and vs. or. Then like the previous cue-based model, it asks about intonation and consistency. If the intonation is rise-fall, or the disjuncts are inconsistent, the interpretation is exclusive. Then it asks whether the sentence is an interrogative or a declarative. If interrogative, it guesses an inclusive interpretation. This basically covers questions with a rising intonation. Then the tree picks declarative examples that have conditional speech act (e.g. “give me the toy or you’re grounded”) and labels them as exclusive. Finally, if negation is present in the sentence, the tree labels the disjunction as NOR.
Figures 14C, 14D, and 14E show the average F1-scores for the conjunctive (AND), exclusive (XOR), and inclusive (IOR) interpretations as a function of training size. While the cue-based model generally performs better than the baseline model, it shows substantial improvement in classifying inclusive cases. Figure 13F shows the average F1-score for the neither interpretation as a function of training size. Compared to the baseline model, the cue-based model shows a substantially better performance in classifying negative sentences. The success of the model in classifying neither examples (NOR) suggests that the cue-based model offers a promising approach for capturing the scope relation of operators such as negation and disjunction. Here, the model learns that when negation and disjunction are present, the sentence receives a neither (NOR) interpretation. In other words, the model has learned the narrow-scope interpretation of negation and disjunction from the input data. In a language where negation and disjunction receive an XOR interpretation (not A or not B), the cue-based model can learn the wide-scope interpretation of disjunction.
Finally, Figure 14G shows the average F1 score for the class NAB. This disjunct interpretation suggested that the first disjunct is false but the second true. NAB was by-far the most infrequent of the considered disjuncts (n = 6), was not in every tree in the random forests, and was not present in the highest performing tree. However, considering the data, it was seen in examples of repair most often and the most likely cue to it was also the communicative function or speech act of repair. The results show that, even though there were improvements in the cue-based model, they were not stable as shown by the large confidence intervals. It is possible that with larger training samples, the cue-based model can reliably classify the NAB interpretations as well.
Conclusions from Study 3
In this study, we used the annotation data from Study 2 to train and compare two random forest models representing two theoretical accounts of the acquisition of disjunction. The first account was a baseline (context-independent) account in which words are isolated and directly mapped to their most likely meanings, disregarding available contextual cues. Random forest models with high minimum-Gini-impurity-decrease represented this account. The second account was what we called the cue-based context-dependent mapping in which words are mapped to meanings using a set of cues available in the context. Random forest models with low minimum-Gini-impurity-decrease represented this cue-based account. Comparison of the F1-Scores produced by models representing these two accounts showed that the cue-based models outperformed the baseline models in every classification task. Most importantly, while the baseline models learned to always interpret a disjunction as exclusive, the cue-based models learned to interpret a disjunction as exclusive, inclusive, conjunctive, or neither (NOR), depending on the cues available in the input.
General Discussion
We have presented three studies to support the claim that child-directed speech contains linguistic cues for the successful interpretation of linguistic disjunction, and that mapping or to its meaning in a cue-based context-dependent manner addresses “the puzzle of learning disjunction”. Study 1 presented the overall distribution of or and and in parents’ and children’s speech in CHILDES corpora. It showed that children heard 1-2 instances of or per 1000 words produced by parents. Children started producing or themselves between 18-30 months, and by 42 months attained a rate of one or per 1000 words. Study 2 showed that, as Morris (Reference Morris2008) had also shown, the most common interpretation of or in child-directed speech was exclusive disjunction. These exclusive interpretations were accompanied by prosodic and semantic cues. In the absence of these cues to exclusivity, the interpretation of a disjunction was most likely non-exclusive. Finally, Study 3 used decision-tree learning to show that an ideal learner can use these linguistic cues to partition the input and predict the intended interpretation of a linguistic disjunction.
Here we address some important limitations of the present account that future work should address. The computational model in study 3 represents an ideal observer (Geisler, Reference Geisler2003). It allows us to measure the information available in the input for mapping or, provides a computational account of how to perform this task, and serves as a starting point for developing more realistic models. Future research should aim to improve at least three important aspects of this model. First, the model had access to a limited set of pre-selected cues for learning. As in other cue-based accounts (Monaghan & Christiansen, Reference Monaghan, Christiansen, Brooks and Kempe2014), this account needs to explain how the learner discovers and selects which cues are relevant to the acquisition of disjunction, among potentially many possible candidate cues. Fortunately the cues relevant for the acquisition of or are not idiosyncratic. Intonation and the semantics of the neighboring words are cues that need to be monitored for the interpretation of almost any word. It is therefore possible that a limited number of salient cues in child-directed speech guide many form-meaning mappings, and future research will uncover these.
Second, our account and computational model assumed the 16 binary logical connective concepts for the mapping of or. Future research on this account, as well as on other accounts of learning disjunction, needs to explain how children limit their conceptual space to consider only connective concepts when mapping words like and and or. One approach that may contribute to this is syntactic bootstrapping (Brown, Reference Brown1957; Gleitman, Reference Gleitman1990). Previous research has shown that syntactic bootstrapping can help learners filter their conceptual space appropriately for many word classes such as nouns (Soja, Reference Soja1992), verbs (Naigles, Reference Naigles1990), adjectives (Taylor & Gelman, Reference Taylor and Gelman1988), and prepositions (Landau & Stecker, Reference Landau and Stecker1990). It seems probable that a similar mechanism applies to connectives, especially that coordination has specific syntactic properties crosslinguistically (see Haspelmath, Reference Haspelmath and Shopen2007). Coordinators combine two or more units of the same type and return a larger unit, also of the same type. This larger unit bears the same semantic relation to the surrounding words, as the smaller units did without the coordination. These properties distinguish coordinators from other function words.
Third, the ideal observer/learner model was implemented using a supervised learning algorithm and had access to labeled training data. While it is not clear what feedback children receive while learning function words like or, it is clear that they do not have access to the kind of labeled data in our model. Future work should revise this aspect of the model and incorporate the kinds of feedback children actually receive (Chouinard & Clark, Reference Chouinard and Clark2003; Clark, Reference Clark2010).
Fourth, this research has demonstrated the utility of cues for the acquisition of disjunction, but future experimental work needs to show that children are indeed sensitive to such cues and in fact use them in the acquisition of or. Some research, for example, already suggests that infants are sensitive to intonational cues. Frota, Butler, and Vigário (Reference Frota, Butler and Vigário2014) have shown that 5-9 month-olds discriminate rising yes/no intonation (typical for questions) from the falling intonation typical for assertions. And Esteve-Gibert, Prieto, and Liszkowski (Reference Esteve-Gibert, Prieto and Liszkowski2017) showed that 12 month-olds can use gesture and intonation to distinguish basic speech acts like commands and statements. Such findings suggest that by the time children start their early mappings for disjunction, they may already be sensitive to the role of intonation in conveying some aspects of linguistic meaning. However, whether they actually use such cues to learn the meaning of function words like or remains an open question.
Fifth, our findings do not speak against specific theoretical accounts regarding the semantic and pragmatic status of disjunctive interpretations. In formal semantics and pragmatics, it is common to assume that the primary meaning of or is inclusive disjunction. The exclusive interpretation is derived using secondary enhancements to this primary meaning – for example, by Gricean reasoning about the alternative connective and, which results in an exclusivity implicature (Grice, Reference Grice1989; Horn, Reference Ho1972; Gazdar, Reference Gazdar1979; Levinson, Reference Levinson2000; Chierchia, Reference Chierchia2004). Such accounts can accommodate our findings by assuming that different cues discussed in this paper are related to specific semantic and pragmatic mechanisms that deliver the intended connective interpretation. For example, a rise-fall intonation may underlyingly cue a mechanism that strengthens the basic inclusive semantics of or into exclusive disjunction (see Roelofsen & Gool, Reference Roelofsen, Gool, Aloni, Bastiaanse, de Jager and Schulz2010 for a formal treatment of disjunction and intonation along these lines). Similarly, when the individual disjuncts are inconsistent (e.g. clean or dirty) the learner can derive an exclusive interpretation using the composition of exclusive disjuncts and an inclusive meaning for or. Such accounts have to then explain how the learner maps the cues to the correct underlying mechanism. Alternatively, it is possible to assume no underlying mechanism and directly map the cues along with the connective word or to the intended interpretation. These cues can later help disambiguate a disjunction in a specific context. Such an account would be closer to the usage-based tradition of language acquisition and processing (Goldberg, Reference Goldberg2003; Langacker, Reference Langacker1987; Tomasello, Reference Tomasello2003). The challenge for such accounts is to explain the universal tendencies in disjunctive interpretations and the mechanisms that generate them. Therefore, different theoretical accounts of disjunction can accommodate the findings of this paper and provide more specific predictions for future research.
Finally, this research should be placed within the larger context of word learning. As we noted earlier, Quine (Reference Quine1960) proposed three strategies for lexical learning: isolated mapping, context-dependent mapping, and description mapping. First, children learn many content words – concrete nouns, adjectives, and verbs – by mapping their isolated forms to concepts that are created through sensory experience. For example, a child may associate dirty with a visible property of objects or sit with the action she performs before having food or wearing shoes. Second, for more abstract meanings like those of some function words, children also rely on the meanings of the surrounding concrete content words on the utterance. For example, hearing “sit and eat” or “clean and shiny” may allow children to infer that the connective and is used when the speaker intends both actions or properties. Connective or, on the other hand, appears commonly in constructions like “sit or stand” and “clean or dirty” where only one or the other action or property can apply in typical everyday contexts. Third, once children have learned enough isolated and context-dependent mappings of meanings, they can also make use of linguistic definitions. For example, children may learn from their parents that below is “another word for under” or that carving is “cutting wood” (see Clark, Reference Clark2010). Gleitman et al. (Reference Gleitman, Cassidy, Nappa, Papafragou and Trueswell2005)’s “syntactic bootstrapping” offers a similar developmental account with emphasis on the role of syntactic structure in learning the meaning of “hard words” like mental verbs (e.g., think and know). They argue for a general probabilistic learning mechanism that combines and coordinates multiple cues – such as the number of the verb’s arguments, the argument position (subject vs. object), as well as argument type (the type of meanings the arguments have) – to constrain the hypothesis space for verb meanings.
Our account of English disjunction presented here is in line with both Quine (Reference Quine1960) and Gleitman et al. (Reference Gleitman, Cassidy, Nappa, Papafragou and Trueswell2005), and contributes to word meaning mapping in at least four respects. First, we have highlighted the role of prosody in the mapping of meaning. Prosody is considered an important source of information for learning a language’s structure (Carvalho, He, Lidz & Christophe, Reference Carvalho, He, Lidz and Christophe2019) and our work suggests that it can also play an important role in addressing the form-meaning mapping problem. Second, we have emphasized the role of semantic relations among known words in an utterance as a cue in mapping meanings; something Gleitman et al. (Reference Gleitman, Cassidy, Nappa, Papafragou and Trueswell2005) discuss under the label of “distributional cues”. The present work on disjunction also shows that the entailment relations between disjuncts, and more specifically whether they lead to logical inconsistency, can help learners map the meaning of a disjunctive term like or. Third, our findings show that cues may play a more complex role than previously assumed. Previous literature has shown that cues can boost a particular hypothesis against another to reduce uncertainty. Our work suggests that cues may also affect the mapping mechanism itself. With respect to disjunction, cues can break down the input into each “context of use” and allow the learner to map words to their meanings in a context-dependent manner. Fourth, in using decision-tree learning, our account takes some initial steps toward quantifying and formalizing the probabilistic cue-integration, as advocated by Gleitman et al. (Reference Gleitman, Cassidy, Nappa, Papafragou and Trueswell2005). Ultimately, we need to discover further cues and mechanisms that aid the acquisition of abstract functional meanings, and so establish a more comprehensive theory of word learning in first language acquisition.
Competing interests
The author(s) declare none.
Appendix
Figure 15 shows the percentage agreement and the kappa values for each annotation category over the 8 iterations.
Agreement in the following three categories showed substantial improvement after better and more precise definitions and annotation criteria were developed: connective interpretation, intonation, and communicative function. First, connective interpretation showed major improvements after annotators developed more precise criteria for selecting the propositions under discussion and separately wrote down the two propositions connected by the connective word. For example, if the original utterance was “do you want milk or juice?”, the annotators wrote “you want milk, you want juice” as the two propositions under discussion. This exercise clarified the exact propositions under discussion and sharpened annotator intuitions with respect to the connective interpretation that is communicated by the utterance. Second, annotators improved agreement on intonation by reconstructing an utterance’s intonation for all three intonation categories. For example, the annotator would examine the same sentence “do you want coffee or tea?” with a rise-fall, a rise, and a flat intonation. Then the annotator would listen to the actual utterance and see which one most resembled the actual utterance. This method helped annotators judge the intonation of an utterance more accurately. Finally, agreement on communicative functions improved as the definitions were made more precise. For example, the definition of “directives” in Table 9 explicitly mentions the difference between “directives” and “options”. Clarifying the definitions of communicative functions helped improve annotator agreement.
Inter-annotator reliability for conjunction was calculated in the same way. Two different annotators coded 300 utterances of and. Inter-annotator reliability was calculated over 10 iterations of 30 examples. Figure 16 shows the percentage agreement between the annotators as well as the kappa values for each iteration. Despite high percentage agreement between annotators, the kappa values did not pass the set threshold of 0.7 in three consecutive iterations. This paradoxical result is mainly due to a property of kappa. An imbalance in the prevalence of annotation categories can drastically lower its value. When one category is extremely common with high agreement while other categories are rare, kappa will be low (Cicchetti & Feinstein, Reference Cicchetti and Feinstein1990; Feinstein & Cicchetti, Reference Feinstein and Cicchetti1990). In almost all annotated categories for conjunction, there was one class that was extremely prevalent. In such cases, it is more informative to look at the class specific agreement for the prevalent category than the overall agreement measured by Kappa (Cicchetti & Feinstein, Reference Cicchetti and Feinstein1990; Feinstein & Cicchetti, Reference Feinstein and Cicchetti1990).
Table 11 lists the dominant classes as well as their prevalence, the values of class specific agreement index, and category agreement index (Kappa). Class specific agreement index is defined as $ 2{n}_{ii}/{n}_{i.}+{n}_{.i} $ , where $ i $ represents the class’s row/column number in the category’s confusion matrix, $ n $ the number of annotations in a cell, and the dot ranges over all the row/column numbers (Fleiss, Levin & Paik, Reference Fleiss, Levin and Paik2013, p. 600; Ubersax, Reference Ubersax2009). The class specific agreement indices are high for all the most prevalent classes showing that the annotators had very high agreement on these class, even though the general agreement index (Kappa) was often low. The most extreme case is the category “consistency” where almost all instances were annotated as “consistent” with perfect class specific agreement but low overall Kappa. In the case of utterance type and syntactic level where the distribution of instances across classes was more even, the general index of agreement Kappa is also high. In general, examples of conjunction showed little variability across annotation categories and mostly fell into one class within each category. Annotators had high agreement for these dominant classes.