Introduction
Theories of statistical learning posit that infants track basic information about the words they hear, such as their frequency and their co-occurrence with nearby words, which provide the foundation for more complex knowledge (e.g., Lany, Reference Lany2014; Saffran, Reference Saffran2003; Thiessen & Erickson, Reference Thiessen and Erickson2013). If this is true, then children must be able to navigate the complexities of incoming speech in real time. Speakers have many options available to them when talking to children, and children therefore encounter words more or less frequently in a variety of sentence contexts. For example, a parent trying to direct a child’s attention might say Look at the horse, but theoretically they could also say Examine the pony; Behold the equine; or This thing is a beast, among endless options. Yet, what we know about children’s early language processing is based almost entirely on frequent, prototypical sentences. By exploring how comprehension may differ for typical vs. atypical phrasings, we will be positioned better to evaluate how statistical regularities in children’s environments shape their learning and understanding of language.
From a young age, infants show a capacity to learn how words co-occur in their input. Infants detect and recognize frequent combinations of words (Mintz, Reference Mintz2003; Skarabela et al., Reference Skarabela, Ota, O’Connor and Arnon2021), and children and adults are better able to process, remember, and repeat frequent phrases (Arnon & Clark, Reference Arnon and Clark2011; Arnon & Snider, Reference Arnon and Snider2010; Bannard & Matthews, Reference Bannard and Matthews2008). This suggests that learners are proficient in using their knowledge of common patterns from their language experiences, and in fact, infants who show greater sensitivity to statistical patterns have been shown to have stronger language skills (Lany, Reference Lany2014). It has also been found that highly typical sentence frames, such as Look at the…, support toddlers’ ability to recognize subsequent words, compared to when words appear in isolation or occur after the word Look by itself (Fernald & Hurtado, Reference Fernald and Hurtado2006; Morini & Newman, Reference Morini and Newman2019). Sentences frames have the potential to provide the child with phonetic, syntactic, and/or semantic information about an upcoming referent, and past studies have shown that young children experience disruptions in their processing when provided with inconsistent cues (e.g., Lew-Williams & Fernald, Reference Lew-Williams and Fernald2007; Mahr et al., Reference Mahr, McMillan, Saffran, Ellis Weismer and Edwards2015; Reuter et al., Reference Reuter, Borovsky and Lew-Williams2019). This work shows that infants use co-occurrence statistics to inform their comprehension in real time and reveals how they may benefit from hearing words in contexts that are most typical of their prior experience.
While infants also show broad sensitivity to frequency, there is less direct evidence concerning how they use frequency information in real time. Children learn frequent words at younger ages (Ambridge et al., Reference Ambridge, Kidd, Rowland and Theakston2015; Goodman et al., Reference Goodman, Dale and Li2008), showing that they benefit from simply encountering words often. Both children and adults consistently show more efficient processing of frequent words (e.g., Garlock et al., Reference Garlock, Walley and Metsala2001), perhaps because the representations of these words are more accessible, or because they have heard these words in more diverse contexts (see Goodman et al., Reference Goodman, Dale and Li2008). In addition, bilingual toddlers tend to show enhanced understanding of words produced in the language they hear more often, and they have greater difficulty understanding words in their less-frequently-heard language (Hurtado et al., Reference Hurtado, Grüter, Marchman and Fernald2014; Morini & Newman, Reference Morini and Newman2019). More specifically, bilingual toddlers have greater difficulty understanding words in their weaker language when those words appear in atypical sentence contexts (Potter et al., Reference Potter, Fourakis, Morin‐Lessard, Byers‐Heinlein and Lew‐Williams2019). This pattern of results illustrates how children’s comprehension can be influenced by both the frequency of individual words and the typicality of the context in which they appear. Thus, a statistical learning framework that spans multiple words in a sentence offers a tool for understanding children’s real-time processing of diverse sentences (Lany et al., Reference Lany, Shoaib, Thompson and Graf Estes2018).
To test relative contributions of frequency and co-occurrence statistics, we investigated monolingual English-learning toddlers’ comprehension of frequent vs. infrequent target words that appeared in typical vs. atypical sentence contexts. Using the looking-while-listening procedure, we monitored toddlers’ eye movements as they viewed pairs of familiar images and heard sentences labeling one object (Fernald et al., Reference Fernald, Zangl, Portillo and Marchman2008). Target nouns were high- or low-frequency labels for the same objects (e.g., horse vs. pony; plate vs. dish). Toddlers were tested on their understanding of these nouns following typical or atypical sentence frames (e.g., Look at the… vs. Examine the…; Do you like the… vs. Would you prefer the…). Our hypothesis was that toddlers’ comprehension would be affected by both word frequency and the typicality of the preceding sentence context. Specifically, we predicted that toddlers would show weaker comprehension of less (vs. more) frequent words, particularly if they occurred in atypical (vs. typical) sentence contexts. We chose to use sentences that included plausible, but unlikely, phrases for toddlers to hear, reasoning that if toddlers’ comprehension depends on the statistics of their input, low-probability sentence frames might derail their comprehension. To understand how toddlers’ growing knowledge of words might affect their comprehension, we also tested whether toddlers’ vocabulary skills predicted the strength of these effects.
Method
Participants
The final sample included 34 full-term typically developing monolingual English-learning toddlers of ages 21 to 27 months (15 female, M = 23.7 months, SD = 2.1) from the Northeastern United States, matched to the age of bilingual participants in prior studies (Morini & Newman, Reference Morini and Newman2019; Potter et al., Reference Potter, Fourakis, Morin‐Lessard, Byers‐Heinlein and Lew‐Williams2019). Parents of all participants provided informed consent, and all procedures were approved by the Princeton University Institutional Review Board. An a priori power analysis (Faul et al., Reference Faul, Erdfelder, Lang and Buchner2007) indicated this sample size would have .81 power to detect a medium main effect (f = .25). Eighteen additional toddlers were tested but excluded for reported language delay (1), fussiness (9), equipment error (2), failure to contribute at least two usable trials in all conditions (5), or an ill-timed fire alarm (1).
Stimuli
Auditory stimuli
Auditory stimuli were produced by a female native English speaker using child-directed speech. Each sentence included a Typical or Atypical sentence frame (Table 1) and a High- or Low-Frequency target noun.
Target nouns were selected based on what children of this age would be expected to know, according to norms for the MacArthur-Bates Communicative Development Inventory: Words and Sentences (MCDI; Fenson et al., Reference Fenson, Bates, Dale, Marchman, Reznick and Thal2007). We chose items that were reportedly produced by at least 30% of 24-month-olds and more than 50% of 30-month-olds to ensure that these were likely to be familiar words, and parental reports confirmed that the majority of children understood both the High- and Low-Frequency labels. For all items, data from the CHILDES corpus indicated that High-Frequency labels were at least twice as frequent as Low-Frequency labels in North American child-directed speech (Table 2).
a Percentage of children in the current study who were reported to understand the label
b Percentage of 24-month-old American English-learning infants reported to produce the label, according to MCDI norms (Frank et al., Reference Frank, Braginsky, Yurovsky and Marchman2017)
c Frequency (per million words) in the CHILDES corpus for North American English (MacWhinney, Reference MacWhinney2000)
d Ratio of occurrence of the High- vs. Low-frequency label in the CHILDES corpus
Each of 16 target nouns occurred once following a Typical frame and once following an Atypical frame. Sentences were edited using Praat (Boersma & Weenink, Reference Boersma and Weenink2016) in order to match sentences in intensity (65db) and to standardize the duration of all frames (1151ms) and target nouns (665ms), such that they were identical in length.
Visual stimuli
Visual stimuli were pairs of images of familiar objects presented side-by-side on grey backgrounds (Figure 1). Images always occurred in yoked pairs matched for animacy and salience (horse/bird, plate/light, candy/pants, sheep/chicken). Each visual image could be labeled with equal plausibility by a High- and Low-Frequency noun. All tokens appeared equally often on the right and left side and equally often as the target and distracter. All stimuli are available through OSF (https://osf.io/frvx7/).
Procedure
Toddlers sat on their parents’ laps in a sound-attenuated, darkened room. Parents wore opaque glasses and were asked not to interfere. Visual stimuli appeared on a 55” monitor, and speech stimuli played over a loudspeaker. On each trial, images were presented in silence for 2s. Participants then heard a sentence labeling one object, followed by a sentence used to maintain attention (e.g., Look at the horse! Do you see it?) and 1.5s of silence, for a total trial duration of 6.7s.
Testing sessions were divided into four 8-trial blocks. Within each block, toddlers heard one type of sentence frame (Typical or Atypical), and half of trials involved High-Frequency target nouns. Toddlers heard alternating blocks of Typical vs. Atypical sentences, and we counterbalanced across participants which type of frame occurred first. Between each block, toddlers saw a filler trial to engage interest. Trial orders were pseudorandomized such that there were never more than three consecutive trials with the same type of target noun (High- or Low-Frequency), and toddlers were randomly assigned to one of four experimental orders.
Following testing, all parents reported whether their children understood and said each target noun. Thirty two of 34 parents also filled out the MCDI to provide a measure of children’s vocabulary (M = 299 words, range = 11-582, SD = 176).
Coding
Trained coders, blind to condition, recorded at 33ms intervals whether the child was looking right, left, or at neither image. We excluded trials where the child was not looking at either image at noun onset or for more than 500ms continuously during our analysis window, resulting in an average of 19.9 usable trials per participant. Twenty percent of sessions were re-coded by a second coder. Coders agreed on the gaze location on 99% of frames and agreed within a single frame on 98% of frames surrounding shift events.
Results
To evaluate children’s recognition of words and their referents, we computed toddlers’ mean accuracy, defined as the proportion of time looking at the target image divided by time viewing either image during the window from 367-2000ms following the onset of the target noun. All data and supplemental analyses can be found at https://osf.io/frvx7/.
For all trial types, single-sample two-tailed t-tests revealed that toddlers significantly exceeded chance (.5) in looking to the target (M = .66, all p <.00001, Table 3), demonstrating successful comprehension, on average, of both frequent and infrequent labels across different sentences.
Note: Performance across all trial types significantly exceeded chance, p <.00001
We then tested the influences of sentence typicality and word frequency using a 2x2 within-subjects ANOVA (Sentence Frame: Typical vs. Atypical, Target Noun: High- vs. Low-Frequency, Figure 2). The effect of Frame was not significant, F(1,33) = 0.25, p = .62, and there was no interaction between Frame and Noun, F(1,33) = 0.001, p = .97, indicating that toddlers showed no significant difference in looking to the target following Typical or Atypical sentence frames. However, there was a significant main effect of Target Noun, F(1,33) = 4.36, p = .04, ηp² = .12; accuracy was higher for High-Frequency nouns. Exploratory analyses revealed no significant order effects or changes across blocks (see Figures S1-S3 in supplementary materials). Thus, while toddlers recognized both High- and Low-Frequency labels in real time, they showed better understanding of frequent words.
Finally, we conducted exploratory analyses to test associations between word comprehension and vocabulary knowledge. MCDI scores were not significantly correlated with overall accuracy, r = .17, p = .34, 95% CI = [-.19, .49], and toddlers showed above-chance comprehension even of the words that their parents did not report they knew, M=.59, t(22)=2.61, p=.02. We then tested how toddlers’ vocabulary related to their sensitivity to word frequency. To do so, we calculated a difference score for each toddler that captured the difference between their accuracy on trials with High- vs. Low-Frequency targets. Difference scores were significantly correlated with vocabulary size, r = -.38, p = .03, 95% CI = [-.64, -.038], indicating that toddlers with larger vocabularies showed smaller discrepancies in recognizing High- vs. Low-Frequency words (see Figure S5).
Discussion
This study tested monolingual toddlers’ sensitivity to the statistical patterns in their language input by exploring their understanding of frequent vs. infrequent target nouns that appeared in typical vs. atypical sentence frames. Toddlers successfully demonstrated comprehension of all sentence types, but they were less accurate in recognizing low-frequency words, and differences were especially pronounced for children with smaller vocabularies. These results yielded two important insights. First, toddlers’ real-time comprehension is flexible enough to contend with sentence contexts that involve relatively unfamiliar sequences of words – at least in a controlled lab setting. Second, frequency information affects children’s ability to recognize nouns, indicating that the robustness of children’s knowledge of a target word is a key factor in their real-time comprehension.
Across different sentences, toddlers displayed better understanding of nouns that are more common in child-directed speech, illustrating how toddlers’ ability to track frequency information has a measurable effect on their real-time comprehension. Interestingly, the relation we observed between vocabulary size and the difference in recognizing high- vs. low-frequency nouns tentatively suggests that when children are in earlier periods of vocabulary growth – or perhaps when they have received less input from caregivers – they may be particularly sensitive to how often they have encountered a given word. That is, frequency effects in word recognition may be especially pronounced when a child has not yet had the chance to form a robust representation. Crucially, high- and low-frequency labels referred to identical visual tokens, ensuring that differences were attributable to familiarity with words, not objects (Kartushina & Mayor, Reference Kartushina and Mayor2019). This imbalance mirrors data from adults showing that word frequency reliably predicts comprehension across a variety of tasks (Ellis, Reference Ellis2002; Gerhand & Barry, Reference Gerhand and Barry1998; Morton, Reference Morton1969) and is consistent with studies showing that children’s ability to correctly select a label’s referent depends on their familiarity with that word (Kucker et al., Reference Kucker, McMurray and Samuelson2018).
Unexpectedly, the typicality or atypicality of the sentence frame had no observable effect on toddlers’ comprehension. That is, even when hearing an unusual frame, toddlers recognized the noun that followed, suggesting that they were relatively insensitive to the unexpectedness of the atypical sentences that we tested (see Figure S2). This does not mean that toddlers are unaffected by prior linguistic context. Indeed, studies suggest that toddlers exploit semantic, morphosyntactic, and phonetic cues in real-time processing (Borovsky et al., Reference Borovsky, Elman and Fernald2012; Fernald et al., Reference Fernald, Zangl, Portillo and Marchman2008; Lew-Williams & Fernald, Reference Lew-Williams and Fernald2007; Mahr et al., Reference Mahr, McMillan, Saffran, Ellis Weismer and Edwards2015), so familiar linguistic contexts could have facilitated downstream recognition, or the atypical sentence frames could have impeded their comprehension. One possibility for why we did not find a significant effect of sentence frame is that the frames were not sufficiently different in this experimental environment. For our atypical sentences, we chose to use frames that are quite uncommon in child-directed speech, yet maintained reasonable naturalness. For example, all sentences included natural co-articulation and prosody, which play a role in supporting more seamless processing of upcoming words (e.g., De Carvalho et al., Reference De Carvalho, Dautriche, Lin and Christophe2017; Mahr et al., Reference Mahr, McMillan, Saffran, Ellis Weismer and Edwards2015; Paquette-Smith et al., Reference Paquette-Smith, Fecher and Johnson2016). The noun was also preceded by the determiner the in every sentence, and there is good evidence that children are sensitive to determiners and can use them to facilitate processing (Höhle et al., Reference Höhle, Weissenborn, Kiefer, Schulz and Schmitz2004; Kedar et al., Reference Kedar, Casasola and Lust2006; Lew-Williams & Fernald, Reference Lew-Williams and Fernald2007). Another explanation is that infants can perhaps ‘listen through’ uninformative or unusual parts of a sentence (Thorpe & Fernald, Reference Thorpe and Fernald2006) and attend primarily to target nouns, or other meaningful information that arrives downstream. None of our sentence frames, either Typical or Atypical, reliably predicted any particular referent, therefore any of the subsequent target nouns were equally likely or unlikely to follow. Future studies will continue to examine whether infants benefit from hearing words in predictable contexts, as well as exploring gradients of familiarity (i.e., highly probable vs. possible vs. unlikely vs. impossible sentences), which will inform current understanding of how they exploit different types of prior experience and what kinds of regularities most strongly shape infants’ comprehension.
Importantly, while toddlers showed reduced understanding of low-frequency nouns, they still recognized more than one label for the identical visual token. This contrasts with views of early word learning proposing that young children are unwilling to accept multiple names for a single referent (Markman & Wachtel, Reference Markman and Wachtel1988) and provides additional evidence that toddlers, like older children, recognize that the same object can be labeled in more than one way (Bergelson & Swingley, Reference Bergelson and Swingley2013; Waxman & Hatch, Reference Waxman and Hatch1992).
These results also converge with recent research on early bilingualism. Monolingual toddlers’ difficulty understanding low-frequency words is analogous to bilingual toddlers’ reduced comprehension of words in their weaker and/or less-frequently-heard language (Hurtado et al., Reference Hurtado, Grüter, Marchman and Fernald2014; Potter et al., Reference Potter, Fourakis, Morin‐Lessard, Byers‐Heinlein and Lew‐Williams2019). The current results, in conjunction with findings from bilingual participants, illustrate that children from different language backgrounds construct their knowledge through experience with words encountered more and less frequently, and suggest that a statistical learning framework can unite research on early learning across monolingual and bilingual environments.
While the current study provides evidence that toddlers are sensitive to overall frequency information, we relied on general estimates of how often words are used, and normative data may not reflect a given child’s experience. That is, some children inevitably had more experience than others with certain words. Future studies could provide a stronger test of the types of statistics that children use to guide their language comprehension by either collecting detailed measures of their experiences or by controlling the frequency of words in different contexts in an experimental design. With such tailored measures and structured manipulations, it might be possible to capture subtle effects, such as differences in reaction time, that could provide insight into the efficiency with which infants use different cues (see Figure S6 for exploratory analyses involving reaction times). Moreover, if we had been able to include more trials for each item, it might have been easier to detect influences of linguistic context and of particular nouns (see Table S1). Future studies should collect sufficient data to explore individual differences at both the child and the word level, perhaps by testing children across multiple sessions. More generally, future studies that are tailored to children’s interests and to their experiences with uncommon words and constructions could help us understand the idiosyncratic nature of their developing knowledge.
Broadly, these results demonstrate how the distributional statistics of children’s language input shape real-time comprehension and offer a potential mechanism underlying reported links between the amount of language children hear in their everyday lives and their language skills (Weisleder & Fernald, Reference Weisleder and Fernald2013). We often think of input in an aggregated sense, but when a child hears more total words compared to another child, they will also likely hear individual words more often (at least on average). This increased word-level experience could support the development of robust representations that enhance comprehension. Thus, we suggest that statistical learning principles can explain how children understand frequent and infrequent words in their input. Namely, over time, children’s representations strengthen as they encounter the same words in new contexts.
Acknowledgments
We would like to thank the participating families and members of the Princeton Baby Lab, especially Kennedy Casey, Ellie Breitfeld, Katie Vasquez, Rinat Tal, and Ariella Cohen. This work was funded by grants from NICHD (F32 HD093139, R01HD095912) and the James S. McDonnell Foundation. Stimuli, data, and supplemental materials are available at https://osf.io/frvx7/.
Competing interest
The authors declare none.