Introduction
Successful readers tap into a rich bank of word knowledge as they read a text (Neuman, Reference Neuman2010; O’Reilly et al., Reference O’Reilly, Wang and Sabatini2019). Consider the following sentence: Manatees spend the majority of their day grazing. Children who do not have a strong conceptual understanding of what a manatee is or what “grazing” means will struggle with the comprehension of this sentence. Semantic knowledge is essential for reading and is characterized by richly connected networks of associated words and concepts (O’Reilly et al., Reference O’Reilly, Wang and Sabatini2019; Schneider et al., Reference Schneider, Körkel and Weinert1989). Moreover, vocabulary accumulation begins early; children’s vocabulary knowledge at kindergarten predicts their reading skills through 4th grade (Dickinson & Porche, Reference Dickinson and Porche2011).
However, children who come from under-resourced communities often lag behind their wealthier peers in word knowledge (National Center for Educational Statistics, 2013; Walker et al., Reference Walker, Greenwood, Hart and Carta1994). These children tend to have less exposure to diverse vocabulary and often engage in fewer of the adult-child interactions that foster vocabulary development (Gilkerson et al., Reference Gilkerson, Richards, Warren, Montgomery, Greenwood, Kimbrough Oller, Hansen and Paul2017; Rowe, Reference Rowe2008). Without a solid foundation of vocabulary knowledge in the early years, children are at risk for falling behind academically when learning to read becomes reading to learn.
Yet the best way to measure vocabulary knowledge is not a settled issue. Some researchers (Hadley & Dickinson, Reference Hadley and Dickinson2018; Hoffman et al., Reference Hoffman, Teale and Paciga2014; Pearson et al., Reference Pearson, Hiebert and Kamil2007) argue that most vocabulary assessments are flawed due to a narrow and limited view of word meaning. These assessments should be based on a strong theoretical foundation that considers the many dimensions of what it means to know a word (Hadley & Dickinson, Reference Hadley and Dickinson2018).
Semantic knowledge in word learning
As children gain vocabulary knowledge, they implicitly construct a web of words that are connected to each other in meaningful ways (Collins & Loftus, Reference Collins and Loftus1975; Steyvers & Tenenbaum, Reference Steyvers and Tenenbaum2005). In the process of building this semantic network, children incrementally learn how words are related to one another and encode connections between new words and existing entries (Henriksen, Reference Henriksen1999; Wojcik, Reference Wojcik2018). Henriksen (Reference Henriksen1999) provides a clear example of how semantic network building occurs with the word hot: a child discovers how neighboring words in the lexicon are related to hot through synonyms (sweltering), antonyms (cold), and near synonyms (warm or scalding).
Young children’s semantic representations appear to go through a refinement process (Bergelson & Aslin, Reference Bergelson and Aslin2017; Clark, Reference Clark1973, Reference Clark, Sinclair, Jarvella and Levelt1978, Reference Clark1987; Hendrickson et al., Reference Hendrickson, Poulin-Dubois, Zesiger and Friend2017; McGregor et al., Reference McGregor, Friedman, Reilly and Newman2002; Seston et al., Reference Seston, Golinkoff, Ma and Hirsh-Pasek2009). Children’s initial representation of the word cookie, for example, is imprecise and may include other types of categorically related foods, such as crackers and cake. As children hear more language that makes contrasts between cookies and other types of baked goods, their representation of cookie becomes more precise, including only small baked goods that are sweet and firm and excluding related category members.
Additionally, research demonstrates that toddlers pass through incremental “stages” of gaining semantic knowledge about a word, rather than learning all of a word’s meaning instantaneously (Hendrickson et al., Reference Hendrickson, Poulin-Dubois, Zesiger and Friend2017). In fact, researchers have previously theorized a specific shift in children’s word knowledge – often called the thematic to conceptual shift (Cronin, Reference Cronin2002; Nelson, Reference Nelson1977; Nelson & Nelson, Reference Nelson and Nelson1990). This theory suggests that children initially form thematic connections between words (e.g., fish, water, beach) and as they develop, they begin to form more conceptual or categorical relationships between words (e.g., sea animals, fish, bass). Yet, evidence has called into question the developmental nature of this shift (Smiley & Brown, Reference Smiley and Brown1979). In fact, many language researchers are finding that categorical connections exist in the lexicons of even very young children (Arias-Trejo & Plunkett, Reference Arias-Trejo and Plunkett2013; Bergelson & Aslin, Reference Bergelson and Aslin2017; Wojcik & Saffran, Reference Wojcik and Saffran2015; Wojcik, Reference Wojcik2018; Wojcik & Kandhadai, Reference Wojcik and Kandhadai2020).
Measuring the nuances of word knowledge
Despite the fact that knowledge of any single vocabulary word represents a bank of knowledge, these finer aspects of word meaning are not routinely evaluated in vocabulary assessments. In fact, there are typically two dimensions of vocabulary knowledge that researchers assess in interventions: breadth and depth. Breadth of knowledge is the number of words children know whereas depth refers to how well children know those words (Anderson & Freebody, Reference Anderson and Freebody1981; Hadley & Dickinson, Reference Hadley and Dickinson2018; Schmitt, Reference Schmitt2010; Silverman & Hartranft, Reference Silverman and Hartranft2015). Although some researchers have argued that depth and breadth are not conceptually distinct (e.g., Vermeer, Reference Vermeer2001), most studies distinguish between these aspects and measure them using distinct tasks.
Intervention studies often consider breadth by using receptive measures to determine the number of words children learn (Hadley & Dickinson, Reference Hadley and Dickinson2018; Marulis & Neuman, Reference Marulis and Neuman2010). Often, these tests are similar to the Peabody Picture Vocabulary Test or PPVT-IV (Dunn & Dunn, Reference Dunn and Dunn2007) which requires children to point to an image of the target vocabulary word presented with several distractor images. Other researchers may present a target vocabulary word and have older children choose among three phrases for the definition. For example, children might be asked which definition fits the word abandoned: “(a) left all alone”, “(b) put in a band”, or “(c) made off limits” (Greene Brabham & Lynch-Brown, Reference Greene Brabham and Lynch-Brown2002).
Receptive measures of breadth only assess the range or variety of words that children can comprehend, limiting the information researchers gather about the depth of participants’ word knowledge. A child may select the correct target item for 5 out of 10 taught vocabulary words, but this task does not assess the type of information children know about a word. Can they use the word in a sentence properly? Can they transfer word knowledge to new contexts? Measures that assess how well children know a word, or the depth of children’s vocabulary learning, can extend our understanding of children’s word-learning (Hadley & Dickinson, Reference Hadley and Dickinson2018; Pearson et al., Reference Pearson, Hiebert and Kamil2007). Measures of depth can also tap into children’s growing semantic networks by assessing the types of associations children have for a particular word.
Depth of word knowledge has been tested in several ways. Children are asked to define target words (Justice et al., Reference Justice, Meier and Walpole2005; Toub et al., Reference Toub, Hassinger-Das, Nesbitt, Ilgaz, Weisberg, H.rsh-Pasek, Golinkoff, Nicolopoulou and Dickinson2018), give yes or no responses about contexts in which a word can be used (Beck & McKeown, Reference Beck and McKeown2007), or even answer questions about conceptual knowledge of a word (e.g., “does a jacket help our body move around?”) (Neuman et al., Reference Neuman, Newman and Dwyer2011). Assessments such as the Test of Preschool Early Literacy (Lonigan et al., Reference Lonigan, Wagner, Torgesen and Rashotte2007) probe children’s depth as well, asking children to describe images of target items. Indeed, although these measures allow researchers to examine depth of word knowledge, they are typically more time consuming to administer and code and often rely on expressive abilities of the child.
Moving towards assessing a continuum of word knowledge
The goal of the current study was to develop and test a novel framework for measuring children’s depth of vocabulary knowledge as part of a larger vocabulary intervention that taught preschool children new words through a combination of shared book-reading and playful learning. We created a measurement technique that tapped into the continuum of children’s word knowledge and was as simple to administer and code as a standard receptive measure that assesses breadth. Our goal was not to create a standardized assessment; rather, we tested a framework for assessing the continuum of vocabulary knowledge that could be applied by researchers or practitioners to whatever set of vocabulary words are of interest in a given setting. As a testing ground for this framework, we created a receptive vocabulary measure that, depending on children’s choices, offered information on how the development of word meanings was progressing as a result of the vocabulary intervention. Some prior research developed related frameworks to assess depth of semantic knowledge and components of vocabulary knowledge among adolescents (Deane et al., Reference Deane, Lawless, Li, Sabatini, Bejar and O’Reilly2014) and adults (González-Fernández & Schmitt, Reference González-Fernández and Schmitt2019) but little work has taken this approach about younger learners.
We operationalized our framework by creating two vocabulary assessments, both designed to measure depth of word knowledge: a theoretically driven receptive task and an expressive task. The receptive task was an easy-to-administer digital assessment that captured not only the progress of children’s word-learning as part of the intervention, but the progression of their knowledge about intervention words along a semantic continuum. The assessment probed various levels of semantic representation by using children’s errors as a source of information about their knowledge. Analyzing children’s errors to provide insight into cognition has a long history in developmental psychology, going back to Piaget’s interpretation and theorizing around children’s errors (Piaget, Reference Piaget1954, Reference Piaget1969) and extending to recent approaches to capture word inferencing (Denicola-Prechtl et al., Reference Denicola-Prechtl, Abel and Maguire2023). In the current approach, we systematically created a receptive measure with 3 meaningfully related foils that would allow us to use children’s errors to make inferences about intermediate levels of word knowledge, rather than concluding that children are either correct or incorrect.
Specifically, while receptive measures have previously been classified as assessments of breadth of knowledge, the measure we created for this study incorporates aspects of depth of knowledge by systematically including foils that are related in different ways to the target word. Foils were designed to deliberately probe a continuum of children’s semantic understanding from non-semantic sound-based representations to meaning-relevant representations. We included two types of semantic foils: Some foils were thematically-related to the target word (shares spatial, causal, temporal, or other relations with the target, e.g., a window for the word awning) and others were conceptually-related to the target word (shares the same taxonomic category as the target, e.g., an umbrella for the word awning). Although both thematic and conceptual foils fall under the larger umbrella of tapping into semantic knowledge, we expected that conceptual foils represented a more advanced form of semantic knowledge, given the developmental evidence that thematic connections between words precede taxonomic, conceptual connections (Cronin, Reference Cronin2002; Lucariello et al., Reference Lucariello, Kyratzis and Nelson1992). Moreover, Wojcik and Kandhadai’s (Reference Wojcik and Kandhadai2020) recent findings indicate that although young children can produce taxonomic connections, their understanding of these word relationships increases with age. Therefore, it is possible that children’s receptive choice of a conceptual foil represents more advanced knowledge on the semantic spectrum compared to the choice of the thematic foil.
However, we expected that both types of semantic foils (conceptual and thematic) would represent more advanced word knowledge than the phonological foil. Phonological foils do not reflect semantic knowledge, but are meaningfully related to the target word through sound-based word learning processes known to influence the structure of children’s early word recognition (Mani & Plunkett, Reference Mani and Plunkett2011; Swingley et al., Reference Swingley, Pinto and Fernald1999). Children who select a sound-based foil (e.g., selecting yawning for the target word awning), rather than a semantically-related foil, likely have limited understanding of the target word’s meaning. The three foils, therefore, incorporate aspects of word learning processes across a knowledge continuum from sound-based non-semantic knowledge to conceptual knowledge. Thus, the current measure allows for a more detailed investigation of children’s incremental vocabulary knowledge beyond simply the number of words learned from the intervention.
Thus, the goal of this paper is to explore whether this new way of probing children’s vocabulary depth provides a fuller picture of the word knowledge children can acquire in a vocabulary intervention. We provide a theoretically motivated receptive framework that can be adapted by researchers to assess vocabulary in a classroom intervention setting. Most measures assess the extent to which children go from having no semantic understanding of an item at pre-test (incorrect response on receptive measure) to “complete” knowledge at post-test (correct response on receptive measure). However, we propose that it is possible children’s semantic understanding could improve incrementally through an intervention. Such incremental gains or intermediate word knowledge would not be captured on traditional measures but can be captured through our novel approach based on a continuum framework, given theory and evidence suggesting that semantic foils (thematic and conceptual) represent more advanced forms of word knowledge than phonological foils. In other words, in this approach, a child who selects an incorrect choice at pretest that reflects a limited understanding of the target item’s meaning (phonological foil) and then choses an incorrect choice at post-test that is semantically related to the target item (thematic or conceptual foil) would be demonstrating intermediate knowledge of the word’s meaning. Although this child may have not yet grasped the nuances of the word’s meaning to select a fully correct response, they would be demonstrating evidence that they have learned some components of the word’s meaning – thus, allowing them to select a more advanced foil. Similarly, a child who selects an incorrect choice at pretest that is semantically related to the target item (thematic or conceptual foil) may have some baseline knowledge of the word’s meaning and may be more “ready to learn”. This child may be more likely to learn the nuanced semantic knowledge necessary to select a correct representation of the word’s meaning post-intervention.
In the current paper, we describe the methodology for creating this novel type of receptive measure and the results from a vocabulary intervention that used this task to analyze children’s foil choices. By focusing solely on errors (i.e., foil choices) rather than children’s accuracy in choosing the correct response option, we were able to detect smaller shifts in how children are wrong when they are wrong. Although prior work has demonstrated the overall effectiveness of this particular intervention (Authors, Reference Hopkins, Collins, Dore, Lawson-Adams, Schatz, Scott, Shirilla, Toub, Dickinson, Golinkoff, Hirsh-Pasek and Scott2019), statistical models including children’s correct responses would obscure smaller shifts in foil choices from pre- to post-test, because accuracy goes up drastically and thus all foil choices necessarily decrease. However, our interest here is in the proportional distribution of children’s foil choices when they do err, which allows us to discern any incremental learning. See Denicola-Prechtl et al. (Reference Denicola-Prechtl, Abel and Maguire2023) for a similar approach.
Specifically, we ask four research questions. (1) Do children’s foil selections change from pre-test to post-test? We expect that, even when children choose an incorrect response, their choices will shift from predominately phonological foils at pretest to predominantly semantic foils at post-test. (2) Is foil selection related to accuracy? Because we focus on children’s errors, we can examine whether how they respond incorrectly, or their choice between the three foils, is related to their overall accuracy, or how often they choose the correct response rather than one of the three foils. We expect that choosing semantic foils rather than phonological foils when children err will be related to higher overall accuracy. (3) Does foil choice on the receptive test at post-test relate to children’s expressive test performance at post-test? We expect that choosing a semantic foil (rather than a phonological foil) for a given word will be related to being more likely to score points on the expressive task for that word, providing evidence of convergent validity for the receptive task as a measure of intermediate word knowledge. (4) Does foil selection indicate readiness for learning? We hypothesize that choosing a semantic foil rather than the phonological foil for a given word at pretest will predict scoring points for that word on the expressive task at post-test; as choosing a semantic foil may indicate an intermediate level of word knowledge that could then be boosted by instruction.
Method
Participants
Participants were recruited from two preschool programs, one in Eastern Pennsylvania and one in Central Tennessee. Children came from families who met income requirements for state- or federally-funded preschool. Across sites, the sample consisted of 138 preschoolers in 10 classrooms (74 girls) with a mean of 14 students per classroom (range = 9 – 20). At the beginning of the school year, the mean age was 50.6 months (Range: 38.0 – 61.4 months).
Caregivers were asked to fill out an optional survey in which they provided demographic information about their child such as race, ethnicity, mother’s education level, and home language exposure. The majority of children in the study identified as black or African-American (62%), 15% identified as Hispanic, 6% identified as white, 10% identified as another race or more than one race, and 7% did not disclose race and ethnicity information. Almost half of participants had mothers whose highest level of education was a high school diploma or less (44%), 38% of mothers had some college or trade school, 8% had a bachelor’s or graduate degree, and 10% did not report their education level. While all participants spoke English, parents of 25% of the children reported that a language other than English was also spoken at home.
Procedure
Teachers implemented four curricular units over the course of an academic year. Each four-week unit was constructed around a particular book and involved weekly reading and play sessions designed to teach 20 words per unit (Authors, 2019). Thus, a total of 80 words were taught throughout the year. Book reading and playful learning activities each included three weekly learning exposures for each target word for four weeks, creating a total of 24 exposures to each word.
Both book reading and play sessions began with a brief review of the target words using picture before moving to the play or reading activity. Before the shared book reading sessions, the teacher used picture cards to introduce the target words and definitions that the class would encounter during the reading. During book reading, teachers were instructed to review the definitions of the target words as they came up in the story, point to illustrations in the book, use gestures to illustrate word meanings and ask questions about the story’s plot to aid comprehension. Play sessions included large and small group games, music, sociodramatic play, and a digital game. As in shared book reading, before the play activity, the teacher introduced the target words and definitions before the play session began and used explicit strategies to promote conceptual understanding of word meanings.
Specifically, book reading and play sessions included three “learning moments” for each of the target words. A learning moment was defined as a single, continuous span of time during which the activity intentionally draws (or maintains) students’ attention to a particular concept, in this case a single target vocabulary word (Authors, 2019). Each learning moment had to include a minimum of two teaching strategies. The following were considered “critical strategies”, meaning that all learning moments had to include at least one of these strategies: defining the word (e.g., “devouring is to eat food very fast”) and elaborating on the word’s meaning by using it in meaningful context or providing meaningful information beyond what was in the definition (e.g., “the boy was weeping because he was sad about losing his toy”). The following were secondary strategies: connecting the word to the child’s life (e.g., “Have you ever gotten in a quarrel with someone”) and illustrating the word with a picture or other visual aid. One of the two teaching strategies could come from this secondary list. Teachers were provided with guidance cards for every play session and book reading, advising teachers how to administer the pre-play or book reading activity, the play sessions or book reading itself, and the post-play or book reading activity.
Children were pre- and post-tested on 25 words (20 taught words; 5 non-exposure control words) for each unit (thus, 100 words total across 4 units) using our novel receptive vocabulary assessment and an expressive vocabulary assessment. The order of administration for the receptive and expressive tests was counterbalanced across children.
All target and control words were uncommon for this age group to ensure that any incremental gains in knowledge about these words could be attributed to the activities in our intervention and not due to exposure outside of our intervention (e.g., in the classroom, at home, etc.). Words classified as ‘Easy’ on the Biemiller (Reference Biemiller2010) scale, words that occurred 6 or more times per 1 million utterances in CHILDFreq (MacWhinney, Reference MacWhinney2014), and words known by 80% of 4th graders on Dale-Chall’s (Chall & Dale, Reference Chall and Dale1995) list were considered too easy for this study. Thus, the words chosen were not those that would typically be known by or taught to preschoolers. Their rarity served to preserve experimental control as part of the intervention.
Receptive vocabulary measure of the target words
The receptive vocabulary measure was administered immediately one week before and immediately one week after each unit on a tablet using an app created specifically for this intervention. Children were first given the opportunity to respond to practice items using known words (e.g., baby). To progress onto the test items, children had to respond to two practice items correctly on the first attempt. Children heard each word once on a blank screen and then again immediately after four response images appeared. For the practice items, children were given feedback if they responded incorrectly.
On the test items, children heard each word once on a blank screen and immediately after the four response images appeared (Figure 1). The app randomized the order of items for each child and randomized the placement of the target and foil images on the screen for each item. Two easy filler items were included spaced evenly throughout the test items to help maintain children’s attention and motivation. No feedback was given for the test or filler items. The position of the images for each test item was randomized at both testing points.
The app was designed to mimic previously established measures of receptive language (PPVT-IV), while adding the benefit of probing a continuum of children’s word knowledge from little to no knowledge (phonological foil), to some knowledge (conceptual or thematic foil), to precise knowledge (target image). Previous researchers have utilized a similar foil structure (Toub et al., Reference Toub, Hassinger-Das, Nesbitt, Ilgaz, Weisberg, H.rsh-Pasek, Golinkoff, Nicolopoulou and Dickinson2018) and these foils allowed us to probe incremental knowledge using children’s errors. Moreover, crafting two semantic foils (conceptual and thematic) provides the possibility of examining the different kinds of semantic information children could be learning. Each target word and its associated foils can be found in Appendix A. The images of each target word and foils for these words can be found at https://osf.io/xekcp/?view_only=d43ea0b126a14aaa83b88f55ed27cc76. In crafting all foil types, we used principles derived from the literature on developing distractors for multiple-choice items, including using plausible distractors, focusing on images that would be familiar to children, and keeping distractors homogeneous in format, style, etc. (Gierl et al., Reference Gierl, Bulut, Guo and Zhang2017).
A phonological foil was identified for each word based on overlapping sounds between the target word and foil. The phonological foil often rhymed with the target word, typically differing by a single phoneme. If a suitable rhyme was not available, a word with the same first phoneme or same stressed phoneme was chosen. A foil was chosen based on 1) what words children might know in each category (i.e., “cannister” for “bannister” was judged to be too difficult) and 2) whether a picture could quickly bring the word to mind. That is, children likely know the word home, but an image representing home is likely to bring to mind the word house before home. Some words were similar on multiple dimensions, e.g., hammer for hammock shared the first three phonemes ([h], [a], and [m]). If possible, we also selected phonological foils that were more closely matched on number of syllables as word length contributes to the level of similarity. The phonological foils were not related either conceptually or thematically to the target word and ended in the present progressive /ing/ when the target did.
Semantic foils were related to the meaning of the target word rather than to its phonology. When a child chooses a semantic foil, this should reflect greater understanding of the target word’s meaning compared to choosing the phonological foil. Subsequent analyses of children’s errors then provide an opportunity to track the refinement of word meanings and offers a more in-depth understanding of children’s incremental learning process during the intervention. Thus, the incorporation of these semantic foils makes this measure sensitive to incremental learning in a way that previous vocabulary assessments were not. We included two types of semantic foils, thematic and conceptual, based on the theory that thematic semantic knowledge may proceed conceptual understanding.
The conceptual foil was in the same taxonomic category as the target. For example, the conceptual foil for cauldron was an image of a cast iron pan (both types of cookware). Foils were chosen to take advantage of children’s prior word knowledge while also ensuring that they shared some conceptual features with the target word. The thematic foil was a word that shares spatial, causal, temporal, or other relations with the target word. Thematic foils were images of words that are frequently found in the same event as the target referent, or “go with” the target item. For example, the thematic foil for cauldron was a bowl of soup.
Some abstract words, like obstacle, were less imageable than concrete words. For these words, we chose a target image that was an example of the abstract concept that would be relatable for preschoolers and the thematic conceptual foils often related to the particular target image rather than the definition of the target word. For example, the image for the target word obstacle was a person struggling to climb over a brick wall, the conceptual foil was a fence with an open gate, and the thematic foil was a small pile of bricks on the ground.
All target images were chosen to match the definitions created for each word in the study and to be appropriate for and relatable to children. Specifically, we browsed free online databases for images. We first selected the image that best matched each target word in terms of relatability for preschool-aged children. Then, we proceeded to find foil images that best “matched” the target images. For example, for awning, we chose an image of a red overhang awning covering two windows. We therefore picked an image of a window for the thematic foil and an image of a red umbrella for the conceptual foil. We chose an image of a person yawning for the phonological foil based on our rhyming criteria. Images were also chosen to be racially diverse as much as possible and were approved in collaboration with our intervention sites to be culturally sensitive for our sample. For each item, images were matched by type (clipart vs. photograph), background (scene vs. white backdrop/background), coloring (bright vs. muted vs. grayscale, etc.), race/age/gender of any people, and saliency.
The analyses described here include 69 of the 80 target words taught throughout the intervention: Eleven words were removed from the data because a sample of 16 adults could not reliably discriminate whether a foil was intended to be phonological vs. thematic or conceptual (average percent identified was more than two standard deviations below the mean of all words). The 20 non-exposure control words utilized in the intervention were assessed separately.
Expressive vocabulary measure of the target words
An expressive vocabulary measure was also administered one week prior and one week after each of the four units using the New Word Definition Test-Modified (NWDT-M; Hadley et al., Reference Hadley, Dickinson, Hirsh-Pasek, Golinkoff and Nesbitt2016). Children were prompted to say and show, using gesture, everything they know about a word. For example, children would first hear, “What does tassel mean?” and then would hear the follow-up question, “Can you tell me or show me anything more about tassel?” Children were first given the opportunity to respond to this prompt using two practice examples with known words. Following the two practice examples, children were tested on all 25 target and control words using two different orders at pre- and post-test. Each version of the expressive test was ordered randomly and then one of the orders was randomly assigned to each child’s pre-test session. Similar to the receptive assessment, two known filler words were dispersed evenly throughout the test to provide encouragement.
Coding
The receptive test was automatically scored by the app. A response was only considered correct if children pressed the target image. We also calculated the proportion of trials where children erred by selecting each of the three foils. Each foil was computed separately to examine a progression of word knowledge, with the phonological foil reflecting no meaning-related information, the thematic foil suggesting a related association, the conceptual foil reflecting deeper, more category-based knowledge, and the target showing a certain level of “mastery”.
Additionally, trained research assistants coded responses on the expressive task for the number of information units provided. A response received a score if it included any relevant information about the word’s meaning such as a synonym, antonym, gesture, superordinate–subordinate relationships, functional or perceptual feature, part-whole relationships, meaningful context, or basic context (Hadley et al., Reference Hadley, Dickinson, Hirsh-Pasek, Golinkoff and Nesbitt2016). To ensure reliability, 20% of the expressive assessments were scored by a Gold Standard coder; trained research assistants coded all the expressive tests and had to reach at least 90% agreement with the Gold Standard coder. If agreement was not achieved, the Gold Standard coder’s score was used, and discrepancies were discussed until agreement was reached. Few children provided more than one information unit per word at both time points. Therefore, this assessment was scored as a binary outcome – correct or incorrect.
Results
Analysis plan
To test the value of our novel framework for measuring the depth of vocabulary knowledge, we focus on which foil children selected for words on which they responded incorrectly. That is, we purposely exclude words where children chose the target image and thus responded correctly, as this type of mastery would be apparent in standard, existing receptive vocabulary measures. Instead, we analyze only incorrect responses to enable an examination of whether variation in the foils chosen can offer information about the development of word meanings even when children do not choose the correct image. For each child, we computed the proportion of their errors at each test session in which they selected each foil type. Each child therefore has six scores – proportion of phonological, conceptual, and thematic errors at pre- and post-test.
Our main analyses were conducted using mixed-effects linear regression to predict these error proportions (using the lmerTest package in R). When foil type is included as a predictor in these models, it is backwards-difference coded so that each level is compared to the one before it. We ordered the levels according to the theoretical depth of knowledge that each foil represents; phonological, thematic, conceptual. Each model therefore produces two comparisons: thematic vs. phonological foils and conceptual vs. thematic foils. When appropriate, we then conducted additional comparisons between conceptual and phonological foils.
The mixed-effects models account for the repeated-measures and nested structure of the data by modeling random intercepts for participants nested within classrooms nested within sites. However, in some cases, these complex random-effects models failed to converge in the statistical software; in these instances, the random effects were systematically dropped in a pre-determined order (classroom, site, participants) until the model was able to run without error (Bates et al., Reference Bates, Kliegl, Vasishth and Baayen2015). Because each unit in the intervention used a different book which entailed different words and occurred at different times throughout the year, we also include book as a control variable in all analyses.
Do children’s foil selections change from pre-test to post-test?
Descriptive statistics on children’s foil choices are shown in Table 1. To test our first hypothesis that children’s foil choices would move towards deeper semantic knowledge at post-test, we predicted the proportion of children’s errors from foil type (phonological, thematic, and conceptual), test session (pre-test, post-test), and their interaction (Table 2). The significant interactions in this model indicate that the relative frequencies of choosing each foil type changed from pre-test to post-test (see Table 1). To help interpret this result, we followed up with mixed-effects regression models for each of the foil types separately, predicting the proportion of errors from session (pre-test, post-test); see Table 3 and Tables S1 through S3 in Supplemental Materials. Note that there were three foils on each trial, so chance responding would predict 33% conceptual, 33% thematic, and 33% phonological.
Note: The left side of the table breaks down the proportion of target versus incorrect images chosen on the assessment at pre- and post-test by words taught in the intervention and control words (words only seen at testing). The right side of the table breaks down the proportion of each foil selection on trials where children responded incorrectly at pre- and post-test by words taught in the intervention and control words (words only seen at testing). Rows do not always sum to 1 due to rounding.
***p <.001 ** p <.01
Selection of phonological foils decreased significantly from pre-test (M = 44.8%) to post-test (M = 36.8%). On the other hand, selection of thematic foils (Mpre = 24.2%, Mpost = 27%) and conceptual foils (Mpre = 31%, Mpost = 36.1) increased significantly from pre-test to post-test. Children’s target responses were not included in this analysis, as the aim was to explore children’s pattern of choosing foil responses, but it is important to note that children chose the target image only 24.3% of the time at pre-test and 46.6% of the time at post-test (Table 1).
Notably, this pattern does not appear to be due to exposure to the pre-test, as the same pattern is not apparent for control words that children were not exposed to during the intervention (Table 4; Tables S4 through S6 in Supplemental Materials). Selection of phonological foils on control words did not change significantly from pre-test (M = 43.8%) to post-test (M = 42.6%), nor did the selection of thematic foils (Mpre = 24.2%, Mpost = 26%) or conceptual foils (Mpre = 31.8%, Mpost = 31.3%). A linear mixed-effect model predicting the proportion of errors including a three-way interaction between foil-type, session (pre-test vs post-test), and word type (target vs. control) (Table 5) showed that the pattern of change in foil choice from pre-test to post-test differed significantly for target and control words.
***p <.001
Is foil selection related to accuracy?
To understand whether children’s choice of a thematic or conceptual foil represents more advanced vocabulary knowledge relative to the phonological foil, we examined how foil choice differed based on children’s overall accuracy on the receptive test. In this analysis, we examined whether children who were more accurate overall on the receptive measure at post-test differed in the foils they chose on trials in which they made errors at post-test (Figure 2). There was a significant foil type x accuracy interaction, indicating that the relation to overall accuracy was different for each foil type (Table 6).
To interpret this interaction, we conducted regressions examining each foil type separately; the full regression results are available in Tables S7 through S9 in Supplemental Materials. Accuracy was significantly and negatively associated with selection of the phonological foil (B = -0.29, p <.01) and significantly and positively associated with selection of the conceptual foil (B = 0.26, p <.01). There was not a significant relation between accuracy and selection of thematic foils (B = 0.04, p =.34). In other words, children who were not very accurate overall were more likely to choose the phonological foil when they made errors. Children who were more accurate overall were more likely to choose the conceptual foil when they made errors.
Does foil choice on the receptive test at post-test relate to children’s expressive test performance at post-test?
In order to compare how the receptive task compared to a more traditional measure of depth of word knowledge, we conducted an analysis looking at whether foil choice on our receptive task at post-test was related to children’s performance on the expressive post-test for a particular word. Because the outcome on our expressive test was binary (correct or incorrect), this analysis employed mixed-effects logistic regression where the dependent variable was a binary variable representing whether a child responded correctly on a single trial of the expressive test. In all other aspects, the models used here followed the same structure and approach detailed in the Analysis Plan section above.
Results showed that children who selected a thematic foil for a particular word were significantly more likely to score points in the expressive test for that word (19.53%) compared to children who selected a phonological foil (11.54%, odds ratio = 1.58; Table 7). This finding indicates that children who may not have possessed full understanding of a word, but rather only had thematic associations of that word, outperformed children who had no semantic understanding of that word on the expressive test.
However, somewhat surprisingly, children who selected a conceptual foil for a particular word were significantly less likely to score points on the expressive test (16.18%) for that word compared to children who selected a thematic foil (19.53%, odds ratio = 0.75). This result may have emerged due to the task demands of the expressive measure. Preschoolers may have had more difficulty verbally expressing conceptual information compared to simply choosing an image representing a conceptual association on the receptive task. We return to this finding and our reasoning in the discussion.
Finally, children who picked the target image on the receptive test for a particular word were more likely to score points on the expressive test for that word (41.81%) compared to children who selected a conceptual foil (16.18%, odds ratio = 2.73). This finding indicates that children who correctly identified an image of a target word were able to express fuller, more accurate information about that word on an expressive test compared to children who could only choose a conceptual association with that word.
Does foil selection indicate readiness for learning?
In a final analysis, to examine whether certain children may have been more ready to learn particular words, we also predicted whether children scored any points on the expressive post-test for a particular word based on which foil type they selected on the receptive pre-test. That is, if children already know some information about a word at pre-test, do they demonstrate more knowledge about that word on the expressive measure at post-test? Here, we omitted any words to which children responded correctly at pre-test. As in the prior section, this analysis uses mixed-effects logistic regression to predict correct/incorrect responding on the expressive test.
Children who selected a thematic foil for a particular word at pre-test were significantly more likely to score points on the expressive post-test (28.12%) for that word compared to children who selected a phonological foil (25.70%, odds ratio = 1.22; Table 8). There was no significant difference in post-test accuracy between children who selected a conceptual foil (26.19%) and children who selected a thematic foil (28.12%, odds ratio = 0.88). A follow-up analysis found no significant difference in post-test accuracy between children who selected a conceptual foil and those who selected a phonological foil (B = 0.03, p =.385, odds ratio = 1.07). These data suggest that thematic knowledge at pre-test may be more facilitatory for learning than phonological knowledge. However, the lack of a significant difference between thematic and conceptual suggests that thematic knowledge and conceptual knowledge may be equally effective in this context.
Discussion
In recent years, language and literacy researchers have started to question how word knowledge is measured in studies that focus on children’s vocabulary learning (Hoffman et al., Reference Hoffman, Teale and Paciga2014; Pearson et al., Reference Pearson, Hiebert and Kamil2007). Word learning takes time; refinement of semantic understanding is not a one-and-done process. However, intervention studies cannot identify incremental changes in word understanding unless researchers use measurement techniques that harness the minute changes of this process. Incorporating assessment methodologies that tap into various levels of knowledge can advance the way we conceptualize vocabulary learning in interventions.
The goal of this paper was to provide a framework for moving beyond a simple “correct” or “incorrect” analysis of children’s word knowledge to provide a more nuanced framework for assessing vocabulary learning. The measurement technique we developed used a systematic approach by including foils from which children must distinguish the correct target image for a given word. The foil types captured word knowledge on a continuum. Phonological foils represent a less sophisticated understanding of word meaning. The two types of semantic foils (thematic and conceptual) reflect children’s partial understanding of a word’s meaning, even when they did not have the more complete knowledge necessary to distinguish the correct target image. Using this measurement technique allowed us to see more depth of knowledge than measures where a child correctly chooses images for 5 out of 10 words, for instance. Specifically, we could see when children were gaining small increases in lexical knowledge by their choice of thematic and conceptual foils.
Indeed, we found that at pre-test, when all children were likely to have limited knowledge about the words, phonological foils were chosen at above chance levels. This indicates that with little lexical understanding of items, children could only rely on how the word sounded and chose the item that sounded most like the target word. Importantly, the fact that children chose phonological foils at higher than chance rates indicates that even at pretest children were using a reasonable strategy for choosing the answer that seemed best with their limited knowledge. We found that when children chose an incorrect response, selection of phonological foils decreased, and selection of thematic and conceptual foils increased from pre- to post-test. This finding implies that even though some children may not have learned enough about the target item to pick the correct image, their semantic understanding of the word indeed progressed throughout the intervention.
Our results also suggest that children with higher levels of knowledge on the semantic continuum performed better on other measures of word knowledge. Choosing the conceptual foil at post-test was positively associated with overall accuracy on the post-test receptive measure, whereas choosing the phonological foil at post-test was negatively associated with overall accuracy at post-test. Similarly, children who picked the thematic foil at pre-test were significantly more likely to score points on the expressive post-test compared to children who chose the phonological foil. This nuanced pattern of results demonstrates that our receptive measure was able to probe depth of word knowledge to illuminate when a child had intermediate levels of knowledge about a word. While they may not have had complete semantic understanding of the item, they knew enough to pick a foil that was “more correct” and were even more likely to be able to provide information about the word on the expressive measure, traditionally considered an assessment of depth of knowledge.
While we predicted that children who demonstrated a higher level of knowledge on the semantic continuum on our receptive task would also express a higher level of knowledge verbally on the expressive test, we saw that children who picked the conceptual foil were significantly less likely to score on the expressive measure compared to those who picked the thematic foil. This suggests that thematic knowledge may be more predictive of ability to express word learning. Why might this be?
It is possible that thematic and conceptual relations impact target words differently depending on the nature of the task (Mirman et al., Reference Mirman, Landrigan and Britt2017). In fact, children often exhibit receptive understanding of a word without being able to produce expressive knowledge of that word (Henriksen, Reference Henriksen1999; Verhallen & Bus, Reference Verhallen and Bus2010). The receptive measure may have been easier for children to demonstrate evidence of conceptual knowledge because they could simply examine the four images and choose the one that depicted their understanding of the target word instead of having to come up with the information themselves. For instance, there may have been familiar perceptual features of the conceptual foil that children could easily identify.
On the other hand, the expressive task requires children to reach into their mental lexicon to produce a definition. For example, for the target word awning, a child who had conceptual knowledge of the word meaning (awning is a type of covering) could look at the images in Figure 1 and select an image that shows a covering (e.g., umbrella). However, it may have been more difficult to produce the precise definition of awning as a covering. Our preschool-aged participants may have had trouble producing relevant functional and conceptual information related to a target item. In comparison, a child who had a thematic understanding of the word awning (awnings go with houses and windows) could more easily verbally express that information on the expressive task. These thematic relations may have been more familiar to children and therefore easier to produce. These possibilities suggest that it is important to probe depth of word knowledge both receptively (an assessment that requires few task-demands) and expressively to gain a holistic understanding of children’s word representations.
Moreover, we found no difference in how likely children were to score expressive post-test points if they picked the conceptual or thematic foil at pre-test. This demonstrates that knowing one kind of semantic information before the intervention was not more beneficial than the other. This is somewhat surprising given the notion that thematic word associations precede conceptual semantic understanding (Cronin, Reference Cronin2002). However, as mentioned previously, recent findings in the word learning literature reveal that the shift from thematic to conceptual knowledge may not be as clear as previously reported (Arias-Trejo & Plunkett, Reference Arias-Trejo and Plunkett2013). Moreover, categorical and thematic similarity often co-occur and can be difficult to disambiguate (e.g., cow and sheep are both categorically related and thematically related in that they co-occur in a farm context). It is possible that possessing either thematic or conceptual information of a word indicates an equal level of inclination to gain full semantic understanding of that word.
Overall, our findings suggest that word learning not only requires building semantic relationships between related words, but also refining representations to a precise understanding of the word. Children were able to leverage semantic connections made between known words and newly acquired words in order to make “better” guesses as to a word’s meaning. For instance, if a child knew that our target word, thermos, had something to do with a cold temperature, they might use their partial knowledge to choose a foil pertaining to ice. However, participants could only identify the correct referent if they knew enough about the target word to inhibit their choice of incorrect, but related words.
These findings are directly in line with research showing that children’s semantic representations go through a narrowing process (Bergelson & Aslin, Reference Bergelson and Aslin2017; Clark, Reference Clark1973, Reference Clark, Sinclair, Jarvella and Levelt1978, Reference Clark1987; Hendrickson et al., Reference Hendrickson, Poulin-Dubois, Zesiger and Friend2017; McGregor et al., Reference McGregor, Friedman, Reilly and Newman2002; Seston et al., Reference Seston, Golinkoff, Ma and Hirsh-Pasek2009). A child may start out with limited understanding of a vocabulary word, but as they experience more and more language exposure, their knowledge of the item becomes more sophisticated. Indeed, some theories suggest that words are acquired by accumulating partial information over many exposures to the word (Yu & Smith, Reference Yu and Smith2007; Yurovsky et al., Reference Yurovsky, Fricker, Yu and Smith2010). In this way, children were learning aspects of word meanings over the course of the intervention which allowed them to choose the thematic and conceptual foils, even when they did not know enough information to correctly choose the target image. By examining the change in foil selection from pre- to post-test, we were able to capture this learning process.
Implications and future directions
The measurement technique used in this vocabulary intervention was a way to begin probing nuances in children’s semantic growth in a way that previous assessments have not. Future work could build on this foundation to gather a more pointed picture of what participants are learning in vocabulary studies. Moreover, future research should explore whether early childhood teachers and administrators could use an assessment like this to track what information students are learning and how pedagogical approaches in the classroom could be improved.
For example, if a majority of students are choosing phonological foils on a formative assessment, it may be beneficial for a teacher to tailor his or her instruction to spend more time focusing on basic definitional information in lieu of knowledge that is too sophisticated for students’ current understanding. Likewise, if students are often picking thematic or conceptual foils or even choosing the correct referents, a teacher could modify instruction to incorporate more complex ideas surrounding the vocabulary items. Future research could also investigate whether differentiated pedagogical approaches could improve learning outcomes for children who begin with differing levels of prior knowledge.
Moreover, using this type of assessment allows researchers to monitor how much time children may need to effectively learn vocabulary items. While literacy researchers are focused on the various types of instruction that may be important for vocabulary learning, it is possible that children simply need more time in intervention studies using current pedagogical approaches to progress through the various stages of word learning. For example, traditional receptive measures might not be sensitive enough to see improvement from pre- to post-test after a particular intervention, but the use of this type of approach would allow researchers to determine whether children are showing incremental gains as a response to the intervention, indicating that the pedagogical strategies are working as intended but perhaps a higher dosage or duration is needed to improve children’s vocabulary more substantially.
One major assumption of our study is that the differences in children’s selection of foils are tied to their depth of knowledge of the target items. However, children’s individual knowledge of the foils themselves may play a role in their choices on the receptive task. For instance, culturally and linguistically diverse children may interpret various words in ways that are relevant to their own environmental contexts. While we included educators in the planning process of these materials, our participants’ varying experiences could impact their knowledge of foils, and in turn, the choices they made on the receptive task. Indeed, there was considerable variability in the difficulty of items (see Table S10 and Figure S1 in Supplemental Materials), which may be partially attributable to variability in the difficulty of foil choices. Future research aiming to use this approach to guide instruction should consider conducting work to provide norming across items and foils and pilot testing of items to ensure that they are most effective for a particular project’s goals and a project’s target participant population. Notably, more systematic norming would likely lead to even stronger effects than those seen here as more variability would actually make it harder to see the significant effects reported here.
Furthermore, this study’s sample only included children from economically-disadvantaged backgrounds. It may be advantageous for researchers to utilize this kind of assessment with children from higher-socioeconomic status backgrounds in order to analyze the generalizability of this framework. Moreover, it may be the case that children from different home environments and with different language experience bring different initial knowledge to vocabulary learning. By testing children of varying backgrounds, it would be possible to not only compare how well words were learned following an intervention, but also the kind of information children bring to the learning task given their language experience. That is, are children from more advantaged backgrounds likely to have more intermediate knowledge at pre-test and thus more likely to learn new words simply because they have less new information to learn? The current measure opens the door to a greater understanding of these and other important questions about children’s vocabulary learning.
Conclusion
The novel framework and measurement technique used in this study begins to answer the calls of researchers to more wholly assess children’s vocabulary knowledge in a way that incorporates a strong theoretical foundation, but is also brief to administer and does not require extensive coding. As opposed to other traditional measures typically used in the field, the measurement technique used here was able to capture where children might fall on various levels of the semantic knowledge continuum. Being able to assess word knowledge in a more nuanced way will be valuable for researchers attempting to understand potential incremental effects of much-needed vocabulary interventions.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0305000924000278.
Acknowledgements
Thank you to the administrators, teachers, parents, and children at the Acelero Learning in Philadelphia and the Nashville Public Schools. We thank members of the Temple Infant and Child Lab, especially Jacob Shatz, the UD Child’s Play Learning and Development Lab, and David Dickinson’s lab at Vanderbilt University for their assistance in data collection and coding, as well as Nick Rogers for creating the digital receptive vocabulary measure. This work was supported by the Institute of Education Sciences [R305A150435, R305B130012].
Competing interest
The author(s) declare none.
Appendix A
Target and Control Words with Foil Choices