Introduction
Understanding the memory processes that underlie word learning helps us understand how children build vocabulary knowledge (Wojcik, Reference Wojcik2013). Additionally, understanding these processes is essential to implement effective vocabulary instruction. To date, the majority of word learning research has focused on fast mapping, how children form initial representations of words during a single session (Carey, Reference Carey2010). Thus, much of the research on memory processes in word learning has focused on working memory (Baddeley, Reference Baddeley2003; Gray et al., Reference Gray, Levy, Alt, Hogan and Cowan2022). However, children typically encode shallow representations of words during a single session (Munro et al., Reference Munro, Baker, McGregor, Docking and Arciuli2012). Without additional experiences with the words, these representations can be readily forgotten (Horst & Samuelson, Reference Horst and Samuelson2008). A primary goal of vocabulary instruction is to support both encoding and retention. Thus, it is important to understand how children build a robust representation of a word across multiple experiences, known as slow mapping (Davis & Gaskell, Reference Davis and Gaskell2009). Additionally, it is important to identify components of training that are likely to support retention after explicit instruction with a word has ceased (Vlach & Sandhofer, Reference Vlach and Sandhofer2012).
The slow mapping process includes several key stages. During the first experience with a word, a child can encode initial representations of the word form (i.e., the phonemes and their order), the word meaning, and the link between the two. All or some of this information can be retained during sleep via consolidation or the information can be forgotten (Davis et al., Reference Davis, Di Betta, Macdonald and Gaskell2009). During the next experience with the word, the child can reactivate consolidated representations of the form, meaning, and link. During this second period of input, she can add to and strengthen these representations, a process known as re-encoding (Nader & Hardt, Reference Nader and Hardt2009). Typically, a child requires extensive experiences with a word before she can readily retrieve and produce the word form and before she develops a rich understanding of the word meaning (McGregor et al., Reference McGregor, Sheng and Ball2007). Given this research, current recommendations for best educational practice include targeting the same words via rich experiences across multiple training sessions.
In addition to understanding slow mapping to support word learning, it is important to identify training strategies that are likely to support post-training retention. Target items that are presented during training are often forgotten over post-training delays and the longer the delay, the more likely they are to be forgotten (Gordon et al., Reference Gordon, McGregor, Waldier, Curran, Gomez and Samuelson2016). However, the type of training can influence how well target information is retained (Vlach & Sandhofer, Reference Vlach and Sandhofer2012). Two strategies that support post-training retention are retrieval-based and spaced practice (Adesope et al., Reference Adesope, Trevisan and Sundararajan2017). Retrieval-based practice includes asking the learner to retrieve target information during instruction in contrast to passive presentations. Spaced practice includes spacing presentations across time in contrast to giving presentations close together in time (see Gordon, Reference Gordon2020). Both retrieval-based and spaced practice introduce desirable difficulties, aspects of training that make learning more effortful but lead to better long-term retention (Bjork & Kroll, Reference Bjork and Kroll2015). Most of this research has been conducted with adults. However, retrieval-based and spaced practice support learning and retention of educationally relevant material in children (Fazio & Marsh, Reference Fazio and Marsh2019) including words (Goossens et al., Reference Goossens, Camp, Verkoeijen and Tabbers2013; Leonard & Deevy, Reference Leonard and Deevy2020).
The relationship between performance during training and post-training retention
One benefit of retrieval-based practice is that performance with a target item during training indicates the likelihood that the item will be retained post training (Storkel, Reference Storkel2015). For example, the number of times adult learners retrieve words during a single session positively relates to the probability of retrieval after a 24-hour delay (Gordon et al., Reference Gordon, McGregor and Arbisi-Kelm2021a). Currently, we lack information about how children’s performance with words during training relates to post-training retention. We are aware of one notable exception. Kueser et al. (Reference Kueser, Leonard, Deevy, Haebig and Karpicke2021) assessed how training performance related to retention with data from a series of word-learning studies by Leonard and colleagues (see Leonard & Deevy, Reference Leonard and Deevy2020 for a review). In these studies, preschool-age children engaged in two training sessions utilizing spaced-retrieval practice. Consistent with the findings from adults, Kueser et al. (Reference Kueser, Leonard, Deevy, Haebig and Karpicke2021) found that the number of times a word form was retrieved during training was positively related to the probability of retrieving and producing the word form after a one-week delay.
Successful retrievals across multiple sessions
As noted, research on the relationship between training performance and post-training retention has focused on adult learners. An additional limitation to this research is that it tends to focus on one training session and one delayed assessment of retention. Given that robust word learning requires multiple training sessions, it would be helpful to understand how performance across sessions relates to retention in children. Rawson and Dunlosky’s (Reference Rawson and Dunlosky2012) research with adult learners is unique in the retrieval-based literature because they studied training across multiple sessions. They asked psychology students to practice retrieving course concepts until each concept was retrieved one or three times during a single session. A criterion of three led to better retention over one-month and four-month delays. However, after the first session students were assigned one to five relearning sessions. As the number of relearning sessions with successful retrievals increased, the benefit of practicing each concept until it was retrieved three times during the first session decreased (see also Vaughn et al., Reference Vaughn, Dunlosky and Rawson2016). Notably, the time spent studying each concept relative to the probability of post-training retention was more efficient when students successfully retrieved concepts one time each across four sessions instead of successfully retrieving concepts three times during the initial session and during one relearning session.
Findings by Rawson and Dunlosky (Reference Rawson and Dunlosky2012) indicate that training is more effective and efficient when students successfully retrieve target items across multiple sessions as opposed to retrieving target items many times during the initial learning session. To understand this finding, it is important to consider the underlying memory processes. When a learner is asked to retrieve an item multiple times during a training session, she is retrieving information that was recently activated in working memory. Successful retrieval during a session is a good indication of learning. However, successfully retrieving an item at the beginning of a second session is a better indication of robust learning. The learner had to encode the item during the previous session, consolidate the item during overnight sleep, and then successfully retrieve the item from long-term memory (Davis & Gaskell, Reference Davis and Gaskell2009). If a learner is successful at the beginning of a second session, they might stop studying the item at that point. However, Rawson and Dunlosky’s (Reference Rawson and Dunlosky2012) research indicates that successful retrievals at the beginning of multiple sessions are optimal to support long-term post-training retention.
The current study
Currently, it is unknown how children’s performance with words across multiple sessions relates to the probability of post-training retention. To address this question, we conducted additional analyses on data from 4- to 6-year-old children with typical development (Gordon et al., Reference Gordon, Storkel, Lowry and Ohlmann2021b, Reference Gordon, Lowry, Ohlmann and Fitzpatrick2022). In these studies, children were trained on a set of word form-object pairs across subsequent days via spaced-retrieval practice until they demonstrated learning of all words or completed a total of six training sessions. Children were asked to retrieve forms for objects at the beginning and end of each session and were asked to retrieve forms for objects one month after their last training session. In the current analyses, we only included data from children who completed a total of six training sessions. We selected this sample as these children all had the same number of training sessions to potentially retrieve word forms. Additionally, these children struggled the most with the task from the typically developing group. Thus, current results can inform educational and intervention practices for children with typical development who struggle the most with word learning.
The data set that we include in the analyses is unique in the current literature in several key ways. First, all children were trained on the same words across six sessions on subsequent days. As noted, most studies on retrieval-based practice include one training session and one delayed testing session. Second, we assessed children’s memory of words one month after their last training session. Post-training retention, especially the ability to retrieve and produce word forms after a long-term delay, is rarely assessed in either children or adults. Third, these are the first analyses, that we are aware of, that assess the relationship between performance on individual words across multiple training sessions and the probability of long-term retention in children. Our key question is: How does the number of training sessions in which words were successfully retrieved relate to the probability of retrieval after a one-month delay?
Method
Participants
The Institutional Review Board at Boys Town National Research Hospital approved all protocol and recruitment methods. The current analyses included data from 24 typically developing (TD) children between the ages of 4:4 - 6:11 years who completed six training sessions. This included n = 19 out of n = 43 children with TD from Gordon et al. (Reference Gordon, Lowry, Ohlmann and Fitzpatrick2022) and n = 5 out of n = 9 children with TD from Gordon et al. (Reference Gordon, Storkel, Lowry and Ohlmann2021b). Children’s racial/ethnic backgrounds included: White/non-Hispanic = 15, White/Hispanic = 1, Black/non-Hispanic = 1, Hispanic/race not reported = 1, biracial = 5, and information not provided = 1. All children were speakers of Standard American English with no reported exposure to a second language and demonstrated normal hearing via a pure-tone audiometric screening. All children demonstrated a typical nonverbal IQ via the Weschsler Preschool and Primary Scale of Intelligence-Fourth Edition (Wechsler, Reference Wechsler2012). Children completed the Peabody Picture Vocabulary Test-Fourth Edition (Dunn & Dunn, Reference Dunn and Dunn2007) to assess receptive vocabulary knowledge (Supplemental S1, Table 1).
Stimuli
Stimuli included nine lab-created forms. Six included two syllables, /bɪnɪp/, /grɑmɚ/, /kinɪt/, /nedɪg/, /sibl̩/, and /topɪn/, with a consonant-vowel (CV) syllable structure of CV.CVC or CCV.CVC, and with stress on the first syllable. Three included one syllable, /dob/, /mep/, and /plun/. Forms varied in initial consonant or consonant cluster. All forms had a relatively high phonotactic probability and low neighborhood density to promote learnability. Each form was paired with one of nine unfamiliar objects that varied in color, material, shape, and size (see Gordon et al., Reference Gordon, Storkel, Lowry and Ohlmann2021b, Reference Gordon, Lowry, Ohlmann and Fitzpatrick2022).
Training and testing
All children completed six training sessions on consecutive days with no missing days. During training sessions, children engaged in blocks of retrieval tasks (see Supplemental S1, Table 2). At the beginning of each session, with the exception of the first session, Gordon et al., (Reference Gordon, Storkel, Lowry and Ohlmann2021b, Reference Gordon, Lowry, Ohlmann and Fitzpatrick2022) asked the child to name each object (e.g., “What is this one called?”). Any object that was correctly named was not included in the training blocks for that session. To be counted as correct, children were required to produce all target phonemes in the right order with no additional phonemes added. Training Block 1 included labeling each object and asking the child to repeat the labels. In training Blocks 2 and 3, the child was asked to indicate which of two target forms linked to a specific object (e.g., “What is this one called? Is it a /nedɪg/ or a /sibl̩/?). Feedback included a presentation of the target form (e.g., “Yes that’s right this is a /nedɪg/” or “Actually this is called a /nedɪg/”). End of session testing blocks were administered for all form-object pairs. During the end of session naming test, if the child did not provide the correct first CV or CCV, she was immediately provided with the first CV or CCV as a cue (e.g., “It starts with /ne/… ”).
One month after the sixth training session, children were administered the naming test without cuing which constitutes the outcome variable for the current analyses (Supplemental S1, Table 3). The cued recall test was only administered if the child failed to produce the correct initial CV/CCV. The child was then administered a 4AFC test, with four word forms to choose from, followed by an end of session free and cued recall naming test.
In the original studies, children were administered a retest session, which mirrored the protocol of the one-month session, at various timepoints between the sixth training session and the one-month session. Note that the retest session included testing blocks but no training blocks. However, we did provide feedback during the final naming test. In Gordon et al. (Reference Gordon, Lowry, Ohlmann and Fitzpatrick2022) children were assigned to be retested one week or two weeks after training, or not retested before the one-month session. In Gordon et al. (Reference Gordon, Storkel, Lowry and Ohlmann2021b) words within child were split into three sets of three and each set was assigned to the one week, two weeks, or no retest conditions. For the current analyses, for each word within child we coded whether that word was retested one week or two weeks after training, or not retested.
Analyses and results
For each word within each child, we calculated the number of training sessions that the child correctly produced the form in the beginning of session naming test (see Supplemental S2, Figure 1). As children were not administered the free recall test at the beginning of the first session, the number of sessions correct ranged from zero to five (mean = 1.38, SD = 1.65). On average, children correctly produced 4.58 (SD = 2.44) of the 9 words at the beginning of at least one session during training. Across all children, there were 110 words with at least one correct production at the beginning of a session. For 85 of those words, once the word was produced correctly it was produced correctly at the beginning of all subsequent sessions. Of the 25 words that were not produced consistently, 19 of them had only one session with an incorrect response once the word was produced correctly (e.g., retrieved sessions 3, 4, and 6).
To address the primary question, we conducted a generalized linear mixed-effects model in an R environment using the lme4 package. The random effects of the maximal model included intercepts for participant and word. We identified the minimal random effects structure such that random effects that did not significantly improve model fit were omitted using Akaike information criteria. Our outcome variable was the production of each word (correct, incorrect) at the beginning of the one-month session. We selected a dichotomous response because the phonological precision of productions was highly binomial (see Supplemental S2, Figure 2). Because the outcome was dichotomous, we fit a model with the log odds of a correct production.
The predictor variables included the number of sessions the form was produced correctly during training, retest condition, sex, age in months, maternal education, and PPVT standardized score. We also included an interaction to assess whether the relationship between the number of sessions correct and performance at one month varied based on retest condition. We contrast coded sex as -.5 and .5 so that model results reflected main effects of all participants (Brehm & Alday, Reference Brehm and Alday2022). We centered the continuous predictor variables which included age, maternal education, and PPVT score. The predictor variables had small correlations with each other – thus, they were unlikely to capture similar variance in the model (Supplemental S3, Table 1). However, we conducted an additional model in which we excluded maternal education and PPVT, and the results were highly similar (Supplemental S3, Table 3).
The random effects structure supported by the data included a random intercept for participant. The number of sessions correct for each word positively related to the probability of a successful production of that word at the one-month test (B = 1.26, z = 7.19, p < .0001; see Table 1 and Supplemental S3, Table 2). Sex, age, maternal education, and PPVT did not significantly relate to one month performance. There was no significant interaction between the number of sessions correct and retest condition.
As exploratory analyses, we conducted pairwise comparisons to assess the difference in retention based on the number of sessions correct (e.g., two vs. three, Supplemental S3, Table 4). Retrieving a word across three sessions contributed to better retention than zero (t = -8.34, p < 0.001) or one (t = -4.22, p < 0.001) session. Similarly, retrieving a word across four sessions contributed to better retention than zero (t = -12.27, p < 0.001) or one (t = -6.09, p < 0.001) session. These results suggest that participants increase the probability of post-training retention if a word is successfully retrieved across three or four sessions. In contrast, there is not a benefit to successfully retrieving a word across five sessions as retention of words retrieved across five sessions did not differ from three (t = -1.48, p = 0.15) or four (t = 0.09, p = 0.93) sessions. These results should be interpreted with caution, however, given the current sample size.
Discussion
We reanalyzed data from Gordon et al. (Reference Gordon, Storkel, Lowry and Ohlmann2021b, Reference Gordon, Lowry, Ohlmann and Fitzpatrick2022) to determine how the number of sessions in which words were successfully retrieved during training related to the probability of retrieval after a one-month delay for 4- to 6-year-old children. We found that as the number of sessions in which words were retrieved during training increased, the probability of post-training retention also increased. Notably, past research focuses on the number of times each target item is retrieved during one or two training sessions (e.g., Kueser et al., Reference Kueser, Leonard, Deevy, Haebig and Karpicke2021). The current analyses extend this work in that we assessed how the number of sessions words were retrieved by children related to post-training retention.
The research by Rawson and colleagues in which adult learners engaged in multiple training sessions and then were assessed after one- and four-month delays is similar to the current methodology and analyses (Rawson & Dunlosky, Reference Rawson and Dunlosky2011, Reference Rawson and Dunlosky2012; Vaughn et al., Reference Vaughn, Dunlosky and Rawson2016). For example, in Rawson and Dunlosky (Reference Rawson and Dunlosky2012) if the adult learner successfully retrieved a course concept at the beginning of a session, she did not engage in additional study of that concept during that session. Similarly, in Gordon et al. (Reference Gordon, Storkel, Lowry and Ohlmann2021b, Reference Gordon, Lowry, Ohlmann and Fitzpatrick2022) forms that were successfully retrieved at the beginning of a session were not included in the training blocks during that session. Notably, in Gordon et al. (Reference Gordon, Storkel, Lowry and Ohlmann2021b, Reference Gordon, Lowry, Ohlmann and Fitzpatrick2022) retrieving a form at the beginning of a session was a relatively good indication of learning. The child had to successfully encode the phonemes that make up the form during the previous session, consolidate the form during overnight sleep, and then retrieve the form from long-term memory and produce it correctly. Learning word forms to this level is a difficult task, even for children with typical development (Gray, Reference Gray2005; Munro et al., Reference Munro, Baker, McGregor, Docking and Arciuli2012).
Although the successful retrieval and production of a word form at the beginning of a session is a relatively good indication of learning, the current study demonstrates that children benefit from additional retrieval opportunities. Specifically, retrieving a form correctly across multiple sessions contributed to better long-term retention than only retrieving the form at the beginning of one session. An open question, to address in future research, is how to optimize practice time for individual words relative to the probability for long-term retention. Specifically, we should determine how many sessions a word should be retrieved before dropping it from training and whether a word should be included in training blocks even after the child produces it correctly.
The findings of the current analyses provide information to inform educational and intervention practices. However, it is important to note limitations. First, words were either retested one or two weeks after the training sessions, or not retested which may have affected performance at one-month. In the current analyses, we did not find an effect of retest condition or an interaction between the number of sessions correct during training and the retest condition on the probability of retrieval after one month. Of note, in the original publications there was not compelling evidence that retest condition benefited retention over one-monthFootnote 1. However, further research in which words are not retested would provide additional information about the relationship between training performance and post-training retention.
A second limitation is sample size. With the current sample size, we found that as the number of sessions a word was retrieved increased, the probability of long-term retention also increased. Rawson and Dunlosky’s (Reference Rawson and Dunlosky2012) research is more specific in that they systematically compared the benefits of continuing to test an item across sessions until the learner correctly retrieved the item across three sessions as opposed to two sessions, etc. Through these analyses, they found that there is a point of diminishing returns after which continuing to retrieve an item across additional training sessions does not provide significant benefits to long-term retention. Through their line of research, they concluded that items should be retrieved across three or four spaced sessions before dropping them from the training set to optimize the benefits of study time relative to retention. Our exploratory analyses provide a similar conclusion. However, with a larger sample size, we can better identify the optimal number of sessions in which words should be retrieved to support long-term retention.
Implications for educational and intervention practices
Current educational and intervention best practice includes training the same words across multiple sessions to foster the robust learning of word forms and meanings (Beck et al., Reference Beck, McKeown and Kucan2013). The current results suggest an additional benefit of targeting the same words across multiple sessions. Specifically, continuing to retrieve a word across sessions, even after the word is technically learned, supports long-term post-training retention. Unfortunately, observations of preschool and kindergarten classrooms reveal that explicit, repeated, and systematic instruction with the same words is often lacking (Wright & Neuman, Reference Wright and Neuman2014). Instead, words are selected and targeted incidentally, such as a teacher providing a definition when a student asks about a word.
Even when words are targeted explicitly and systematically across multiple sessions, children with typical development vary in the rate that they learn them based on both word-level (Hadley et al., Reference Hadley, Dedrick, Dickinson, Kim, Hirsh-Pasek and Golinkoff2021) and child-level factors (Gordon et al., Reference Gordon, Storkel, Lowry and Ohlmann2021b). For example, in Gordon et al. (Reference Gordon, Lowry, Ohlmann and Fitzpatrick2022), we found that current verbal working memory skills and vocabulary knowledge related to the amount of information children encoded about words during each training session. One potential solution to address this variability in encoding is to use word learning apps that tailor the amount of practice with each word to the individual learner to optimize post-training retention (Hirsh-Pasek et al., Reference Hirsh-Pasek, Zosh, Golinkoff, Gray, Robb and Kaufman2015). Word learning apps should not replace live instruction as interactive rich instruction is important for children’s vocabulary development (Beck et al., Reference Beck, McKeown and Kucan2013). However, an app that coordinates with classroom lessons could provide additional practice for an individual child based on words she has more difficulty learning. There are promising findings that apps and interactive websites can support both learning and retention of words when they adhere to evidence-based principles of learning and memory (McGregor et al., Reference McGregor, Marshall, Julian and Oleson2019; Settles & Meeder, Reference Settles and Meeder2016). Even though popular vocabulary apps for adult learners incorporate criterion-based principles (Settles & Meeder, Reference Settles and Meeder2016), we have yet to incorporate these principles effectively in apps for young children. Given the amount of time that children engage with apps (Dore et al., Reference Dore, Shirilla, Hopkins, Collins, Scott, Schatz, Lawson-Adams, Valladares, Foster, Puttre, Toub, Hadley, Golinkoff, Dickinson and Hirsh-Pasek2019), it is important that these apps leverage principles of learning and memory to optimize both learning and retention.
Future directions and conclusion
Regarding future research, in the current study we focused on children’s ability to learn forms. However, it would be helpful to conduct similar research with word meanings. Another avenue of future investigation is understanding how the spacing of training sessions affects long-term retention. Daily vocabulary practice may be impractical in educational or intervention settings and may not be necessary to derive the benefits of successfully retrieving a word at the beginning of multiple sessions. Through further research we can identify schedules of vocabulary instruction, or vocabulary instruction paired with apps, that optimize training efficiency and long-term retention. Overall, by better understanding the memory processes that are integral to word learning, we can better support children’s ability to both learn and retain taught words. In this way, vocabulary instruction can be optimized to support vocabulary development and to improve other important long-term outcomes such as children’s academic performance and social skills.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0305000923000752.
Acknowledgements
We thank Ron Pomper for providing feedback on an earlier version of this manuscript.
Competing interest
The authors declare none.