1. Introduction
Errors in young children’s speech have been studied as windows into the process of language acquisition (e.g., Kam & Newport, Reference Kam and Newport2009; Lidz & Gagliardi, Reference Lidz and Gagliardi2015; Saffran et al., Reference Saffran, Aslin and Newport1996; Yang, Reference Yang2004). For example, over-regularisation errors in young children have been interpreted as evidence that they know and use the rules of their target language but have over-applied a rule when faced with an exception (e.g., Brown, Reference Brown1973). In other words, children’s production of I goed to the zoo suggests that children know that adding the morpheme -ed denotes past tense, but they have yet to learn that the verb to go is an exception to this otherwise regular pattern.
Not all errors have such straightforward explanations, particularly in the domain of syntax. Some examples from English include possessor extraction, as in (1) and subject-auxiliary inversion in embedded interrogatives, as in (2).
Such errors are not limited to production. Children have been reported to interpret subject-less sentences (e.g. Play with blocks) as null subject sentences (e.g., They play with blocks) rather than as imperatives (Orfitelli & Hyams, Reference Orfitelli and Hyams2012). All these errors are of particular interest because, while none of these structures is used in English, each is used in another language. Possessor extraction constructions are attested in Chamorro and Hungarian (Chung, Reference Chung1991; Szabolcsi, Reference Szabolcsi1983); subject-auxiliary inversion in embedded interrogatives is viable in Spanish (Torrego, Reference Torrego1984); and null subject sentences are grammatical in most Romance languages. Below, we refer to syntactic errors in Child English that correspond to grammatical structures in other languages as Syntactic Creativity Errors, following Schulz (Reference Schulz2011).
Explaining these deviations from adult production and comprehension has been the subject of much debate. They might be the result of immature competence if the child is temporarily using an alternative UG-licensed grammar. Under either a Principles and Parameters or an Optimality Theory (OT) model, the innate language faculty provides access to potential syntactic variations and learners eventually select the right grammar based on exposure. The innate access to the principles and parameters of UG allows for temporary adoption of the incorrect parameter setting or constraint ranking while choosing among competing grammars (e.g., Yang, Reference Yang2002; Legendre et al., Reference Legendre, Vainikka, Hagstrom and Todorova2002; Snyder, Reference Snyder2007).
Alternatively, children might have a good grasp of the grammar but make such errors because of immature processing mechanisms, as documented in much recent psycholinguistic research (for reviews: Omaki & Lidz, Reference Omaki and Lidz2015; Snedeker, Reference Snedeker and von Gompel2013). Thus, it is possible that examples of non-adult-like productions and comprehension could be despite children’s competence rather than because of it.
This article investigates the relationship between children’s processing mechanisms and two classic errors in children’s production and comprehension of English: the production of “medial” question words in the production of biclausal information questions (Thornton, Reference Thornton1990) and the misinterpretation of questions which include a question-word at the clausal boundary (de Villiers & Roeper, Reference de Villiers and Roeper1995); see (3) and (4) below. We will begin by reviewing previous work on these errors (Section 2.1). We will also give a brief overview of processing models of production and comprehension (Section 2.2). We then describe an experiment investigating children’s production, comprehension, and WM abilities (Section 3). Our findings are consistent with the claim that these errors are the result of immature processing mechanisms (Section 4). We suggest that errors in comprehension and production of complex wh-structures stem from different processing mechanisms leading to different error patterns across participants. We argue that the medial question words children produce are either examples of syntactic blends due to immature planning and attention (a la Jager, Reference Jager2005) or grammatical workarounds in the form of sequential, monoclausal questions which children resort to when over-taxed. We argue that children’s comprehension errors in response to questions with question words at the clausal boundary are the result of immature inhibitory control.
2. “Medial questions”: a specific case
One classic error made by English-speaking children involves an “extra” question word. In production, children sometimes include an additional question word at the beginning of embedded clauses in long-distance (LD) questions, as seen in (3).
Thornton (Reference Thornton1990) reports individual children making this error consistently in production tasks. Liter et al. (Reference Liter, Grolla and Lidz2022); Lutken et al. (Reference Lutken, Legendre and Omaki2020) and others report similar findings with similar elicited production tasks.
Children also show evidence of misusing the question word at the embedded clausal boundary in their comprehension. It has repeatedly been shown (e.g., de Villiers & Roeper, Reference de Villiers and Roeper1995; Thornton, Reference Thornton1995; de Villiers et al., Reference de Villiers, Kotfila, Klein, Ionin and Rispoli2019; de Villiers & Pyers, Reference de Villiers and Pyers2002; de Villiers et al., Reference de Villiers, Roeper, Bland-Stewart and Pearson2008; Lutken et al., Reference Lutken, Legendre and Omaki2020) that in questions-after-stories tasks, children will respond to questions such as (4) by “answering” the second question-word:
In (4), what, is identical in form to a question word. Rather than responding to the matrix question word, how, children sometimes mistakenly take what to be a “true” question word and the one to answer. In other words, they have interpreted (4) to mean (5):
What causes these wh errors? The competence explanation, suggested by earlier literature (e.g., Thornton & Crain, Reference Thornton, Crain, Hoekstra and Schwartz1994; de Villiers & Roeper, Reference de Villiers and Roeper1995), exploits the surface resemblance between this error pattern and Wh-Scope Marking structures (WSMFootnote 1) which are grammatical in languages like German and Hindi, but not in English. WSM languages typically include a question word in the medial position as well as a dummy question word in the matrix clause which marks the scope of the true question word (McDaniel, Reference McDaniel1989). This is illustrated in (6), from German, in which the question word is mit wem and the clause-initial Scope-Marker is was:
It is possible that both errors in (3) and (4) stem from the same source and this is an example of Syntactic Creativity: English-speaking children who produce this error might do so because they have temporarily misjudged English as a WSM language. The child thinks WSM is the target construction, and their behavior reflects that.
Under a performance explanation, children are using English grammar, but some aspect of their processing prevents them from doing so successfully. In production, Thornton (Reference Thornton1990) suggests this error is the result of a “failure to delete” the copied question word and argues these errors are evidence for successive cyclic movement of question words. Liter et al. (Reference Liter, Grolla and Lidz2022) build on this proposal and suggest that immature processing mechanisms, specifically inhibition control, are the cause of children’s failure to delete the medial question word. Lutken et al. (Reference Lutken, Legendre and Omaki2020) instead suggest that the repetition of the question word at the clausal boundary, e.g., in (3), is a re-articulation of the question word as an effort to reactivate the question word which is fading from memory (similar to Kroch (Reference Kroch, Hendrick, Maseh and Miller1981)’s explanation of resumptive pronouns). While these arguments give a straightforward explanation of “copy-like” constructions ( Who do you think who is the good fairy?) which use the same question-word twice, productions with distinct question words ( What do you think who is the good fairy?) require further explanation. Liter et al. (Reference Liter, Grolla and Lidz2022) suggest that, because different question words might have different background levels of activation, a child might intend to produce who, but if what has a higher activation level, they might accidentally say what. Lutken et al. (Reference Lutken, Legendre and Omaki2020) suggest instead that children are simply restarting their question. Thus, various performance explanations for the production errors have been suggested by previous literature, but further investigation is still warranted.
Moreover, the comprehension error exemplified in (4) has been virtually unexplored from the processing angle aside from the suggestion that perhaps children whose WM is not as developed will simply have more difficulty remembering which question word they should respond to (Lutken et al., Reference Lutken, Legendre and Omaki2020). However, this claim was not directly tested.
We propose to test which of these hypotheses explains the classic phenomenon best. The Immature Grammar Hypothesis holds that both production and comprehension errors have the same cause: Syntactic Creativity. Children temporarily adopt a non-target grammar, in this case WSM, which leads them to erroneously produce medial question words and respond to question words that are not truly question words. In contrast, the Limited Processing Hypothesis holds that each of these errors has their own separate cause rooted in children’s limited processing.
Linguistic data alone cannot adjudicate between these hypotheses. In their within-subjects experiment consisting of both production and comprehension tasks, Lutken et al. (Reference Lutken, Legendre and Omaki2020) found no correlation between comprehension and production errors, suggesting they are not an example of Syntactic Creativity, but they also found relatively few comprehension errors. Theirs is essentially a null finding: the fact that they did not find evidence for the Immature Grammar Hypothesis (which they refer to as the “Parametric Acquisition Hypothesis”) does not mean it is irrelevant. To further test which hypothesis best explains the nature of these errors, it will be necessary to further explore the nature of the relevant processing mechanisms.
2.1. Models of processing: production and comprehension
The Limited Processing Hypothesis suggests that children’s grammar is essentially adult-like, but their immature processing mechanisms limit their ability to produce adult-like constructions or give adult-like responses. The model of comprehension we assume is that described by Phillips and Erehnhofer (Reference Phillips and Erehnhofer2015), which states that comprehension is incremental. As each new word is understood, it must be integrated into the current interpretation of the entire utterance. Every word that is processed is related to the words preceding it. It will sometimes satisfy the requirements of those words and sometimes introduce new requirements. These requirements (thematic roles, scope relations, agreement relations, etc.) must be met either by accessing an item in memory or by waiting for a new item to be introduced. This requires resource management as well as many different processes operating in parallel. The comprehender must be able to manage these resources without allowing interference by items that are superficially similar (for instance, reflexive attraction effects; Cunnings & Felser, Reference Cunnings and Felser2013). This task requires non-negligeable use of cognitive resources and adults often mis-parse sentences and must reanalyse, often making errors (e.g., Ferreira & Henderson, Reference Ferreira and Henderson1991; Sturt, Reference Sturt2007).
The processing model of production we assume follows three levels (Dell, Reference Dell1986; Bock, Reference Bock and Ellis1987; Levelt, Reference Levelt1989): first, a semantic level where the message is formulated; second, a syntactic level where possible syntactic structures are generated in parallel (Garrett, Reference Garrett and Butterworth1980; Bock & Ferreira, Reference Bock, Ferreira, Goldrick, Ferreira and Miozzo2014; Coppock, Reference Coppock2010); and third, a morphophonological level where lexical items are selected and phonemes are mapped onto the structure. These levels are motivated by the fact that production errors in adults generally surface as either semantic, syntactic, or morphophonological errors. Jager (Reference Jager2005) provides evidence that children’s errors follow similar patterns, as we will discuss in detail in Section 4.
One cognitive resource of interest (since both processes require it) is working memory (WM). Indeed, according to the Baddeley (Reference Baddeley2017) model of the executive control system, WM acts as a buffer between the processing systems and action. This suggests that WM can be used as a proxy measure for processing mechanisms (e.g., Johnson et al., Reference Johnson, McMahon, Robinson, Harvey, Hahn, Leonard, Luck and Gold2013, for processing mechanisms generally). This gives us a direct means of testing the Limited Processing Hypothesis: if these errors are the result of processing limitations, then children with lower WM should make more errors.
We propose WM as a measure to adjudicate between the Immature Grammar Hypothesis and the Limited Processing Hypothesis. For the Immature Grammar Hypothesis, the WM measure should hold no significance. There is no reason to predict WM should affect whether children have mis-set a parameter. However, for the Limited Processing Hypothesis, we predict a correlation between error rates and performance on the WM task. Because we are using WM as an indicator for processing generally, we do not necessarily predict this relationship to be particularly strong, but it should be there: WM should correlate with both adult-like performance and the WSM error in both production and comprehension. Our experiment will thus report individual child error rates in the production of questions with extraction from embedded clauses, comprehension of questions which include medial question words, and a WM task.
3. Production, comprehension, and WM
The primary purpose of the current experiments is to establish whether children’s WM abilities are correlated with their performance on production and comprehension tasks involving biclausal questions. We also want to establish whether removing the “lead-in” used in previous production tasks (e.g., Thornton (Reference Thornton1990); Thornton and Crain (Reference Thornton, Crain, Hoekstra and Schwartz1994); Liter et al. (Reference Liter, Grolla and Lidz2022); Lutken et al. (Reference Lutken, Legendre and Omaki2020)) will affect children’s production of medial question words, thus establishing whether these errors are simply an experimental artifact. A lead-in is a method to elicit productions from children by telling them “The question starts “What does he think…”. We argue that this takes away from the natural setting of the task and might cause children additional processing difficulties.
To accomplish these goals, we will include three tasks in our experiment: a production task; a comprehension task; and a WM task. The production task will elicit questions with extraction from an embedded clause (of the form Who do you think we should ask?) while the comprehension task will require responses to questions with medial question words (of the form How did Lewis tell Sally what he caught) which could be mistaken for WSM constructions. Thus, each task gives children an opportunity to make a “WSM error”: by producing a medial question word in the production task OR responding to a medial question word in the comprehension task. While we use the comprehension task described in Lutken et al. (Reference Lutken, Legendre and Omaki2020) and the WM task is standardized, the production task is novel and adds to the methodology used to elicit complex questions by doing so without a lead-in.
3.1. Participants
Fifty children participated in this study. Following Lutken et al. (Reference Lutken, Legendre and Omaki2020), data from children who could not perform filler items were excluded (n=18). This left 32 children whose ages ranged from 4;4 to 6;7 (average 5;5). This age range was selected based on the abilities of children to perform these tasks reported in Lutken et al. (Reference Lutken, Legendre and Omaki2020), as well as several pilot tests (n=8). We extended the age range from the previous experiments to establish a maximum age at which children would make such errors.
Due to the COVID-19 pandemic, 12 participants took part in person, while the remaining 20 did so virtually. Necessary changes to the procedures will be described in the following section. The results section will describe how children’s performance was affected while supporting our main hypothesis. We attribute the relatively high drop-out rate to the complications that arose due to completing much of this study online. It should be noted that only 5 participants of 17 who participated in person had to dropout compared to 13 of the 33 virtual participants.
3.2. Procedure
This experiment had a within-subjects design. The production and comprehension tasks were balanced in order: half of the participants did the production task first; half did the comprehension task first. All participants completed the WM task last because pilot trials revealed that when the WM task (a repetition task) preceded the language tasks, children tended to simply continue repeating what the experimenter said and had a more difficult time understanding the language tasks. All in-person sessions were filmed with an HDR-CX160 SONY Handycam. Sessions that were performed virtually were conducted via the Zoom interface and were recorded using that application.
Production task
The goal of the task was to elicit natural, biclausal questions from children. We created a child-friendly version of a “translation task” initially used by Schulz (Reference Schulz2011) to elicit biclausal questions from L2 speakers of English. We used a continuous, interactive story (following Lutken et al. (Reference Lutken, Legendre and Omaki2020)) involving an alien, Morp, and a friendly raccoon from earth, Cindy. The story was accompanied by PowerPoint illustrations and animation. Cindy had been traveling around space and she wanted to show Morp her home. Their spaceship crashed and the pieces spread all over the countryside. The child was asked if they would help Morp and Cindy find the pieces of their spaceship. Cindy explained that Morp did not understand English, but she could tell us what he was saying because she spoke his alien language. Thus, whenever the child wanted to find out about Morp’s ideas they needed to ask Cindy to help. The experimenter always followed the pattern of asking the child’s opinion about the problem first (e.g., Who can fix the engine? Do you have an idea?), followed by prompting the child to ask Cindy the monoclausal question (“Who can fix the engine?”), and then finally eliciting the biclausal target (Who does Morp think can fix the engine?). This pattern ensured the child knew what information to elicit. An example exchange is in Appendix A. It should be noted that the experimenter never included the complementizer, that, in any practice or example.
We incorporated three practice questions as well as two fillers into the story. While practice trials were designed to help the child understand the task, filler questions were used not only to break up the types of questions being asked but to remind the child of how the task worked.
We designed the task to include 6 object-extraction targets and 6 subject-extraction targets, but we accepted any production that included more than one clause. All target constructions used the question word who because this allowed us to determine whether children were changing the question word. Target structures are listed in Appendix B. Participants heard the story which elicited the same questions in one of two orders (Forms A and B) which allowed us to determine whether the order of presentation affected participants’ productions. A full transcript of Form A can be found in Appendix C.
There was much deviation from the target structures, as expected with free-response tasks. Children were given three chances to produce a biclausal question before the experimenter prompted them directly using the entire target question. See Appendix D for support the experimenter provided before offering the entire sentence. It should be noted that many children needed little to no support.
The task was challenging, particularly for younger children (circa 4;0), but every participant completed at least half of the questions. The task proved enjoyable for the children and many asked if they could continue playing with Morp and Cindy or go back and see what would happen if they decided to do something different at any point.
Comprehension task
The comprehension task was the “questions-after-stories” task developed by Lutken et al. (Reference Lutken, Legendre and Omaki2020). Children were presented with 8 short stories (6 target, 2 filler) accompanied by power point animation followed by a pre-recorded question posed by a puppet, Hillary Hippo. All stimuli were the same as those used in Lutken et al. (Reference Lutken, Legendre and Omaki2020). An abbreviated example story and question appear in Appendix E. In this example, “Evil Steve” is a thief planning to steal the queen’s crown, who distracts “Detective Sherry” by telling her that he would steal the queen’s ring. He first tries to tell her using his TV machine, but it breaks, and he has to write her a letter. All target stories were followed by how questions with tell as the matrix verb and what appeared medially, e.g., How did Evil Steve tell Detective Sherry what he was gonna steal? All stories were balanced for event prominence of the matrix event (the telling) and the embedded clause event (in this case, the stealing). This was an open-ended task, and children could respond however they wanted, but four response types were most common: (1) Correct manner (in a letter), (2) distractor manner (the TV machine) (3) correct embedded (The crown) (4) WSM-like (the ring).
WM task
Following work such as Delage and Frauenfelder (Reference Delage and Frauenfelder2019) and Willis and Gathercole (Reference Willis and Gathercole2001), we used two tasks to assess children’s WM: a Forward Digit Span (FDS) and a Backward Digit Span (BDS), which have both been shown to correlate with performance on tasks with complex syntactic structures. While FDS has been used as an indicator of how well one can maintain an item in WM, BDS has been used as a measure of how well one can manipulate items held in memory (Delage & Frauenfelder, Reference Delage and Frauenfelder2019). Wilde et al. (Reference Wilde, Strauss and Tulsky2004) suggest that a child’s composite score, the sum of their FDS and BDS, gives a more complete picture of the child’s processing capabilities. Thus, we decided to use FDS and BDS and to report participants’ composite scores.
We used the Differential Ability Scales -II (DAS-II) (Elliot, Reference Elliot2007), which includes standardized digit span tasks for FDS and BDS. The BDS, while typically considered appropriate for children over 5;0 was included because it requires holding information in memory as well as manipulating it.
If immature processing mechanisms lead children to make errors resembling WSM, then we expect to find a relationship between WM and their ability to produce our target structures accurately. In other words, a child with high WM should perform well on the production and/or comprehension tasks. A child who performs poorly on WM tasks should also perform poorly on our language tasks. However, if these errors were strictly the result of a mis-set parameter or nonadult-like constraint ranking, then errors in production and comprehension should be correlated regardless of WM.
3.3. Data coding and analysis
WM
Individuals’ BDS and FDS were combined to provide a composite score, which corresponds to an ability score in the DAS-II. We report this as their WM ability.
Production
Children’s questions were transcribed and coded by the first author. There were a total of 384 attempted trials. Of these, 26 were excluded because the experimenter had to prompt the child with the full target question. Any production that was pragmatically relevant was accepted whether it was target-like (LD extraction with no errors, meaning of the form Who does Morp think we should help?) or not. There were five instances where the child never produced anything except a monoclausal version of the structure. Thus, 31 utterances were not attempts at biclausal questions. The primary production types were target-like LD constructions, constructions with medial question words, and those with overt complementizers. However, because of the open-ended nature of the task several other production types were also used and accepted, though they were not considered “target-like”. These included (e.g.) questions with infinitival clauses like Who does Morp want to help? or questions with a quotation like Can you ask Morp “Who can help?” We refer to these grammatical, but non-target productions as workaround productions. See Appendix F for examples of various types of workarounds.
The 353 remaining utterances were coded for extraction type (subject, object, adjunct, Yes-No questions, unclear) as well as whether they did (1) or did not (0) include a medial question word. They were also coded for whether they were (1) or were not (0) Target-like, meaning they were an LD structure that used appropriate lexical items and verbs, and whether they were (1) or were not (0) adult-like. The distinction between target-like and adult-like is simply that an adult-like structure could include any grammatical structure of English (e.g., workarounds like Who does Morp want to help?), target-like is restricted to LD constructions.
Data were submitted to a mixed effects logistic regression model (e.g., Kruschke, Reference Kruschke2015) in the R statistical analysis environment (R Core Development Team, 2015). The models included fixed effects of Composite WM score as well as age, which was coded as a numerical value, centered, and scaled. The random effects structure of the model included crossed participant and item effects and was run with the maximal random effects structure that would converge (Barr et al., Reference Barr, Levy, Scheepers and Tily2013) minimally including random intercepts for participant and itemFootnote 2.
Finally, to compare our results with those of an experiment that elicited productions of this type using a lead-in, we combined the data from this task with the data from the within-subjects experiment from Lutken et al., Reference Lutken, Legendre and Omaki2020 and ran another model to determine whether medial wh-production was predicted by the experimental paradigm. The model included Experiment (Lutken et al. (Reference Lutken, Legendre and Omaki2020) vs. current) and Age (centered and scaled) as fixed effects and participant and item numbers as random effects. Furthermore, since it was possible that the mere presence of the lead-in might cause children to make more errors and there were some instances in our experiment where the lead-in was given if the child was really struggling, we also submitted this data to the same model with lead-in (present, absent) as a fixed effect instead of Experiment.
Comprehension
Children’s responses to the “how” questions fell into six primary types: correct manner (adult-like), false object (WSM-like response), distractor manner (alternative how response), true object (response to the embedded question), “trick” response (pointing out that the character lied), and “other” responses. Comprehension data were binomially coded as being each of these response types (1) or not (0). The data were submitted to several mixed-effects logistic regression models in the R environment, as with the production data. The model always included a fixed effect of composite WM as well as age (centered and scaled) when the model would converge with this effect. Again, the model was run with the maximal random effects structure that would converge including at least random intercepts for participant and item. Age and WM were used as random slopes for participant when the model would converge.
Comprehension and production
Another goal of this experiment was to replicate the lack of correlation between production and comprehension found in Lutken et al. (Reference Lutken, Legendre and Omaki2020). We therefore submitted the production data (coded binomially for whether there was (1) or was not (0) a medial question-word) to another model with fixed effects of WSM-like comprehension (coded binomially for whether responses were (1) or were not (0) a WSM-like response). The model included the maximal random effects structure that could converge, which included random intercepts for participant and production task item and comprehension task item as well as random slopes of WSM-like comprehension and age for participant.
3.4. Results
WM
Children who participated in this experiment showed a range in WM ability score from 109 to 311 and an average WM ability score of 228.5. While this range is wide, there was a clustering of scores around 250. Considering the range of ages (4;5 to 6;7) this is comparable to the standards reported in Elliot (Reference Elliot2007).
Production task
As described above, there were more production types than in previous elicited production tasks reported in the literature, which can be explained as a result of the lack of lead-in. Table 1 presents the primary types of constructions children produced during this task. Target-like LD constructions occurred most frequently, but there were many workarounds as well. In this context, target-like refers only to grammatical LD constructions because these were the questions we were trying to elicit. Other structures such as infinitival responses can be grammatical (or adult-like) but weren’t always. We have indicated the percentage of total productions as well as the percentage that were grammatical and ungrammatical where relevant. An * indicates ungrammaticality, a? indicates we recognize the grammaticality as debatable.
Note that these percentages will not sum to 100, as construction types are not mutually exclusive and there was some overlap. For example, *What does Morp think about to fix the engine? would be an infinitival response as well as a “think about” structure.
While the average production of the medial question words (19.5%) is similar to Experiments 1 (22%) and 3 (15.4%) from Lutken et al. (Reference Lutken, Legendre and Omaki2020), there are a few differences we wish to highlight. While in both experiments in Lutken et al. (Reference Lutken, Legendre and Omaki2020) as well as in Liter et al. (Reference Liter, Grolla and Lidz2022), the majority of productions with medial question words were copy-like constructions (82% in Lutken et al. (Reference Lutken, Legendre and Omaki2020)’s Experiment 1, 65.3% in Liter et al., Reference Liter, Grolla and Lidz2022), in the current experiment, only 37% of medial question word productions used the same question word twice. Furthermore, there were also fewer instances of productions with overt complementizers (that) (3.3%) compared with 33% from Experiment 1 in Lutken et al. (Reference Lutken, Legendre and Omaki2020).
There was also quite a bit of individual variation. While no child produced exclusively adult-like productions (be they target LD questions or other acceptable sentence of English), 16 children produced adult-like sentences over half the time and five children produced questions with medial question words over half the time. One child produced medial wh-questions 75% of the time and another 83% of the time. On the other hand, 10 children never produced a medial wh question at all. Thus, children were at both extremes of the spectrum (some were basically adult-like and others produced many errors).
The 69 questions with medial question words were made up of the 8 combinations shown in Table 2.
As is apparent, most productions of this type had either what or who as the initial question word and as the medial question word. Twenty-five productions were copy-like constructions using the same question word twice. It should be noted that all instances of which were examples of d-linked question words and essentially were asking who (which one, which guy, which person, which animal), what (which shirt, which one) or where (which place) and were contextually appropriate to the question the child posed. Overall, distinct question words were observed in the majority (62%) of question word pairings.
As a final descriptive note, we found that extraction type was a significant predictor of the use of medial question words (p<.01). Of the 69 productions with medial question words, 48 were in identifiable subject extractions (70%), while only 18 (26%) were in questions with object extraction. Of the three remaining, one was an adjunct (What does Morp think where is the wing) and 2 were unclear (Where Morp thinks who has a hammer). This asymmetry is reported in Thornton (Reference Thornton1990), Thornton and Crain (Reference Thornton, Crain, Hoekstra and Schwartz1994), and Liter et al. (Reference Liter, Grolla and Lidz2022). This led us to further investigation of the difference between performance on subject and object extraction, which is shown in Table 3. We will further discuss the differences between extraction types in Section 4.
Does the lead-in affect productions?
One of the goals of this experiment was to establish whether the lead-in affected children’s productions. We, thus, compared the findings of this experiment to the production findings from Lutken et al. (Reference Lutken, Legendre and Omaki2020). Neither experiment (current vs. Lutken et al. (Reference Lutken, Legendre and Omaki2020)) nor “lead-in” (presence vs. absence) was a significant predictor of medial-wh productions (p>.1 for both). This result suggests that productions with medial question words are not the result of experimental artifact, and that neither task elicited more medial question words. However, the presence of a lead-in did affect the types of medial questions produced. As we will discuss below, our task without a lead-in led to fewer copy-like constructions (38% compared to 82% of total medial questions).
As a final note, we point out that we consider children’s variation in production types as a positive sign: participants knew what they were supposed to ask about and used the construction they preferred. While we did not force children to produce a particular sentence (a highly unnatural event), we still elicited mostly biclausal questions.
Comprehension
The results of the comprehension task largely replicated the findings from the comprehension tasks in Lutken et al. (Reference Lutken, Legendre and Omaki2020). No additional answer types (beyond those found in Lutken et al., Reference Lutken, Legendre and Omaki2020) were found, and children generally performed well on this open-ended task. The percentage of the response types is displayed in Table 4.
Notable patterns apparent in Table 4 include the fact that, replicating Lutken et al. (Reference Lutken, Legendre and Omaki2020), the adult-like Correct Manner response accounts for the largest group of response types. However, one difference is that while Lutken et al. (Reference Lutken, Legendre and Omaki2020) found no significant difference between Incorrect Object (WSM-like) responses and the Distractor Manner (distractor) responses or the Correct Object (second question) responses, for the current data, a paired samples t-test reveals that there are significantly more Incorrect Object (WSM-like) responses than the other two response types (p<.01, for both comparisons).
Independently, the three tasks WM, production, and comprehension showed standard results: the WM scores are within range for the standard scores reported by Elliot (Reference Elliot2007); the production and comprehension tasks showed similar results to previous work, regarding both medial-wh production and WSM-like interpretations. In comprehension, in contrast to Lutken et al. (Reference Lutken, Legendre and Omaki2020), we did find significantly more WSM-like interpretations than other responses. However, this is not because there were more WSM-like interpretations overall, but rather that there were fewer of the other incorrect response types.
Correlations: WM and language tasks
We found no correlation between performance on the comprehension task and performance on the production task. This is made apparent in Figure 1. Like Lutken et al. (Reference Lutken, Legendre and Omaki2020), we found that there is neither an overall pattern suggesting children make these errors in both their production and comprehension, nor are there individuals for whom this seems to be the case. Both errors are present, but not made by the same children.
The results of the mixed effects logistic regression model investigating whether performance on the production task was predicted by performance on the comprehension task further confirm this (p>.1, r=−.03, r2=.0009).
We ran two models using WM as a predictor of target-like and adult-like productions. Both models found WM to be a significant predictor (p<.05, in both cases). The results of the logistic regression model investigating this relationship are shown in Table 5.
* denotes significance.
There are positive correlations between both target-like productions and adult-like productions and composite WM scores. This suggests, unsurprisingly, that children with higher WM are more likely to produce target-like and adult-like questions. However, our primary interest lies in whether children with lower WM make more productions with medial question words. Figure 2 plots children’s average medial-wh productions against their composite WM score.
Figure 2 illustrates a negative correlation between medial wh-production and composite WM ability score, indicating that the higher a child’s WM, the fewer productions with medial-wh words they made. Table 6 shows the results of the logistic regression model investigating this relationship. This model asked whether WM and age were predictors of medial-wh production, and it found that while age was not a significant predictor (p=.33), WM was (p= .01). While the finding that age is not a significant predictor might, at first, be surprising, it is apparent in Figure 2 that the youngest participants (light blue dots) made relatively few productions of this type (all under 25%) and one of the oldest children (6;6) used medial question words in two-thirds of their productions. While most of these errors are made by children under 5;0, medial wh-productions are not restricted to the youngest children.
* denotes significance.
In summary, we have seen that there is a positive correlation between composite WM scores and adult-like and target-like productions. We have also seen a negative correlation between composite WM scores and productions which include medial question words. In other words, higher WM means more adult-like and target-like productions. Lower WM means more productions with medial question words. WM significantly affects production.
We now turn to examining the relationship between WM and comprehension. As with the production task, we will begin with adult-like responses. We found a positive correlation between composite WM and adult-like responses to the comprehension task: the higher a child’s WM, the more likely they are to give an adult-like response (Pearson’s r=.3, r2=.09, p=.01).
However, the critical question is whether WM also predicts WSM-like interpretations. Our mixed-effects logistic regression model found that WM predicted WSM-like interpretations at a rate approaching the standard measure of significance (p=.08). Considering Delage and Frauenfelder (Reference Delage and Frauenfelder2019)’s claim that FDS might be more relevant to comprehension, while BDS might be more relevant to production because it requires additional manipulations (see Section 2.3), we investigated the FDS and BDS scores individually. We submitted the comprehension data to two new models using BDS and FDS as fixed effects (predictors). While BDS was not a significant predictor of either adult-like responses (p=0.9) or WSM-like responses (p=0.2), FDS was a significant predictor for both (p=.002 for adult-like responses and p=.01 for WSM-like responses). This is in notable contrast to Willis and Gathercole (Reference Willis and Gathercole2001) who did not find a relationship between WM and comprehension.We therefore present Figure 3 as an illustration of FDS and WSM-like comprehension as well as Table 7 which shows the output of the logistic regression model investigating this relationship.
* denotes significance.
Both Figure 3 and Table 7 illustrate the fact that FDS is a significant predictor of WSM-like comprehension responses: the higher a child’s FDS, the fewer WSM-like responses the child will give.
We have seen that the composite WM score predicts adult-like and WSM-like responses, but upon further investigation, we saw that the relationship was driven by the FDS score. FDS is a predictor of both adult-like and WSM-like responses. As with the production task, WM is a stronger predictor than age is. For instance, in Figure 3, we see multiple children under 5;0 who give no WSM-like responses, but the child who gave the most WSM-like responses (83%) was 5;11.
In summary, the higher a child’s WM (specifically, their FDS), the more likely they are to give adult-like responses to complex questions and the less likely they are to give WSM-like responses.
Because of the COVID-19 crisis, much of the study was conducted virtually. Children at the lower end of our age range had a greater amount of difficulty completing the task virtually. Thus, the average age of those who completed the task virtually is higher (5;8) than those who completed the task in person (5;1). There was also a difference in WM for those who completed the task online (composite WM ability average of 241) and those who completed it in person (composite WM ability average of 207). Mixed effects logistic regression models indicate that those who participated in person were more likely to use medial question words (p=.06) and less likely to use Target-like structures (p=0.01). In-person participants were also more likely to give WSM-like responses (p=0.01) and less likely to give adult-like responses (p=0.01). In other words, doing the task virtually did affect children, but not in such a way that invalidates any results. The difference boils down to an effect of age and WM: younger children had more difficulty with the online task and were more often excluded.
4. Discussion
This work investigated two common errors of child English: the production of a medial question word in questions with extraction from an embedded clause and responding to question words appearing at the embedded clause boundary. We have posited two possible explanations for these errors. First, the Immature Grammar Hypothesis suggests these two errors have the same root cause, namely the temporary adoption of a non-adult grammar, WSM, which superficially resembles the wh…wh… pattern of these structures. Second, the Limited Processing Hypothesis suggests that the errors stem from immature processing mechanisms.
The logical prediction of the Immature Grammar Hypothesis (whereby a given parameter value, e.g. +/-WSM, holds for the grammar underlying production and comprehension of a given language) is that a child who has adopted a +WSM grammar should show this behavior in both their production and comprehension. We tested children’s production and comprehension of the relevant structures. While we found individuals who made both errors, we did not find a correlation between the two error types: a child who made the error in production did not necessarily make the error in comprehension, and vice versa. This is a replication of Lutken et al. (Reference Lutken, Legendre and Omaki2020): a null finding that does not disprove the Immature Grammar Hypothesis. It is simply not evidence for it.
It is important to note, a reviewer points out, that the structures in the production task are not the same as in the comprehension task even on a surface level: the production task elicits LD questions, but in the comprehension task, all questions are how questions with what appearing medially. Languages that allow WSM generally vary in whether they use what or how as a scope marker, but they do not generally use both (Lutz et al., Reference Lutz, Müller and von Stechow2000). It is conceivable that a child might have a WSM grammar that uses what as the scope marker and hence they produce it, but when they hear How did Steve tell sherry what he was gonna steal?, they do not treat it as a WSM structure because the initial question word is not what. Furthermore, some languages that allow WSM simultaneously allow LD (e.g., dialects of German (McDaniel, Reference McDaniel1989)), so they might be producing WSM, but responding to a question they know is not WSM in the comprehension task. This is an excellent point that we acknowledge and mostly leave to future research except that we can say that no child produced how as an apparent scope marker in the production task and since we did not give a lead-in, we think it is most likely that if a child did have WSM in their grammar, but thought the scope marker was how, we would have seen them produce it.
The Immature Grammar Hypothesis does not explain the relationships between WM and performance. In particular, the fact that lower WM predicts these errors is difficult to explain under such a hypothesis, particularly given the fact that age was not a predictor of WSM errors (in either production or comprehension). One might otherwise conclude that younger children simply have not yet established the target grammar.
Under the Limited Processing Hypothesis, the lack of correlation between the two errors (production vs. comprehension) is unsurprising. If both errors were caused by the same underlying factor, then we would expect such a correlation, but there is no reason to assume the same immature processing mechanism should necessarily lead to both errors. As discussed in our introduction, there are many mechanisms involved in both comprehension and production. The model of comprehension processing described by Phillips and Erehnhofer (Reference Phillips and Erehnhofer2015) requires strong WM skills as the listener must hold each new word in memory and integrate words into sentences in real time, but it also requires resource management and inhibitory control. The three-level processing model of production (e.g., Garrett, Reference Garrett and Butterworth1980; Bock & Ferreira, Reference Bock, Ferreira, Goldrick, Ferreira and Miozzo2014; Coppock, Reference Coppock2010) certainly requires WM, planning, and inhibitory control as speakers form ideas, select and remember which syntactic structure they will use to communicate that meaning, and map words to it. We used WM as an indicator for general processing ability, following the Baddeley (Reference Baddeley2017) model of the executive control system which posits WM as a buffer between the processing systems and action. Thus, we expect a correlation between each language task with WM, but we do not necessarily expect the language tasks to correlate with each other. This is because WM is not necessarily the specific driving force behind either error type, but instead gives a general estimate of performance. A participant might have good WM, but only moderate inhibitory control abilities. In this case, if WM is very important for comprehension, but less important for production, that participant might do very well on the comprehension task and not as well on the production task. Thus, the two need not be correlated even if WM is correlated with both production and comprehension performance.
It is a prediction of the Limited Processing Hypothesis that WM should be correlated both with adult-like performance and with the relevant errors in each language task. This is what we found. The correlations are not particularly strong (r2 values are generally around .09), but this is also not unexpected because we have not postulated that WM is the sole processing mechanism responsible for these errors. Given this finding, we conclude that children’s processing abilities predict not only how likely they are to perform like adults on these tasks but also how likely they are to make WSM errors, specifically. This suggests that the root cause of these errors lies in processing. Below we further examine the possible processing explanations for these errors in production and comprehension.
4.1. Processing in production
All previous experiments reported in the literature have included a lead-in in order to elicit biclausal questions from children. These experiments have yielded similar types and proportions of non-target productions, including medial-wh constructions with the same question word twice as well as distinct question word, and overt complementizers. In contrast, our production task does not include a lead-in and while we did find all the non-target productions found previously, we also found a variety of other productions: both ungrammatical and grammatical, non-target workarounds. These included productions with an infinitival clause (both grammatical and ungrammatical), using the wrong question word, and think-about constructions (both grammatical and ungrammatical) as well as others (for a complete list as well as examples, see Appendix F or Table 1 for an abbreviated version). Thus, this task, which we contend is more ecologically valid than a task with a lead-in, shows us that the types of questions children produce are in fact more varied than previously thought.
An ideal explanation for the phenomenon, therefore, would explain the full extent of children’s productions rather than a subset. Both Lutken et al. (Reference Lutken, Legendre and Omaki2020) and Liter et al. (Reference Liter, Grolla and Lidz2022) offer explanations for medial-wh productions as the result of immature processing mechanisms. However, both groups used an elicitation task with a lead-in, and in both cases, most questions with medial question words were copy-like constructions: 82% in Lutken et al. (Reference Lutken, Legendre and Omaki2020) and 65.32% in Liter et al. (Reference Liter, Grolla and Lidz2022). As a result, both groups focus on explaining this type of production and give alternative explanations for questions where the two question words do not match and for questions with overt complementizers. Lutken et al. (Reference Lutken, Legendre and Omaki2020) suggest the second question word is an effort to reactivate a question word that is fading from memory while Liter et al. (Reference Liter, Grolla and Lidz2022) suggest these errors are the result of children accurately moving the question word from its base position in the embedded clause, but failing to inhibit pronouncing the question word that appears medially. In the case of questions with two distinct question words, Lutken et al. (Reference Lutken, Legendre and Omaki2020) suggest these are examples of “restarts” where the child simply begins their question again, using a new question word. In contrast, Liter et al. (Reference Liter, Grolla and Lidz2022) explain these productions as the activation of the wrong question-word: “slip of the tongue” errors similar to pronouncing pass the salt when the speaker intends pass the pepper. Finally, with respect to productions with overt complementizers (Who does he think *that can break the spell?), Lutken et al. (Reference Lutken, Legendre and Omaki2020) suggest these productions are the result of immature planning, while Liter et al. (Reference Liter, Grolla and Lidz2022) again suggest a spreading activation explanation: activation of a null complementizer would also spread to overt complementizers, leading some children to produce them. Though we agree that limited processing mechanisms are likely the right explanation for these errors, and it is possible that each production type has an independent cause, we offer an account which extends to additional types of child utterances, as revealed by our experiment.
Overall, we propose that all non-target productions (anything that isn’t a grammatical LD extraction) in our dataset are one of two things: (1) a grammatical, but less complex production we call a workaround or (2) a syntactic blend (Jager, Reference Jager2005) of multiple utterance types, which we discuss in detail below.
In the first place, let us discuss workarounds which are a grammatical repair strategy when processing is taxed (especially for children) because of the complexity of the structure to be produced. This is independently observed by Jakubowicz (Reference Jakubowicz2011) with evidence that French-speaking children avoid constructions involving complex wh-movement in favor of producing “two separate root questions” (SeqQs) and Jakubowicz and Strik (Reference Jakubowicz and Strik2008) who conclude that simpler constructions emerge before more complex ones even when the complex constructions appear more frequently in the target grammar. Jakubowicz (Reference Jakubowicz2011) also found evidence of various other types of productions, including ungrammatical questions with infinitival clauses and grammatical questions including a quote which they call indirect questions, as well as many others. Indeed, in our own findings, 25.6% of total productions represented workarounds of various types. These include some unmistakable workarounds such as Q1 & Q2 productions like (7), questions that include a quote like (8), and monoclausal questions followed by a choice like (9)
Other examples of workarounds include questions with infinitival clauses as in (10) (19% of productions) and “think about” constructions as in (11) (2.5% of productions).
Though not as simple as (7–9), these structures with reduced embedded clauses are arguably less taxing than LD questions either from a processing (Frazier & Clifton, Reference Frazier and Clifton1989) or from a planning (McDaniel et al., Reference McDaniel, Cowart, McKee and Garrett2015) standpoint.Footnote 3 All told, these grammatical workarounds account for a large proportion of our data: as much as 42% of our non-target data (27% of all data including target-like productions).
Now, having established the presence of workarounds in our data as well as in previous literature, let us return to productions that include a medial question word (without introducing a matrix verb change as in (11). Following Jakubowicz (Reference Jakubowicz2011) it is possible that at least some of these are not in fact LD constructions which include a medial question-word, but instead are examples of what we call “Sequential Questions” (SeqQs), or two monoclausal questions in a row as in (12)
We suggest that SeqQs are another form of performance-driven workaround licensed by UG. SeqQs are completely acceptable utterances in English and yet previous literature does not seriously consider the possibility that some of these productions with medial question words could be examples of SeqQs. Why would children refrain completely from using this type of simpler utterance? Logically, at least some of these productions should be SeqQs.
However, not every production including a second, medial question word can be interpreted as an instance of SeqQs. In the first place, only 43% of questions that included a medial question word included the subject-auxiliary inversion necessary in SeqQs, which children in this age range are generally good at in monoclausal questions (Pozzan & Valian, Reference Pozzan and Valian2017). In the second place, because subject questions do not make subject-auxiliary inversion an indicator of SeqQs in the way object questions do, it is not as obvious if these are SeqQs or something else see Table 8 for a breakdown.
Furthermore, SeqQs would not explain the structures that use the same question word twice. Though we found fewer questions with the same question word twice than in previous work, these questions were still present (38% of questions with multiple question words).
Thus, we turn to a second type of mechanism to explain not only the remaining productions with medial question words, but many other ungrammatical productions in our data. We propose these are examples of what Jager (Reference Jager2005) calls syntactic blends, which she identified in spontaneous or elicited child productions like (13a), which she argues is a combination of (13b and 13c).
Given a production model where multiple syntactic structures are generated in parallel (e.g., Garrett, Reference Garrett and Butterworth1980; Coppock, Reference Coppock2010; Bock & Ferreira, Reference Bock, Ferreira, Goldrick, Ferreira and Miozzo2014) introduced in Section 2.2, syntactic blends involve a mix of two possible structure types which are produced in parallel during the ‘syntactic level’ of sentence production. Crucially, the two structures should both be viable ways to express the semantic meaning the speaker wants to convey. The concept of syntactic blends has been established in the study of adult errors (e.g., Bock & Ferreira, Reference Bock, Ferreira, Goldrick, Ferreira and Miozzo2014; Coppock, Reference Coppock2010) and is one of the primary pieces of evidence for multiple structures being produced in parallel. While blends can be as simple as a word swap (essentially, what Liter et al. (Reference Liter, Grolla and Lidz2022) suggest results in What…who… productions), they can also be splices, which consist of an initial substring from one target construction with a final substring from another target construction (Fay, Reference Fay and Cutler1982). In Jager (Reference Jager2005)’s account of blends in children’s speech she characterizes splices as cross over structures, which indicate the use of two constructions blended together. Crucially, these cannot always be explained as a simple word swap or omission.
Jager (Reference Jager2005) suggests that blends are the result of under-developed attention and planning abilities. While her data does not include any examples of blends with questions, there is no principled reason why blends could not occur in questions. We thus propose that those productions in our data which cannot be SeqQs are further examples of splices, specifically. For example, if in the process of production, a child’s processing mechanism produces both SeqQs (What does he think? Who should we ask?) and an LD structure (Who does he think we should ask?), but the child is unable to devote the necessary resources to choosing between the two (perhaps planning, attention, or inhibition control), they might produce What does he think who we should ask?: the first substring coming from an SeqQ construction, the second coming from an LD construction. Similarly, constructions with repeated question words like Who does he think who can fix the fence? would be the opposite: the first half coming from an LD construction (Who does he think can fix the fence?), the second coming from an SeqQ construction (What does he think? Who can fix the fence?). It should be pointed out that, in 22 of 25 who…who…constructions, the second clause formed a perfect monoclausal question (with subject-auxiliary inversion and no errors), lending further support to this proposal. If this is the case in our task without a lead-in, the addition of a lead-in might further over-tax an under-developed planning mechanism, resulting in more frequent use of these blended structures (many using the same question word twice). One of Liter et al. (Reference Liter, Grolla and Lidz2022)’s primary findings was that inhibitory control abilities predicted productions with medial question words. While their explanation is different (though we would argue it is in the same vein), their finding is consistent with this explanation as well.
One further benefit of this analysis is that it explains productions that appeared in our no lead-in experiment, though not in previous work which included a lead-in. These are productions that are not examples of medial wh-productions, which otherwise would have no explanation from previous literature. Consider the following examples from our data:
(14) is an effort at an infinitival workaround while (15) is an example of what we have called lexical errors because the child used what instead of who. These two types of errors occurred in 23% of the productions we elicited, thus a sizable portion of the data. Both can be viewed as syntactic blends. (14) could be a blend of the target LD construction with an alternative infinitival construction (e.g., Who does Morp want to fix the engine?), while (15) could be a blend of the target LD and SeqQs where instead of a splice, the child has only done a word swap (of the first question word). The fact that these errors unrelated to the use of medial question-words could also be syntactic blends leads us to consider this explanation over others because it explains more of the data.
We have argued that essentially all non-target productions can be explained as either grammatical workarounds or examples of syntactic blends. It remains for us to provide further evidence that this is the case. It is difficult to conclusively identify which of the two strategies a child is resorting to in any given instance of a question with a medial question word. We cannot use subject-auxiliary inversion to distinguish syntactic blends involving embedding from SeqQs. English-speaking children often invert subjects and auxiliaries in the embedded clause, resulting in errors (27% of the time in Pozzan & Valian, Reference Pozzan and Valian2017). Prosodic cues such as an intonational break immediately preceding (or not) the medial question word cannot, in the absence of a separate targeted experimental study, reliably distinguish SeqQs from syntactic blends in these productions. Lutken et al. (Reference Lutken, Legendre and Omaki2020)’s preliminary prosodic analyses as well as our own discussions with multiple expertsFootnote 4 suggest that children’s productions in the experimental task are too disjointed to provide reliable data. Particularly, their productions include long pauses throughout (not just at clausal boundaries, for instance).
We conducted an analysis of disfluencies at the clausal boundary following Liter et al. (Reference Liter, Grolla and Lidz2022) and Lutken et al. (Reference Lutken, Legendre and Omaki2020) which found relationships between disfluencies and adult-like productions, such that participants who had fewer disfluencies in their productions were more likely to produce grammatical questions. We chose disfluencies based on McDaniel et al. (Reference McDaniel, McKee and Garrett2010). They included (1) an audible “umm”, (2) a restart (defined as returning to the beginning of either the entire sentence or the embedded clause), and (3) a stutter (defined as at least one repetition of the question word at the clausal boundary). Each utterance either did (1) or did not (0) include a disfluency. We submitted this data to a mixed-effects logistic regression model with ‘disfluency’ and Age (centered and scaled) as fixed effects and participant and item number as random effects, but we found no significant relationship between disfluencies and adult-like productions (p>.1). In short, we cannot use prosody or disfluencies in the existing data to determine whether children are producing SeqQs or a syntactic blend.
Further examination of our production data reveals two groups of children, those who produce many utterances with medial question words and those who produce just a few. The production rate of the hypothesized SeqQs and blends is informative in light of the logic of Crain and Thornton (Reference Crain and Thornton1998) that while an error of competence should be consistent within a speaker, performance errors tend to occur more sporadicallyFootnote 5. In our case, a child who is producing grammatical SeqQs “on purpose” should do so consistently and hence produce many utterances with medial question words, while a child who is producing a blend should do so only “by accident”. This is in line with the claim that SeqQs are performance-driven choices sanctioned by the children’s grammar (3–5 year-olds have generally mastered monoclausal wh-questions; Pozzan & Valian, Reference Pozzan and Valian2017) while blends are production or performance errors. Productions that do not resemble SeqQs in turn are predicted to be produced by children who produce few medial question words and many other types of constructions. We can compute how many of children’s utterances contain two well-formed instances of monoclausal SeqQs. This gives us an upper bound on the percentage of utterances that can be analysed as SeqQs. If they cannot be identified as SeqQs or some other workaround, we can, at least provisionally, characterize them as syntactic blends.
To examine this, we needed a measure of “medial-wh consistency” as well as an indicator of whether the constructions could be SeqQs. To that end, each construction containing a medial question word was bimodally coded to indicate whether it could (1) or could not (0) be SeqQs. Coding was performed by the first author and checked by a trained research assistant who was a native speaker of English. Any disagreement was discussed and resolved such that there was complete agreement between the two coders. To count as potential SeqQs, the word order had to be that of two monoclausal questions with no error. For instance, consider (16) which was produced by participant 41, age (6;6):
(16) was marked as a structure that could be an instance of SeqQs because What does Morp think? Where is the wing? would be an instance of SeqQs with no error. In contrast, consider (17) which also contains a medial question word but is not potentially SeqQs because of its initial who paired with think: Who does Morp think? is not a grammatical question.
For each utterance, we totaled the number of medial wh-productions made by the child who produced that utterance. This became their medial-wh number. Because participant 41 also produced 7 other structures which included a medial question word, the medial wh-number associated with (16) is 8. (17) was produced by participant 13, age 4;6, who did not produce any other structures with medial question words. Thus, the medial wh-number for this structure is 1. In this way, we were able to measure whether a single instance of a medial wh-production was anomalous or the norm for that participant.
We submitted this data to a mixed-effects linear regression model with medial wh-number and Age (centered and scaled) as fixed effects and participant and item number as random effects. The maximally complex model which would converge was run, which included medial wh-number and age as random slopes for participant as well as item number. The results suggested that medial wh-number is a significant predictor of whether a structure could be two sequential questions (p=.02). In other words, if a child produced many questions with a medial question-word, they were significantly more likely to produce constructions that could be SeqQs. If a child only occasionally produced a medial question word, they were significantly more likely to produce a construction that could not be SeqQs. The prediction that a child who is producing SeqQs as a workaround should produce them consistently is, therefore, borne out and the prediction that a child who is producing blends should only do so sporadically is also borne out.
In short, these additional analyses provide suggestive evidence that children who produce medial question words might do so for more than one reason. The first case is that of a child who prefers SeqQs as their workaround and produces them consistently. The second case is that of a child who occasionally produces a blend of LD and SeqQs. Note that the productions reported in Liter et al. (Reference Liter, Grolla and Lidz2022) could also be explained as blends. For example, their (11cii, pp. 13) Who do you think what popped the balloons? could be analyzed as the combination of Who/what do you think popped the balloons? And What/who popped the balloons.
Both SeqQs and syntactic blends are used by young speakers during the actual production of complex wh-questions and thus fall under strategies to cope with their limited executive control resources. To further investigate, we ran a mixed effects linear regression model which showed WM predicted medial-wh number to a degree approaching significance (p=.07), possibly because relatively few participants produced medial questions. We also ran a Pearson’s correlation between medial-wh number and composite WM score (r=−.3, r2=.09) The negative correlation suggests that the higher a child’s WM, the lower their medial-wh number. Children with lower WM produce questions with medial question words consistently (which we suggest are SeqQs) while children with higher WM are more likely to only produce medial question words occasionally (though these productions are more often blends and ungrammatical). Why this might be is debatable. It is conceivable that children with lower WM fail to fully plan and execute the comparatively complex question appropriately or perhaps know they are likely to fail at more complex questions. Similarly, they might just know SeqQs will work and have not yet learned they should use LD. It is also possible that, in an effort to relieve their already overworked WM, they offload half of the content (What does Morp think?) before they integrate the content of the embedded clause. Pulling these explanations apart would require further research.
A reviewer further points out that if children with lower WM are simply asking SeqQs whenever they can, the subject/object asymmetry in questions with medial question words is surprising. Like previous work, with the notable exception of Lutken et al. (Reference Lutken, Legendre and Omaki2020), we found significantly more medial question word productions in questions with subject extraction (n=48) than in questions with object extraction (n=18).Footnote 6 This is not predicted by either the use of SeqQs or syntactic blends per se, which are equally possible in both subject and object extractions. As shown in Table 3, productions with object extractions were primarily target-like, work arounds, or LD constructions with a lexical error while most subject extractions were either target-like or medial wh productions (with relatively few work arounds or LD constructions with lexical errors). We note that the asymmetry in the production data parallels an aspect of the difference in syntactic analysis: object questions involve movement to specCP, which the child must have mastered to overwhelmingly produce adult-like instances. In contrast, subject question words are in the canonical subject position at the clausal periphery (unless vacuous wh-movement is posited), making them vulnerable to other syntactic manipulations resulting in medial wh productions, particularly (see Table 3). In other words, it may be easier for a child to produce a periphery error while planning and/or executing their production of subject questions. We leave further exploration of this topic to future research.
In summary, we found a significant relationship both between WM and productions with medial question words and between WM and target-like productions. We suggested that productions with the medial question words are not instances of WSM but rather one of two alternatives: first, grammatical SeqQs, and second, examples of ungrammatical syntactic blends. We showed that children who produce utterances that could be adult-like SeqQs, produce them consistently while children who produce utterances that could not be adult-like SeqQs do so only rarely. Our suggestion is that at least some of these productions are blend constructions, as described in children of this age by Jager (Reference Jager2005), is further evidence for parallel processing of syntactic structures and provides evidence for it in people as young as age 4;0.
4.2. Processing in comprehension
A processing account of comprehension is comparatively simple. Following Phillips and Erehnhofer (Reference Phillips and Erehnhofer2015), we assume comprehension is incremental and that each new word is integrated into the current interpretation of the utterance. This requires resource management as well as many different processes operating in parallel. For the sake of example, recall the story about Evil Steve, who tells Sherry in a letter that he will steal the queen’s ring, when in fact he steals the crown. After this story, children were asked a question such as (18) (see Appendix E for full example).
Lutken et al. (Reference Lutken, Legendre and Omaki2020) found that WSM-like responses (the ring) and responses to the second question (the crown) were at chance. They suggested that children were responding to the most recent question word but were at chance in their accuracy. However, we have found more WSM-like responses than second-question responses. We also know that a child with lower WM is more likely to do this, which does not suggest an alternative grammar explanation. According to the model presented by Phillips and Erehnhofer (Reference Phillips and Erehnhofer2015), interpreting a question requires the child to recognize a question word and keep it in memory as they search for the relevant corresponding gap in the embedded structure. They must also keep similar lexical items from interfering with their interpretation, requiring strong inhibitory control. For our purposes, this would mean they must maintain the matrix question word as the question word with scope rather than allowing the medial question word (which is identical in form but not in meaning or function to an interrogative question word) to interfere. This is similar to the claim made by Cunnings and Felser (Reference Cunnings and Felser2013) that adults with lower WM abilities are more likely to mis-parse constructions with multiple potential antecedents for pronouns. They claim that noun phrases with similar properties to the target antecedent interfere and lead to the mis-parse.Footnote 7 Our explanation is that children with lower WM scores simply are not as good at keeping the matrix question word in memory as the question to be answered. They integrate the two clauses and thus know that the question is not about the embedded clause, but they allow the medial question word to interfere with their memory of the question word to be answered and thus give a WSM-like response. For example, in processing (18), they know the question word has to do with Steve saying, but when they come to what they allow it to replace the true question word, how, and end up answering a question about what Steve said rather than how he said it. This is further supported by the fact that errors in comprehension were significantly predicted by FDS rather than composite WM, suggesting the crucial thing is children’s ability to hold the appropriate information in memory. While we did not test inhibitory control explicitly, we expect it also plays a crucial role in children’s performance on this language task and leave this to future work.
4.3. The path to adult-like use
Under our account, children’s errors (including simpler production of SeqQs and WSM-like responses) generally stem from immature processing abilities. These abilities mature over time and as they do the child will make fewer errors of these types. However, our account does predict that even adults should continue to make errors under processing strain. This prediction is, in part, borne out by Lutken and Stromswold (Reference Lutken and Stromswold2024) who found adults gave WSM-like responses to questions with medial question words about 10% of the time when they were put under processing strain. In other words, while the child’s processing abilities will mature to the point that they no longer frequently make these errors, the potential to make such speech errors when overloaded never goes away entirely.
5. Conclusion
Our primary, novel finding is that both production of medial question words and giving WSM-like responses to questions containing medial question words are predicted by a child’s WM. We have suggested that in production these structures with medial question words are one of two things (1) SeqQs used as a workaround construction or simplification licensed by the grammar when overtaxed or (2) syntactic blends, directly resulting from children’s immature processing mechanisms. In comprehension, we have suggested that WSM-like responses are the result of children failing to keep the medial question-word from interfering with the true question-word and thus they answer a question that would give that question word scope over the whole structure. These findings are important for several reasons. First and foremost, we suggest that our data support a processing account of these errors. Second, we have added to the body of literature suggesting WM is relevant to both production and comprehension. Third, our study provides (we believe) the first instance of evidence of syntactic blends in (monolingual) child questions.
In addition, our study emphasizes the importance of eliminating the lead-in in the experimental task. The lead-in does affect production, but only in that it leads to more structures with the same question word twice, which in turn affects the interpretation of the challenges children face and solutions they resort to when producing and processing biclausal questions. The lack of lead-in allows children more freedom to respond how they would naturally and, therefore, we have encountered more workarounds and more error types than previously reported. Replication and other studies are needed to further support the conclusions we have reached based on a WM-centered experiment.
Finally, we asked at the beginning of this paper just how adult-like children’s grammars are. Are all the errors the result of limited processing or is there a possibility that children have yet to establish the adult-like grammar? We have concluded that there are, in fact, two groups of English-speaking children who produce utterances with medial question words: those who are producing syntactic blends and those who use SeqQs as their preferred workaround. We claim children’s comprehension of questions with medial question words are limited by their ability to maintain the matrix question word as the one to answer. These suggestions merit further testing, but this work has led us to the conclusion that these errors stem, not from the adoption of an alternative grammar, but from children’s immature processing mechanisms.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0305000924000564.
Data availability statement
The data that support the findings of this study are available on request from the corresponding author Dr. Geraldine Legendre (legendre@jhu.edu) or Dr. C. Jane Lutken (clutken@wcu.edu).
Acknowledgements
This work was supported in part by an NSF Doctoral Dissertation Research improvement Grant to the authors:BCS-1853297. We would like to acknowledge and thank the NSF for making this research possible. We must also acknowledge that the initial stage of this work was also guided by the late Akira Omaki without whom it could not have been the same. We gratefully acknowledge the help of Barbara Landau for advice on experimental methods with children. We further thank research assistants Caroline West and Laura Nugent who assisted in coding and in creating our elicited production task. Further thanks to Eleanor Chodroff, Colin Phillips, and Matt Goldrick for discussions on the puzzle of children’s prosody. Conversations with audience members at BUCLD, LSA, and HSP since 2019 have helped us to improve our work and clarify our arguments. These include Jeff Lidz, Dana McDaniel, and Tom Roeper among others. Thanks as well to members of multiple labs at Johns Hopkins (The Language Acquisition Lab, Language Processing and Development Lab, and the Language and Cognition Lab) as well as Karin Stromswold and members of the Language Acquisition and Processing Lab at Rutgers University for their helpful comments on this work. Finally, thank you to the children and parents who took the time to come “play science” with us and remind us that science is fun as well as fascinating!