1. Introduction
There is accumulating evidence that speakers recruit general inhibitory control when selecting among co-activated but context-inappropriate lexical representations during object naming (e.g. Korko et al., Reference Korko, Coulson, Jones and de Mornay Davies2021; Shao et al., Reference Shao, Roelofs, Martin and Meyer2015) and when avoiding redundant referential expressions in pragmatic language use (e.g. Trude & Nozari, Reference Trude, Nozari, Gunzelmann, Howes, Tenbrink and Davelaar2017; Wardlow, Reference Wardlow2013). Similarly, a growing body of research shows the importance of domain-general inhibition in the resolution of syntactic conflict in language comprehension (e.g. Hsu et al., Reference Hsu, Kuchinsky and Novick2021; Kaushanskaya et al., Reference Kaushanskaya, Park, Gangopadhyay, Davidson and Weismer2017; Woodard et al., Reference Woodard, Pozzan and Trueswell2016; Ye & Zhou, Reference Ye and Zhou2009). However, it is less clear what cognitive control mechanisms underlie the production of syntax. This study examines the role of domain-general inhibitory processes in the selection of syntactic structures under high competition demands. In particular, we gauged the extent to which resolution of non-verbal interference arising at distinct stages of information processing predicted performance in an active-passive voice production task in which conflict was experimentally induced.
1.1. The scope for interference in syntactic selection
There are good reasons to believe that syntactic selection, by analogy to lexical selection, is a competitive process, with activated structural representations vying for the speaker’s attention. Interference in sentence production can take the form of underdetermined competition when a stimulus, such as a pictured transitive event, elicits multiple appropriate responses, for example, structural representations, only one of which can be selected. As syntactic choices unfold, settling on one particular structure or arrangement of items may become increasingly effortful and time-consuming (Ferreira & Engelhardt, Reference Ferreira, Engelhardt, Traxler and Gernsbacher2006; Myachykov et al., Reference Myachykov, Thompson, Scheepers and Garrod2011, Reference Myachykov, Scheepers, Garrod, Thompson and Fedorova2013, Reference Myachykov, Garrod and Scheepers2018). In support of this claim, Myachykov et al. (Reference Myachykov, Scheepers, Garrod, Thompson and Fedorova2013) found that greater syntactic flexibility in the Russian language, which provides six-word order options to describe a transitive event, compared with only two in English, was associated with slower sentence onset times and longer referent-related eye-voice spans. The authors explained this pattern of results with a competition account, according to which the more alternatives in the speaker’s structural inventory that become simultaneously activated, the longer it takes to select the target structure. In a study by Hwang and Kaiser (Reference Hwang and Kaiser2015), Korean speakers were similarly slower to begin their utterances when the verbs they were presented with were not case-marked, allowing for greater syntactic flexibility than when the verbs were case-marked and permitted only one syntactic option. Naturally occurring speech errors are another indication that multiple grammatical structures may become simultaneously activated, interfering with the production process. In the erroneous sentence Do I have to put on my seatbelt on (Stemberger, Reference Stemberger and Ellis1985), it is evident that two message-compatible grammatical structures (put on my seatbelt and put my seatbelt on) were activated, intruding one onto another, which led to a syntactic slip.
Interference in sentence production may also take the form of prepotent competition, when an input gives rise to a dominant but context-inappropriate structural representation. In grammatical encoding, such interference may arise from a primed (recently heard or spoken) or high probability (default) syntactic structure that does not fit the communicative context. For example, speakers tend to produce a more familiar active voice structure, only to correct it to a passive voice construction because the former is incompatible with the given context (Engelhardt et al., Reference Engelhardt, Nigg and Ferreira2013). Speakers are also more inclined to repeat a less familiar but primed syntactic structure (passive voice), yet they do so at the expense of grammatical or semantic errors, for example the girl was watered by the flower (Adams & Cowan, Reference Adams and Cowan2021).
Clearly, speakers face multiple challenges as they plan and produce their utterances in real time. What is less clear, however, is how they manage conflicting demands and what processes support their efforts to select target representations in the face of either multiple, equally felicitous syntactic structures (underdetermined competition) or one dominant but irrelevant syntactic frame (prepotent competition). Here, we focus on prepotent competition in an active-passive voice production task with contextual constraints. We hypothesise that under increased competition demands, speakers recruit domain-general inhibitory control to override a prepotent bias towards a grammatical structure that is readily accessible at the time of speaking, but that does not fulfil the contextual constraints. Although inhibitory control is often treated as a uniform construct, here we argue for multiple inhibitory control processes. We first present the rationale for why active-passive voice construction provides scope for interference. We then present theoretical and empirical arguments for a relationship between syntactic selection and domain-general cognitive control processes. Finally, we justify why inhibitory control should be conceptualised in terms of multiple inhibitory processes and not as a single executive function.
1.2. Conflicting demands of active-passive voice selection
Whether a speaker commits to an active (e.g. The pirate ate the cheese) or passive (e.g. The cheese was eaten by the pirate) voice is determined by multiple factors, such as structural familiarity (e.g. Dick & Elman, Reference Dick and Elman2001), structural priming (e.g. Bock, Reference Bock1986), referent animacy (e.g. Ferreira, Reference Ferreira1994), and referent accessibility (e.g. Gleitman et al., Reference Gleitman, January, Nappa and Trueswell2007; Myachykov et al., Reference Myachykov, Garrod and Scheepers2018). English speakers are generally biased towards active sentences, especially when an event is novel, and there are no contextual constraints (Dick & Elman, Reference Dick and Elman2001). Roland et al. (Reference Roland, Dick and Elman2007) report that canonical subject–verb–object–active constructions account for 94% of agent–patient structures in English. It is conceivable, therefore, that active voice, as the default syntactic option to describe transitive events in English, may become a source of prepotent competition when the communicative context biases the use of the less familiar passive voice.
Recent experience of a specific sentence structure may similarly influence syntactic choice in a subsequent utterance (Bock, Reference Bock1986). This syntactic priming effect has been demonstrated with active and passive transitive verbs, as well as prepositional and double-object dative sentences, and is interpreted in terms of activation-based systems (Bock, Reference Bock1986). That is, procedures or operations responsible for the grammatical encoding of a message are assumed to be more activated than procedures responsible for an alternative syntactic form. Switching from procedures in a more active state to those in a less active state is associated with a cognitive cost that the speaker is generally unwilling to pay unless it is demanded by the communicative situation.
Other important factors influencing grammatical voice choice include animacy and accessibility. Animate entities (Ferreira, Reference Ferreira1994) and lemmas or referents that are more readily available to the speaker, whether due to semantic priming (Bock, Reference Bock1986) or visual attention-capture cues (Gleitman et al., Reference Gleitman, January, Nappa and Trueswell2007; Myachykov et al., Reference Myachykov, Garrod and Scheepers2018), tend to be placed in higher syntactic positions. So, when a patient referent is more salient than an agent referent, English speakers are more likely to produce passive voice sentences with patient entities occupying the sentence-initial subject position.
This study exploits these biases while also imposing contextual constraints to induce syntactic interference in a task in which speakers are asked to generate active or passive voice sentences. Speakers are instructed to use either the irregular past (e.g. ate) or past participle (e.g. eaten) verb form provided. In half of the task trials, the external verb form constraint conflicts with the structural frequency, referent animacy, and referent accessibility biases. We were particularly interested in trials that necessitate the use of the irregular past participle verb because, on such occasions, speakers may need to override a prepotent tendency to use a default active voice structure and to automatically assign a nominative role to a referent that is animate and more salient.
1.3. Syntactic selection and the cognitive control system
Given the scope for interference during grammatical encoding, how do speakers manage the conflicting demands of online utterance planning and execution? Here, we propose that syntactic selection, specifically under increased competition, is supported by general interference resolution mechanisms that are language-independent. Below, we present the rationale for why we think this may be the case.
A number of theorists have claimed syntactic or combinatorial aspects of language to be separable from other abilities or systems with which they interact (e.g. Hauser et al., Reference Hauser, Chomsky and Fitch2002; Pinker, Reference Pinker1994). For Chomsky (Reference Chomsky1988), grammar exhibited domain-specific properties that were distinct from other aspects of cognition. In a review spanning three decades of neuropsychological research, Grodzinsky concluded that ‘the ability to create and analyse meaningful expressions through rule-based combination is sharply distinguished from other seemingly related mental capacities (such as arithmetic or general intelligence)’ (Grodzinsky, Reference Grodzinsky2000, p. 2). Such modularity claims have found partial support from neuropsychological data. For example, Mesulam et al. (Reference Mesulam, Coventry, Rader, Kuang, Sridhar, Martersteck and Rogalski2021) argued for a modular architecture of syntactic production (or expressive syntax) by assessing the cortical thinning of neuronal tissue in patients with primary progressive aphasia (PPA). Cortical atrophy in these patients was correlated with poorer linguistic performance in a number of language tasks, including the Northwestern anagram test and the Northwestern Assessment of Verbs and Sentences test (NAT-NAVS) devised to assess the ability to construct non-canonical sentences, such as sentences with passive voice or extracted objects. The poorer ability to sequence lexical elements to fit the intended meaning in such sentences was associated with greater neuronal loss within the posterior part of the middle frontal gyrus (pMFG). The inferior frontal gyrus (IFG) was another site of peak atrophy associated with reduced ability to generate complex syntactic structures, particularly among patients with PPA of the agrammatic subtype (PPA-G) (Mesulam et al., Reference Mesulam, Rogalski, Wieneke, Hurley, Geula, Bigio and Weintraub2014; Rogalski et al., Reference Rogalski, Cobia, Harrison, Wieneke, Weintraub and Mesulam2011; Wilson et al., Reference Wilson, Henry, Besbris, Ogar, Dronkers, Jarrold and Gorno-Tempini2010).
While the strongly modular approach towards expressive syntax may be valid for some aspects of language use, one can imagine situations in which domain-specific explanations become troublesome. Indeed, a number of lines of theoretical and empirical evidence converge on a more domain-general view of syntactic abilities, according to which grammatical encoding, at least under some circumstances, relies on general information processing factors that are language-independent.
First, both neuroimaging and neuropsychological data appear to undermine the claim that syntactic production is localised to one specific brain area. Apart from the involvement of the frontal cortex (i.e. pIFG and pMFG), there is an association between expressive syntax and the posterior middle temporal gyrus (pMTG), specifically during the production of non-canonical sentences (e.g. The girl that the boy pushed is smart; Matchin & Hickok, Reference Matchin and Hickok2020). Agrammatic production deficits, in the form of impoverished speech with omissions of function words and/or morphemes, have also been linked to lesions in the white matter tracts that connect the IFG to the posterior temporal lobe (Wilson et al., Reference Wilson, Henry, Besbris, Ogar, Dronkers, Jarrold and Gorno-Tempini2010). Paragrammatism, which manifests as disordered speech output, characterised by incorrect argument structure, agreement errors, and/or failures to correctly resolve syntactic dependencies (Wilson et al., Reference Wilson, Henry, Besbris, Ogar, Dronkers, Jarrold and Gorno-Tempini2010), may be observed following damage to the posterior temporoparietal lobes (Matchin & Hickok, Reference Matchin and Hickok2020). In many PPA-G patients, cortical atrophy encompasses parts of the posterior frontal lobe and posterior temporal regions (Rogalski et al., Reference Rogalski, Cobia, Harrison, Wieneke, Weintraub and Mesulam2011).
Second, the same cortical regions implicated in grammatical encoding are known to mediate other linguistic and non-linguistic functions. Despite revealing a modular landscape of language function, with individual neural clusters associated with different aspects of language, Mesulam et al. (Reference Mesulam, Coventry, Rader, Kuang, Sridhar, Martersteck and Rogalski2021) pointed out that only a small fraction of the variance in neuronal loss observed in PPA patients was explained by impaired task performance associated with particular language functions: 19% in the case of percentage of grammatically correct utterances and 32% in the case of non-canonical sentence production. This suggests that the affected regions play their part in other forms of linguistic and/or non-linguistic behaviours that were not directly tested in the study.
Apart from serving as a hub for the production of syntax, the IFG plays a pivotal role in the comprehension of sentences, especially non-canonical ones (Friederici et al., Reference Friederici, Rüschemeyer, Hahne and Fiebach2003; Grodzinsky et al., Reference Grodzinsky, Pieperhoff and Thompson2021). The area has also been implicated in phonological and semantic processing (e.g. Devlin et al., Reference Devlin, Matthews and Rushworth2003), particularly in situations in which one has to select relevant semantic knowledge from among competing semantic representations (Badre et al., Reference Badre, Poldrack, Paré-Blagoev, Insler and Wagner2005; Thompson-Schill et al., Reference Thompson-Schill, D’Esposito, Aguirre and Farah1997; Wagner et al., Reference Wagner, Paré-Blagoev, Clark and Poldrack2001).
In addition, studies have shown the IFG to support non-linguistic processes such as action observation, action imagery, action execution, and action imitation (e.g. Fazio et al., Reference Fazio, Cantagallo, Craighero, D’ausilio, Roy, Pozzo and Fadiga2009; Pulvermüller & Fadiga, Reference Pulvermüller and Fadiga2010). The inferior frontal area also appears to be a common substrate for processing grammatical and musical sequences (Maess et al., Reference Maess, Koelsch, Gunter and Friederici2001; Patel, Reference Patel2003). Furthermore, activity in this area has been linked to performance in tasks designed to tap non-linguistic cognitive control processes, such as interference resolution in the flanker task (Nee et al., Reference Nee, Wagner and Jonides2007). The pMTG, another region strongly associated with grammatical encoding, appears to also be engaged in semantic processing. This region is particularly active in ambiguous contexts during the selection of the context-appropriate meaning of a word from all of its possible alternatives (e.g. Davey et al., Reference Davey, Thompson, Hallam, Karapanagiotidis, Murphy, De Caso and Jefferies2016; Thompson-Schill et al., Reference Thompson-Schill, D’Esposito, Aguirre and Farah1997; Wagner et al., Reference Wagner, Paré-Blagoev, Clark and Poldrack2001).
In sum, successful production of grammatical sentences appears to rely on a distributed network of interconnected brain regions, rather than on a single neural cluster centred in and around Broca’s area. At the same time, both the IFG and pMTG, the regions commonly associated with expressive syntax, appear to fulfil a wider functional role than previously believed, a role that extends not only beyond the production of grammatical utterances, but also beyond the language system.
Behavioural studies provide further evidence for the relationship between syntactic production and general information processing factors. Most focus on working memory capacity. For example, digit span has been found to be a good predictor of syntactic production measures, such as repetition of increasingly more complex sentences with embedded clauses (Alloway et al., Reference Alloway, Gathercole, Willis and Adams2004; Delage & Frauenfelder, Reference Delage and Frauenfelder2020) and proportion of complex sentences in spontaneous speech (Delage & Frauenfelder, Reference Delage and Frauenfelder2020). Working memory capacity was also positively correlated with performance in the more cognitively demanding non-canonical word order condition of a sentence production task (Sung, Reference Sung2015).
In dual-task experiments, speakers’ fluency and grammatical complexity of sentence structure suffered as a result of the additional strain on non-verbal working memory such as when the participants had to perform concurrent finger tapping (Kemper et al., Reference Kemper, Herman and Lian2003) or the pursuit rotor task (Kemper et al., Reference Kemper, Schmalzried, Herman, Leedahl and Mohankumar2009). While generating spontaneous speech and performing the target tracking task, young adults used shorter, simpler sentences than they did in the baseline condition. The concurrent pursuit rotor task slowed older participants’ speech but did not otherwise affect their fluency, grammatical complexity, or propositional content. Adams and Cowan (Reference Adams and Cowan2021) had preschool children emulate the same form of speaking as the experimenter, who used the passive voice and later produced descriptions of pictured transitive events. Participants were more likely to use the less familiar but primed passive voice than to switch to the more familiar active voice when under a visual–spatial working memory load as compared to ‘no load’ condition. Repeating the less familiar syntactic structure (passive voice) was deemed less cognitively taxing than using a more familiar one (active voice) because the latter would require reassigning agent and patient roles to noun constituents.
Slevc (Experiment 3; 2011) tested accessibility effects while subjects produced prepositional object dative sentences (e.g. The pirate gave the book to the monk) to describe transitive events under verbal and spatial working memory load. On prime trials, subjects were cued with either the object (the monk) or the theme (the book). The accessibility effect, that is a bias to produce the more accessible, or cued word first to maximise one’s working memory capacity, was more pronounced under verbal than under spatial memory load, suggesting that the former played a greater part in syntactic selection. However, as there was no baseline condition in the experiment, there is a possibility that a more general form of cognitive control also played its part in the syntactic choice.
Very few studies have examined the relationship between syntactic production and domain-general inhibitory control processes. Using a latent variable approach, Engelhardt et al. (Reference Engelhardt, Nigg and Ferreira2013) found that inhibition accounted for 12% of the variance in the production of utterance repairs. In particular, those with reduced inhibitory control were more likely to correct their utterances by reassigning roles to noun constituents. Thothathiri et al. (Reference Thothathiri, Evans and Poudel2017) manipulated verb bias by training their subjects with dative verbs presented in different structural configurations. Some dative verbs appeared only in the double-object structure (e.g. Mike offered Carol the napkin; DO only condition), some in the prepositional object structure (e.g. Mike tossed the coin to Carol; PO only condition), and some appeared equally often in DO and PO structures (DO–PO–equipotent condition). Producing DO dative structures in the case of the equipotent verbs was associated with better inhibitory control. The authors concluded that choosing a DO dative structure in the context of equipotent verbs, when one could opt for a less cognitively taxing structure of a PO dative (PO datives are generally preferred over DO datives: Bock & Irwin, Reference Bock and Irwin1980; Bock & Loebell, Reference Bock and Loebell1990), required additional processing resources such as inhibition, which would bias attention towards a less dominant representation and away from the more strongly activated default option. Inhibition was also a significant predictor of number agreement production in ten- to twelve-year-olds (Veenstra et al., Reference Veenstra, Antoniou, Katsos and Kissine2018). Children who experienced greater interference in inhibitory control tasks had higher subject–verb agreement error rates. Similarly, Nozari and Omaki (Reference Nozari and Omaki2018) reported that susceptibility to interference explained 20% of the variance in the production of agreement errors by adult speakers.
Cross-domain priming studies provide further evidence for the role of domain-general mechanisms in syntactic selection (Koranda et al., Reference Koranda, Bulgarelli, Weiss and MacDonald2020; Pozniak et al., Reference Pozniak, Hemforth and Scheepers2018; Scheepers, Reference Scheepers2003; Scheepers et al., Reference Scheepers, Sturt, Martin, Myachykov, Teevan and Viskupova2011, Reference Scheepers, Galkina, Shtyrov and Myachykov2019; van de Cavey & Hartsuiker, Reference Van de Cavey and Hartsuiker2016). Syntactic choices during sentence completion were found to be affected by the structure of non-linguistic primes, such as mathematical expressions (Scheepers et al., Reference Scheepers, Sturt, Martin, Myachykov, Teevan and Viskupova2011, Reference Scheepers, Galkina, Shtyrov and Myachykov2019). Exposure to mathematical equations with a parenthetical grouping (e.g. 80 – (9 + 1) × 5) increased the likelihood of completing a sentence fragment with a high-attachment relative clause structure (e.g. I saw the lights of the room that were bright), whereas exposure to mathematical equations without the grouping (e.g. 80–9 + 1 × 5) increased the probability of using a low-attachment relative clause (e.g. I saw the lights of the room that was large). Reverse cross-domain priming, from language to mathematics, was further demonstrated by Scheepers and Sturt (Reference Scheepers and Sturt2014), who found that if a primed linguistic structure was incongruent with that of a mathematical equation, participants were more likely to solve the equation incorrectly. Syntactic priming effects were also observed between music and language (Van de Cavey & Hartsuiker, Reference Van de Cavey and Hartsuiker2016). The attachment of a relative clause to a sentence fragment was primed by pitch sequences with a similar structure, lending further support to the idea that certain combinatorial mechanisms may be shared across domains. Koranda et al. (Reference Koranda, Bulgarelli, Weiss and MacDonald2020) showed that word-ordering processes in utterance planning, such as choosing between a DO and a PO dative structure, had parallels in non-linguistic action planning, such as touching left and right items on the screen. The left-first action primes increased the likelihood of using PO datives in the language task relative to the right-first action condition. Conversely, when PO datives served as primes, left-first actions were more likely to be chosen in probe trials. Given the priming effects and the fact that left-first touches were the preferred pattern in the action task and that speakers generally prefer PO over DO dative structures (Bock & Irwin, Reference Bock and Irwin1980; Bock & Loebell, Reference Bock and Loebell1990; Thothathiri et al., Reference Thothathiri, Evans and Poudel2017), it is possible that the two types of behaviour relied on the same cognitive mechanism that ‘offloads’ the more accessible material first to conserve ‘mental space’ for the more demanding content.
Together, the findings of correlational, dual-task, and cross-domain studies highlight the role of domain-general mechanisms in syntactic production. However, the evidence is not always clear-cut. First, some studies employed verbally based assessment of executive functions. For example, Hartsuiker and Barkhuysen (Reference Hartsuiker and Barkhuysen2006) found that in the low-span group, subjects produced more subject-verb agreement errors when they had to simultaneously remember linguistic stimuli. Thothathiri et al. (Reference Thothathiri, Evans and Poudel2017) assessed inhibitory control using a Stroop test, a task with a strong verbal component. Similarly, the digit span task, used to index working memory (Alloway et al., Reference Alloway, Gathercole, Willis and Adams2004; Delage & Frauenfelder, Reference Delage and Frauenfelder2020; Sung, Reference Sung2015), may entail tacit verbalisation of the to-be-remembered stimuli. Second, in some studies cognitive control measures formed latent variables or composite scores reflecting a mixture of linguistic and non-linguistic elements. In Engelhardt et al. (Reference Engelhardt, Nigg and Ferreira2013), inhibition was construed as a latent factor representing variance pooled across linguistic (Stroop task) and non-linguistic (stop-signal task and hyperactivity–impulsivity questionnaire) task scores. Third, general cognitive factors were not consistently shown to be involved in syntactic production. Veenstra et al. (Reference Veenstra, Antoniou, Katsos and Kissine2018) found that only the verbal component of working memory predicted subject–verb agreement errors. In Slevc (Reference Slevc2011), verbal working memory load imposed greater constraints on syntactic selection than non-verbal load. Moreover, some syntactic production tasks, such as sentence repetition, may conflate grammatical, semantic or lexical, and memory processes. Others, such as subject–verb agreement, mix morphological and syntactic processes. Therefore, it is not always clear which aspects of cognitive control (linguistic or non-linguistic) are relevant to which aspects of sentence construction.
2. The multi-faceted nature of inhibitory control
This brings us to the construct of inhibitory control itself. The measures that are commonly used to assess inhibitory control do not only conflate linguistic and non-linguistic elements, but may actually reflect multiple processes that are functionally distinct. In Veenstra et al. (Reference Veenstra, Antoniou, Katsos and Kissine2018), for example, the latent variable of inhibition accounted for 80% of the variance in performance on the colour-shape-switching task and for only 20% of the variance in the flanker task. While both tasks are said to involve inhibition, performance on the colour-shape-switching task may rely more heavily on the ability to adapt to the changing context (e.g. Monsell, Reference Monsell2003). Therefore, the construct of inhibition examined by Veenstra et al. may have tapped into the ability to reconfigure one’s response set to the new task rather than the ability to inhibit an invalid response code per se. The latent variable of inhibition was also disproportionately represented by the manifest variables in Nozari and Omaki (Reference Nozari and Omaki2018). In fact, only the no-go trial scores loaded significantly on the latent variable of inhibition, with the flanker, picture Stroop, and Simon effects having no predictive value for number agreement performance. In all but one of the reported studies examining the role of inhibition in syntactic production, inhibition was construed as a global cognitive function – an estimate provided by a latent variable analysis. While the method provides a ‘purer’ measure of the construct of interest by extracting the variance common to the selected inhibitory control tasks and partially out the variance due to task-specific processes, the observed differences in individual component loadings (Nozari & Omaki, Reference Nozari and Omaki2018; Veenstra et al., Reference Veenstra, Antoniou, Katsos and Kissine2018) and the modest or non-significant zero-order correlations between individual manifest variables (Engelhardt et al., Reference Engelhardt, Nigg and Ferreira2013) suggest caution in the interpretation of these results. It may be that attributing variation in linguistic behaviour to one inhibitory control function is not fully warranted.
Indeed, earlier theoretical work (e.g. Kok, Reference Kok1999; Nigg, Reference Nigg2000; Verbruggen et al., Reference Verbruggen, Stevens and Chambers2014) and more recent empirical findings (e.g. Chuderski et al., Reference Chuderski, Taraday, Nȩcka and Smoleń2012; Friedman & Miyake, Reference Friedman and Miyake2004; Pettigrew & Martin, Reference Pettigrew and Martin2014; Rey-Mermet et al., Reference Rey-Mermet, Gade and Oberauer2018; Stahl et al., Reference Stahl, Voss, Schmitz, Nuszbaum, Tüscher, Lieb and Klauer2014) have argued against a common inhibitory factor. For example, Stahl et al. (Reference Stahl, Voss, Schmitz, Nuszbaum, Tüscher, Lieb and Klauer2014) provide empirical support for three distinct sources of interference based on a temporal locus criterion: stimulus interference at the input stage, when distracting information in the environment involuntarily captures one’s attention, proactive interference at the intermediate representational stage in the form of goal-irrelevant cognitions or representations, and response interference at the output stage in the form of involuntarily activated, task-irrelevant response options. Some authors (e.g. Aron, Reference Aron2011; Nee et al., Reference Nee, Wagner and Jonides2007) propose further dissociation between inhibition at the response selection level, at which one has to choose between two or more equipotent response codes, and at the response execution level, at which one has to withhold, modify, or stop already selected response.
3. The current study
This study has two aims: 1) to address the question of whether syntactic selection is supported by cognitive control processes that are not language-specific and 2) to assess the relevance of different forms of inhibitory control for syntactic selection. In contrast to previous syntactic production studies, which employed inhibitory control measures that conflate linguistic and non-linguistic components, here we provide a more focused, that is non-verbal, assessment of inhibition. In addition, bearing in mind that the construct of inhibitory control may subsume functionally distinct processes, we adopt the temporal locus criterion framework after Stahl et al. (Reference Stahl, Voss, Schmitz, Nuszbaum, Tüscher, Lieb and Klauer2014). For this reason, we employ three inhibitory control tasks, each described in the literature as involving conflict at different levels of information processing: the arrow flanker task, the Simon arrow task, and the anti-saccade task.
The arrow flanker task is employed to capture the resolution of representational conflict. The task may also induce conflict at the level of stimulus processing, some conflict at the level of response selection, but little to no conflict at the level of response execution (Nee et al., Reference Nee, Wagner and Jonides2007; van den Wildenberg et al., Reference van den Wildenberg, Wylie, Forstmann, Burle, Hasbroucq and Ridderinkhof2010). In the task, participants are asked to respond with a right- or left-button press according to the direction of the middle target arrow, which is flanked by either compatible arrows that face in the same direction as the target or incompatible arrows that face in the opposite direction to the target. In the critical incompatible trials, conflict may be induced at an early stage involving perception, at which the perceiver has to decide which of the displayed stimuli is the middle target arrow to be attended to, or at an intermediate stage, at which flankers activate a representation of direction that conflicts with that activated by the target arrow. Alternatively, conflict may arise between responses that are mapped onto targets and distracters. Finally, a wrongly selected motor response can be blocked at the output stage.
The effect in the Simon arrow task is commonly attributed to response selection processes (e.g. Lu & Proctor, Reference Lu and Proctor1995) while it avoids interference associated with perceptual (e.g. van den Wildenberg et al., Reference van den Wildenberg, Wylie, Forstmann, Burle, Hasbroucq and Ridderinkhof2010) or representational conflict (Hommel, Reference Hommel2011). In fact, ‘the Simon effect is a particularly pure measure of the impact of a task irrelevant stimulus feature on response conflict’ (Hommel, Reference Hommel2011, p. 2). Participants are presented with a right- or left-pointing arrow on the right or left side of the computer screen. The goal is to indicate the direction of the arrow while ignoring its location by pressing the relevant button on the keyboard. In the critical conflict trials, the location of the target mismatches the direction of the arrow and the position of the hand, which should be used to press the correct key. The location of the target is thought to elicit an automatic motor response of the hand corresponding to that location. In the task, one has no problem in establishing where the target is or which direction it represents, as in the arrow flanker task, but in overcoming a prepotent motor response elicited by the target location.
There is broad agreement that the anti-saccade task is a measure of inhibition of a prepotent response (Friedman & Miyake, Reference Friedman and Miyake2004; Pettigrew & Martin, Reference Pettigrew and Martin2014; Stahl et al., Reference Stahl, Voss, Schmitz, Nuszbaum, Tüscher, Lieb and Klauer2014), reflecting conflict resolution at the response execution level. The goal of the task is to identify a target letter presented briefly on a computer screen. In the critical anti-saccade trials, participants are instructed not to look in the direction of a saccade-eliciting cue presented either on the right or left side of the screen. The inability to halt an automatic saccade, that is eye-movement, in the direction of the cue interferes with target identification, resulting in slower and more erroneous responses.
Apart from the non-verbal inhibitory control tasks, we designed a language production task in which participants generate an active or passive voice sentence by assigning grammatical roles to a verb’s arguments. In addition, we assessed speakers’ vocabulary knowledge with the WAIS-III vocabulary subscale (Wechsler, Reference Wechsler1997), taking it as a proxy for language competence. The measure was used as a control variable following the rationale that syntactic selection may rely not only on efficient inhibitory control mechanisms, but also on the speaker’s language competence.
Since we use three inhibitory control tasks, each reflecting interference at a different stage of processing, we expect performance in these tasks to be largely unrelated. Importantly, if general cognitive factors play a role in syntactic selection, particularly in situations of increased competition, performance in non-verbal inhibitory control tasks should predict the ease with which a target syntactic structure is produced. Additionally, the three types of inhibitory control should differentially contribute to sentence production. In particular, if the same mechanism underlies the stopping of an eye-movement (inhibition in the oculomotor system) and the halting of one’s verbal output (speech motor control), better performance in the anti-saccade task should be associated with fewer overt errors and/or repairs in high-interference trials of the voice production task. If interference is resolved at an earlier stage and both the selection of the correct representation (the direction of the target arrow) over irrelevant competitors in the arrow flanker task and the selection of a correct grammatical role for the verb’s arguments share the same mechanism, then better performance in the flanker task should translate into fewer speech errors or repairs and shorter sentence onset latencies in high-interference trials of the voice production task. If the pictured referents in the production task elicit automatic speech motor programmes by analogy to the stimuli in the Simon task, and the ability to select task-relevant motor programme over a competing one in both tasks is shared across the two systems (manual versus language), then it is possible that speakers who are more efficient at selecting the correct hand motor programme will also commit fewer errors and be quicker to generate the correct grammatical voice on the production task.
4. Method
4.1. Participants
Ninety-six participants (Nfemale = 78; Mage = 20.7, age range 18–43 years) were recruited at Middlesex University. All reported English to be their dominant language, but only those who were born in the UK or arrived in the country by the age of five years were included in the final analysis. All had normal or corrected-to-normal vision, with no history of neurological impairment and no cognitive deficits. Three participants had medical conditions that precluded them from completing all the tasks. Two had missing data in one of the tasks, while four performed below the 50% accuracy level on at least one of the inhibitory control tasks and were therefore excluded from analyses. The final sample therefore consisted of eighty-seven participants (Nfemale = 70; Mage = 20.6, age range 18–43 years).
4.2. General procedure
Participants were tested individually in a sound-attenuated room. After completing a brief demographic and language background questionnaire, they performed the inhibitory control tasks and the sentence production tasks, which were counterbalanced across participants according to the Latin square rule. The vocabulary test was always administered last. Participants received both oral and written instructions, which put equal emphasis on speed and accuracy of responding. Each experimental task was preceded by practice trials and an opportunity to ask questions for clarification. All tasks except the vocabulary test were administered using E-Prime 2.0 (Psychology Software Tools, Pittsburgh, PA). Responses from the inhibitory control tasks were collected using E-Prime. Responses from the production task and the vocabulary test were audio-recorded for later scoring. The data sets generated and analysed during this study are available in the Open Science Framework repository, under the link: https://osf.io/sjz6m/?view_only=c7637b99026744bc8bdb759557da6001.
5. Materials, design, and procedure for individual tasks
5.1. Inhibitory control tasks
5.1.1. Arrow flanker and Simon arrow tasks
We used the version of the task as described in Korko et al. (Reference Korko, Coulson, Jones and de Mornay Davies2021). There were two dependent measures: 1) the flanker effect quantified as the difference in mean RTs between the stimulus-incongruent, response-congruent condition and the stimulus-congruent, response-congruent condition and 2) the Simon effect quantified as the mean RT difference between stimulus-congruent, response-incongruent and stimulus-congruent, response-congruent conditions. Mean error rates were also calculated for these tasks. Larger effects denoted poorer inhibitory control.
5.1.2. Anti-saccade task
We used the version of the anti-saccade task from Ortells et al. (Reference Ortells, Noguera, Álvarez, Carmona and Houghton2016). The task design and the timing of trials are described in detail in Korko et al. (Reference Korko, Coulson, Jones and de Mornay Davies2021) and are presented in Fig. 1.
The two dependent variables were the anti-saccade effects quantified as 1) mean reaction time (RT) in the anti-saccade block minus mean RT in the pro-saccade block, and 2) mean error rate (ER) in the anti-saccade block minus mean ER in the pro-saccade block. Larger interference effects indicate poorer inhibitory control.
5.1.3. Active-passive voice production task
The active-passive voice production task was adapted from Altmann and Kemper (Reference Altmann and Kemper2006). Participants were required to produce a simple, meaningful, and grammatical sentence containing the names of the stimuli presented on the computer screen. The stimuli included two pictures of objects differing in animacy and one verb. The animate objects typically depicted a role (e.g. cleaner, soldier, and baby), while the inanimate objects were concrete items (e.g. key, bicycle, and house). The pictures were colour photographs taken from the Bank of Standardized Stimuli (BOSS) (Brodeur et al., Reference Brodeur, Guérard and Bouras2014) and the Internet. They were presented one after another, with a 1000-millisecond interval, and were followed by a verb. The latter could be either an irregular past tense form (e.g. ate, shook, and grew) requiring animate objects in the subject position, or an irregular past participle form (e.g. eaten, shaken, and grown) requiring inanimate objects in the subject position. Because the aim of the task was to elicit either an active or passive voice and the design of the task allowed for the production of alternative syntactic forms such as an active perfective (e.g. the baby had shaken a rattle), before each block participants were reminded to avoid using has, have, and had. An additional set of pictures depicting animate (e.g. gardener) and inanimate (e.g. ice cream) objects and a set of intransitive verbs in the past participle form (melted) were used as fillers. All stimuli are presented in Supplementary Table S1.
There were 96 trials divided across four experimental blocks, each containing 12 experimental and 12 filler trials. The order of trials was pseudo-randomised with the constraint that the experimental and filler trials alternated. The conditions were spread evenly across blocks, with six animate object first trials (two active and two passive) and six inanimate object first trials (three active and three passive) in each block.
Before the experiment began, participants were familiarised with the structure of the task and the experimental stimuli. Two examples of correct utterances were demonstrated by the experimenter. This was followed by a practice block in which ten practice trials, two of each condition, were randomly presented. The pictures and the verbs used in the practice block were not included in the experimental materials. Before each experimental block, participants were asked to name the experimental pictures (animate and inanimate objects) that would appear in that block. They were corrected only when the name used would not fit in the experimental context. The picture-naming practice was followed by four experimental blocks, which were separated by short breaks.
In half of the experimental trials, animate objects were presented first, followed by inanimate objects and a verb; on the other half, the order of animacy was reversed. The verb was always presented last together with a beep sound. Participants were asked to speak as soon as possible upon hearing the beep.
The stimuli (two pictures and one verb in the experimental trials and one picture and one verb in the filler trials) were presented randomly one after another in the four corners of the computer display. First, a blank screen appeared for 700 ms, followed by a fixation cross of a jittered duration (500 to 1000 ms). Immediately following this, the first object appeared, followed by the second object and then the verb with the beep sound. The stimuli succeeded one another at 1000-millsecond intervals. All remained on the screen until a vocal response was recorded. See Figs. 2 and 3 for the presentation and timing of the trials.
Participants’ responses were audio-recorded. Sentences containing errors, disfluencies, and silent pauses longer than 250 milliseconds were transcribed verbatim and coded for disfluency types. These included repairs (e.g. the cat broke… the vase was broken by the cat), sound prolongations (e.g. theee… farmer, and fffflew), hesitations or filled pauses (e.g. ehm… the grass…), and repetitions (the boy‥ the boy saw the ghost). Instances of each category of disfluency were counted in each condition. Incorrect responses included utterances containing has, have, or had (e.g. the clown had blown a bubble) or ungrammatical sentences (e.g. the clown blown a bubble) and responses that were initiated before the beep sound or lasted longer than 6000 milliseconds. Latency to begin speaking was measured manually using Audacity® 2.2.1 recording and editing software. The cursor was placed at the beginning of the beep sound, and distance was measured to the correct mention of the subject phrase. Ten per cent of the speech samples were randomly selected and coded by a second independent rater. The inter-rater agreement was excellent for repairs, hesitations, and repetitions (100%) and for silent pauses (91.7%) and acceptable for prolongations (80.6%).
5.1.4. Vocabulary task
In the vocabulary subscale of WAIS-III (Wechsler, Reference Wechsler1997) participants provide definitions to a list of words (e.g. ‘Tell me what consume means’). Before the experiment, we had decided that the first seven items were not discriminating enough for a group of students and so the original list was shortened to 26 items. Participants were told that the task was not a speeded task and that there were no penalties for wrong answers. They were allowed to skip any unknown words. Participants’ responses were audio-recorded and scored according to the manual, with 2 points awarded for a correct and complete answer, 1 point for a correct but incomplete answer, and 0 points for an incorrect or no answer. The maximum score was 52 points.
5.2. Statistical analysis
To test for interference effects in the inhibitory control tasks, correct trial reaction times (RTs) and error rates (ERs) from the high- and low-interference conditions were analysed with repeated-measures t-tests. Untrimmed RTs were log-transformed to correct potential distributional problems. For ease of interpretation, however, untransformed data are reported and illustrated in the figures.
Responses in the active-passive voice production task that contained errors, those initiated before the beep sound, or those with speech onsets longer than 6000 milliseconds were excluded from the analysis. This impacted 8.7% of active animate-first trials and 6.1% of inanimate-first trials and 10% of passive animate-first trials and 11.1% of inanimate-first trials. Repetitions, hesitations, and vowel prolongations were aggregated into one category (‘other disfluencies’) because there were too few instances of each category in the transcribed speech samples. The remaining data were analysed with a series of repeated-measures analyses of variance (2 x 2 ANOVAs)Footnote 1 for subjects and for items, with repairs, silent pauses, other disfluencies, and sentence onset latencies as dependent variables and grammatical voice (active vs passive) and ordering of referents (animate-first versus inanimate-first) as independent variables.
Pearson’s bivariate correlation analyses were performed on all principal measures, that is vocabulary scores, interference effects across the inhibitory control tasks, and repairs and sentence onset latencies in the active-passive voice production task. Correlation coefficients are presented in the Supplementary Table S2 for trimmed but non-transformed RT data and Supplementary Table S3 for untrimmed log-transformed RT data. A principle component analysis and multiple regression analyses were conducted on log-transformed RT data.
6. Results
6.1. Inhibitory control tasks
The interference effects obtained in the inhibitory control tasks were all statistically significant. Table 1 shows the means, standard deviations, t-statistics, p-values, and effect sizes for low- and high-interference conditions in each task. Mean RTs and ERs per condition per task are also presented in Fig. 4.
Abbreviations: Diff, difference between conditions; ER, error rate (in per cent); RT, reaction time (in milliseconds).
Table 2 presents means and standard errors of the rates of repairs, pauses, other disfluencies, and sentence onset latencies per condition in the active-passive voice production task, with differences between the conditions, and p-values (Bonferroni corrected) for pairwise comparisons (see also Fig. 5).
Note: %, per cent of occurrences; p-values for pairwise comparisons (after Bonferroni’s corrections).
Abbreviations: diff, difference between conditions; ms, milliseconds.
Analysis of repairs showed a main effect of grammatical voice [F1(1,86) = 127.9, p < .001, ŋp2 = .598; F2(1,46) = 313.79, p < .001, ŋp2 = .882], a main effect of referent ordering [F1(1,86) = 12.79, p = .001, ŋp2 = .129; F2(1,46) = 8.84, p = .005, ŋp2 = .174], but no interaction [F1(1,86) = 1.99, p = .162, ŋp2 = .023; [F2(1,46) = 4.03, p = .051, ŋp2 = .088]. This replicates the results reported by Engelhardt et al. (Reference Engelhardt, Corley, Nigg and Ferreira2010), who also reported a higher proportion of repairs for inanimate-first past participle trials. Analysis of silent pauses revealed no main effect of grammatical voice [F1(1,86) = 1.53, p = .219, ŋp2 = .018; F2(1,46) = 2.59, p = .115, ŋp2 = .058], no main effect of referent ordering [F1(1,86) = 3.49, p = .065, ŋp2 = .039; F2(1,46) = 1.53, p = .223, ŋp2 = .035], but a significant interaction between grammatical voice and referent ordering for subjects [F1(1,86) = 10.42, p = .002, ŋp2 = .108; F2(1,46) = 8.27, p = .006, ŋp2 = .164]. Analysis of aggregated disfluencies (repetitions, prolongations, and hesitations) showed a main effect of grammatical voice for subjects [F1(1,86) = 4.92, p = .029, ŋp2 = .054], but not for items [F2(1,46) = 2.55, p = .118, ŋp2 = .057], a main effect of referent ordering for subjects [F1(1,86) = 6.81, p = .011, ŋp2 = .073], but not for items [F2(1,46) = 2.79, p = .103, ŋp2 = .062], and a significant interaction for subjects [F1(1,86) = 19.54, p < .001, ŋp2 = .185] and for items [F2(1,46) = 7.23, p = .010, ŋp2 = .147]. Analysis of sentence onset latencies showed a main effect of grammatical voice [F1(1,86) = 172.11, p < .001, ŋp2 = .667; F2(1,46) = 158.48, p < .001, ŋp2 = .790], a main effect of referent ordering [F1(1,86) = 66.06, p < .001, ŋp2 = .434; F2(1,46) = 15.94, p < .001, ŋp2 = .275], but no significant interaction between the two [F1(1,86) = .1.08, p = .301, ŋp2 = .012; F2(1,46) = 1.86, p = .180, ŋp2 = .042].
7. Exploratory factor analysis
To test the multidimensional structure of inhibitory control, a principal component analysis (PCA) was conducted on the six measures of inhibitory control (flanker effect ER, flanker effect RT, Simon effect ER, Simon effect RT, anti-saccade effect ER, and anti-saccade effect RT) with direct oblimin rotation. Three factors emerged with eigenvalues above 1.00. The three factors combined explained 70.62%. Table 3 summarises the PCA results, including the factor loadings after rotation. The flanker effects (ER and RT) loaded onto the first factor, which we interpreted as representing the resolution of representational conflict. The Simon effects (ER and RT) clustered onto the second component, which we identified as resolution of response selection conflict. Finally, interference effects in the anti-saccade task (ER and RT) were loaded onto the third component, which we interpreted as inhibition of response execution.
Note: Factor loadings above .40 appear in bold. The PCA was conducted on untrimmed log-transformed RT data.
The factors had fairly small relationships with each other, as shown in Table 4. This suggests that the factors represent constructs that are largely independent.
Note: Rotation method – oblimin with Kaiser normalisation.
8. Regression analyses
To assess the relationship between the production of syntax and different types of non-verbal interference resolution abilities, we entered the six inhibitory control measures as predictors into multiple regression analyses while controlling for vocabulary knowledge. We focused on the production of repairs and sentence onset latencies as outcome variables, as these differed consistently across the active and passive voice conditions of the production task and were therefore considered reliable markers of syntactic difficulties, in contrast to silent pauses and other disfluencies. We ran regression analyses for passive voice trials only, as none of the inhibitory control measures correlated with repairs or sentence onset latencies in active voice trials (see Supplementary Tables S2 and S3). In Table 5, we report the results of the four regression analyses.
Note: Regressions were conducted on untrimmed log-transformed RT data. Significant predictors appear in bold.
The WAIS vocabulary score and the inhibitory control measures together accounted for 28% (inanimate-first) and 21% (animate-first) of the variation in the production of repairs in passive voice sentences. The anti-saccade effect (RT), t(79) = 2.9, p = 0.005, and the flanker effect (ER), t(79) = 3.1, p = 0.003, were significant predictors of repairs in inanimate-first trials above and beyond vocabulary knowledge. This was also the case in animate-first trials, in which the anti-saccade effect (RT), t(79) = 3.5, p < .001, and the flanker effect (ER), t(79) = 2.05, p = 0.044, made unique contributions to the production of repairs. We observed that speakers who were slower to suppress their eye saccades towards irrelevant cues in the anti-saccade task made more overt grammatical role assignment errors that were spontaneously corrected. Also, those who were less efficient in dealing with conflicting representations of the arrows’ direction in the flanker task tended to repair their passive voice utterances more often. The Simon effect made no significant unique contribution to the model.
In terms of timing, the WAIS vocabulary score and the inhibitory control measures together accounted for 17% (inanimate-first) and 21% (animate-first) of the variation in passive sentence onset latencies. Despite significant correlations between the anti-saccade effect (RT), r = 0.222, the Simon effect (RT), r = 0.219, and sentence onset latencies in passive inanimate-first trials, and between the flanker effect (ER and RT), r = 0.302, r = 0.228, the Simon effect (RT), r = 0.259, and sentence onset latencies in passive animate-first trials, none of the inhibitory control measures reliably predicted the delay to begin formulating passive voice utterances above and beyond vocabulary knowledge (all ps > 0.05). The unique contribution of the flanker effect (ER) to the speed with which sentences were uttered was only marginally significant, t(79) = 1.7, p = 0.09.
9. Discussion
The question of interest in the present study was whether domain-general inhibition would contribute to syntactic selection during active-passive sentence production. Additionally, having argued against a unitary inhibitory control function, we assessed the relative contribution of three different forms of inhibition to the ease with which active-passive sentences were generated: the ability to resolve representational conflict (i.e. to choose the correct representation over an incorrect but dominant one), the ability to resolve interference at the response selection stage (i.e. to select the correct response code over an incorrect but prepotent one), and the ability to inhibit response execution (i.e. to stop an initiated but irrelevant response).
In line with the proposal that interference can arise at different stages of information processing, from stimulus perception, through response preparation to response execution, we found that the assessed inhibitory control functions were largely independent. Critically, non-verbal measures of cognitive control predicted the speed and fluency of syntactic production, with two inhibitory functions uniquely contributing to grammatical voice selection under prepotent competition. Those participants who handled non-verbal interference more effectively, whether by inhibiting prepotent eye saccades towards a distracter in the anti-saccade task or suppressing competing representations of an arrow’s direction in the flanker task, made fewer utterance repairs. Participants who dealt more efficiently with conflicting information in the flanker task were also quicker to begin their passive voice utterances.
The fact that non-verbal measures of inhibition predicted the ease with which passives were produced provides support for the claim that under conditions of prepotent competition, whether syntactic or non-syntactic, a general interference resolution mechanism may be at work. This mechanism may allow the speaker to override a prepotent tendency to place animate entities in higher syntactic positions, just as it may allow a respondent looking at a row of arrows to overrule a prepotent but incorrect direction representation triggered by the flanking stimuli. Direct comparisons between the findings of the current study and previous studies investigating the link between general inhibition and expressive syntax are difficult to make, since syntax production research has to date relied either on verbal assessment of inhibition or on measures conflating linguistic and non-linguistic factors. Nevertheless, parallels can be drawn with studies investigating syntactic comprehension abilities. A number of such studies have linked non-verbal inhibition with syntactic processing of complex (e.g. Kaushanskaya et al., Reference Kaushanskaya, Park, Gangopadhyay, Davidson and Weismer2017) and ambiguous garden-path sentences (Woodard et al., Reference Woodard, Pozzan and Trueswell2016). In a neuroimaging study reported by Ye and Zhou (Reference Ye and Zhou2009), the left IFG appeared to respond to non-verbal conflict (flanker task) in a similar manner as it did to syntactic conflict while participants had to choose the correct meanings of ambiguous sentences. Interpretations of ambiguous sentences were also reliably more accurate when preceded by flanker-conflict trials compared with no-conflict trials (Hsu et al., Reference Hsu, Kuchinsky and Novick2021).
It is also possible to draw parallels between the present finding of the relationship between domain-general inhibition and expressive syntax and those of neuroimaging studies that showed the same brain areas (the IFG and MTG) to be involved in syntactic production and non-linguistic behaviours, particularly in conflicting situations. By inference, and in view of the current data, the same cognitive operations may be shared across syntactic and non-syntactic domains when computational demands are high. To draw on neural reuse theories (Anderson, Reference Anderson2010; Asano et al., Reference Asano, Boeckx and Fujita2022), it may be that in routine situations, speakers rely on habitual linguistic behaviours, using default structural options, such as the active voice, which are guided by language-specific mechanisms, but in situations of increased cognitive demands, for example, under prepotent competition, they deploy non-language-specific mechanisms shared across domains. As explicated by Asano et al. (Reference Asano, Boeckx and Fujita2022), the brain may become more specialised in certain skills in the course of its development, relying less and less on shared cognitive resources, but drawing on these resources under exceptional circumstances when things are unexpected, ambiguous, or the demands become too great.
Moreover, the present study revealed a unique contribution of inhibition of response execution to the production of repairs in passive sentences. In particular, individuals who were better at identifying targets in the anti-saccade task, and, hence, controlled their eye saccades more effectively, tended to produce fewer repairs in high-interference passive voice trials. Repairs in the production task are likely to reflect difficulties with grammatical function assignment (Engelhardt et al., Reference Engelhardt, Nigg and Ferreira2013). Self-corrected errors such as The burglar… the bike was stolen by the burglar or Clea‥ eeh‥ key was hidden by the cleaner suggest that speakers have failed to suppress the prepotent tendency to insert animate referents in sentence-initial subject position and begin to produce a context-incompatible utterance only to suspend it in midstream, reassign the grammatical roles, and move the animate referent to the lower syntactic position, that is that of an object. The positive association between inhibition at the response execution level and the production of utterance repairs above and beyond language competence and other forms of inhibition permits the following interpretation. Motor inhibitory abilities may be most critical when a highly activated referent or lemma is wrongly assigned the nominative role – a tendency driven by probabilistic use of the active voice and the animacy bias – reaches the output buffer, and is either suppressed in time or articulated, leading to an overt error and its subsequent repair. Those who are faster to suppress an incorrect motor response, such as a saccade towards an irrelevant cue, may well halt their speech sufficiently early to avoid uttering and then repairing an incorrect phrase. As an aside, some speakers produced an active sentence with the past perfect had (e.g. The postman had forgotten the letter) when a passive was intended. Using the disallowed form could also involve some inhibitory processes because the speaker would have to battle against the use of the more probable past perfect had, which conflicts with the situation-specific demands. This type of error was extremely rare, however, so using it as an outcome variable in a regression analysis would be unwarranted.
The unique contribution of the flanker effect to passive sentence repairs and onset latenciesFootnote 2 when language competence and other types of inhibition have been controlled for, as shown in the present study, suggests that interference can also arise and be resolved at an internal representational level. Higher error rates and slower responses in the arrow flanker task are taken to reflect the poorer resolution of the stimulus-level conflict that arises due to a mismatch between representations of the target (the middle arrow) and the distractors (the flanking arrows). It could be that the co-activation of visual or conceptual representations of the flanking arrows dominates the activation of the middle arrow representation, delaying its selection as a target. By analogy, assignment of the nominative role to the most active but context-inappropriate lexical representation, that is automatically placing a dominant lemma in the subject position when the given verb form dictates otherwise, may slow down correct function assignment. From the current data, it is not possible to determine how the selection is accomplished, whether through lateral inhibition or Luce’s ratio principle, but it is reasonable to assume that some kind of interference resolution mechanism is in operation that facilitates the selection of the intended grammatical structure at a representational (intermediate) level of processing.
Contrary to predictions, inhibition of response selection, as indexed with the Simon arrow effect, did not reliably predict passive sentence production under prepotent competition. By analogy to interfering representations, a context-incompatible speech motor programme could, in fact, be automatically triggered by the display of animate referents and either suppressed in time or executed, leading to an overt error. A possible explanation for why no relationship was found between this inhibitory factor and the production of passives is that the task was not susceptible to the same level of interference as the other tasks. Indeed, the size of the Simon arrow effect was rather small. A larger sample would therefore be required to detect a relationship if it were to exist at the population level.
The conclusions drawn from the findings in this study are based on the premise that the inhibitory control tasks measure what they purport to measure, that is resolution of conflict that arises at different stages of information processing. However, the fact that they have traditionally been used as measures of inhibition does not preclude the possibility that they tap other salient processes that may ultimately obscure inhibitory effects (see the ‘task impurity’ problem; Burgess, Reference Burgess and Rabbitt1997). Some control tasks, such as Stroop, flanker, and Simon, may engage processes that are primary to inhibition, for example, conflict monitoring. As a result, performance on these tasks may be affected by variations in conflict detection rather than inhibitory control per se. Indeed, it is possible that, despite having selected an incorrect response code, those who scrutinise their covert behaviour more closely may well detect and correct the erroneous response before it is executed. Efficient monitoring abilities would thus be associated with fewer overt corrections. Chevalier et al. (Reference Chevalier, Chatham and Munakata2014) highlighted the importance of monitoring for contextual cues in inhibiting an ongoing action. Their data revealed that training participants in context monitoring so they could more efficiently detect cues indicating the need for stopping resulted in better performance on the stop-signal task than training participants in response to stopping itself. It is important that future research teases apart these processes and assesses their relative contributions to the speed and ease with which grammatical utterances are produced.
To conclude, uttering a sentence in an active-passive voice production task is ostensibly widely different from identifying a target by inhibiting an oculomotor response in an anti-saccade task or by controlling conflicting representations of arrows’ direction in a flanker task. Nevertheless, the ease with which syntax is generated, at least under increased competition, appears to rely on broad interference resolution mechanisms that are shared across syntactic and non-syntactic (non-verbal) domains. These mechanisms, as our study attests, should not be treated uniformly, but rather as separate component processes, each making its own unique contribution to the speed and fluency of speaking. One action may involve ‘slamming on the brakes’ when an erroneous response has been selected and is already on its way, and another may involve ‘blocking’ a competing representation before it is ready for release.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/langcog.2023.44.
Acknowledgments
The first author was supported by a Middlesex University PhD scholarship.
Data availability statement
The data sets generated and analysed during this study are publicly available in the Open Science Framework repository, under the link: https://osf.io/sjz6m/?view_only=c7637b99026744bc8bdb759557da6001.
Competing interest
The authors declare none.
Financial disclosure
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.