1. Introduction
It is well established that crosslinguistic influence (or transfer) from one’s first language (L1) can affect the acquisition of subsequently learned languages (L2s), on all language levels, from phonology to discourse (Mitchell et al., Reference Mitchell, Myles and Marsden2013, p. 16). However, for specific language phenomena, it remains unclear under which circumstances they transfer. This is the case for so-called verb-second (V2) word order in the Germanic languages, as previous studies have found contradictory results (e.g. Håkansson et al., Reference Håkansson, Pienemann and Sayehli2002; Sayehli, Reference Sayehli2013; Bohnacker, Reference Bohnacker2006). It also remains unclear whether learners with L2 English (a non-V2 language) have less V2 transfer (Dahl et al., Reference Dahl, Listhaug, Busterud, Leal, Isabelli and Shimanskaya2022). V2 word order is traditionally described as notoriously difficult to master for L2 learners (Bolander, Reference Bolander, Halliday, Gibbons and Nicholas1990; Hagen, Reference Hagen1992), but it may not be equally challenging to all learners, as the learners’ language background and the sentential contexts are possible contributing factors (e.g. Johansen, Reference Johansen2008). To examine the complexity of V2 acquisition, we thus focus on both factors in one study.
Here, we present the first large-scale study of V2 in written L2 Danish, focusing on the order of verb and subject in sentences where the subject is not in the first position. Our study is the first V2 corpus study that uses inferential statistics to understand the separate contributions of the learners’ language background (primarily their L1), the learner’s proficiency level and the length of the sentence constituents. The latter is considered a rough measure of complexity, as longer constituents, or constituents consisting of multiple words, are in general assumed to be associated with higher processing costs than short ones or single words (Rayner, Reference Rayner1998). The study uses a mixed methods approach, as the quantitative studies are supplemented by a qualitative study of the sentence constituents involved.
2. Background
2.1. V2 word order in Danish
Across the world’s languages, V2 is rare, but all Germanic languages (apart from modern English) have V2 word order. In V2 languages, ‘the finite verb is obligatorily the second constituent, either specifically in main clauses or in all finite clauses’ (Holmberg, Reference Holmberg, Kiss and Alexiadou2015, p. 342). The Danish sentences below have different constituents in the first position. The subject of the main clause is in bold, and the finite verb in bold and underlined. The function of the constituent in the first position is in parentheses. All except (3) are grammatical and have the verb in the second position. In (1), the order is subject-before-verb as in English, but in (2) and (4)–(6), the first position is occupied by a non-subject constituent. Since the verb must appear in the second position, the subject occurs after the verb.
Sentence (3) has ungrammatical V3 word order, because two constituents (both the adverbial and the subject) precede the verb. V3 word order is common in L2 production (L2 Norwegian: Brautaset, Reference Brautaset1996; Hagen, Reference Hagen1992; Johansen, Reference Johansen2008. L2 Swedish: Bolander, Reference Bolander, Hyltenstam and Obler1989; Hammarberg & Viberg, Reference Hammarberg and Viberg1977; Håkansson, Reference Håkansson, Hyltenstam and Lindberg2004. L2 Danish: Holmen, Reference Holmen, Hyltenstam and Viberg1994; Lund, Reference Lund1997; Søby & Kristensen, Reference Søby and Kristensen2019. L2 German: Bohnacker, Reference Bohnacker2006; Håkansson et al., Reference Håkansson, Pienemann and Sayehli2002; Sayehli, Reference Sayehli2013). Among second language researchers, there are different explanations why V2 should be hard to acquire. The influential Processability Theory by Pienemann (Reference Pienemann1998) is based on generative grammar and argues that XVS word order involves a complicated movement, demanding high cognitive capacity: SVX and ungrammatical *Adv-SVX (that is V3) have a more basic word order which is easier to process and thus easier to produce than XVS (V2). In Pienemann’s hierarchy of acquisition, V2 is acquired late – if ever. This theory is not compatible with a functional-cognitive theoretical basis as ours (Engberg-Pedersen et al., Reference Engberg-Pedersen, Fortescue, Harder, Heltoft and Jakobsen1996; Harder, Reference Harder2006). From a functional-cognitive perspective, constituents are not moved from an original position in deep structure. Instead, the nonsubject is seen as occurring in an initial position for functional reasons, not as having been moved there from an underlying basic position. For instance, nonsubjects may occur in the first position with the function of expressing linkage to previously mentioned referents (Kristensen, Reference Kristensen2013) as in (5) and (6). A functionalist alternative to Pienemann (Reference Pienemann1998) is the explanation by Lund (Reference Lund1997, p. 162), who argues that there is little communicative pressure to acquire V2 word order for declarative sentences compared to interrogative sentences, as V2 word order in declarative clauses does not have ‘any semantic or pragmatic function’. In Danish interrogative sentences, such as (7) and (8), there is more communicative pressure, as the use of verb-before-subject indicates that the mood of the sentence is interrogative.
Some previous studies of CLI from one V2 language to another suggest that crosslinguistic influence from the L1 may reduce the difficulties (Lund, Reference Lund1997, on L2 Danish; Johansen, Reference Johansen2008, on L2 Norwegian), but these studies do not use inferential statistics, and Johansen (Reference Johansen2008) only includes learners on an intermediate proficiency level. Other V2 to V2 studies do not find clear evidence of initial L1 transfer (e.g. Dahl et al., Reference Dahl, Listhaug, Busterud, Leal, Isabelli and Shimanskaya2022; Håkansson et al., Reference Håkansson, Pienemann and Sayehli2002; Sayehli, Reference Sayehli2013, on L2 German), maybe because transfer also occurred from other L2s. For instance, influence from other non-V2 L2s may increase the difficulties (e.g. Bohnacker, Reference Bohnacker2006, on L2 German). Studies of L2 Swedish and Norwegian indicate that the production of correct V2 word order is not equally challenging in all sentential contexts (Swedish: Bolander, Reference Bolander, Hyltenstam and Obler1989. Norwegian: Brautaset, Reference Brautaset1996; Hagen, Reference Hagen1992; Johansen, Reference Johansen2008). These studies call for further investigations of how the complexity of the clause may affect V2 production. A better understanding of which sentential contexts are challenging to learners can help L2 teachers of V2 languages focus their teaching.
Language teachers of L2 Danish, L2 Swedish and L2 Norwegian are faced with similar challenges, as the three languages are mutually intelligible (perhaps with ‘some initial difficulty’ (Vikør, Reference Vikør2015)), and all have similar use of V2 word order, but with some variation in grammaticality of V3. In Swedish, for instance, a few focalizing adverbials can be placed between the subject and verb: Hun [S] bare [A] ville [V] låna min cykel ‘She just wanted to borrow my bike’ (Bohnacker, Reference Bohnacker2006, p. 455), and some Norwegian dialects, most notably in northern Norway, allow V3 word order in wh-question structures (Westergaard et al., Reference Westergaard, Vangsnes and Lohndal2017). Such use of V3 is ungrammatical in Danish, and the acquisition of V2 word order in L2 Danish may therefore be slightly different.
Although sentences with V3 are comprehensible despite the ungrammatical word order, the use of correct V2 word order does seem important to ensure smooth communication with L1 users. An error detection study by Søby et al. (Reference Søby, Ishkhanyan and Kristensen2023a) found that L1 users notice incorrect V3 word order more frequently than other types of grammar anomalies, such as confusion of verb inflections and missing gender agreement in NPs. V3 word order also disrupts L1 users’ online processing, as found for Swedish by means of EEG (Andersson et al., Reference Andersson, Sayehli and Gullberg2019; Sayehli et al., Reference Sayehli, Gullberg, Newman and Andersson2022; Yeaton, Reference Yeaton2019) and for Norwegian by means of eye-tracking (Søby et al., Reference Søby, Milburn, Kristensen, Vulchanov and Vulchanova2023b).
2.2. Crosslinguistic influence and proficiency level
Crosslinguistic influence (CLI) is here used interchangeably with transfer, referring to ‘the ways in which a person’s knowledge of one language can affect his or her learning, knowledge and use of another language’ (Jarvis, Reference Jarvis, Golden, Jarvis and Tenfjord2017, p. 2). As mentioned in the introduction, it is well established that grammatical CLI occurs in L2 acquisition, and research on CLI has generally shifted towards discussing the role of previously learned languages beyond the L1 (that is L3 acquisition) (e.g. Bardel & Falk, Reference Bardel and Falk2007).
Since previous studies on V2 to V2 transfer have found mixed results concerning L1 transfer, we find it relevant to examine the transfer of word order in this specific context. Furthermore, we wish to examine how large the effects of CLI are for V2 production compared to other contributing factors such as proficiency level and constituent length. Johansen (Reference Johansen2008) examined sentences with non-initial subjects in a standardized test for 100 learners of Norwegian (The Language Test for Adult Immigrants) from the ASK test corpus (ASK, 2015), roughly corresponding to the B1 CEFR level (Tenfjord et al., Reference Tenfjord, Jarvis, Golden, Golden, Jarvis and Tenfjord2017, p. 3; Council of Europe, 2001). The learners generally had a high rate of V2 in the sentences, 86.5%, but the 20 L1 users of Dutch and German had even higher success rates, 98.4% and 100%. The other eight language groups ranged from 69.4% to 93.6%. In the only previous study on L2 Danish, Lund’s (Reference Lund1997) longitudinal study of six learners (2 with L1 Dutch, 2 with English, 1 Spanish and 1 Portuguese), only the two Dutch speakers, whose L1 is a V2 language, achieved ‘some stability’ in producing V2 in declarative clauses during their first 5.5 months with Danish classes (Lund, Reference Lund1997, p. 158). These findings support the idea that having V2 in one’s L1 makes it easier to produce V2 in an L2 – and potentially already in the early stages of learning. Contrarily, Håkansson et al. (Reference Håkansson, Pienemann and Sayehli2002) found that Swedish pupils produced V3 in L2 German, although both are V2 languages. Bohnacker (Reference Bohnacker2006) argued that the use of V3 by the pupils in Håkansson et al. (Reference Håkansson, Pienemann and Sayehli2002) could be due to syntactic transfer from English (which all pupils learn as an L2 prior to learning German). In order to control for influence from English, Bohnacker (Reference Bohnacker2006) compared oral production data from six adult Swedish learners of German; three monolinguals learners, and three with prior knowledge of English. The learners with no knowledge of English did not produce V3 in their German, but those with L2 English did, suggesting that learners can transfer the property of V2 from their L1, in contrast to Håkansson et al. (Reference Håkansson, Pienemann and Sayehli2002), but that ‘L2 knowledge of a non-V2 language (English) may obscure this V2 transfer’ (Bohnacker, Reference Bohnacker2006, p. 444). Using a different method, acceptability judgements, Dahl et al. (Reference Dahl, Listhaug, Busterud, Leal, Isabelli and Shimanskaya2022) examined transfer from L1 Norwegian and L2 English in the acquisition of verb placement, including XVS word order, in L3 German. Their results, however, ‘did not show evidence which would indicate that wholesale transfer had taken place from either L1 or L2 at the earliest stages of L3 acquisition’ (Dahl et al., Reference Dahl, Listhaug, Busterud, Leal, Isabelli and Shimanskaya2022, p. 211). Sayehli (Reference Sayehli2013) examined L1 transfer of verb placement from Swedish to L3 German (L2 English) among 61 pupils (12–16 years, across 4 school years), using a repetition task and an oral elicitation task. She found no evidence of L1 transfer of XVS in any task (Sayehli, Reference Sayehli2013, p. 86).
Studies of V2 to non-V2 languages have found indications of transfer of verb-second syntax (XVS) in the opposite direction, that is in L2 English from either L1 German and Dutch (Rankin, Reference Rankin2012), and from L1 Norwegian (Westergaard, Reference Westergaard, Foster-Cohen and Pekarek-Doehler2003). Finally, Stadt et al. (Reference Stadt, Hulk, Sleeman, Bardel and Sánchez2020) found transfer of V2 from L1 Dutch to L2 French.
Since most learners in Scandinavia learn a V2 language after having acquired a non-V2 language (English), it is relevant to examine the role of L1 transfer in this specific language-learning context. Learners whose L1 has V2 order (e.g. Dutch or German) will almost certainly have some proficiency in English. So how does the interplay between CLI from the L1 and the L2s affect word order acquisition (XVS word order) in a new V2 language? This is not clear from previous studies, especially for early stages of language acquisition in V2 to V2.
Even though CLI can occur on all language levels from phonology to discourse, the odds of encountering CLI in learner data are greatest when ‘the target language is related to a language the learners have already mastered’ (including the L1) and when ‘the feature is frequent in the learners’ L1’ (Jarvis, Reference Jarvis, Golden, Jarvis and Tenfjord2017, p. 14). Both of these factors apply to learners of L2 Danish with a Germanic V2 language as their L1, as Danish is typologically close to the Germanic V2 languages, and as clauses with non-initial subjects are common in e.g. Danish, Swedish and German (Bohnacker & Rosén, Reference Bohnacker and Rosén2008; Fabricius-Hansen & Solfjeld, Reference Fabricius-Hansen and Solfjeld1994; Kristensen, Reference Kristensen2013; Westman, Reference Westman1974).
To our knowledge, there are no quantitative V2 to V2 studies comparing the role of CLI from different L1s on V2 production on different proficiency levels. There are a few studies (Bolander, Reference Bolander, Halliday, Gibbons and Nicholas1990; Brautaset, Reference Brautaset1996) that report a progression in V2 production with increasing proficiency, but they do not differentiate between users with V2 and non-V2 background. In Johansen (Reference Johansen2008), all the 100 learner texts were at B1 level. The interplay between the learner’s proficiency level and language background is therefore unaccounted for.
2.3. The role of sentence complexity
Sentence-internal factors may also influence the production of V2 in learner language. Previous studies have investigated the role of the sentential context for L2 Swedish (Bolander, Reference Bolander, Hyltenstam and Obler1989, Reference Bolander, Halliday, Gibbons and Nicholas1990; Hyltenstam, Reference Hyltenstam1978) and L2 Norwegian (Brautaset, Reference Brautaset1996; Hagen, Reference Hagen1992; Johansen, Reference Johansen2008), but not for L2 Danish. Comparing the results of these studies is difficult. Some studies use categories where materials (e.g. subordinate clause) and syntactic functions (e.g. object) are intertwined (Bolander, Reference Bolander, Hyltenstam and Obler1989, Reference Bolander, Halliday, Gibbons and Nicholas1990; Brautaset Reference Brautaset1996). Other studies are mainly qualitative (Hagen, Reference Hagen1992; Johansen, Reference Johansen2008), and it seems that only Hyltenstam (Reference Hyltenstam1978) has used inferential statistics to test effects of a few complexity factors.
Besides those of Bolander, the studies are based on written production, both essays and elicited material (see Table 1). The largest studies are those of Hyltenstam (160 learners), Bolander (60 learners) and Hagen (38 learners). Johansen’s (Reference Johansen2008) qualitative analysis only includes 19 learners. The learners in the studies are on different proficiency levels, with some studies comparing the same learners at different stages (Brautaset, Reference Brautaset1996; Hagen, Reference Hagen1992; Hyltenstam, Reference Hyltenstam1978). As shown in Table 1, most studies do not include learners with V2 background (Bolander, Reference Bolander, Halliday, Gibbons and Nicholas1990; Brautaset, Reference Brautaset1996; Johansen, Reference Johansen2008), or they do not report the L1s (Hagen, Reference Hagen1992). As these previous studies have not attempted to study the separate contributions of sentence-external factors, such as the learner’s language background versus the sentence-internal factors, such as sentence complexity, it is still an open question if the role of sentence complexity applies equally to all learners.
Most studies have focused on the role of the constituents in the first three sentence slots when examining whether there are favorable and unfavorable contexts for producing XVS word order. They include a mixture of structural (syntactic functions, material, complexity of the material, frequency) and semantic descriptions.
As shown in Table 1, the studies differ with respect to the principles for categorization of sentential constituents, with respect to modality (written versus oral language) and task type. Their conclusions also differ, probably due to methodological differences and the different approaches to categorization. It is, for instance, not clear if nominal versus pronominal subjects affect the share of V3. Hagen (Reference Hagen1992) proposes a general hypothesis: sentences with V3 generally include features which burden the language user’s capacity to process information to a higher degree than the sentences with V2 (Hagen, Reference Hagen1992, p. 34). In the following, we summarize some key findings regarding the first constituent, the subject and the verb.
2.3.1. The role of constituents in first position
Bolander (Reference Bolander, Hyltenstam and Obler1989, p. 76) found a high share of correct V2 after objects in first position, 82%. However, her data is oral L2 Swedish, and many of the object-initial sentences are of the type det tror jag or det tyckar jag ‘That I think’ which could be learned as chunks. In her study, the most common first constituent (in sentences with non-initial subjects) are adverbs, which only have an accuracy rate of 36%. For subordinate clauses in first position, the share of V2 is much lower (19%). In the correct sentences, Brautaset (Reference Brautaset1996) found similar shares of adverbials consisting of subordinate clauses versus adverbial phrases (although with different tendencies on different proficiency levels). It should be noted that Brautaset only reports the distribution in correct sentences, leaving out important information about the sentences where learners have produced V3 word order.
Hagen (Reference Hagen1992) almost exclusively found adverbials in first position in sentences with non-initial subjects. Based on his data, he hypothesized that learners produce more V3 when the constituent in first position is heavy (measured by the number of words). Brautaset (Reference Brautaset1996) could not confirm this hypothesis, as she found a higher share of long adverbials (more than one word), 88.7%, versus short (one word) adverbials, 86.3%, but again only in the correct V2 sentences.
2.3.2. The role of the subject
Previous studies have mainly investigated pronominal versus nominal subjects, and the results are inconclusive. Bolander (Reference Bolander, Halliday, Gibbons and Nicholas1990) reports the highest shares of V2 in speech when the subjects were NPs (56% V2) or first-person pronouns (43%), compared to e.g. second or third-person pronouns (22%). However, the shares may be different in written production. Brautaset (Reference Brautaset1996) found higher shares of pronominal subjects (89.8%) than nominal subjects (79.6%) in the V2 sentences across levels. Hyltenstam (Reference Hyltenstam1978) did not find differences between pronominal and nominal subjects. Finally, Hagen (Reference Hagen1992) hypothesized that a subject which is either a pronoun or long and heavy, favors V2. The last part seems contradictive to his general hypothesis on processing load, but it is not explained further.
2.3.3. The role of the verb
Previous studies have mainly compared single verbs to complex verbs (such as a modal followed by an infinitive). Hyltenstam (Reference Hyltenstam1978) did not find differences, but Brautaset (Reference Brautaset1996) generally found higher shares of single verbs than complex in the V2 sentences, apart from on the highest proficiency level.
Bolander (Reference Bolander, Halliday, Gibbons and Nicholas1990) reports that V2 is often found with verbs expressing opinion or belief, as in the examples with det tyckar jag ‘That I think’. These results may, however, be specific to oral V2 production where frequent and fixed (chunk-like) OVS sequences are more dominant than in written language (Kristensen, Reference Kristensen2013). Hagen (Reference Hagen1992) hypothesized, based on this findings, that frequent or short verbs favor V2.
2.3.4. Sentence complexity across constituents
To conclude, methodological differences in the previous qualitative and quantitative V2 corpus studies make it difficult to characterize the role of sentence complexity across the board. Still, many observations concerning the first three sentence constituents in relation to the production of V2 versus V3 evolve around complexity (sometimes intertwined with frequency, cf. Johansen, Reference Johansen2008) and indicate that V3 is more frequent for complex XVS sequences. This idea resonates with Skehan’s (Reference Skehan1998) Limited Capacity Hypothesis, which argues that learners, because of limited attentional resources, constantly have to balance between focusing on the accuracy or the complexity of their output.
It is also not clear if the tendencies are the same for non-V2 learners and for learners whose L1 is also a V2 language. It is therefore relevant to systematically study the role of sentence complexity with unequivocal principles for categorization and compare the role of sentence complexity for learners with a V2 versus non-V2 background.
2.4. The current study
In the current cross-sectional study, we examine the production of V2 versus V3 in texts written by students on three different test levels enrolled in an official Danish language program. Sentences with initial subjects are not a specific challenge to learners of V2 languages, and the study therefore only covers sentences with non-initial subjects. Using a statistical model, we test whether there are effects on V2 production of (1) learners’ test level, (2) learners’ L1 and (3) the length of the first constituent, the verb and the subject.
-
– Hypothesis 1. We expect to find a progression with increasing test level, reflected in a higher share of V2. This hypothesis is based on findings from Brautaset (Reference Brautaset1996) and Bolander (Reference Bolander, Halliday, Gibbons and Nicholas1990) who report a progression in V2 production with increasing proficiency.
-
– Hypothesis 2. We expect that learners with another V2 language as their L1 (V2 learners) have a higher share of V2 than learners with an L1, not featuring V2 (non-V2 learners). This hypothesis is based on the study by Johansen (Reference Johansen2008), who found higher shares of V2 for V2 learners than for non-V2 learners.
-
– Hypothesis 3. We expect that increasing complexity (operationalized as length) of the three first constituents will affect the share of V2 production negatively.
-
– 3.1. We expect that the length (number of words) of the first constituent negatively affects V2 production. From previous studies, it is unclear whether heavy constituents in general affect V2 production, but Bolander (Reference Bolander, Halliday, Gibbons and Nicholas1990) found lower shares of V2 after subordinate clauses.
-
– 3.2. We expect that the length of the subject, operationalized as one word versus multiple words, negatively affects V2 production. Previous studies have compared pronominal versus nominal subjects with contradictory results. This comparison is to some extent intertwined with our comparison between one versus multiple words.
-
– 3.3. We expect that the length of the verb, operationalized as single versus complex (multiple words), negatively affects V2 production (based on findings from Brautaset (Reference Brautaset1996)).
-
3. Method
3.1. Data collection
All official Danish language programs consist of a series of modules. After each module, students need to pass a module test to continue in the program. We collected the written module tests at the school Copenhagen Language Center in 2017 to 2018, as part of a larger research project. This means that the data were not collected specifically for this V2 study. The written tests consisted of 1–2 writing assignments, which varied according to the test level (cf. Supplementary Table S6). The students’ handwritten texts were digitized and anonymized.
In total, texts from 217 students were collected (138 women; mean age 30.9 years, SD 7.2 years) (Søby, Reference Søby2023; Søby & Kristensen, Reference Søby and Kristensen2019). The participants had around 52 different L1s (cf. Supplementary Table S5). The five most dominant L1s were English (N = 38), Spanish (N = 16), German (N = 13), Portuguese (N = 12) and Russian (N = 11). All participants had knowledge of English as either their L1 or as an additional language learned after the L1, and Danish was therefore not always the second language that they had required chronologically. Twenty-four students had another V2 language as their L1 (German (N = 13), Dutch (N = 7), Icelandic (N = 2), Afrikaans (N = 1) and German/Russian (N = 1)). Supplementary Table S5 also provides an overview of the learners’ Danish Program, module and test level. For simplicity, the learners’ texts are divided into three test levels. Level 1 was taken after approximately 5 months of teaching at the language school and should correspond to CEFR level A2 (N = 137) (MII, 2019). Level 2 was taken after around 8 months (N = 51). Level 3 was taken after around 12 months and should correspond to CEFR level B1 (N = 29). The distribution between test levels is skewed, as most tests were level 2. The module tests only give rough estimates of students’ proficiency and do not necessarily reflect the actual levels of the learners. We do not know if students passed their test or not. Students may also be more proficient than indicated by their current test level.
4. Analysis
4.1. Markup principles and exclusion criteria
All declarative clauses with non-initial subjects (XVS/*XSV) and all interrogative sentences ((X)VS) were marked. Due to the creative and sometimes surprising nature of learner language, we based the markup on two principles. Firstly, we ignored morphological and orthographical anomalies, as well as anomalous word choice, and focused on the order of what we interpreted as a plausible verb and a plausible subject. For instance, examples (9–10) are both considered as correct V2 word order – in spite of the anomalous morphological form of the constituents. In (9), two non-finite verbs, the participles tabt ‘lost’ and haft ‘had’, are used instead of finite verbs (e.g. past tense tabte ‘lost’ and havde ‘had’), but they are correctly placed in second position. In (10), the subject os ‘us’ is in oblique form instead of nominative vi ‘we’, but is correctly placed after the verb.
Secondly, the punctuation is not always consistent with the content of the sentence. In these cases, the classification is based on the content of the sentence. For example, (11) was tagged as an interrogative sentence despite the lack of a question mark, because the interrogative pronoun hvad ‘what’ is used.
Finally, we excluded 35 sentences from the analysis for two reasons. Sentences with the adverb måske ‘maybe’ in first position were excluded (N = 11), because the adverb both can be succeeded by verb-subject (12) and subject-verb word order (13) (Beijering, Reference Beijering2010; Boye, Reference Boye2005). Thus, we cannot determine the success rate in this context.
Furthermore, 24 sentences were excluded because the prescribed word order could not be decided. The majority of these cases (N = 18) included the word så ‘so/then’, which can either be used as a conjunction (introducing main or subordinate clauses), or an adverbial. When så is used as a coordinating conjunction, it is used to convey a result or a consequence and is followed by subject–verb word order as in (14). In (15), så is used as an adverb (‘then’) followed by verb–subject word order (both non-authentic examples).
In some cases, we could not determine which of the two meanings the learner intended to use, and we were therefore unable to determine if the word order was correct. Example (16) is written by an Icelandic learner, and here så is followed by verb-subject. The intended meaning of så may either be to convey a consequence (in which case så should have been used as a coordinating conjunction with subject-verb order) or to convey the meaning ‘therefore’ or ‘then’ (in which case så is an adverb in first position requiring verb–subject word order). Due to this ambiguity, example (16) was excluded from the analysis.
In total, 491 declaratives and 158 interrogatives were included in the analysis. We included interrogatives when tagging word order, because previous studies have compared learners’ use of VS in declaratives and interrogatives (e.g. Lund, Reference Lund1997).
4.2. Statistical models
Data were analyzed using generalized linear mixed models for binomial data in RStudio (R Core Team, 2021, ver. 2022.02.1), including the lme4 package (Bates et al., Reference Bates, Maechler, Bolker and Walker2015, ver. 1.1.27.1). P-values were obtained using the lmerTest package (Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017, ver. 3.1.3) and this formula:
With this model, we examine the proportion of V2 in relation to the total number of topicalized sentences, and how it is affected by the three independent variables of the study. The fixed effects were the length of the first constituent (number of words in first position), subject length (one word versus multiple words), whether the verb was complex (single versus complex), test level (1–3), and L1 (non-V2 versus V2). For subject and verb length, we expected that the main challenge for learners would be to process more than one word, but for first constituents (which vary in length and usually exceed one word), we expected that the length (and not just the categorical difference between one versus multiple words) could have an effect. The model also included random intercepts for participant. Comparisons were coded using sum contrasts (Schad et al., Reference Schad, Vasishth, Hohenstein and Kliegl2020), so that non-V2 learner was coded as -0.5 and V2 learner was coded as 0.5. For subject length, one word was coded as -0.5, and multiple words as 0.5. For the verbs, not complex was coded as -0.5 and complex as 0.5.
Twenty-two non-V2 learners indicated that one of their L2s was a V2 language (German, Swedish, Norwegian or Dutch). They did not indicate their proficiency level of these additional L2s. To control whether exposure to a V2 language (other than the L1) improved V2 production in Danish, we carried out a post-hoc test comparing non-V2 learners with and without previous exposure to V2. We ran the same model as mentioned above on a dataset where the V2 learners were excluded. Instead of L1 (V2 versus non-V2), we included knowledge of V2 as a fixed effect. No knowledge was coded as -0.5, and V2 knowledge as 0.5.
Both models were fitted using the following procedure: We first tried to include all variables mentioned above. Variables were removed one by one to see whether the fit improved. The fit was best when all were included. Due to the small amount of data for the V2 learners, an interaction between L1 and test level was not included.
5. Results
The first three subsections contain the descriptive statistics for test level (5.1), language background (5.2) and sentence types (5.3). The results of the statistical model are presented in Section 5.4. The results are concluded with a further analysis of sentence constituents in 5.5 and an analysis of overuse of V2 in 5.6. The learners produce almost no V3 anomalies in the interrogative clauses (99% V2), so the results focus on the use of V2 and V3 in declarative sentences.
5.1. Production patterns on test levels 1–3
Figure 1A shows the production patterns for all participants. On test level 1, there is a relatively high share of participants who do not produce declarative sentences with non-initial subjects (38%), but the share decreases on level 2 (24%). On level 3, all participants produce XSV/XVS. Of the 217 participants, the most common production pattern is to only produce V2 (87 learners, 40%). From test level 1 to 2, the share increases, but then decreases again from level 2 to level 3.
The share of participants who use both V2 and V3 is small on levels 1 and 2 (both 8%), but increases drastically on level 3 (59%), where everyone produces sentences with non-initial subjects. The share of participants who only use V3 is 19% on level 1 and gradually decreases for higher levels. On level 3, just one Greek learner only produces V3.
5.2. Individual variation and L1 patterns
Figure 1B shows the number of V2 and V3 sentences for each of the 153 participants who produced declarative sentences with non-initial subjects. For the non-V2 learners, there is individual variation in the number of XVS sentences (and the success rate) per participant, but generally, the number of V2 sentences increases drastically on test level 3.
Seven V2 learners do not produce sentences with non-initial subjects and are not represented in the figure. Of the remaining 17 V2 learners, only two produce V3. As seen in Figure 1B, the learner on level 3 produces five out of six of the V3 sentences produced by the entire learner group. The general pattern with a large increase in the number of V2 sentences on level 3 is also found for this learner group.
Supplementary Table S7 shows the distribution on test level and L1 for the 153 learners producing sentences with non-initial subjects. The V2 group has a lower share of level 1 learners (35% versus 58%) and a higher share of learners on level 2 (41% versus 24%) and 3 (24% versus 18%).
5.3. Distribution of V2 and V3 on sentence types
Table 2 shows the share of V2 in declarative clauses and interrogative clauses. As mentioned, the interrogative clauses have consistently high accuracy rates. For the 491 declarative clauses with non-initial subjects, one out of four sentences have ungrammatical V3 word order.
Table 2 also provides an overview of the syntactic functions of the different sentence constituents found in first position in the corpus. The first constituent is almost always an adverbial (similar to Hagen’s (Reference Hagen1992) findings for Norwegian). Only six sentences have objects in first position, most of them with V2. Although the share of V2 is seemingly higher after objects than after adverbials, in line with Bolander’s (Reference Bolander, Hyltenstam and Obler1989) findings, the numbers are small and any conclusion uncertain. Finally, the category ‘After other’ contains nine sentences, all with V2. In three of these sentences, the first constituent is the complement of a preposition, as shown in Table 2. This category also contains sentence intertwinings (Poulsen, Reference Poulsen2008) (N = 2), as seen in (17) where det is a topicalized constituent from a subordinate clause.
Finally, this category contains sentences in which the syntactic role of the first constituent cannot be decided (N = 4), e.g. (18) in which Når i skolen ‘when in the school’ is anomalous in Danish. It may be intended as a subordinate clause Når vi er i skolen ‘When we are in the school’.
5.4. Model results: the role of L1, test level and constituent length
Table 3A shows the results for the statistical model. As expected, we found an effect of L1 background (hypothesis 2) (p < 0.01), so that V2 learners had higher accuracy than non-V2 learners. Contrary to our expectations, the effect of test level (hypothesis 1) did not reach statistical significance, although it trended (p = 0.055). The nonsignificant effect of test level is small compared to the effect of L1, as seen in the estimates. If the estimates are transformed to probabilities, the probability for V2 is 83% when all fixed effects are set to their baseline. The probability increases to 89% when test level increases from 1 to 3. The probability increases to 98% when L1 changes from non-V2 to V2 (with all other fixed effects set to their baseline). Supplementary Figure S2 provides an overview of the distribution of V2 versus V3 sentences on test levels and L1s. The effect of L1 is clearly illustrated, as most sentences produced by V2 learners have V2 word order.
Note: Significance codes: *** p < 0.001, ** p < 0.01, * p < 0.05,. p < 0.1. (A) shows model results (number of observations = 491, participants = 153). (B) shows the share of correct V2 in declarative sentences. (C) shows model results for the post-hoc test which only included the non-V2 learners (number of observations = 428, participants = 136).
As shown in Table 3A, we also found effects of constituent length. We found a small effect of the number of words of the constituent in first position (hypothesis 3.1) (p < 0.05). The more words there were in first position, the more V3 we found. To examine whether the effect was carried by subordinate clauses in first position, we ran the model again on a dataset without the 81 constituents containing a subordinate clause. The effect disappeared, suggesting that the length of constituents, besides those which contain a subordinate clause, may not matter (cf. Section 6.2) (see model results in Supplementary Table S8). Moreover, we found the expected effect of subject length (hypothesis 3.2) (p < 0.05), so that when the subject consists of multiple words (versus one word), the more V3 sentences are produced. The effect of verb length (hypothesis 3.3) was small and only trended toward significance (p = 0.072).
5.4.1. Post hoc test for L1 and test level
Table 3B shows the share of V2 for V2 learners versus non-V2 learners on the different test levels. For non-V2 learners, the share of V2 gradually increases from test level 1 to 3. For V2 learners, the share of V2 is high in general across levels, but the number of learners per level is small. As Table 3B showed different patterns for V2 and non-V2 learners, we carried out a post-hoc test on the dataset without the V2 learners. We conducted the post-hoc test for two reasons. First, we suspected that test level and verb length had an effect on V2 production for the non-V2 learners, but not for the V2 learners who had high accuracy in general. Second, we wanted to test whether non-V2 learners performed better when they had previous knowledge of a V2 language (e.g. from learning Swedish or German prior to learning Danish). Participants had indicated their L2s, but not their proficiency level in these L2s. Therefore, it is not clear if all participants were at a level where they were able to produce XVS in these languages. Table 3C shows the model results for this test. The effect of test level reached statistical significance (p < 0.05) – the higher the level, the more V2 is produced. The p-value for verb length decreased to 0.05965 and is thus still not significant. There was no effect of having previous knowledge of an L2 with V2 word order.
5.5. Further analysis of constituents
As seen in Section 5.3, the first constituent is often adverbial. In this section, we present results from an analysis of the materials of the first three constituents in declarative clauses and how these are linked to the distribution of V2 versus V3. We also examine the semantic content of the adverbials in first position.
5.5.1. Length of constituents in first position
Table 3A suggests that it is a challenge to produce V2 after long constituents containing a subordinate clause (N = 81). In (19), the adverbial subordinate clause is followed by V3. Fifty-one of the sentences with a subordinate clause in first position have V2 word order, that is 63%. In comparison, the share of V2 after constituents not containing a subordinate clause is 78% (N = 404, main clauses only). The V2 learners seemingly do not have challenges here, as only one V2 learner (level 1) produces a sentence with V3 in this context, out of 13 examples distributed across levels. Test level also seems to play a part, as the share of V2 after subordinate clauses increases on level 3 (76% of 45 sentences), compared to level 1 (48% of 21 sentences) and 2 (47% of 15 sentences).
5.5.2. Subject length
Table 4A provides an overview of the shares of V2 in sentences with subjects consisting of one word versus multiple words. For the V2 learners, subject length does not seem to affect accuracy. Most of the one-word constituents are personal pronouns (410 of 491), so the comparison between one word and multiple words is assumed to be largely correlated with a comparison between pronominal and non-pronominal subjects, though there are also one-word subjects like Anna, danskere ‘Danes’ and kommunen ‘the municipality’. The non-V2 learners do not produce many multiword subjects on test levels 1 and 2, but especially on level 3, the share of V2 is lower when subjects are longer. The multiword subjects range from two-word constituents like min bror ‘my brother’, mange folkene ‘many people’ and Champions league to longer constituents like en god transport systemer ‘a good transportation system’, and min polsk-dansk ordbog, rød plastiken peberfrugter og to nøglen ‘my Polish-Danish dictionary, red plastic peppers and two keys’.
Note: The total shows the number of declarative sentences (that is both correct and incorrect). (A) shows the share of V2 with one-word versus multi-word subjects (in %). (B) shows the share of V2 with single versus complex verbs (in %). (C) shows the share of V2 for sentences with adverbials in the first position (excluding sentences where the first position contains a subordinate clause).
5.5.3. Verb length
Table 4B shows the distribution of V2 in sentences with single finite verbs versus complex verbs. Complex verbs consist of a finite and nonfinite verb, e.g. present perfect or a modal plus infinitive. In (20), the verb is complex. The finite verb is in second position, and the nonfinite verb occurs after the subject. In (21), the subject precedes the finite verb, resulting in V3.
As seen in Table 4B, both the V2 learners and non-V2 learners tend to have higher shares of V2 in sentences with single verbs than with complex verbs, but the effect did not reach statistical significance.
5.5.4. Semantics of adverbials and verbs
Table 4C shows an explorative analysis of the semantic content of the sentence-initial adverbials (excluding those with subordinate clauses). Typical examples of time/frequency expressions are: nu ‘now’, i dag ‘today’, i 2017 ‘in 2017’, nogle gange ‘sometimes’, and combinations of næste/sidste gang/uge/mandag ‘next/last time/week/Monday’. Place expressions are e.g. her ‘here’, i Norge ‘in Norway’ and i parken ‘in the park’. Argumentative adverbials are those used when arguing for or against something, e.g. derfor ‘thus’ and på den ene/anden side ‘on the one/other hand’. Attitudinal adverbials denote one’s attitude towards something, such as heldigvis ‘luckily’ and desværre ‘unfortunately’. For V2 learners, we have little data, and semantic content does not seem to affect V2 production. For non-V2 learners, the share of V2 seems higher when the adverbials denote time/frequency or place than when argumentative or attitudinal adverbials are used, across test levels. The reason that attitudinal and argumentative adverbials like derfor ‘thus’ have a higher share of V3 for the non-V2 learners could also be that they are interpreted as conjunctions.
5.6. Overuse of V2: V1 word order
Another type of word order anomaly found in the corpus is overuse of V2, or perhaps V1 word order. There are 61 cases in total, all produced by non-V2 learners. Typically, overuse of V2, that is verb-before-subject for subject-before-verb, occurs in subordinate clauses (N = 42). In 40 out of these 42 cases, overuse occurs after words functioning as conjunctions. In (22), the learner produces verb-before-subject word order in the subordinate clause after the conjunction når ‘when’, but correct subject-before-verb word order after the main clause conjunction og ‘and’.
The overuse of V2 could be related to difficulties with distinguishing between adverbials in first position (which must be followed by the finite verb) and conjunctions (which do not occupy the first position). There are 19 cases of V1 word order in main clauses, primarily after conjunctions, but in 6 cases sentence-initially, as in (23).
6. Discussion
In our study of written L2 Danish, interrogative sentences had a consistently high share of V2, whereas one out of four declarative sentences with non-initial subjects had incorrect V3 word order. Section 6.1 discusses the role of proficiency and learner L1 for the production of V3 in declarative sentences (hypothesis 1 and 2), while Section 6.2 addresses the effects of constituent length (hypothesis 3.1, 3.2 and 3.3). On the basis of our results, we argue that movement and transfer are not sufficient to explain the variation in the use of V2. Other frameworks are needed to explain why V3 is more common in specific contexts and with specific types of constituents.
6.1. Proficiency and L1: Are the effects caused by CLI of V2?
We found an effect of learners’ L1 (V2 versus non-V2, in line with previous studies on syntactic CLI of XVS (L2 Norwegian: Johansen, Reference Johansen2008. L2 English: Rankin, Reference Rankin2012; Westergaard, Reference Westergaard, Foster-Cohen and Pekarek-Doehler2003. L2 French: Stadt et al., Reference Stadt, Hulk, Sleeman, Bardel and Sánchez2020), but in contrast to the findings of Håkansson et al. (Reference Håkansson, Pienemann and Sayehli2002), Sayehli (Reference Sayehli2013) and to some extent Bohnacker (Reference Bohnacker2006)). In our study, V2 learners had a lower share of V3 than non-V2 learners (supporting hypothesis 2), and overuse of V2 only occurred in texts by non-V2 learners. These effects were found, even though most participants were beginners. To our knowledge, the only previous study of L1 influence on V2 production in L2 Danish is Lund (Reference Lund1997), who compared oral and written V2 production for four non-V2 learners to two Dutch learners. Our study thus contributes to research on syntactic CLI in a new target language using statistical models. Interestingly, the fact that all learners in the corpus had learned L1 or L2 English prior to learning Danish did seemingly not impede transfer of V2 (as in Bohnacker, Reference Bohnacker2006, and potentially Håkansson et al., Reference Håkansson, Pienemann and Sayehli2002; Sayehli Reference Sayehli2013). Our analysis was based on an existing corpus with inherent limitations. We found a significant effect of whether the L1 was a V2 language or not, yet the corpus was not balanced with respect to language background – only 24 learners had a V2 language as their L1. Learner level was not balanced either in the corpus and simply estimated based on which modules participants attended, not on their actual test score. With these reservations in mind, our data indicate that proficiency level plays a minor role compared to language background, since test level had no significant effect on production of V2. V2 learners in the corpus were generally on higher levels, but the test level did not seem to affect the share of V2 for this learner group. To test this claim, future studies based on corpora with a better balance of L1s and proficiency level, and with better testing of proficiency level, are needed. We did not have enough data to include an interaction between L1 and test level in our model. This would be possible on a larger and more balanced dataset.
The question of how early in L2 acquisition CLI can occur is interesting in the light of the Developmentally Moderated Transfer Hypothesis (Håkansson et al., Reference Håkansson, Pienemann and Sayehli2002), which argues that all learners will progress along general developmental trajectories and thus prefer XSV word order initially, but that this preference will gradually change to a preference for XVS, leading to XSV and XVS being equally preferred on intermediate stages (Sayehli, Reference Sayehli2013). This also predicts a faster acquisition for V2 learners, but in order for transfer to occur, the learner’s development should still have reached the appropriate stage (Sayehli Reference Sayehli2013). Our data cannot verify or falsify the existence of this stage, but if such a stage exists, it must be reached early, according to our data from early learners.
We cannot rule out the possibility that the higher accuracy for V2 learners compared to non-V2 learners is not due to CLI, but to a general benefit of learning a language closely related to one’s L1 (Jarvis, Reference Jarvis, Golden, Jarvis and Tenfjord2017). Future contrastive studies of V2 learners and non-V2 learners may address this question. If V2 learners have a general benefit, they should not only be more accurate than non-V2 learners but also have higher accuracy for grammatical features of Danish that are not shared with the L1 of the V2 learners.
6.2. V2 is not always difficult – length of constituents
The patterns in our data suggest that V2 is not challenging in all contexts. In interrogative sentences, learners almost consistently produced the verb in second position, suggesting that the specific function of word order in the utterance is relevant to V2 production, or that CLI from e.g. English facilitates production, as English has a similar order of auxiliaries and subjects in interrogatives. Finally, learners may produce interrogative sentences by means of prefabricated memorized chunks circumventing syntactic processes (see e.g. Christiansen & Chater (Reference Christiansen and Chater2016) on the role of chunking in sentence processing). In declarative sentences, the chunking and semantic complexity may also affect V2 production, but these factors are difficult to operationalize for analysis of naturally occurring texts. What is complex and what is an established chunk may vary from individual to individual. Instead, we focused on constituency length, which is easily operationalized and has some overlap with the concepts of semantic complexity and with chunking. In line with previous studies of complex constituents in V2 languages (e.g. Bolander, Reference Bolander, Hyltenstam and Obler1989; Hagen, Reference Hagen1992), our study shows that the use of V3 increased for sentences with lengthy constituents. The share of V2 was negatively correlated with both the number of words in first position (hypothesis 3.1) and with the length of the subject (hypothesis 3.2.). The effect of verb length (hypothesis 3.3), however, only trended towards significance. The general decrease in accuracy for clauses with lengthy constituents gives empirical support to Skehan’s (Reference Skehan1998) Limited Capacity Hypothesis, which argues that learners are less accurate when they produce complex output due to limited attentional resources. Even though our study is merely correlational, there may be a causal link between the complexity of the constituents and accuracy rates, as hypothesized by Skehan (Reference Skehan1998). Our study contributes to the debate on the role of complexity by examining a new target language, including more participants and testing effects of length of the first three constituents by means of a statistical model. Further investigations are needed to test whether the challenges with heavy constituents in first position are driven by subordinate clauses, and not long constituents in general. Our qualitative analysis of the first constituents’ material indicated that V2 after subordinate clauses was only challenging for non-V2 learners, and that test level might positively affect V2 production in this context. Previous studies of L2 Swedish have also found indications of sentence-initial subordinate clauses being particularly challenging for V2 production (Bolander, Reference Bolander, Hyltenstam and Obler1989), but this pattern was not replicated in L2 Norwegian (Brautaset, Reference Brautaset1996). This may be due to Brautaset’s method, as she only examined the share of subordinate clauses versus adverbial phrases in correct V2 sentences. An efficient method for investigating the matter further could be a fill-the-gap task among non-V2 learners contrasting short and long first constituents (of which the latter both contained subordinate clauses versus phrases). A fill-the-gap task would also have the advantage of controlling for the confound variables of the current study, such as differences in prompts, time limits and text lengths. It would also compensate for a shortcoming in the number of data points in our study, where many participants produce only one XVS/XSV sentence.
Our subject length measure distinguished between one word and multiple words. Most of the one-word subjects were personal pronouns, and the comparison therefore resembles the comparison of pronominal subjects and nominal subjects found in previous studies. It may not be the length itself that increases the cognitive load, but the accessibility and content of the subject or the subject’s tendency to be processed as part of a verb-subject chunk. In our descriptive analysis, only the non-V2 learners were negatively affected by subject length, but the V2 learners produced very few complex subjects. Thus, further studies are needed to uncover this potential difference between the two learner groups.
Our study focused on constituent length only and did not investigate other relevant complexity measurements, e.g. frequency or uniqueness (e.g. used in Johansen, Reference Johansen2008). It may be that both the frequency of the individual constituent and the frequency of the entire ‘chunk’, that is how often the three constituents appear together, affect V2 production. For the non-V2 learners, we found that semantics seemed to play a part for the sentence-initial adverbials (subordinate clauses excluded), as temporal and spatial adverbials had higher shares of V2 than argumentative and attitudinal adverbials. This could be related to frequency – if temporal adverbials are frequent in first position (with V2) in learners’ input, it may be easier for the learners to produce V2, particularly, the more chunk-like the XVS sequence is. In practice, however, it is difficult to operationalize chunks in learner language due to the lack of large learner corpora in Danish and highly individual patterns of chunking.
6.3. Applications of the study
As mentioned in the introduction, V2 is traditionally described as notoriously difficult to master for L2 learners (Bolander, Reference Bolander, Halliday, Gibbons and Nicholas1990; Hagen, Reference Hagen1992). The current study provides a more nuanced picture by providing evidence that V2 word order in L2 Danish is not equally challenging for V2 and non-V2 learners. Furthermore, factors such as proficiency level and constituent length affect the accuracy of V2 production. This knowledge could be of didactic value to language instructors in Danish as a second language – and potentially other V2 languages as well.
It is common for popular textbooks and teaching materials (e.g. Slotorub & Moreira, Reference Slotorub and Moreira2011, Reference Slotorub and Moreira2014; Thorborg & Riis, Reference Thorborg and Riis2010, which were used at the language school) to introduce Danish word order as subject-before-verb – with sentences with non-initial subjects as an exception. Often, the term inversion is used, indicating that XVS word order is a special case or exception from word order in general, although the use of XVS order is widespread in V2 languages. Instead of seeing XVS as an exception, it may be beneficial to introduce V2 as the basic Danish word order. It might also be useful to emphasize that constituents in first position vary in complexity (phrases and especially subordinate clauses), especially for non-V2 learners. Likewise, for the subjects, both pronouns and NPs with gradually increasing complexity can be introduced, and for the verbs, both single and complex verbs can be introduced. In relation to the overuse of V2, the difference between sentence-initial conjunctions and adverbials could also be a focus point for non-V2 learners.
7. Conclusion
The study shows that V2 is not difficult per se for all learners and not in all contexts. Learners with another V2 language as their L1 had higher accuracy than non-V2 learners, and for non-V2 learners, the share of V2 increased with proficiency. Finally, we found effects of the length of the first constituent (measured as the number of words) and subject length (one word versus multiple words), so that the share of V2 decreased significantly with increasing length. The study adds knowledge of didactic value to Danish language instructors, by highlighting that V2 is not difficult for all learners, and that the complexity of the constituents involved plays a part.
Supplementary material
To view supplementary material for this article, please visit http://doi.org/10.1017/S1366728924000518.
Data availability statement
The data and code for this study are openly available at https://github.com/ResearchXX/verb-second.
Acknowledgements
The study was financed by Independent Research Fund Denmark. Thanks to student assistants Julie Johanna Hansen, Kasper Rud Jensen, Caroline Ørum-Hansen and Maja Mittag for tagging the L2 corpus. Special thanks go to Byurakn Ishkhanyan for statistical support and to the anonymous reviewers for their helpful comments.
Competing interest
The authors declare none.