1. Aims of this study
It has been frequently argued that language users tend to reduce the cost-to-benefit ratio during language use (Gibson et al., Reference Gibson, Futrell, Piantadosi, Dautriche, Mahowald, Bergen and Levy2019; Hawkins, Reference Hawkins2004; Jaeger & Tily, Reference Jaeger, Buz, Fernández and Cairns2011; Levshina, Reference Levshina2018; Levshina & Moran, Reference Levshina and Moran2021). An important strategy that helps to use language more efficiently is to choose less costly forms to express more predictable (accessible, typical, frequent, discourse-given, etc.) information, and more costly forms to express less predictable information. One form is less costly than another if its duration is shorter and/or it requires less articulation effort, which depends on the number of segments, amount of articulatory detail and prosodic prominence. Note that duration and articulation effort are correlated. Using less costly forms for predictable meanings is possible because language users know that they can rely on the interlocutor’s ability to infer relevant information from linguistic cues and context under the assumption of cooperative efficient behaviour (Levinson, Reference Levinson2000; Levshina, Reference Levshina2018). This ability is based on the mechanisms of social cognition and theory of mind, although in many cases, the efficient choices become conventional and automatic as a result of repeated use (Bybee, Reference Bybee2010; Diessel, Reference Diessel2019).
Examples of efficient use of shorter and longer forms can be found in the lexicon (Mahowald et al., Reference Mahowald, Fedorenko, Piantadosi and Gibson2013; Piantadosi, Tily, & Gibson, Reference Piantadosi, Tily and Gibson2011; Zipf, Reference Zipf1949), morphosyntax (Haspelmath, Reference Haspelmath2021; Kurumada & Jaeger, Reference Kurumada and Jaeger2015; Levy & Jaeger, Reference Levy, Jaeger, Schlökopf, Platt and Hoffman2007) and phonology (Cohen Priva, Reference Cohen Priva, Abner and Bishop2008; Hall et al., Reference Hall, Hume, Jaeger and Wedel2018; Jaeger & Buz, Reference Jaeger and Tily2017; Seyfarth, Reference Seyfarth2014). For example, it is possible to name one and the same person using different referential expressions, for example, she, the professor, Professor Smith or Professor Caroline Smith from the English department. Their choice depends on how accessible the referent is at the given point in discourse (Ariel, Reference Ariel, Sanders, Schliperoord and Spooren2001). In morphosyntax, there are studies of use and omission of complementiser and relativiser that, which demonstrate that the predictability of the clause given the matrix verb or head noun/adjective increases the chances of that-omission (Jaeger, Reference Jaeger2010; Kaatari, Reference Kaatari2016; Wasow, Jaeger, & Orr, Reference Wasow, Jaeger, Orr, Simon and Wiese2011). For instance, the omission of that is more likely after hope than after show because the former is more commonly followed by a complement clause than the latter, for example, I hope (that) everything will be just fine (Jaeger, Reference Jaeger2010). As for subject-auxiliary contractions, such as she’s or they have, their rate is also determined by diverse predictability measures, for example, predictability of the subject given the auxiliary or predictability of the next verb given the subject and auxiliary (Barth, Reference Barth2019; Frank & Jaeger, Reference Frank, Jaeger, Love, McRae and Sloutsky2008). Phonological reduction is also determined by predictability, measured in very different ways. For example, Fowler & Housum (Reference Fowler and Housum1987) found effects of repetition on the duration of content words in a narration. Bell et al. (Reference Bell, Brenier, Gregory, Girand and Jurafsky2009) report a significant effect of different types of conditional probability – given the previous context or the next item, as well as word frequency and repetition, on reduction of words. Cohen Priva (Reference Cohen Priva, Abner and Bishop2008) shows that oral and nasal stop deletion in English is influenced by the phones’ average informativity (i.e., the negative log-transformed probability of a phone given all the phones that precede it in the same word, averaged across every instance of the phone in the corpus), even when frequency and context-specific predictability are controlled for (see also Seyfarth, Reference Seyfarth2014).
Importantly, the efficient use of variants presupposes their functional equivalence. However, this requirement contradicts the Principle of Contrast (Clark, Reference Clark and MacWhinney1987), which is also known as the Principle of No Synonymy (Goldberg, Reference Goldberg1995) in Construction Grammar. It is closely related to the principle of isomorphism, or ‘one meaning, one form’ (Bolinger, Reference Bolinger1977; Haiman, Reference Haiman1980). According to this principle, two formally distinct forms should also differ functionally. That is, they should not be fully interchangeable in a given context. More specifically, the difference can be in register (e.g., buy vs. purchase), dialect (e.g., lorry vs. truck), connotation (e.g., curious vs. nosy), construal (e.g., The policewoman arrested the thief vs. The thief was arrested by the policewoman) and so forth (Goldberg, Reference Greenacre2019, pp. 25–26). We will use Goldberg’s formulation in this paper because includes stylistic and sociolinguistic differences, which are the main focus of our study.
The Principle of No Synonymy is based on pragmatic reasoning and enabled by statistical pre-emption in language acquisition (Clark, Reference Clark and MacWhinney1987; Goldberg, Reference Goldberg1995, Reference Greenacre2019). Speakers and addressees adhere to the principle ‘What is not said, is not’ (Levinson, Reference Levinson2000): in the presence of a salient alternative to some expression, a language user will not over-extend the meaning of this expression to include the meaning of that alternative. This is why the interpretation of ‘some chocolates’ is not extended to include ‘all chocolates’. According to Goldberg (Reference Greenacre2019, p. 26), the fact that speaker does not need to choose between fully equivalent forms has advantages for language production because unbiased decisions are more difficult to make.Footnote 1
There is abundant evidence that children always try to infer a contrast between two different lexical or grammatical forms. For example, a child may first use the word ‘dog’ to refer to cats, sheep, horses and other animals. But as soon as they learn the word ‘cat’, they automatically stop over-extending the label ‘dog’ to cats (Clark, Reference Clark and MacWhinney1987). In language change, if one source construction has two or more formally distinct variants, or if some variant appears in addition to the already existing one (due to phonological change, borrowing, etc.), the resulting constructions should divide their semantic or pragmatic ‘labour’ and go their own ways.
Unfortunately, there is no clear definition of what nuances qualify as a difference in meaning, and forms or structures can differ or overlap at different levels (cf. Uhrig, Reference Uhrig2015). Laporte, Larsson, & Goulart (Reference Laporte, Larsson and Goulart2021) suggest that the principle holds less reliably at low levels of formal description.Footnote 2 Moreover, linguistic variation is usually probabilistic, as has been demonstrated in numerous studies of grammatical alternations (e.g., Bresnan et al., Reference Bresnan, Cueni, Nikitina, Baayen, Bouma, Kraemer and Zwarts2007; Gries, Reference Gries2003; Szmrecsanyi & Hinrichs, Reference Szmrecsanyi, Hinrichs, Nevalainen, Taavitsainen, Pahta and Korhonen2008), which means that there should be a certain degree of freedom, even if one form can be strongly preferred to another in a given context. Still, there is broad consensus that variation between alternate forms should be motivated and can therefore be analysed in terms of determining semantic, syntactic, stylistic and other factors. A good variational model is expected to discriminate well between the variants, which means it should have a low amount of random (‘residual’) variation.
At the same time, there is evidence that at least some alternations can be used in a communicatively efficient way as described above (e.g., Levshina, Reference Levshina2018), such that the less costly form is used when the meaning is more predictable from context, and the more costly form is preferred in situations where the meaning is less predictable. The assumption behind most studies of efficient communication is that the intended meaning should stay the same. But if different forms have different meanings, is this assumption tenable? This important question has not been addressed yet, as far as we know.
Importantly, formal length is not only determined by predictability. A major factor is stylistic variation. If the variants exhibit a length asymmetry, the longer variant is more likely to be preferred in formal communication and careful speech, whereas the shorter one will be more appropriate in informal communication and casual speech (e.g., Labov, Reference Labov1966). Generally speaking, contracted forms are considered more appropriate in informal language, while full forms are regarded as typical of formal texts (Finegan & Biber, Reference Finegan, Biber, Eckert and Rickford2001), so contractions like I’ll, aren’t or they’d are less formal than I will, are not or they would (cf. Daugs, Reference Daugs, Hilpert, Cappelle and Depraetere2021; Nesselhauf, Reference Nesselhauf and Hundt2014; see also Biber et al., Reference Biber, Johansson, Leech, Conrad and Finegan1999, pp. 1128–1132). In Japanese, when asking someone for a favour, one says yoroshiku onegaiitashimasu in very formal situations, yoroshiku onegaishimasu in less formal situations, and simply yoroshiku when speaking to one’s friends (personal knowledge). Similarly, help followed by a bare infinitive is considered to be less formal than the variant with a to-infinitive (e.g., Rohdenburg, Reference Rohdenburg1996, p. 159; see also Biber et al., Reference Biber, Johansson, Leech, Conrad and Finegan1999, pp. 736–737).
As for sociolinguistic variation, reduced forms are more common in the speech of younger people and men (Bell et al., Reference Bell, Jurafsky, Fosler-Lussier, Girand, Gregory and Gildea2003), although women may prefer reduced forms if these forms are more prestigious (Ernestus, Reference Ernestus2014). Some reduced forms may also indicate an orientation towards a local identity (Hollmann & Siewierska, Reference Hollmann and Siewierska2011; Tagliamonte & Roeder, Reference Tagliamonte and Roeder2009). Reduced forms often carry less overt prestige than full forms; for example, the present progressive suffix variant /ıŋ/ (‘walking’) is generally associated with higher status and prestige than the lenited form /ın/ (‘walkin’) (Campbell-Kibler, Reference Campbell-Kibler2007; Trudgill, Reference Tizón-Couto and Lorenz1974, pp. 91–93).Footnote 3 On the other hand, reduced forms can carry covert prestige by indexing group belonging and solidarity. Since these values are often associated with masculinity, and since aberrant behaviour tends to be less accepted in women than in men, women have stronger motivation to avoid non-standard reduced language (cf. Chambers & Trudgill, Reference Chambers and Trudgill1998, pp. 83–85; Romaine, Reference Romaine, Holmes and Meyerhoff2003, pp. 103–105). The age effect can be explained by overall more conservative linguistic behaviour of older speakers, probably, due to the strength of exemplar representations of the previous forms in their memory. Moreover, young men are more prone to using non-conventional forms to mark an identity as people who do not depend on social norms and restrictions (cf. Eckert, Reference Eckert2008).
But this correlation between length on the one side and formality and prestige on the other side is not always observed. Although that-omission has been claimed to be more widely spread in informal speech than in formal language (Huddleston & Pullum, Reference Huddleston and Pullum2002, p. 953), no clear stylistic or social effects on that-omission were detected when numerous other factors were also controlled for (Staum, Reference Staum2005; Tagliamonte, Smith, & Lawrence, Reference Tagliamonte, Smith and Lawrence2005). Moreover, formal length can depend on the genre and text type in very specific ways. For example, in written articles with high lexical density, which can be measured as type-token ratio, journalists may prefer the shorter genitive variant with -‘s to the longer of-genitive for reasons of space economy, trying to cram as much information as possible into a press text (Szmrecsanyi & Hinrichs, Reference Szmrecsanyi, Hinrichs, Nevalainen, Taavitsainen, Pahta and Korhonen2008).
In general, very little is known about the impact of style and sociolinguistic factors, as well as other functional differences, on efficient use of variants. In this paper, we want to make a step towards understanding the paradox of efficient language use under the pressure of the tendency for distinct forms to be non-exchangeable, which is captured by the Principle of No Synonymy and the likes. We hypothesise that efficient use of variants is possible when the functional differences between them are small. The greater and more salient the semantic, pragmatic and stylistic differences, the less likely that predictability will play a role in the choice between the variants. Speaking about sociolinguistic variation, we can recall Labov’s (Reference Labov1972) degrees of indexicality, which includes indicators (non-salient sociolinguistic variants), markers (variants salient inside a social group) and stereotypes (variants salient inside and outside of a social group, which language users are often aware of and which often have a negative value). Using this classification, we may expect the least salient indices to be the most available for efficiency considerations, and the most salient stereotypes the least available (see Hollmann & Siewierska, Reference Hollmann and Siewierska2011, pp. 47–48, for a similar proposal regarding reduction due to high token frequency). Moreover, we can expect that contractions are less salient in informal speech than in formal texts, where they are perceived as inappropriate. This means that contractions are probably more likely to be recruited for efficiency purposes in informal speech, and less likely in formal language.
In this study, we focus on the variation between want to and wanna followed by an infinitive. This alternation is illustrated with an example from an old popular song ‘Girls just want to have fun’.Footnote 4 Interestingly, the song’s official title includes ‘want to’, but the pronunciation in the recording is invariantly ‘wanna’. For the sake of brevity, the alternation will be designated in this paper as WANT. The main research question is whether the variation is explained only by stylistic and sociolinguistic factors, or whether predictability also plays a role. Previous research has suggested that frequency-based measures correlate with the use of the variants (Flach, Reference Flach, Sanchez-Stockhammer, Günther and Schmid2020; Levshina, Reference Levshina2018), while Krug (Reference Krug2000) and Lorenz (Reference Lorenz2013) show that the variation is constrained by numerous stylistic and social factors. However, these accounts have not been tested simultaneously before. We approach this question using data from the spoken component of the British National Corpus (BNC) and Bayesian mixed-effects logistic regression models. Admittedly, the BNC does not provide a full picture of sociolinguistic and stylistic variation, especially when it comes to identity construction, but its relatively large size allows us to compute the probabilistic measures, which are required for testing the predictability effects.
Surprisingly, there has as yet been no study that provides a comprehensive, multivariate usage model of the WANT alternation in British English. Krug (Reference Krug2000) presents a thorough investigation based on the BNC, but with a more exploratory approach to the data. Others have focused on specific aspects and/or American English (see Section 2). A second goal of this study is to fill in this gap. The previous studies show us what to look out for: effects related to speech and articulation, effects of register and speech situation and factors of syntactic co-text.
We also pursue a methodological goal. In many situations, researchers are confronted with missing values in the data. For example, the BNC contains files with full demographic information about the speakers, and some others where this information is not available. In particular, for some instances of WANT in our dataset, we do not have information about the speaker’s gender and age. One faces a dilemma: either to discard the incomplete data points, or to exclude the variables with missing values. Both options are suboptimal. In this paper, we explore a solution, which is known as data imputation, where the algorithm computes the missing values based on the existing ones. We compare two Bayesian regression models. One is based on the smaller dataset with complete observations only. The other one uses the full dataset with imputed gender, age group and speech rate (see more information below).
The remaining part of the article is as follows. In Section 2, we summarise previous accounts of the alternation. Section 3 describes the data and variables, and provides a description of the imputation method and Bayesian regression modelling. Section 4 reports and compares the regression models with and without data imputation. Finally, Section 5 offers a discussion of the results. We used R, version 4.0.2 (R Core Team, 2020) for data analysis.
2. Previous research
The contraction of want to to wanna sits at the crossroads of phonetic reduction, morpho-syntactic restructuring and alternation of modal items. It has been viewed from all these angles, and in various theoretical frameworks. Most notably in generative grammar and trace theory, the syntactic conditions (i.e., ‘rules’) for the occurrence of wanna garnered much attention (cf. Falk, Reference Falk, Butt and King2007; Lakoff, Reference Lakoff1970; Postal & Pullum, Reference Postal and Pullum1982; Pullum, Reference Pullum1997). However, our focus will be on variation in variable contexts, namely constructions of the type want to/wanna VINF, where the implicit subject of VINF is coreferential with that of want or wanna. This perspective has been taken by usage-based studies such as Krug (Reference Krug2000) on British English and Lorenz (Reference Lorenz2013) on American English.
As a contracted item, wanna has its source in articulatory reduction, leading (perhaps gradually) to a realisation /wɒnə/ for the string want to. It can be seen as a case of extreme reduction of a specific sequence (‘special reduction’, Bybee, File-Muriel, & Napoleão de Souza, Reference Bybee and Scheibman2016), due to the high frequency and internal bondedness of want to (cf. Krug, Reference Krug2000, p. 139). Morphosyntactically, the restructuring is from want + to-infinitive to wanna + bare infinitive, likewise following from the fusion of the invariant sequence want to, while the infinitive verb form remains an open slot in the construction (cf. Bybee, Reference Bybee2010, p. 43; Hudson, Reference Hudson2006, p. 609). With a bare infinitive complement (and its lack of inflected forms), wanna is structurally more ‘modal-like’, as also suggested by some of its usage tendencies (e.g., its dispreference after modals, cf. Krug, Reference Krug, Bybee and Hopper2001).
Synchronically, the use of wanna is rather a variant choice than a case of online reduction or contraction (cf. Broadbent & Sifaki, Reference Broadbent and Sifaki2013; Sag & Fodor, Reference Schmid1994), though there is still some gradience of variants in speech, such as [wɒnə], [wɒnɾə], [wɒntə] (cf. Bolinger, Reference Bolinger1981; Ellis, Reference Ellis2002, p. 331; Lorenz, Reference Lorenz2013, pp. 101–102). The choice is that of a modal item expressing volition, which can extend into intention or obligation (cf. Krug, Reference Krug2000, pp. 147–149). Thus, in Krug’s (Reference Krug2000) analysis, want to is an ‘emerging modal’ whose fusion into wanna is part of its grammaticalisation (cf. also Okazaki, Reference Okazaki2002), and starts forming a schema with other modal items of similar form such as gonna, gotta. Lorenz (Reference Lorenz2013) views these contracted forms as undergoing a process of gradual ‘emancipation’ by which they become conceptually independent from the respective full forms. This means that they gradually lose the traits of reduced realisation variants and can take on stable functional and communicative properties that differentiate them from the full forms. In other words, they behave according to the pragmatic principles behind the Principle of No Synonymy.
Empirical findings on the use of wanna attest to its status as emerging and emancipating. In data from the ‘spoken’ section of the BNC, Krug (Reference Krug2000), p. 175) observes a strong frequency increase of wanna relative to want to in apparent time, from below 20% in the age group 60+ to just over 50% in the youngest speakers. In spoken American English (Santa Barbara Corpus), the distribution over age groups stabilises at around 75%–80% for the cohorts aged 49 and younger (Lorenz, Reference Lorenz2013, p. 44). Changes in the factors of variation suggest that wanna is becoming a fully independent item, yet less emancipated than gonna or gotta, as some aspects of ease of articulation persist (e.g., the contraction being favoured in higher speech rates and disfavoured at phrase ends; Lorenz, Reference Lorenz2013, pp. 104–105). Moreover, children overwhelmingly use wanna in variable contexts and even overuse it as a transitive verb (as in, *Who do you wanna play with you), suggesting that children might acquire wanna and want as separate items and subsequently learn their distributional differences (Getz, Reference Getz2019).
Diachronically, it seems that the usage patterns of wanna vs. want to gradually converge with those of gonna and gotta, in particular on the level of socio-pragmatics and register (Lorenz, Reference Lorenz, Sommerer and Smirnova2020). The most prominent and consistent property of the contractions then is to mark informality and colloquialness. In Boas’ (Reference Boas, Achard and Kemmer2004) constructional formalisation, ‘colloquial style’ is what specifies the meaning of wanna in addition to semantic features inherited from want and to.
To summarise, want to and wanna are entrenched as distinct units with different social and stylistic properties. Therefore, they are unlikely to be used interchangeably for efficiency purposes, at least, not by all speakers and not in all contexts.
At the same time, there are indications that some predictability measures do play a role in explaining the variation of WANT. For example, Levshina (Reference Levshina2018) argues that verbs that have high attraction to and reliance on WANT (cf. Schmid, Reference Sag and Fodor2000), have higher chances of being used with wanna. Flach (Reference Flach, Sanchez-Stockhammer, Günther and Schmid2020) has shown, as well, that measures of association with the following item (most clearly predictability of the verb given the construction, and collostructional strength) can to an extent predict the use of contractions like wanna. Similarly, Mair (Reference Mair, Hundt, Mollin and Pfenninger2017) has proposed that token frequency of WANT + V, as well as priming through preceding contractions play a role in the production of wanna. However, these studies did not measure the role of stylistic factors and sociolinguistic variables. The present study fills this gap, combining social and stylistic factors (in particular, age, gender, text type, speech rate and stylistic prosody of individual verbs) with different predictability measures, which are described in the next section.
3. Methodology
3.1. Corpus data and variables
The data for this study come from the spoken component of the BNC. A Python script was used to extract all instances of wanna, which is represented by two tokens, wan and na in the corpus, and all instances of want followed by to, together with diverse contextual information, such as wordforms, lemmas and part-of-speech tags of neighbouring words, which helped us to annotate the data for 15 potential predictor variables. The sentences in which there was no infinitive were disregarded. We also checked manually those sentences where the verb occurs only once after wanna/want to and excluded erroneous hits. For example, in the sentence I wanna packet Walker crisps the word packet is erroneously annotated as an infinitive. Another example is Do you wanna big’un?, where big’un is also analysed as an infinitive in the corpus. Examples like those were excluded. The dataset included 9123 observations, after removing irrelevant and problematic hits.
The variables include structural variables, sociolinguistic variables, variables related to register and text type, and variables reflecting different types of predictability. They are described below, and also summarised in Table 1. The dataset is provided in the online repository (see Data Availability Statement).
3.1.1. Structural variables
The first variable represents the response variable, wanna (n = 2,114) or want to (n = 7,009). It is called expression in the dataset. Second, we coded the infinitive that serves as a non-finite complement of wanna or want to. We included a variable which reflected if there is a negative particle before wanna and want_to. It is called neg_part in the dataset. The values are ‘Yes’ (n = 2,034) and ‘No’ (n = 7,089). Another variable describes if there is a question mark at the end of the sentence. It is labelled as question. Its values are ‘Yes’ (n = 1,784) and ‘No’ (n = 7,339). In addition, we coded the grammatical subject of wanna and want to (variable subject). The values were grouped into several categories:
-
• I (including me, mine in children’s speech; n = 2,958).
-
• You (including yous and ya; n = 3,336).
-
• We (including us; n = 845).
-
• He_she (he and she; n = 309).
-
• They (including them; n = 745).
-
• PRON (other pronouns, e.g., everybody, many, who; n = 243).
-
• Other (common and proper nouns, numerals and other nominalisations; n = 395).
-
• Omitted (absent in the clause; n = 285).
-
• Unclear (when we could not determine the subject due to insufficient context; n = 7).
3.1.2. Sociolinguistic variables
The sociolinguistic variables include different information about the speaker. One of them is called speakerID and represents the ID of the speaker, as provided in the corpus. The variable sex describes the speaker’s sex (‘m’ male, n = 4,381, or ‘f’ female, n = 3,058). The variable ageGroup represents the speaker’s age group with the following values:
-
• ‘Ag0’: 0–14 years (n = 739).
-
• ‘Ag1’: 15–24 years (n = 618).
-
• ‘Ag2’: 25–34 years (n = 1,034).
-
• ‘Ag3’: 35–44 years (n = 1,044).
-
• ‘Ag4’: 45–59 years (n = 1,545).
-
• ‘Ag5’: 60+ years (n = 641).
In many cases, age and sex were unspecified.
3.1.3. Variables related to style, text type and register
We included five variables related to style, text type and register. Two of them were already available in the BNC metainformation. The first one was the text type (variable textType), which had two values: conversations (‘CONVRSN’, n = 4,342) and other spoken text types, for example, lessons, sermons or meetings (‘OTHERSP’, n = 4,781). The other variable was settingID, which stands for the unique ID of a conversation between individual speakers in a specific time, place and during a certain activity.
We also added three other variables based on additional analyses. In particular, we computed speech rate (variable SpeechRate), measured as phones per second (phon/sec) in a recording, as taken from the time-aligned Praat TextGrids for the audio edition of the Spoken BNC, made available by the Oxford University Phonetics Laboratory (Coleman, Reference Coleman2019; Coleman et al., Reference Coleman, Baghai-Ravary, Pybus and Grau2012). Stretches that are not annotated (muted or marked as ‘unclear’ in the transcript) were excluded; short pauses and silences were not counted as phones but their duration was not discounted. We used the R package rPraat (Bořil & Skarnitzl, Reference Bořil, Skarnitzl, Sojka, Horák, Kopeček and Pala2016) to work with the TextGrid files. Note that since we measured the rate for each recording as a whole, this speech rate is across the utterances in a conversation. It is not strictly an articulation rate but provides a measure of a conversation’s general pace, and hence of the time pressure on speech production.Footnote 5 Speech rate is an important factor that boosts phonetic reduction (Ernestus, Reference Ernestus2014; Raymond, Dautricourt, & Hume, Reference Raymond, Dautricourt and Hume2006).
Finally, we also evaluated which text type the individual verbs that occur as infinitives after want to or wanna are associated with. This information helps to capture stylistic prosody of the verbs, and provides a finer-grained and more local operationalisation of register and text type than captured by the other variables at the global level of a speech recording. The corresponding variables are called Dim1 and Dim2. These dimensions are taken from a simple Correspondence Analysis of the associations between all verbs as lemmas and all text types in the BNC (Greenacre, Reference Goldberg2007; Levshina, Reference Levshina2015, Ch. 19). A part of the space is shown in Fig. 1 (due to its large size, we cannot show the entire map). The horizontal dimension can be interpreted as a contrast between formal (left) vs. informal communication (right), while the vertical dimension can be interpreted as a contrast between informative language (bottom) and language for aesthetic purposes and entertainment (top). Examining additional dimensions did not yield any interpretable results. Most spoken text types are located in the bottom right quadrant. The values of the individual verbs are provided in the online repository. Among the verbs with the greatest positive values on the horizontal dimension and negative ones on the vertical dimension are contracted forms gonna, wanna, as well as obscene and slang terms, such as f***, bugger, shit, sod, snog and shag. So, we would expect more instances of wanna with the verbs that have similar scores – that is, which are highly informal, and do not represent aesthetic use.
3.1.4. Variables reflecting predictability
The fourth and final group of variables are three corpus-based measures that reflect different types of predictability information. In particular, we can expect wanna to be preferred if WANT is highly probable given the left context or the right context (the infinitive). In addition, we should test whether the probability of wanna is higher if the infinitive is more probable after WANT.
In accordance with previous research on communicative efficiency and formal reduction, we used informativity measures, where a negative logarithm is taken from a conditional probability. As a result, the measures represent ‘unpredictability’, also known as surprisal. Higher informativity means lower predictability, and the other way round. In efficient language use, high informativity is associated with longer and more effortful forms, whereas low informativity is associated with shorter and less effortful forms.
The first measure shows how unexpected an infinitive is as a complement of WANT. This variable is called Info_Verb_given_WANT. In order to compute this variable, we used the following formula:
where F (Verb, WANT) stands for the frequency of a given infinitive after want to or wanna in the data, whereas F (WANT) stands for the sum frequency of want to + Infinitive and wanna + Infinitive in the spoken part of the BNC.
We also computed how unexpected want to/wanna is given the infinitive, using the following formula:
where F (Verb) stands for the frequency of a given verb in the spoken part of the BNC. The reason for including this variable is the fact that backward conditional probabilities often have an effect on length (see the references in Section 1). This variable is added under the label Info_WANT_given_Verb.
Finally, we coded how predictable want to or wanna is given the previous word. It is called Info_WANT_given_left and computed as follows:
where F (Word_left, WANT) represents the joint frequency of the word followed by WANT, and F (Word_left) represents the frequency of the word in the spoken component of the BNC. Note that contractions like I’ll and would not were treated as words, although they are analysed as two separate tokens in the BNC. In quite a few sentences, the first word was want or wanna, followed by the infinitive, which means that there was no left context in the same sentence. For those cases, we computed the predictability of WANT to be in the beginning of the sentence, dividing the number of sentences beginning with WANT (88) by the total number of sentences in the spoken corpus (1,145,450).
We should mention here that different theories exist to explain predictability and frequency effects in formal reduction. One of them, mentioned in Section 1, assumes that the speaker/signer takes the perspective of the addressee, namely, whether the latter will be able to process the reduced form correctly on the basis of available contextual information and pragmatic principles (Jaeger, Reference Jaeger2013; Levshina, Reference Levshina2018). An important role is played by information theory, which teaches how to reduce code while transmitting the message in a noisy channel (cf. Gibson et al., Reference Gibson, Futrell, Piantadosi, Dautriche, Mahowald, Bergen and Levy2019).
In contrast, Bybee (Reference Bybee2007, Reference Bybee2010) takes a predominantly speaker-centred perspective. According to her, reduction is boosted by the process of chunking of neighbouring units, based purely on surface frequency (Bybee, Reference Bybee2007, Reference Bybee2010). Each instance of use further automates and increases the fluency of the sequence, leading to fusion of the units (Bybee, Reference Bybee2007, p. 324). For instance, Bybee & Scheibman (Reference Bybee, File-Muriel and Napoleão de Souza1999) show that reduction of the vowel and the consonants in do not in spoken English is particularly frequent after the pronoun I and before the verbs know and think. The reason is that this contraction particularly frequently occurs in phrases I do not know and I do not think. Although Bybee (Reference Bybee2010), p. 40) admits that the speaker controls the amount of reduction, according to the listener’s needs,Footnote 6 it is the speaker-internal processes that drive the reduction. Another speaker-centred account has to do with the speaker ‘buying time’ to prepare a continuation with low accessibility (Ferreira & Dell, Reference Ferreira and Dell2000).
These accounts are very difficult to disentangle. We do not know yet if wanna has emerged due to the fact that want to followed by an infinitive was highly frequent or predictable given context (e.g., the presence of an infinitive). With regard to the synchronic variation, if the chunking account is correct, we can expect wanna to be preferred more strongly after the subjects that are used frequently with want to/wanna, comparing the joint probabilities instead of conditional probabilities. This approach would also require testing the joint probabilities of WANT and verbs. However, the ranking of these probabilities is the same as the rankings of Info_Verb_given_WANT, which is based on the frequencies of WANT and verbs. The same holds for the ‘buying time’ account. If it is correct, wanna would be more frequent when the verb is highly probable after want to/wanna. At present, we cannot distinguish between these accounts because they generate the same predictions for the data that we have. We hope that this will be done in the future.
3.2. Data imputation
Unfortunately, the spoken component of the BNC does not contain all values for the sociolinguistic variables. We also did not have access to some recordings in order to compute speech rate. There was a high proportion of missing values. In particular, sex had 18% missing observations, ageGroup 38% and SpeechRate 47%. These are very substantial proportions. The missing values are more frequent in the ‘other spoken’ texts than in the conversations.
To solve this problem, we performed two analyses. The first one is based on the data without missing values. This is a small dataset with only 3,603 observations. The second method is to use all data and impute the missing values. If the missing values were few, one might get by with simply setting them to the median or reference level of the variable; but since some of our variables have many missing observations, we need a more fine-grained method (see Harrell, Reference Harrell2015, pp. 47–57 for further discussion). For this purpose, we used the procedure of multiple imputation implemented in the R package mice (van Buuren & Groothuis-Oudshoorn, Reference van Buuren and Groothuis-Oudshoorn2011). This approach predicts the missing values based on the values in the other variables.
The imputation algorithm is based on so-called chained equations. A series of regression models is fitted, whereby each variable with missing data is modelled conditional upon the other variables in the data. For the binary variable (sex), the regression is logistic. For ageGroup, which is an ordered factor, it is an ordinal proportional odds model. For SpeechRate, it is a Bayesian linear model. The information about the setting IDs, speaker IDs and the infinitive was not considered for prediction because they had very low frequencies (median frequency of each individual level = 2), which made the computations unreliable and prohibitively slow, even on a computer cluster. The imputed speech rates were restricted to the range between 0 and 15 phones per second because this is where the 99% of the observed data lay.
In the next step, the predicted values in one of those variables are set back to missing and the procedure is repeated; this time the imputed values in the other variables are used for prediction. This procedure is repeated several times. In our analysis, we used 50 iterations. The cycles converge, fluctuating randomly around a narrow range of values.
The imputation algorithm returned five imputed datasets (the default option), which were identical in the non-missing values, but differed in the imputed values. These differences are due to uncertainty in the Bayesian probabilistic sampling used by multiple imputation. Next, they were averaged as one dataset with the same number of observations as the initial dataset, but with the imputed values instead of the missing ones. Finally, the dataset with imputed values was used to fit the second regression model.
3.3. Bayesian GLMM
We used Bayesian mixed-effects generalised linear models (GLMM) with the logit transformation (package brms in R, see Bürkner, Reference Bürkner2018). Bayesian modelling represents an attractive alternative to frequentist regression, which is also known as the maximum likelihood method. First, it allows the researcher to test the alternative hypothesis directly, rather than to test if the null hypothesis can be rejected. Second, it does not involve binary decisions based on p-values. Third, the generalised mixed linear models in the maximum likelihood version often have convergence issues, which can be more easily solved in the Bayesian framework. For an explanation of the principles of Bayesian inference, see Kruschke (Reference Kruschke2011), McElreath (Reference McElreath2016) and Nicenboim & Vasishth (Reference Nicenboim and Vasishth2016); for a comparison of frequentist and Bayesian regression in models of language variation, see Levshina (Levshina, Reference Levshina, Schützler and Schlüterin press).
In practical terms, the difference between frequentist and Bayesian regression is that the regression coefficients are estimated as the mean posterior distributions of the parameters given the data, rather than as point estimates based on the maximum likelihood estimation. The distributions are generated with the help of the Monte Carlo Markov Chain (MCMC) sampling. In practice, however, the coefficient estimates of maximum likelihood models and those of Bayesian models are very similar. What differs is the process of inference. The posterior distributions based on MCMC enable us to compute 95% credible intervals for an estimate, where the parameter of interest (e.g., a regression coefficient) falls with the probability of 95%. We can also compute the probability that a certain estimate has a positive or negative effect on the use of wanna vs. want to (cf. Nicenboim & Vasishth, Reference Nicenboim and Vasishth2016; Vasishth et al., Reference Vasishth, Chen, Li and Guo2013).
The technical details about the Bayesian models and their goodness-of-fit measures are provided in the Supplementary Materials. The model selection process was as follows. First, all variables listed above were tested as fixed effects, with the exception of infinitives, subjects of WANT, setting IDs and speaker IDs, which served as random intercepts. We also tested models with fine-grained text categories as random intercepts (which partly overlap with the predictor textType), but found that this variable did not make any substantial difference. This is why we chose the simpler models without this random effect. All pairwise interactions were tested, such that the models with them had WAIC (Watanabe-Akaike Information Criterion) less than the model without any interactions, and the difference was more than one standard error. The model based on the small sample contained an interaction between informativity of WANT given the word on the left (Info_WANT_given_left) and speech rate. This interaction was also found to be relevant in the model based on the large sample with imputed values. In addition, the second model contained another interaction that involved two informativity measures. They reflect the predictability of WANT given the left context and the second verb: Info_WANT_given_left and Info_WANT_given_Verb.
4. Results of Bayesian modelling
The results of our modelling are shown in Table 2. It displays the coefficients of the fixed effects and their 95% credible intervals. The estimates greater than 0 mean that the condition increases the chances of wanna, whereas negative values indicate that the chances of want to increase. The right-hand column shows the posterior probability of the coefficients to be positive, computed as the proportion of positive values in the 6,000 posterior samples. Each cell in the table contains two numbers. The upper number in italics is the estimate from the model based on the small dataset with complete observations; the lower number is related to the model based on the large dataset with imputed data. Overall, the estimates produced by the two models are similar with regard to the direction and strength of the effects, with the exception of the two informativity variables. The difference between the models is due to the additional interaction term in the second model. Also, the 95% credible intervals are narrower in the second model based on the large dataset. This is natural: Bayesian posteriors usually become more specific if more data are available.
Note. The upper values in a cell (in italics) are the coefficients of the first model. The bottom values in a cell (no italics) are the coefficients of the second model based on the imputed data. Positive values mean higher chances of wanna in comparison with want to. The interaction term was absent in the first model (hence NA’s), according to the results of model selection.
The coefficients and probabilities tell us that wanna occurs more often in conversations than in the other texts. The effect is very strong and highly credible, as we can see from the 95% credible intervals, which do not include zero in either of the models. Also, the posterior probabilities that conversations boost the chances of wanna are 100%.
The presence of the negative particle decreases the chances of wanna with sufficient credibility, which means that it increases the chances of want to. Questions somewhat increase them, but this effect is weak, and the posterior probabilities are below 90%.
As for the sociolinguistic variables, there is a clear effect of the speaker’s sex in both models. Male speakers use wanna more often than female speakers. Also, the use of wanna is less likely in the older groups than in the younger groups. Figure 2 shows the probabilities of wanna for different age groups in each model. It is a so-called conditional plot, so the probabilities of wanna are computed for some selected values of the other categorical variables (i.e., conversations, female speakers, no negative particle, not a question), and mean values of the continuous variables. The plot shows that the probability of wanna is higher in the three younger groups than in the three older groups.
The interaction between speech rate and informativity of WANT given the word on the left is visualised in Fig. 3. It is based on the model with complete observations, but the results for the model with imputed values are very similar. By default, the three values of the informativity variable chosen for visualisation are the mean (4.01, green line) plus one standard deviation (6.17, blue line) and minus one standard deviation (1.85, red line). The plot shows that the effect of predictability becomes obvious when the speech rate is higher. In fast speech, lower informativity of WANT (represented with the red line in Fig. 3) means higher chances of wanna, in comparison with middle-level informativity (the green line) and high informativity (the blue line). This means that contexts where WANT is more expected contain wanna more frequently than contexts where WANT is less expected, which can be regarded as efficient use of language. Crucially, this effect only becomes obvious in fast-paced interactions. In slow-paced communication, the chances of wanna are equally low for all types of contexts.
Words with low informativity (and therefore high predictability) of WANT that are particularly frequently followed by wanna include frequent adverbs just and only, and some contractions, such as she’ll, d’ya and gonna. Consider examples (4–6) below. They represent very informal language, so informality and high predictability seem to be intertwined. We will return to this observation in the discussion.
(4) Cos you only wanna lose half as much you are eating twice as much as we said? (KPR, a conversation).
(5) Sinead where d’ya wanna go today? (KPE, a conversation).
(6) They gonna wanna turn out Sunday you know, or? (KSR, a conversation).
There are also differences in the effects of dimensions of Correspondence Analysis. In the first model, the effects of Dimension 1 (informality vs. formality) is in the predicted direction (informality boosts wanna), but the 95% credible interval includes zero, and the probability of this positive effect is less than 90%. The mean posterior of Dimension 2 is around zero. In the second model, the coordinates of verbs on Dimension 1 have a stronger and more convincing positive effect on the chances of wanna. This means that verbs that occur in informal texts are also more frequently used with wanna in speech.
It is also useful to look at the random intercepts of the grammatical subjects of WANT. Figure 4 displays their distribution (in log-odds ratios, which show their adjustments in favour of wanna). In both models, the greatest positive adjustment in favour of wanna is given to the contexts in which the subject is omitted, followed by the contexts with the personal pronouns I and you as the subject. Consider example (7) of a sentence with an omitted subject:
(7) Just wanna wrap this up now erm by bringing in the erm example of Greece. (DCJ, a lecture at a college).
Usually, it is I or you which are implied.
Finally, the second model based on the large dataset with imputed values contains an additional interaction, which is shown in Fig. 5. The effect of informativity of WANT given left context depends on informativity of WANT given the verb. It is negative when WANT is expected given the verb on the right, and slightly positive when WANT is not expected given the verb. That is, if a verb is faithful to this construction, the effect of the left context is strong and in the predicted direction. But if it is ‘promiscuous’ and occurs in many other contexts (e.g., be and say), then the effect of the left context is positive. This can be regarded as an anti-efficiency effect. Yet, the very broad confidence band associated with the promiscuous verbs tells us that we need more data to make a final judgement.
5. Conclusions
The main theoretical question we addressed was how the Principle of No Synonymy, which posits that all distinct forms also have distinct functions, can be reconciled with the bias towards efficient language use, by which the reduced and less costly form is chosen in more predictable contexts than the non-reduced and more costly one, and which assumes a certain degree of exchangeability of the forms. We presented two Bayesian mixed-effects models in order to test which contextual factors help us to predict the use of want to and wanna in the spoken component of the BNC. The factors included stylistic and sociolinguistic variables and different predictability measures, which served to test if the speakers used the variants efficiently. The first model was based on the small dataset without missing values, whereas the second model was based on the complete dataset with imputed missing values.
Before moving to the theoretical discussion, we should summarise our methodological finding. The first conclusion we can draw is that the results of the modelling based on complete observations and on the data with imputed values are very similar. The effect sizes and credibility levels of the predictors are nearly the same. One difference is that the use of the larger sample with imputed values allows us to discover another interaction between informativity of WANT given the word on the left and informativity of WANT given the infinitive. In the smaller sample, it did not play an important role. In addition, the large-sample model showed a clearer effect of the coordinate of a verb on Dimension 1 in Correspondence Analysis, which represents informality. Apparently, the use of the larger sample with a greater number of diverse verbs makes it possible to detect these effects. Finally, the larger model had narrower credibility intervals. This is not surprising: the more data a Bayesian model has (other things being equal), the more certain it is about the estimates.
In both models, most of the effects can be interpreted in terms of informality, which boosts the chances of wanna. In particular, wanna is more often preferred in conversations and with verbs that are commonly used in informal text types. In addition, the random effects suggest that wanna is most frequently used when the subject is a speech act participant (I or you), or is missing altogether, which serves as another indication of informality.
Wanna is also more frequently found in fast speech. This finding can be interpreted in two ways. On the one hand, high speech rate creates pressure for reduction. On the other hand, high speech rate is a property of causal speech, which means that it can capture some additional stylistic variation.
We also find expected patterns of sociolinguistic variation. If the speaker is young, he or she is more likely to use wanna. These results largely match previous findings from American English (Lorenz, Reference Lorenz2013, Reference Lorenz, Sommerer and Smirnova2020), suggesting that the emancipation of wanna from want to is an on-going process. Male speakers also use wanna more than female speakers do. This supports the observation that men use non-standard variants more often than women, who tend to prefer variants with high overt prestige (Labov, Reference Labov1990; Romaine, Reference Romaine, Holmes and Meyerhoff2003). The present data are coarse-grained in this respect, as the BNC is designed to represent British English as a whole. In smaller-scale speech communities the WANT variable might have more specific indexical values, which this study cannot capture.
Moreover, the full variant want to is preferred if there is a negative particle before WANT. Negation can be interpreted as a sign of greater cognitive complexity (cf. Rohdenburg, Reference Rohdenburg1996). Alternatively, negation is less frequent and therefore less expected than affirmation (Diessel, Reference Diessel2019, p. 228). This is why it can be more efficient to use the longer form want to in negative contexts. More research is needed in order to understand the origin of this effect.
We see thus that style and sociolinguistic variables play a central role, in accordance with previous research. Wanna has developed strong stylistic connotations, which now determine its usage. It is also strongly entrenched as a construction of its own with distinct properties, which nearly excludes its use as a pronunciation variant of want to in the same communicative context. Recall the title of the song discussed in Section 1, Girls just want to have fun. The variant want to appears in the official (written) title of the song. In the song itself, Cyndi Lauper pronounces wanna. So, the variants are interchangeable semantically, but not stylistically. They have strong associations with two different modalities.
This kind of divergence is most likely the reason why predictability has only a restricted effect on the use of the variants. Wanna is chosen more often when WANT is more likely to follow after the preceding word, in accordance with the expectations. However, this effect is observed only when the speech rate is high. This means that wanna helps to save articulation effort in fast speech when WANT is more predictable. Moreover, this effect is observed only for verbs that are more likely to be used after WANT, as the other interaction term suggests. Taken together, these conditions may represent a persistence effect of the origin of wanna in phonetic reduction. Reductions that level morphological boundaries often require favouring conditions in communicative and articulatory terms (such as predictability and rapid speech; cf. Lorenz & Tizón-Couto, Reference Lorenz, Tizón-Couto, Hohaus and Schulze2020). It seems that the choice of wanna is still boosted when these factors conspire.
The mechanism explaining the relationships between predictability effects and the Principle of No Synonymy could be as follows. First, reduced forms of a construction arise due to the pressure for efficient communication in contexts where the construction is highly predictable. Next, two developments are possible. If the variants are not perceived as formal alternatives, the Principle of No Synonymy does not come into operation. The variants are then used interchangeably depending on the predictability. But if the variants are perceived as alternatives, the Principle of No Synonymy will pull them apart functionally.
Whether or not two variants are perceived as alternatives seems to depend on how salient the formal differences between them are. In very fast speech, want to and wanna (or any intermediate form) can be difficult to distinguish, which is why reduction can be employed for efficiency reasons. Due to individual differences, some uses will emerge in slower speech, as well. When a reduced form solidifies into a recognisably distinct variant and attains a certain frequency and salience, the variants are perceived as alternatives. Since wanna is a reduced form and used most often in fast speech, which is associated with informality, a natural path for differentiation of the variants is along the distinctions in the indexical field of stylistic and social stereotypes. This development is also strengthened by the secondary modal verb schema, which subsumes gonna, gotta and similar informal reduced forms (Krug, Reference Krug2000; Diessel, Reference Diessel2019: Ch. 4).
More generally, the results support our hypothesis that predictability effects are less likely to be found if the variants have salient functional distinctions. When the variants are less stylistically and semantically contrastive, predictability effects can be easier to detect. For example, that-omission, which is not associated with salient stylistic or semantic differences, strongly depends on predictability of the complement clause (see Section 1). The WANT alternation represents the opposite pole: the variants are highly distinct, and display hardly any predictability effects. It would be interesting to see where other alternations are located on this continuum, and whether there is a trade-off between functional distinctiveness and the role of predictability.
Supplementary Materials
To view supplementary materials for this article, please visit <https://doi.org/10.1017/langcog.2022.7>.
Acknowledgments
The first co-author’s research presented in this paper was funded by the Netherlands Organisation for Scientific Research (NWO) under Gravitation grant ‘Language in Interaction’, Grant No. 024.001.006. We thank the reviewers for their detailed and constructive feedback, which has helped us improve the paper substantially. All remaining errors are ours.
Sources
The British National Corpus, version 3 (BNC XML Edition). 2007. Distributed by Bodleian Libraries, University of Oxford, on behalf of the BNC Consortium. URL: http://www.natcorp.ox.ac.uk/.
Data availability statement
The datasets and R code are available in the following OSF online repository: https://osf.io/rd3jf/?view_only=1ff821da50d7420289b64fc275680797.