6.1 Introduction
6.1.1 General Introduction
There is a rich history of research on the English genitive alternation (Rosenbach Reference Rosenbach2014; also Chapter 9), the choice between the s-genitive and the of-genitive. The two options differ mainly in two respects: the ordering and linking of two constituents. In the s-genitive as in (1a), the possessor (Sri Lanka) precedes the possessum (self-interest), whereas it follows the possessum (image) in the of-genitive as in (1b). To indicate the genitive relation between the constituents, the clitic ’s (or simply ’ with plurals) is added to the possessor phrase in s-genitives, whereas in of-genitives, constituents are connected by the preposition of. Additionally, the definite article is not explicitly stated in s-genitives (since ’s has a deterministic function), so if we wanted to transform (1a) into an of-genitive, we would have to add it (i.e. the self-interest of Sri Lanka).
a. So let us now serve Sri Lanka[possessor]’s self-interest[possessum] by making use of the potential that this whole region has for massive economic development in the years to come <ICE-SL:S1B-051#109:1:A>
b. You had been manufacturing films to tarnish the image[possessum] of Sri Lanka[possessor] or photographs with the aim of promoting communism, terrorism or creating communal tension … <ICE-SL:S1B-054#102:1:A>
The present investigation uses a variationist approach (e.g. Szmrecsanyi Reference Szmrecsanyi2017) to grammatical choice-making and, in doing so, focuses on genitives that constitute ‘alternate ways of saying “the same” thing’ (Labov Reference Labov1972: 188; for a discussion of sameness in the genitive alternation, see Rosenbach Reference Rosenbach2002). This entails excluding certain forms, which cannot be expressed in the respective other form (Rosenbach Reference Rosenbach2002); these forms include appositive genitives (e.g. the state of California), descriptive genitives (e.g. person of colour), double genitives (e.g. the friend of John’s), noun–noun genitives (e.g. satellite photographs; but see Szmrecsanyi et al. Reference Szmrecsanyi, Biber, Egbert and Franco2016), partitive genitives (e.g. one of my friends) and idiomatic genitives (e.g. Valentine’s Day). In focusing on the so-called choice context (i.e. on cases that can be expressed in both genitive variants), we seek to produce results that complement recent multifactorial studies of genitive choice (e.g. Grafmiller Reference Grafmiller2014; Heller et al. Reference Heller, Bernaisch and Gries2017a).
Linguists have long been aware that the choice between the s-genitive and the of-genitive is subject to multiple constraints, the most important of which is possessor animacy. In most studies, this refers to the binary distinction between animate (2a) and inanimate (2b) possessors, but more fine-grained additions to this distinction have been found to be important in explaining genitive choice (e.g. Wolk et al. Reference Wolk, Bresnan, Rosenbach and Szmrecsanyi2013): collective (2c), locative (2d) and temporal (2e) possessors. In essence, the higher a possessor is on the animacy scale, the more likely it is to be used in an s-genitive.
a. What form of government will then replace Saddam Hussein[possessor]’s dictatorship[possessum]? <ICE-GB:W2E-001#57:3>
b. The boiling point[possessum] of coconut oil[possessor] is more conducive for frying <ICE-SL:S1A-006#154:1:A>
c. Had the timely arrival[possessum] of US and British forces[possessor] not prevented this manoeuvre, some 5 million barrels a day of oil production might have been put at risk <ICE-GB:W2E-001 #45:3>
d. Mahatma Gandhi called India Sri Lanka[possessor]’s nearest neighbour[possessum] <ICE-SL:S1B-051#112:1:A>
e. Mani, now today[possessor]’s question[possessum] again; let me repeat that …<ICE-SL:S1A-094#14:1:A>
Almost equally important is the length of the constituents (for a comparison of length and animacy, see Rosenbach Reference Rosenbach2005). If the possessor is particularly long (in relation to the possessum, that is), it is more likely to be used in an of-genitive, in which it follows the possessum (3a). On the other hand, if the possessum is particularly long, it is more likely to be used in an s-genitive, in which it, again, is placed in the final position (3b). In other words, genitive choice corresponds to the principle of end-weight (Behaghel Reference Behaghel1909), according to which longer constituents are often placed last. Both the effect of possessor animacy and the effect of constituent lengths reflect a more general principle of linguistic choice-making: Easy First (MacDonald Reference MacDonald2013), which states that constituents that are more easily retrievable from memory (e.g. animate [see Bock Reference Bock1982: 15 on egocentric bias] and/or short ones) tend to be placed first.
a. Now Mr deputy speaker with the conclusion[possessum] of the conflict situation[possessor] <ICE-SL:S1B-056#26:1:A>
b. And also another interesting thing is Ernest Hemingway[possessor]’s last posthumously published novel[possessum] <ICE-SL:S2A-025#72:1:A>
Beyond possessor animacy and constituent length, the present study considers several additional constraints: three language-internal (i.e. sibilancy, definiteness and semantic relation) and three language-external ones (i.e. modality, variety and gender). Sibilancy refers to the presence of a final sibilant at the end of the possessor phrase. If a sibilant ([s], [z], etc.) is present there, language users tend to avoid the s-genitive because the combination of the sibilant and the clitic ’s creates a repetitive sound sequence (4). Definiteness also refers to the possessor and simply distinguishes definite and indefinite ones; if the possessor is definite (5a), s-genitive usage is more probable than with indefinite possessors (5b). Semantic relation is here operationalised as a binary distinction between prototypical (including part–whole, kinship and legal) relations (e.g. [6a]) and non-prototypical ones (6b). With prototypical semantic relations, the s-genitive is usually more frequent than with non-prototypical ones. The effects of these three language-internal constraints have been found and replicated in many studies (an overview of which can be found in the appendix to Rosenbach Reference Rosenbach2014).
(4) In Peter Lucius[possessor]’s terms[possessum] … <ICE-SL:S2B-049#29:1:A>
a. we met auntie Ashani[possessor]’s daughter[possessum] … <ICE-SL:S1A-011#38:1:B>
b. six hundred years ago, the delicious flavour[possessum] of mushrooms[possessor] intrigued the Pharaohs of Egypt <ICE-SL:S1B-021#138:1:D>
a. the slave owners[possessor]’ children[possessum] … <ICE-SL:S2A-024#115:1:A>
b. you know the underlying principles[possessum] of all the grammar issues[possessor] <ICE-SL:S1A-015#35:1:A>
The language-external predictors that are included are modality, variety and gender. Modality (spoken vs. written) is included as a control because it has been found to have an effect, especially in interaction with language-internal predictors (e.g. Grafmiller Reference Grafmiller2014). Szmrecsanyi and Hinrichs (Reference Szmrecsanyi, Hinrichs, Nevalainen, Taavitsainen, Pahta and Korhonen2008) report more s-genitive usage in spoken texts, which they attribute to a difference in average formality (with spoken language usually being less formal). Low formality, in turn, has long been recognised as a factor that increases the use of the s-genitive (Altenberg Reference Altenberg1982). Genitive choice across different varieties has only recently been studied in detail (Heller et al. Reference Heller, Bernaisch and Gries2017a; Szmrecsanyi et al. Reference Szmrecsanyi2017; Heller Reference Heller2018), but results show significant differences, especially for variety as a moderator of possessor animacy. In essence, it was found that possessor animacy triggers the s-genitive more strongly in Inner-Circle varieties (e.g. Britain, Canada; Kachru Reference Kachru, Quirk and Widdowson1985) than in Outer-Circle varieties (e.g. Heller et al. Reference Heller, Szmrecsanyi and Grafmiller2017b). Gender has – to our knowledge – so far not been studied as predictor of genitive choice. A study of the (arguably comparable) dative alternation that included gender found non-significant results (Kendall et al. Reference Kendall, Bresnan and van Herk2011).
The present study seeks to add to the research of syntactic alternations across varieties by investigating genitive choice in Sri Lankan English (SriLE). SriLE is a postcolonial variety with a clear variety-specific structural profile that spans several linguistic levels (e.g. Meyler Reference Meyler2007; Künstler et al. Reference Künstler, Mendis and Mukherjee2009; Bernaisch Reference Bernaisch2012, Reference Bernaisch2015). Within Schneider’s Dynamic Model, SriLE has thus passed the nativisation phase and is arguably on its way towards endonormative stabilisation (see Mukherjee Reference Mukherjee and Stierstorfer2008: 361). In order to characterise genitive choice in SriLE, we compare it to its historical input-variety British English (BrE).
Thus far, there has been little research on gender differences in SriLE. In his dictionary of SriLE, Meyler (Reference Meyler2007: 53) presents words that women use more often, such as child! as ‘a colloquial term of address’. Bernaisch (Reference Bernaisch2012) investigates Sri Lankans’ attitudes towards American, British, Indian and SriLE without finding a statistically significant difference between female and male participants. Revis and Bernaisch (in press) report a significant effect of gender on the choice of filled and unfilled pauses in a conditional inference tree. However, in their general linear mixed-effects model, gender did not reach significance. Gunesekera (Reference Gunesekera2005), in her analysis of the postcolonial identity of SriLE, found the phenomenon of topicalisation to be more common in female than in male speech, while sports metaphors and swear words were – at least in public – limited to male speech (see Gunesekera Reference Gunesekera2005: 137).
With respect to genitive choice, SriLE has not yet received much scholarly attention (pace Heller et al. Reference Heller, Bernaisch and Gries2017a). Given that previous studies of genitive choice across varieties revealed that language-external factors might influence genitive choice only as moderator of language-internal constraints (e.g. variety, which was found to moderate the strength of possessor animacy) and given the rich history and importance of the study of gender-based differences in language use (see the introduction to this volume), we reckon that the systematic study of gender in genitive choice constitutes a research gap. The present study, therefore, seeks to complement the current body of research on the genitive alternation by 1) including gender as language-external variable and 2) by investigating gender across varieties.
6.1.2 Overview of the Present Chapter
In the following section on methods (Section 6.2), we will first outline the data extraction process and show some descriptive statistics of the above-mentioned predictors (Section 6.2.1). In Section 6.2.2, we will present the details of the statistical analysis, whose results are then described and visualised in Section 6.3. In the final section, Section 6.4, we will provide a discussion and concluding remarks.
6.2 Methods
6.2.1 Data
Data were extracted from a 10% register-stratified sample of the British component of the International Corpus of English (ICE; Greenbaum Reference Greenbaum and Greenbaum1996) and a 25% sample of the Sri Lankan component (ICE-SL) because at the time of our retrieval only 25% of the spoken component was available. The extraction of interchangeable genitives was accomplished in several (partly automatised) steps: first, all text units containing genitive markers (i.e. of, ’s and –s’) were automatically extracted. Then, an automatic classification determined interchangeability. Annotation of predictors was done automatically where possible, based on computational work from Heller (Reference Heller2018). In every step, manual corrections were made where necessary. Speaker gender information for each case was taken from the respective metadata in the case of ICE-SL and from metadata made available by Martin Schweinberger in the case of ICE-Great Britain (ICE-GB).Footnote 1
Altogether, we ended up with 4,045 cases that are distributed across variety and gender as shown in Table 6.1, with an overall preference for of- over s-genitives to a degree that raised initial concerns about the class imbalance problem (the problem that if the [two] levels of the dependent variables are very skewed in favour of one option already, it can become problematic to get good results out of a regression).
Table 6.1 Composition of the data with respect to variety and gender
variety | gender | genitive: of | genitive: s | Sum |
---|---|---|---|---|
Great Britain | Male | 305 | 104 | 409 |
Female | 48 | 19 | 67 | |
Unknown | 434 | 153 | 587 | |
Sri Lanka | Male | 1,185 | 298 | 1,483 |
Female | 580 | 310 | 890 | |
Unknown | 500 | 109 | 609 | |
Sum (baseline %) | 3,052 (75.5%) | 993 (24.5%) | 4,045 |
In addition to the variables shown in Table 6.1, the data were then annotated with regard to several other predictors discussed in the introduction; the following is an overview of these predictors and their levels; the patterns of how genitives are distributed across the levels of previously studied predictors are in line with previous research.
modality: spoken vs. written: our sample contains 1,878 genitives from spoken texts, 27.10% of which are s-genitives. Of the 2,167 genitives from written texts, only 22.34% are s-genitives. Data sparsity permits neither a more fine-grained division of the two modalities into different registers nor an analysis of how modality interacts with other predictors; however, we are not aware of studies of alternation phenomena in which modality interacted with other linguistic predictors in a way that led to a reversal of hypothesised effects anyway.
animacy (of the possessor): animate vs. collective vs. locative vs. temporal vs. inanimate. In our sample, the distribution of possessor animacy is as follows: animate – 1,028, collective – 695, locative – 280, temporal – 184, inanimate – 1,858. The proportion of s-genitives is 59.44%, 32.81%, 19.64%, 33.15% and 2.05%, respectively.
sibilancy (of the final phoneme of the possessor): absent vs. present. In 3,053 cases, there is no final sibilant, but in 992 cases, a final sibilant is present; s-genitives are used in 28.15% and 13.53% of the cases, respectively.
definiteness (of the possessor): definite vs. indefinite. 2,621 possessors are definite, while 1,424 are not. With definite possessors, the s-genitive rate is 31.63%; with indefinite ones, it is 11.52%.
lengthDiff: the difference of log2 possessor length minus log2 possessum length, thus an approximately normally distributed numeric predictor ranging from –4.52 to 4.71 (length is measured in words). In 224 genitives, the possessor and the possessum are equally long; the s-genitive is used in 31.70% of these cases. When the possessor is longer (i.e. lengthDiff>0), the s-genitive is used in only 14.98% of the cases. If it is shorter (i.e. lengthDiff<0), the s-genitive is used in 38.32% of the cases.
semRelation: prototypical vs. non-prototypical. In our data, we find 242 prototypical and 3,803 non-prototypical relations. When prototypical, the s-genitive is used in 52.90% of the cases and when not, in only 22.75%.
6.2.2 Statistical Evaluation
While this kind of alternation question is one that would prototypically be explored with a generalised linear (mixed-effects) model, we did not proceed along that route. This is due to both the skewed distribution (towards the of-genitive) already briefly mentioned in Table 6.1 and the additional fact that the potential random-effects structure looked as if it would become highly problematic: there were a fairly high number of speakers who contributed only few data points to the sample (30% of the data points were by speakers contributing only ten or fewer data points), lowering the chance of proper convergence of our regression models and/or the random effects being particularly relevant. We therefore decided to use an approach based on random forests, an extension of classification and regression trees, here specifically the kind referred to as conditional inference trees (Hothorn et al. Reference Hothorn, Hornik and Zeileis2006) and implemented in R as party::cforest (see also Chapter 7). Random forests add additional layers of randomness to such a tree-based analysis: first, many different conditional inference trees are constructed on different bootstrapped samples of the data. Second, each split in a conditional inference tree is only permitted to choose from a randomly chosen subset of the available predictors rather than all of them. The predictions of the random forest consist of amalgamating the multitude of trees that were generated and their votes for the out-of-bag cases. Typically, the user has to specify only two hyperparameters (i.e. parameters that are defined before a statistical analysis begins and affect how it is conducted): the number of (randomly chosen) predictors that may be considered at each split of each tree (we left that at the default value of 5) and the number of trees grown (we set that to 2,000).
In order to interpret the results of the random-forest analysis, several strategies are available. One that has been in use especially since Tagliamonte and Baayen (Reference Tagliamonte and Baayen2012) is to i) perform a random-forest analysis on the data, ii) report variable importance scores from the random forest to assess each predictor’s importance to the alternation and iii) use a single classification/conditional inference tree on the complete data to visualise the predictors’ effects. In this study, we are not following this approach. This is for two main reasons that previous research has ignored. First, the practice of interpreting a random forest – i.e. a set of often 500 or even many more trees on randomly resampled data with different sampled predictors at every split – on the basis of a single tree on all the data with no resampling is highly problematic and can lead to misinterpretation of the patterns in the data. Second, the way in which random forests are often interpreted – variable importance scores and partial dependency scores – can fail dramatically at representing the nature of the effects in the data faithfully in terms of both over- and underestimated variable importance scores and how predictors interact with each other. Space does not permit a more detailed discussion here; suffice it to say that trees and random forests, which are supposed to be very good at detecting and visualising interactions, are not necessarily as good as they are widely believed to be (see Gries [in press] for more discussion and exemplification and Deshors and Gries [in press] for another English-varieties application).
In order to address all these issues we follow Gries’ (in press) recommendations: the first step of our statistical analysis consisted of manually creating a number of new predictors that represent what in a regression model would be interaction predictors, i.e. new variables that embody all combinations of the predictors they consist of:
all two-way interactions of all predictors with gender and variety: gender:variety, gender:modality, variety:modality, gender:animacy, variety:animacy, gender:sibilancy, variety:sibilancy, gender:definiteness, variety:definiteness, gender:lengthDiff, variety:lengthDiff, gender:semRelation and variety:semRelation;
all three-way interactions involving gender and variety: variety:gender:modality, variety:gender:animacy, variety:gender:sibilancy, variety:gender:definiteness, variety:gender:lengthDiff and variety:gender:semRelation.
These were then added as predictors to a forest of all 2,000 conditional inference trees.
We then evaluated the forest in two ways: first, we computed the forest’s overall prediction accuracy, its precision and recall and its C-score to determine how well the forest identified structure in our data; second, we computed regular variable importance scores but also an alternative one proposed in Janitza et al. (Reference Janitza, Strobl and Boulesteix2013), which is not based on error rates from categorical predictions, which loses important probabilistic information, but in fact on the area under the curve (AUC), which does not just rely on categorical predictions but also uses the probabilistic strength of the predictions.
As for evaluating the directions of effects, multiple options are theoretically available, and it does not seem as if there is much of a discussion let alone a consensus yet as to what works best. One could explore effects on the basis of
the observed percentages of of- and s-genitives for each level of each predictor (main effects or interaction predictors alike);
the averages of the predicted percentages of s-genitives for every attested combination of each level of each predictor;
the weighted (by frequency of occurrence) averages of the predicted percentages of s-genitives for every theoretically possible combination of each level of each predictor.
It does not seem that much of the corpus-linguistic literature on random forests topicalises this issue much but, after some consideration, we ultimately decided to go with the last option: while the first approach would be appealing for its simplicity, it has the huge disadvantage that it shows the differences between levels of a predictor but too simplistically, because this would involve levels of a predictor without controlling for all other effects or holding all others constant; thus, one would never know to what degree the effect observed for one predictor is also (in part) due to others, which also often leads to exaggerated and anticonservative results. (This, in fact, is the reason why multifactorial regression models should not be summarised with observed means.)
The second approach would be better in that it would be based on predicted, not observed, probabilities and is the logic behind so-called partial dependence statistics/plots (Friedman Reference Friedman2001). However, it seems as if these averages are still suboptimal in how they would not weight predicted probabilities by the frequencies of predictor levels in the data (see Molnar Reference Molnar2018: Section 5.1), thereby – in unbalanced observational data like the present – this might result in upgrading the impact of infrequent combinations and downgrading the impact of frequent ones.
The third approach, while computationally more complex than both the previous ones, seems theoretically most sound and is in fact the logic that underlies Fox’s (Reference Fox2003: 1) effect plots for regression models where ‘values of other predictors [i.e. all those not currently being computed/visualised] are fixed at typical values: for example, a covariate could be fixed at its mean or median [we do not have any here since, for ease of representation we will factorise lengthDiff], a factor at its proportional distribution in the data’; not only is this much more effective than simple observed results, but this approach also leads to easier-to-interpret results than regression tables and visualises intercepts, main effects and all interactions nicely, which is why we will adopt those plots here. Applied to the genitive alternation, this means that we will – for each combination of predictors of interest (such as variety:gender) – inspect the effects of these combinations on genitives with otherwise typical values (in the sense of typical distribution) of remaining covariates, such as animacy, sibilancy, etc.
6.3 Results
The random forest of conditional inference trees resulting from the above analysis performed well on the data: the OOB prediction accuracy obtained is 84.5%, which is significantly better than a baseline percentage of the more frequent of-genitive (75.5%) and the baseline percentage one would arrive at from random proportional guessing (63%); both p < 10-44. Precision and recall for s-genitives are not particularly high (71% and 62.1%, respectively), but this is in part due to the class imbalance: precision and recall for the of-genitive are much better (88.2% and 91.7%, respectively); with a value of 0.909, the C-score for the random forest exceeds the standard threshold value of 0.8.
In terms of variable importance, the top 10 AUC-based variable importance values are shown in Table 6.2.
Table 6.2 AUC-based variable importance scores from the conditional inference forest with explicitly coded interaction variables
Before we look at some of these predictors’ effects, it is instructive to compare this set of variable importance values to those that result from a conditional random forest fitted without interaction predictors, as shown in Table 6.3.
Table 6.3 AUC-based variable importance scores from the conditional inference forest without explicitly coded interaction variables
Predictor | Var. imp. | Predictor | Var. imp. |
---|---|---|---|
animacy | 0.2598940 | lengthDiff | 0.0797186 |
sibilancy | 0.0094320 | gender | 0.0087491 |
definiteness | 0.0073567 | variety | 0.0062786 |
modality | 0.0043213 | semRelation | 0.0038330 |
The way in which this is instructive is that the forest without interaction variables does not really encourage the analyst to explore variable combinations/interactions that the forest with interaction variables clearly ranks really highly. For instance and to use the language of regression analysis, while both rankings put animacy first, suggesting to researchers to explore this as a main effect with, for instance, a visual representation of the type in Figure 6.1, the ranking of the forest with interaction variables immediately serves to caution against this, given that animacy appears in interactions with other predictors.

Figure 6.1 Percentages of genitive: s for the levels of animacy (vertical dashed line: overall frequency of s-genitives, point sizes are proportional to level frequencies)
This is relevant here because of how the strongest main effect – animacy – but also the main variables of interest in this analysis – variety and gender – are all involved in interactions with a high degree of importance. In fact, the interaction predictors either score more highly than the main effects of which they are made up (see e.g. variety and gender’s main effects are not among the top ten predictors but they feature in interaction predictors that are) or immediately follow a main effect predictor (see e.g. lengthDiff). Since it is problematic to analyse a random forest with a single tree fitted on all the data (which, if such a tree was not unproblematic, could reveal interactions in the sequence of splits), exploring interaction variables is, therefore, a possible alternative (we will discuss other alternatives below).
In what follows, we will discuss the following effects: variety:gender:animacy (Section 6.3.1), variety:gender:lengthDiff (Section 6.3.2), variety:gender:definiteness (Section 6.3.3) and variety:gender:sibilancy (Section 6.3.4); the reason we are focusing on these is that these are the predictors with variable importance scores among the top ten and the ones with the highest order of interactions for every predictor; for instance, animacy has the highest value, but it participates in an interaction in the second most important predictor variety:animacy, but then these two predictors as well as the third most important one, gender:animacy, all are involved in the three-way interaction variety:gender:animacy, which is therefore the first effect to be discussed.
6.3.1 The Effect of Variety:Gender:Animacy
The first interaction is shown in Figure 6.2. The y-axis shows the predicted percentage of s-genitives for the combinations of five levels of animacy and two levels of gender (abbreviated versions of male and female, unknown is not shown) shown across the x-axis. The two varieties are shown as filled circles: BrE in light grey and SriLE in dark grey.

Figure 6.2 The effect of variety:gender:animacy on genitive shows that gender differences are most pronounced for genitive choice with animate possessors
One immediately obvious result is that, with inanimate possessors, there is essentially no difference between varieties and/or genders: s-genitives are just very strongly dispreferred (ever so slightly more in BrE). Another fairly strong result is that the interaction is most pronounced for animate possessors: with animate possessors, male speakers of both varieties use the s-genitive half the time, but the female SriLE speakers use s-genitives much more than the female BrE speakers, who in turn use it less than male speakers; in other words, the female SriLE speakers are exhibiting the canonical/expected patterning more than the female BrE speakers. Finally, we find that collectives behave differently from the other animacy levels: i) s-genitives are rarer in SriLE than in BrE and ii) compared to the other combinations, there is a very high percentage of s-genitive use among female BrE speakers. While this pattern should not be overinterpreted, given the small number of data points for exactly that combination, female BrE speakers seem more advanced in adopting the expansion of the s-genitive to possessors that are lower on the animacy scale (see Wolk et al. Reference Wolk, Bresnan, Rosenbach and Szmrecsanyi2013 for a diachronic account).
Another interesting finding is that, in most combinations of predictors, SriLE speakers use s-genitives more than BrE speakers: the dark grey dots are nearly always higher up than the light grey dots. Also, usually, the differences between the varieties are greater with the female speakers (with a slight exception of inanimate possessors, but with these there is nearly a floor effect anyway). With locative and temporal possessors, the results are not that marked: SriLE speakers produce similarly more s-genitives than BrE speakers and this is much more noticeable with the female speakers.
6.3.2 The Effect of Variety:Gender:LengthDiff
The second relevant interaction is represented in Figure 6.3.

Figure 6.3 The effect of variety:gender:lengthDiff on genitive (lengthDiff is factorised into four levels for expository purposes) is subject to stronger gender differences in SriLE (dark grey) than in BrE (light grey)
The most obvious result here is the very strong expected main effect of lengthDiff: as the possessor becomes longer relative to the possessum, the s-genitive becomes less and less preferred across all combinations of variety and gender. The two outer quarters with the rarer situations of big discrepancies between possessor and possessum lengths show less in terms of differences between genders and varieties, but the middle two quarters of the plot, where most of the cases are located, are more interesting: they show that the differences are small between varieties but more noteworthy between men and women because women simply use many more s-genitives than men in both varieties.
6.3.3 The Effect of Variety:Gender:Definiteness
The third relevant interaction is represented in Figure 6.4. We can see a main effect such that definite possessors (left panel) lead to higher numbers of s-genitives than indefinite possessors (right panel), as might be expected. Also, there is another main effect such that, with one exception, women use more s-genitives than men. However, these main effects are qualified by crossover effects. First, the one exception of the main effect just mentioned: female BrE speakers use s-genitives less than male BrE speakers, but only with indefinite possessors. Second, the differences between men and women are more pronounced in SriLE than in BrE and that is especially true for definite possessors, where female SriLE speakers exhibit a much higher proportion of s-genitives than any other combination, which indicates that Sri Lankan women seem to react more to the grammatical cue of definiteness than BrE women: for them, the difference from definite to indefinite is bigger – male speakers show less of an impact there.

Figure 6.4 The effect of variety:gender:definiteness on genitive shows that SriLE speakers and females are more sensitive to the definiteness constraint
6.3.4 The Effect of Variety:Gender:Sibilancy
The final interaction to be discussed is shown in Figure 6.5. There is the overall expected main effect of sibilancy, according to which sibilancy: present (right panel) should reduce the presence of the s-genitive and if there is a sibilant, then both genders use s-genitives correspondingly rarely. However, female speakers react more to sibilancy than men (and especially so in SriLE) and SriLE speakers react more to sibilancy than BrE speakers (and especially so the female speakers).

Figure 6.5 The effect of variety:gender:sibilancy on genitive shows particular sensitivity of SriLE speakers and females
6.4 Discussion and Concluding Remarks
6.4.1 Implications for the Genitive Alternation
On the whole and on a general linguistic level, our results are largely compatible with previous studies: we find a strong effect of (possessor) animacy that is in line with previous findings; the same is true of the strong effect of lengthDiff. While weaker, the effects of definiteness and sibilancy do not contradict prior research either. Reassuringly, we find that a factor such as lengthDiff seems to be interacting less with gender, which makes sense since one would expect male and female speakers to have a very similar cognitive architecture and, thus, react similarly to the higher degree of processing pressure that arises from high length differences. However, there are also more noteworthy differences between the genders.
The results displayed in Figures 6.2–6.5 show that the language-external factors gender and variety moderate the effects of language-internal constraints on genitive choice. As stated in the introduction, this conforms to our expectations derived from recent research, which uncovered how variety modulates the effect of animacy (e.g. Heller et al. Reference Heller, Szmrecsanyi and Grafmiller2017b). However, our results also go against and beyond the expected in that i) the effect strength of the animacy constraint across the varieties differs from expectations and ii) the effect of gender on genitive choice had not yet been investigated in a multifactorial design.
Regarding i), we find that the effect of animacy – the most important language-internal predictor of genitive choice – more precisely, the difference between animate and inanimate possessors, is stronger in SriLE than in BrE. This is unexpected because recent research (Heller et al. Reference Heller, Szmrecsanyi and Grafmiller2017b; Heller Reference Heller2018) found the effect of animacy to be weaker in Outer-Circle varieties (i.e. Hong Kong, Indian, Jamaican, Philippine and Singapore English) than in Inner-Circle varieties (i.e. British, Canadian, Irish and New Zealand English). Since SriLE arguably qualifies as an Outer-Circle variety (e.g. Schneider Reference Schneider2011), it was expected that the effect of animacy would be weaker in SriLE. Counter-intuitively, however, Figure 6.2 shows bigger differences between the animate condition and the inanimate condition in SriLE.
Regarding ii), we present first findings on how gender enters the equation: gender moderates the effects of length difference and variety and also further qualifies the interactions between variety and animacy (see the previous paragraph) as well as between variety and definiteness. First, there appears to be a slight gender difference in the effect of length in that males show up as more sensitive to the condition in which the possessum is much longer than the possessor (see the leftmost panel in Figure 6.3). In this condition, males use the s-genitive more frequently. In all other conditions, females seem to be more drawn to the s-genitive (see other panels in Figure 6.3). Closer inspection of these differences reveals that, on aggregate, males respond to length difference in a more categorical fashion: while females respond to length difference fairly linearly (i.e. the stronger the cue, the stronger their reaction), males mostly default to using the of-genitive, but as soon as possessum length outgrows possessor length by a certain threshold, they prefer the s-genitive and, within this extreme range, do so even more than females. gender also interacts with variety since most of the time we see that light grey m/f dots are closer to each other than the dark grey m/f dots (Figures 6.2–6.5). Depending on which predictor we interpret as focal, we can describe the observed preference for the s-genitive in two ways: either 1) in SriLE, we find a stronger effect of this preference in females, which presumes a stronger gender difference in SriLE, or 2) in females, we find a stronger effect of this preference in SriLE, which presumes that cross-varietal differences emerge more strongly in female language use. In other words, Sri Lankan females use the s-genitive more than Sri Lankan males and gender differences are more pronounced in SriLE. Finally, gender mediates the variety-animacy and variety-definiteness interactions: females appear to be more sensitive to both. Once gender is taken into account, we see that the stronger effect of animacy in SriLE goes back to female language users only; Sri Lankan males, on the other hand, behave fairly similarly to BrE speakers. A similar effect emerges in the possessor definiteness constraint. SriLE-speaking females respond to definiteness more strongly than SriLE-speaking males or speakers of BrE. Thus, Sri Lankan females show higher sensitivity to both animacy and definiteness constraints.
How can we make sense of these patterns? Although there are obvious limitations to our study (e.g. partly exploratory design and limited sample size), it seems reasonable to propose a contact-linguistic explanation of our findings. Since our study covers uncharted territory by focusing on the role of gender in the probabilistic grammar of English genitive choice, we cannot inform any World Englishes models (and vice versa) because these models do not make predictions along the lines of gender (e.g. models by Kachru, McArthur, Schneider, or more recently by Mair or Buschfeld and Kautzsch). Therefore, and based on previous studies such as Brunner (Reference Brunner2014), we turn to a more specific contact-linguistic explanation of the gender difference in the strength of the possessor animacy constraint.
The stronger inclinations of (female) SriLE speakers to use the s-genitive might be caused by a transfer of structures in Sinhala, Sri Lanka’s most prevalent native language. This transfer might work in two ways – directly and/or in a more abstract way. In Sinhala, the possessor always precedes the possessum (see Chandralal Reference Chandralal2010: 10), which corresponds to the s-genitive. This constituent ordering might carry over directly from Sinhala to English, equivalent to the transfer found by Brunner (Reference Brunner2014), who observed that noun phrase modification patterns in Singapore English and Kenyan English correlate with preferences in the countries’ respective native languages. There could also be a more abstract transfer of cue strength. Rosenbach (Reference Rosenbach2017) showed that this is indeed the case with genitive choice in the L2 English of Afrikaans speakers. However, the relation between Sinhala and SriLE is different because there is no genitive alternation in Sinhala – the transfer of the animacy constraint could thus only be plausible on a more abstract level. We propose that it might be the high salience of the constraint in Sinhala that carries over to English. Sinhala is special in that it has a different inflectional morphology for animate/inanimate and definite/indefinite nouns (see Chandralal Reference Chandralal2010: 45). The distinctions are thus ubiquitous. Because of the high salience of the animacy and definiteness constraints, speakers could more easily pick up on the constraints in English and also use these constraints in a more categorical fashion, resulting in higher usage frequencies of the s-genitive with animate/definite possessors. High salience might also cause an overcorrective use of the animacy rule (i.e. use the s-genitive with animate possessors); this tendency to overcorrect might be further enhanced by the perceived high status of English in postcolonial societies (Schneider Reference Schneider2007).
However, it remains unclear why these cross-varietal differences in the effects of animacy and definiteness mostly rely on preferences found in Sri Lankan females. A tentative search for explanations might include societal factors, such as gender equality. According to the United Nations Development Programme (2014), the societies of Sri Lanka and Great Britain are vastly different in terms of labour market participation. In Sri Lanka, only 35% of women participate in the labour market (males: 76.4%), whereas in the United Kingdom, 55.7% of women participate (males: 68.8%) (see United Nations Development Programme 2014: 172–3). Higher participation in the workforce might require more use of English and wider social networks, which might explain the more BrE-like patterns of Sri Lankan males. Sri Lankan women’s relative absence in the labour market might further explain why we observe less convergence between the genders than we do in Britain.Footnote 2 Lower participation in the workforce, however, is likely to be associated with less use of English and narrower social circles, which might pave the way for more influence from Sinhala in the English of Sri Lankan females.
6.4.2 Methodological Implications
This study has methodological implications as well. First, it is one of the first studies following Gries’ (in press) recommendations regarding classification trees and random forests (see also Deshors and Gries in press). To avoid using summarising a random forest on the basis of a single tree and an interpretation biased towards main effects, we used Gries’ new random-forest protocol, which involves including interaction predictors and AUC-based variable importance scores to determine whether interactions are relevant, too, and which predictors (main effects or interactions) to discuss. We believe that the comparison between Figures 6.1 and 6.2 clearly shows that main effects, even if they are the most highly ranked predictors of the analysis, underestimate, or at least do not at all highlight, the complexity of a dataset (and, again, a single tree cannot be used reliably to determine the interaction structure in a whole random forest).
Second, we are the first to implement an alternative to partial dependency scores, which are not yet readily implemented for the experimental forests of conditional inference trees coming with the party package for R, namely, an exploration of predicted probabilities that mirror those of the widely used effects plots à la Fox (Reference Fox2003) for regression modelling. We think that this is a useful way to proceed given how much using observed percentages can lead to anticonservative overestimates of effects especially in infrequent (combinations of) levels of predictors and in cases with moderate to high collinearity in the data, in which case the effect that a plot returns for one predictor will also contain the effect of many other correlated predictor values.
To see that effect in the present dataset, consider Figure 6.6, which is a somewhat complex extension of Figure 6.3. Specifically, the filled circles are the points from Figure 6.3 (with one default size), but, crucially, the light grey and dark grey triangles represent the simple observed percentages of s-genitives in BrE and SriLE; in other words, the further light grey and dark grey triangles are away from light grey and dark grey circles, respectively, the more the simple observed percentages differ from the effects-type predictions computed here that control for other predictors.

Figure 6.6 The effect of variety:gender:lengthDiff on genitive (circles) vs. observed percentages of s-genitives for variety:gender:lengthDiff (triangles)
The interpretation is relatively straightforward: in the middle two quarters of the plot, which represent length differences that are fairly frequent in the data, the two estimation methods lead to similar results: the percentages are a bit off and sometimes the ratio of BrE to SriLE is incorrect in the observed percentages display, but the differences are not too dramatic. However, in the outer two quarters of the plot, which represent much fewer cases – because huge discrepancies between possessor and possessum length are rarer – the triangles/observed percentages exaggerate the effects much more.
For instance, in the rightmost quarter, the observed percentages are all 0 whereas the effects predictions are (sometimes quite a bit) higher. It is important to realise that while there are no s-genitives when the possessor is much longer than the possessum, which the triangles represent, the triangles/observed percentages are still not a good guide towards understanding the effect of lengthDiff. This is because, for instance, these cases where the possessor is much longer than the possessum also have
a much higher number of indefinite possessors, which also favour of-genitives;
a much higher number of possessors with final sibilants, which also favour of-genitives.
Thus, the triangle positions at y = 0 reflect multiple variables’ effects, not just, like the figure caption would have one believe, (variety:gender:)lengthDiff.
The same is true of the or<<um cases on the left, which involve many more definite and non-sibilant-ending possessors than overall so here, too, the triangles/observed percentages overestimate what the graph has a reader attribute to (variety:gender:)lengthDiff. Also, the effects-like computation for random-forest predictions we pioneer here is, we believe, a nice way of extending a tried-and-true logic from regression modelling to random forests to lead to a better understanding of their effects (one that is comparable to the use of global surrogate models for random forests, see Gries in press).
6.4.3 Where to Go from Here
While our study is an exercise in World Englishes scholarship, its results do not straightforwardly inform popular models in the field because World Englishes models do not make predictions on our subject matter. This study is concerned with an abstract syntactic alternation and the slight probabilistic differences in its conditioning factors and in particular their interactions with gender. Current models of World Englishes do not make predictions about these things; in fact, before our study, there has hardly been any indication that male and female speakers make different genitive choices. In this sense, our findings comply equally with all World Englishes models.
That being said, this study has shown that gender is an important determinant of probabilistic grammatical choice-making and we do offer a potential explanation for our strongest finding – the gender difference in the strength of the possessor animacy constraint – by referring to a possible L1 transfer. While L1 transfer, again, is compatible with, and thus does not distinguish between, basically all popular models of World Englishes (e.g. by Kachru, McArthur, Schneider, or more recently by Mair or Buschfeld and Kautzsch), seeing these gender differences is instructive both in and of itself but also in terms of how gender qualifies other interactions and in terms of how it may force the analyst to face the sociocultural realities on the ground and how they impact linguistic choices and language change. Further studies of genitive choice should thus do their best to take this influence into account.
In order to understand the role of gender on grammatical alternations more fully, researchers should also look at its effects in a larger scope. This might involve 1) looking at genitive choice in more than two varieties or 2) looking at the effect of gender in other positional alternations such as the dative alternation or the particle placement alternation. To facilitate this, we recommend the use of additional ICE metadata provided by Martin Schweinberger (see above) or Beke Hansen (Hansen Reference Hansen2017).
Further, we see potential of further research in the field of contact-induced probabilistic differences in grammatical alternations across varieties. Although previous studies have suggested a transfer of constituent order (Brunner Reference Brunner2014) and probabilistic weights (Rosenbach Reference Rosenbach2017), contrasting findings that show opposite patterns exist as well (Heller Reference Heller2018). Further, the present study suggests that high salience of certain distinctions might already spark probabilistic differences in English, a hypothesis that remains to be tested. This might be achieved by inspecting genitive choice across a range of varieties with differing degrees of salience and different probabilistic weights of predictors like animacy and definiteness in the respective countries’ L1s.
Finally, we do feel that the statistical methodology we applied here merits more attention and future use. While mixed-effects modelling in particular has been taking much of linguistics by storm, its applicability to observational data is still often quite difficult so it is understandable that alternatives such as tree-based methods and/or random forests are becoming prominent alternatives. However, as alluded to above, if only briefly, there are scenarios in which the deceptive simplicity of these methods is counterproductive and hides some of the interesting variability in the data – exploring interactions and visualising effects are extremely rare in random-forest studies and we hope to have shown how and why this matters and what the discipline has to gain from such steps; the main effects of many phenomena are already well understood so the ability to add interactions with, for instance, speaker-level effects or other language-external factors is one of the things we need to move things to the next level.