In a video titled “Posh Plurals” published on TikTok in March 2022, British comedian Russell Kane commented on what he describes as a salient linguistic difference between working-class and middle-class speech in the United Kindom.Footnote 1
The video is about the class connotations of a particular grammatical pattern: the presence or absence of overt plural marking on noun phrases, which Kane associates with middle- versus working-class speech, respectively. What is interesting for the purposes of the present discussion is the way that Kane laminates his comments about this pattern with distinctly stylized middle- and working-class voices, themselves comprising a number of linguistic features that are stereotypically associated with social class in the UK. These include extensive TH-fronting and glottaling for the working-class voice (e.g., line 6: it’s [f]ree me[ʔə]) (Williams and Kerswill Reference Williams1999; Beal Reference Beal2014) and higher pitch and noticeably fronted sibilants for the middle-class voice (e.g., line 14: some meat[s̟:] and chee[z̟]e[z̟:]) (Levon Reference Levon2014; Holmes-Elliott and Levon Reference Holmes-Elliott2017b).
These stylized vocal differences are accompanied by distinct physical embodiments of working- versus middle-class personae (see fig. 1). When speaking in his working-class style, Kane uses much larger articulatory gestures, including greater movement of the lips, tongue, and jaw, not to mention a generally greater range of movement of the head. When animating the middle-class persona, in contrast, Kane is much more articulatory “focused.” The primary facial movements involve pursing of the lips and upward movement of the jaw, while the tongue remains mostly invisible within the half-closed mouth and the head remains level. These different articulatory styles draw on entrenched behavioral stereotypes in the UK, stereotypes that reflexively link larger and more pronounced forms of behavioral display to working-class figures and, in contrast, more tightly controlled bodily postures to middle-class ones (see Agha Reference Agha2007, 197–99).
Kane’s video is one example of the way that enactments of class-linked personae are often materialized by both acoustic alternations and distinct physical embodiments. In this article, we explore the connections that exist between these two types of socially meaningful practice. Specifically, we are interested in how acoustic variation and differences of bodily posture come together to form a multichannel semiotic array (Agha Reference Agha2007, 272–73; Calder Reference Calder2019a) that is linked to ideologies of class-based distinction in the British context. In other words, why does Kane’s use of smaller articulatory gestures when embodying a middle-class persona co-occur with vocal features such as fronted /s/? What is the semiotic logic that unites these two forms of performative display? By asking these questions, we seek to investigate the links among socially relevant diacritics and to understand how they come to be emblematic of a given identity. We suggest that the iconicity of the body plays an important role in this process, and so we follow Bucholtz and Hall (Reference Bucholtz2016) in attempting to rethink language through the body to derive a better understanding of how the social meanings of language are generated.
The Body, Variation, and Iconic Meaning
Studies of iconicity and variation have tended to focus on sound symbolism, or on meanings that are related to iconic properties of the speech signal itself (Hinton et al. Reference Hinton1994; Eckert Reference Eckert2017). The most common sound symbolic meaning discussed in the literature is what Ohala (Reference Ohala1994) terms “magnitude symbolism,” or the iconic link between the frequency of a given sound and perceptions of size, where, by virtue of physical correlations between frequency and the size of different resonating chambers, higher-frequency sounds are linked to “smallness” and lower-frequency sounds to “bigness.” These iconic associations between size and frequency are then conventionalized in culturally-specific ways (what Silverstein [Reference Silverstein1994] calls an affective engagement with sound symbolism), such that culturally relevant interpretations of “bigness,” for example, come to be associated with lower frequencies.
Eckert (Reference Eckert2010) draws on this semiotic understanding of sound symbolism in her analysis of the backing (and raising) of the English lot and price vowels (in words like gosh and ride) in the speech of a group of preadolescent girls in Northern California. She demonstrates that the girls systematically vary their productions of lot and price depending on the kind of social personae they are enacting in a particular interaction. When presenting themselves as “nice,” “friendly,” and “positive,” their productions of lot and price are significantly fronter (and lower) than when they are presenting themselves as “cynical,” “negative,” or “having an attitude.” Eckert argues that this stylistic pattern can be seen as sound symbolic in nature. According to this interpretation, the girls affectively engage with the magnitude symbolic meaning of differences in F2 (corresponding to vowel fronting versus backing) so as to interpret vowel backing (= lower F2) as being linked to negative affect and, by extension, to things like “adulthood” and “maturity.” In contrast, front vowels (= higher F2) are taken by the girls to index positive affect, which they understand as related to “childhood” and “innocence.” The “big” versus “small” contrast has thus been reoperationalized by the girls in question to refer to a locally salient difference between things that are semiotically linked to be being big (i.e., negativity, hence adults) versus things that are small (i.e., positivity, hence kids).
Pratt’s (Reference Pratt2020) discussion of linguistic materializations of the working-class-affiliated tech persona at CAPA, an arts-focused high school in Northern California, offers a similar opportunity for interpreting linguistic practice in terms of sound symbolism. Within CAPA, tech students are those who study the technical aspects of theatrical production, including set building and working with lighting and sound equipment, setting them apart from their peers who study the more middle-class “fine” arts such as dance, theater, and music (e.g., Bull Reference Bull2019). Yet much more than just a disciplinary difference, tech students are strongly associated with a particular set of enregistered traits, notably toughness, that are ideologically linked to the types of manual labor in which they engage. In her analysis of their speech, Pratt notes that tech students produce lot vowels that are significantly higher (= lower F1), backer (= lower F2), and more labialized (= lower F3) than their peers in other disciplines. We can see this linguistic variation as another instantiation of magnitude symbolism, whereby lower formants frequencies are linked to bigness and, by extension, the kinds of tough, working-class masculinity associated with the tech persona.
While this is a plausible explanation for lot, it does not work as well for the second variable that Pratt examines, namely, the velarization of word-initial /l/ (in words like light). Pratt finds that tech speakers at CAPA also produce more velarized articulations of /l/ than their peers and that this feature is a locally meaningful sign of distinction. This pattern is not one that magnitude symbolism can easily capture. To account for the origins of the variation she observes, Pratt discusses the fact that both raised and backed lot and velarized /l/ are produced by a greater degree of constriction in the postdorsal (i.e., back) portion of the tongue. Because of this common articulatory source, Pratt suggests that the kind of lingual constriction associated with raised/backed lot and velarized /l/ may itself be iconized within CAPA as an index of the toughness that characterizes the tech persona. In other words, Pratt does not rely on a sound symbolic account to derive the social meanings of the two linguistic features she examines. Rather, she argues that there exists a certain stylistic cohesion between the indexical meanings of the linguistic variants and the embodied posture (i.e., lingual constriction) that gives rise to them.
While Pratt’s analysis does not go beyond discussing variation and embodied postures in terms of stylistic alignment, we believe that Pratt’s insights open the door for seeing embodiment not only as a co-occurring phenomenon but as a new avenue for generating sociolinguistic meaning. In short, we suggest that the adoption of a locally meaningful embodied posture (i.e., lingual construction, in Pratt’s [Reference Pratt2020] example) has nonarbitrary consequences on linguistic production, consequences that in turn lend specific linguistic variants (e.g., lot raising/backing) their indexical potential. Put another way, our suggestion is that the body mediates the meanings that come to be ascribed to variable patterns, providing an initial template for the elaboration of indexical meaning.
In making this suggestion, we build on Bourdieu’s (Reference Bourdieu1977, 195) arguments about bodily hexis as a form of cultural memory, a way in which “a whole cosmology, an ethic, a metaphysic, [and] a political philosophy … are instilled in abbreviated and practical, i.e., mnemonic, form … through injunctions as insignificant as ‘stand up straight’ or ‘don’t hold your knife in your left hand.’” For Bourdieu, forms of bodily comportment are the physical manifestations of the values and principles of a given culture, ways in which that culture is materialized in the world. In his later writing, Bourdieu (Reference Bourdieu1991, 86) describes language as a crucial component of this physical manifestation of culture: “Language is a body technique, and specifically linguistic, especially phonetic, competence is a dimension of bodily hexis in which one’s whole relation to the social world … [is] expressed. There is every reason to think that through the mediation of … ‘articulatory style,’ the bodily hexis characteristic of a social class determines the system of phonological features which characterizes a class pronunciation. … This ‘articulatory style,’ a life-style ‘made flesh,’ like the whole bodily hexis, welds phonological features—which are often studied in isolation—into an indivisible totality which must be treated as such.” In his exhortation to treat class-linked phonological features as part of a broader “indivisible totality,” Bourdieu argues for locating the social meaning of variation at the level of articulatory embodiment, not at the level of the isolated linguistic feature. Bourdieu’s argument in this regard is reminiscent of an earlier claim made by Labov (Reference Labov1963) in his famous study of socially meaningful variation in Martha’s Vineyard. While the majority of Labov’s analysis is focused on patterns of centralization of the price and mouth vowels on the island, he nevertheless notes that “there are no less than 14 phonological variables which follow the general rule that the higher, or more constricted variants are characteristic of the up-island, “native” speakers, while the lower, more open variants are characteristic of down-island speakers under mainland influence” (307). Labov goes on to state that we can therefore “reasonably assume that this ‘closed-mouth’ articulatory style is the object of social affect,” not the individual variables themselves. This is the crux of the argument that we explore in the remainder of this article: that social meaning (what Labov calls “the object of social affect”) attaches to embodied articulatory styles and that what drives the association of linguistic variables with social meanings may be the adoption of a meaningful form of embodiment with which those variants are linked (see also Podesva Reference Podesva2021). In the remainder of the article, we present two case studies of variation in Southern British English to briefly illustrate this proposal.
The Semiotics of Sibilants
The first case study we present relates to a variable that was already mentioned in relation to Kane’s video about plurals: the acoustic realization of the English voiceless sibilant fricative /s/. This feature has been studied widely in a number of English-speaking contexts, and in all cases scholars have agreed that more fronted realizations of /s/ (where the constriction between the blade of the tongue and the top of the mouth happens closer to the teeth, resulting in higher-frequency sibilant noise) is stereotypically associated with perceptions of femininity.Footnote 2 In many contexts, the link between /s/-fronting and femininity has also been shown to extend to perceptions of sexuality, and specifically to perceptions of gayness in men.Footnote 3 The standard analysis for this recurrent form-meaning relationship between /s/-fronting and gender/sexuality is that it must be more than simply conventional and instead based on a universal sound symbolic property linking high-frequency /s/ to smallness, which in turn links to perceptions of femininity and/or gayness.
But more recent studies of /s/-fronting have also identified meanings for the feature that are less amenable to sound symbolic explanations. Calder (Reference Calder2019a, Reference Calder2019b), for instance, has argued that for drag queens in San Francisco’s SoMa neighborhood, fronted /s/ is linked to expressions of “fierceness” and the construction of a “sickening” persona. For both Campbell-Kibler (Reference Campbell-Kibler2011) and Levon (Reference Levon2014), fronting was shown to be linked to stances of “education” and “authority” in addition to gender and sexuality, while Steele (Reference Steele2019) and Calder and King (Reference Calder2022) have both documented the ways in which /s/-fronting in racialized in the US context. While it may be possible to construct particular interpretive logics linking meanings of “fierceness,” “authority,” and race to sound symbolism, the connection is not immediately evident. When it comes to social class, the sound symbolic source of any class-linked associations with /s/-fronting is even less clear. Paralleling Kane’s behavior in his video, Stuart-Smith (Reference Stuart-Smith2007) identifies a correlation between /s/-fronting and social class positioning among women in Glasgow. Specifically, Stuart-Smith finds that young working-class women avoid /s/-fronting and, unlike their middle-class peers, produce articulations of /s/ that are similar to working-class men. Again, while it may be possible to derive a link between a sound symbolic origin for the meaning of /s/-fronting and its status as a class marker in Glasgow, whereby working-class subject positions are associated with masculinity and middle-class ones with effeteness (e.g., Connell Reference Connell1995), the question remains whether another, more comprehensive analysis is possible, one that could account not only for the links between /s/-fronting and social class but also the variety of different meanings that have been attested.
In our own prior work (Holmes-Elliott and Levon Reference Holmes-Elliott2017b), we set out to examine the connections between /s/-fronting and classed subjectivities. To do this, we examined patterns of /s/ variation across two British reality television shows, which we use as a proxy for relevant social class differences:
-
• The Only Way Is Essex (TOWIE), based in Essex in the suburbs east of London. This show represents a more traditional working-class lifestyle and cast members speak with an accent that is typical of working-class East End/Cockney; and
-
• Made in Chelsea (MIC), based in the hyperaffluent district of Chelsea in west London. This show represents an upper middle-class lifestyle and cast members speak with a Standard Southern British accent (not unlike Received Pronunciation).
Both shows are so-called engineered reality television programs that follow a group of twenty-somethings in their day-to-day lives. While the scenarios on the shows are (obviously) staged, the interactions between cast members are not scripted, and the cast engage in spontaneous, naturally occurring speech. It is important to note that we do not take the speech of the cast members of these shows to be representative or “accurate” reflections of language practices in Essex and Chelsea more generally. Instead, we see the shows as performative enactments of class-based stereotypes, venues for cast members to strategically voice (and for the audience to decode) relevant enregistered personae (including the more working-class “Essex Girl” and the more middle-class “Sloane Ranger”). In this respect then, we treat language use in the show as a form of performance speech (Bauman Reference Bauman1992, Reference Bauman2005; Schilling-Estes Reference Schilling-Estes1998) in which particular classed ways of being are consolidated and put on display.
Our analyses of /s/ variation are based on the speech of central cast members in both shows. In total, we extracted 88 usable scenes from the first two seasons of both programs (approximately 6.5 hours of recorded speech) involving 24 different speakers (9 men and 15 women). Scenes were taken from high-definition downloaded files of the programs and were selected only if they did not contain any music or other background noise. From this corpus, we extracted 1,988 tokens of /s/ and, for each token, calculated its peak frequency (an acoustic measure that correlates with the front-back dimension). Tokens were then subjected to quantitative regression analyses, which examined the extent to which peak frequency is conditioned by a range of linguistic, social, and stylistic factors (see Holmes-Elliott and Levon [Reference Holmes-Elliott2017b] for full details).
The main quantitative difference in /s/ quality that we found was based on speaker gender, though the effect of gender was different across the two shows. An overview of our principal results is presented in figure 2. There, we see that in both middle-class MIC and working-class TOWIE, women (grey bars) have higher /s/ peak frequencies than men. This is the expected pattern, given prior research and stereotypical associations between high-frequency /s/ and femininity. Yet we also see that the difference between women’s and men’s peak frequencies is greater in TOWIE (on the right) than in MIC (on the left). It could be the case that individuals in TOWIE are more gender-normative, and so engage in more gender stereotypical practice (with women showing higher peak frequencies and men lower). But when we also consider how /s/ patterns in single- versus mixed-gender contexts, the picture becomes more complex. In MIC, both women and men have higher-frequency /s/ in mixed-gender talk than in single-gender talk. In TOWIE, in contrast, there is a different pattern: women have higher-frequency /s/ in single-gender talk than in mixed-gender talk, whereas the men do not vary across the two.
The complexity of the relationship between /s/-fronting, gender, and context across the two shows (which, again, we take as proxies for social class) makes it difficult to argue for a stable, group-wide meaning (such as “femininity”) for the variable. Even if we could somehow account for why both women and men in MIC, for instance, express less “femininity” in mixed-gender contexts than in single-gender ones, we would need a different explanation for TOWIE, where the patterns across contexts are distinct. For this reason, we argue in Holmes-Elliott and Levon (Reference Holmes-Elliott2017b) that it is better to model /s/-fronting in terms of how it participates in the specific activities that take place in these different contexts. In particular, we argue for an examination of /s/-fronting as it relates to stance-taking in interaction (Kiesling Reference Kiesling2009, Reference Kiesling2022), which we suggest provides us with a more fine-grained tool for tracking the social meaning of variation.
For reasons of space, we do not reproduce our entire stance analysis here. For present purposes, we instead provide a single example of how stance and /s/ variation relate to one another and consider the role that embodiment could play. The example is drawn from an argument between Amy and Kirk, two cast members from TOWIE, during the first season of the show. Prior to this argument, Amy and Kirk were dating, and the scene reproduced in transcript 2, from “Totally Out of Order” (TOWIE, season 1, 2011), culminates in Amy ending their relationship.Footnote 4
Amy’s /s/ tokens during her argument with Kirk are plotted in figure 3 (with numbers on the x-axis corresponding to subscripted numbers in the transcript). In transcript 2, we see that Amy adopts four distinct conversational positions. She begins in turns 7 and 9 (tokens 1–3) by initiating a questioning sequence, asking Kirk why he had not told her he would be taking Sam, their mutual acquaintance, out for a drink. Kirk replies that his meeting with Sam was for business purposes and did not imply any infidelity on his part. Amy replies with a sort of resigned disbelief, noting that while she does not believe in the truth of his explanation, she is willing to accept it. During this reply (turns 11–13), Amy adopts the “cool-headed” voice of someone refusing to get drawn into an argument and instead simply declares the consequences of Kirk’s actions. The quality of /s/ that accompanies this “cool-headed” voice (tokens 4–8) are all fairly low frequency, sitting well below Amy’s average peak frequency values during the exchange (represented by the solid horizontal line in fig. 3). In turn 14, Kirk refuses to accept the closure that Amy is attempting to enact, responding (in turn 16) that she is now becoming irate. Amy ratifies Kirk’s continuation of the argument, moving into a new voice (turns 17–25), which we term the “reasonable negotiator.” During this portion of the interaction, Amy works to justify her reaction to Kirk, explaining why it has upset her. The tension between Amy and Kirk rises noticeably across these eight turns, with neither Amy nor Kirk acceding to the other’s point of view. In terms of /s/, we note that Amy’s tokens in her “reasonable negotiator” voice are generally higher than they were previously, but still sitting within the middle 50 percent of the overall frequency distribution (represented by the dashed horizontal lines in fig. 3). Finally, in turn 29 Amy indicates that she has had enough and moves out of her “reasonable negotiator” voice into a final “exasperated” one, where she firmly rebuts Kirk’s arguments. Her /s/ peak frequencies in this final portion of the interaction rise quickly and dramatically, resulting in a number of very high tokens that correlate with the expression of high levels of (negative) affect.
We suggest that the broad correlations between Amy’s /s/ peak frequency values and the different voices she adopts in her argument with Kirk can help us to account for the complexity of the patterns observed at the group level (i.e., between MIC and TOWIE, between women and men, between mixed- and single-gender contexts). It is clear from the frequency values depicted in figure 3 that variation in /s/ quality is linked to the different stances Amy adopts, with /s/ frequency gradually rising as she becomes more animated and exasperated. But why might /s/ frequency be positively correlated with increasing exasperation? We do not think that sound symbolism can easily account for this. Instead, we suggest that changes in Amy’s /s/ quality are linked to emotion, and specifically to her move from a more low-arousal emotional state (“cool-headed”) to a more high-arousal one (“exasperated”). We know from research in phonetics that across a wide range of languages and cultures, high emotional arousal is often articulatorily materialized via specific forms of muscular tension, and in particular by more peripheral, frontward movement of the tongue during articulation (Erickson et al. Reference Erickson1998; Lee et al. Reference Lee2005; Kim et al. Reference Kim2020). We suggest that as Amy becomes progressively more exasperated in her interaction with Kirk (i.e., as her emotional arousal levels increase), this stance of exasperation is signaled by, among other things, fronter articulations of /s/. We contend, moreover, that the reason fronted /s/ functions as a diacritic of exasperation for Amy in this context is because of long-standing ideologies linking particular embodiments of emotion and social class in the UK.
As already noted in the prior discussion of Kane’s video, social class in Britain is linked to a particular ethnokinesics, a set of ideologies that stereotypically associates certain bodily postures and movements with different social class positions (see, e.g., Agha Reference Agha2007, 272–77). According to this system, working-class personae are characterized by large and dynamic behavioral displays, practices that are interpreted as a form of bodily excess. Skeggs (Reference Skeggs1997) has described how this ethnokinesics is grounded within a cultural understanding of moral respectability and propriety, in which enacting a respectable persona involves refraining from visible manifestations of embodied affect (see also, e.g., Lawler Reference Lawler2005; Nicholls Reference Nicholls2019). The linking of working-class identities to large and dynamic forms of bodily expression thus functions to exclude working-class behavioral styles from within the bounds of “respectability,” while simultaneously rendering the expression of embodied affect a naturalized emblem of the “authenticity” and “lack of artifice” to which working-class subjectivities are stereotypically linked (Skeggs Reference Skeggs2005).
In her interaction with Kirk, Amy expresses her exasperation through a visible display of embodied muscular tension, which includes /s/ fronting. In doing so, we argue that Amy draws on the particular ethnokinesics of class in Britain, a system that renders her behavior legible as a classed (and gendered) way of displaying emotion. In the heat of the moment, Amy’s embodied behavior (tension) functions as a text-level emblem of exasperation, a demeanour indexical (Agha Reference Agha2007, 240–41) of her emotional state (and for which fronted /s/ serves as a salient diacritic). We believe that the legibility of Amy’s use of /s/-fronting as a text-level emblem of exasperation relies on class-based stereotypes of embodied affect. In other words, the reason why fronted /s/ successfully signals Amy’s exasperation with Kirk is because of the iconic association between /s/-fronting and bodily tension, a form of embodiment which is itself imbued with class-based meaning. Speech and the body are thus linked via the creation of qualic congruence (Calder Reference Calder2019a) between fronted /s/, exasperation, and working-class femininity.
We believe that reading variation through the body in this way provides us with a better account of the group-wide patterns of /s/-fronting noted above, that is, in MIC versus TOWIE, by women versus men, in mixed- versus single-gender talk. Essentially, we argue that the differences that we find are related to differences in how affective stance is embodied, and the classed and gendered connotations that these embodiments carry. According to this account, fronted /s/ functions as a semiotic hitchhiker (Mendoza-Denton Reference Mendoza-Denton2011), a feature whose indexical potential derives from the qualic links that are established between speech and the other semiotic cues with which speech co-occurs. This is not to say that /s/ itself cannot become rhematized (Gal and Irvine Reference Gal2019) to function as a salient emblem of a given stance or identity (as indeed much research on the variable has shown). But we maintain that seeing the indexical meaning of /s/-fronting as emerging from specific forms of culturally meaningful affective embodiment allows us to provide a more unified account for the different meanings of fronted /s/ (e.g., “feminine,” “gay,” “fierce,” “authoritative”) that have been proposed to date.
Articulating Ease
While our first example of how the body could act as an indexical source for social meaning generation examines a text-level emblem that draws on broader stereotypes, our second example focuses directly on the level of broad stereotypes (what Agha [Reference Agha2007, 256] labels an enregistered emblem). Specifically, we examine the short front drag chain in Southern British English, a coordinated lowering and backing of three front vowels—kit (in a word like sit), dress (in a word like best), and trap (in a word like cap)—presumably begun by the movement of trap, which then subsequently “dragged” the other vowels after it (e.g., Trudgill Reference Trudgill1986; Torgersen and Kerswill Reference Torgersen2004; Fabricius Reference Fabricius2019). This pattern exists in our MIC/TOWIE dataset, particularly among MIC speakers. This is not surprising. The short front drag chain has been attested in a number of English varieties around the world, including Canadian English (Boberg Reference Boberg2008), Californian English (D’Onofrio et al. Reference D’Onofrio2019), Irish English (Hickey Reference Hickey2018), and South African English (Chevalier Reference Chevalier2019), in addition to (Southern) British English. In all of these contexts, the shift is associated with young middle- and upper-class speakers, and predominantly women.
The presence of this shift in such geographically and culturally disparate contexts has led scholars to argue that it is driven primarily by system-internal factors, that is, that it is an autonomous reorganization of the internal structures of the English vowel system driven by language internal constraints (e.g., Becker Reference Becker2019). But this account is unable to capture the fact that not only do we see similar linguistic patterns across contexts, but we also see similar social ones as well. A system-internal explanation cannot easily account for why the shift is consistently associated with specific social groups (notably, young upper-middle-class women) wherever it appears. This similarity in indexical meaning across contexts is what initially drew our attention to the shift and to the possibility of a system-external explanation for the patterns observed.
When we examine the behavior of kit, dress, and trap in our MIC/TOWIE dataset, we see evidence that the vowel changes in question may not be a “chain shift” after all. Figure 4 presents the relative height of the vowels kit, foot, dress, and trap for speakers in MIC and TOWIE in relation to the vowel fleece, which sits at the top of the vowel space. For speakers in TOWIE (gray bars), we see the traditional Southern British English pattern, with kit (and foot) high in the vowel space, not far from fleece; trap at the bottom; and dress between the two, clearly distinguished from both kit and trap. In MIC (black bars), in contrast, we see that kit and dress are both lower than they are in TOWIE, with dress crowding into the space of trap. This is the hallmark pattern of the shift. Yet, we also note that trap itself has not moved; it is in the same position in MIC as it is in TOWIE. If this were a drag chain initiated by the movement of trap, we would expect trap to be sitting in a lower (and/or backer) location in MIC than it is in TOWIE, with dress moving down to fill the spot that trap vacated (while fig. 4 only plots vowel height, analyses reveal that trap is also not backer in MIC than in TOWIE). But this is not the case. Instead, we seem to be getting a convergence in the placement of dress and trap in the bottom of the vowel space among MIC speakers.
In fact, when we look at the entire vowel space (see fig. 5), we see that for MIC speakers (solid ovals) there is a general crowding in the bottom half of the vowel space, with dress, trap, palm, and strut all encroaching on one another. This is sharply different from what we find in TOWIE (dashed ovals), where these vowels are all kept separate. Measurements of the overall vowel space areas in MIC and TOWIE confirm that this compression of the vowel is linked to an overall reduction in the size of the vowel space in MIC, which is characterized by more centralization and a generally smaller vowel space area than TOWIE.
Interestingly, this kind of compression and reduction in the overall size of the vowel space has been described before, linked to what Laver (Reference Laver1980, 155) termed a lax voice articulatory style, or a style of speaking associated with “minimized radial movements of the tongue, a lowered larynx, … and a relatively immobile jaw.” Laver goes on to state, following Honikman (Reference Honikman1964), that lax voice is typical of Received Pronunciation of British English, the variety associated with upper-middle-class speakers like those in MIC. There are a number of acoustic diagnostics that can be used to identify the presence of lax voice, which involve looking at the patterning of the high front, high back, and low vowels with respect to the first, second, and third vowel formants. Without going into the details here (see Holmes-Elliott and Levon Reference Holmes-Elliott2017a), we report that we conducted three such diagnostic tests, all of which point to the presence of lax voice articulation in MIC and its absence in TOWIE. Because of this, we suggest that it is not the case that vowels in MIC are moving autonomously for system-internal reasons. Instead, we argue that the in-crowding of the vowels we see in MIC is an acoustic fallout of MIC speakers’ adoption of a particular articulatory setting (i.e., lax voice).
Why might MIC speakers be adopting a lax articulatory setting? To answer this question, it is useful to consider how lax voice (and its characteristic open, immobile jaw) has been mediatized in the British context. Beginning in 2010, for example, British comedian Matt Lacey posted a series of parodic videos to YouTube titled “Gap Yah.” The videos depict the adventures of a young upper-class man (“Orlando”) who is traveling around the world during his “gap year” between university and the start of his career. In these videos, “Orlando” uses a very distinctive speech style featuring heavily backed and lowered short front vowels (including his pronunciation of the word “year” as [jɑ:]) accompanied by a lowered and immobile jaw. While clearly an exaggeration, Lacey’s parody (and the huge success that it enjoyed) demonstrates the relevance of systemic convergence in the lower half of the vowel space as a salient signal of young upper-class identity in the UK today. Similarly, a review of American actor Kristen Stewart’s performance as the title character in the recent Princess Diana biopic Spencer described her British accent as “entirely convincing, hitting the exact self-conscious, detached-jaw, pseudo-estuary drawl that posh people have adopted now that they realise how silly Received Pronunciation sounds” (Heritage Reference Heritage2021, emphasis added), once again pointing to an enregistered link between class positioning and jaw placement. And in August 2022, comedian Russell Kane posted another video on TikTok commenting on “posh people’s squashed vowels.”Footnote 5
As before, Kane’s comments indicate a keen awareness of an ideological association between articulation (my mouth is tight) and class formation (I’m tight, my emotions are tight), such that lax voice and its associated jaw setting function as salient cues of “poshness” in the UK. Why would that be the case? What is it about lax voice that gives it this class-linked meaning? We propose that the answer lies in the same British cultural understanding of class—the same ethnokinesics—that we describe for Amy and /s/-fronting previously, though in this situation it has to do with how the ethnokinesic system imagines “poshness.” As noted above, sociologists have argued that social class in the UK historically has been organized in terms of an ethics of restraint, such that dominant conceptualizations of “prestige” and “respectability” treat these constructs as being negatively correlated with the expression of emotion (e.g., Skeggs Reference Skeggs1997; Cannadine Reference Cannadine1999; Lawler Reference Lawler2005). Recent research confirms the ongoing relevance of this ideal. In a study of elite government workers in Britain, for example, Friedman (Reference Friedman2021) describes how senior civil servants are taught to enact a form of “studied neutrality,” a stance that they see as intimately tied to competence and authority (see also Ashley Reference Ashley2021). Friedman describes this ideal of neutrality as a form of embodied cultural capital, a somatic disposition that serves to legitimate civil servants’ claims to authority and prestige. In other words, Friedman argues that neutrality is a form of bodily hexis (Bourdieu Reference Bourdieu1977), a conventionalized mode of comporting one’s self that is emblematic of elite status in British society.
This norm includes adopting a posture of stoicism, as evidenced by the high cultural value placed on tropes like the “stiff upper lip” and “‘controlled excitement,’ [where] strong emotions are cultivated but always kept under control” (Bull Reference Bull2019). It also includes the corporeal enactment of disinterest. This was on display, for instance, in the case of Jacob Rees-Mogg, a Conversative Member of Parliament, who was lambasted in the press and on social media for reclining on the benches in the House of Commons during a debate about Brexit in September 2019. Anna Turley, a Member of Parliament from the opposition Labour Party, was quoted at the time as describing Rees-Mogg’s behavior as “the physical embodiment of arrogance and entitlement” (Rawlinson Reference Rawlinson2019). While Turley’s comments and those of others like hers were clearly critical of Rees-Mogg’s behavior, they nevertheless demonstrate the strong cultural association that exists in the UK between embodied indifference and elite social status.
In the context of the present analysis, we argue that the adoption of lax voice by speakers in MIC—with its characteristic minimal movement of the tongue and immobile jaw—functions quite literally as a physical embodiment of British ideals of eliteness. Put simply, we propose that the changes in the vowels visible in MIC is driven by a common orientation among MIC speakers to a set of qualities associated with elite status: restraint, detachment, indifference. This orientation pushes MIC speakers to adopt a particular bodily posture (lax voice) that is iconically linked to these qualities, which, in turn, corresponds to specific linguistic outcomes (compression in the lower half of the vowel space). According to this account, the positioning of the jaw and tongue among MIC speakers functions like any other form of bodily comportment, and serves as a symbolic strategy for aligning bodily carriage with a culturally elite persona (Agha Reference Agha2007). If this account is correct, we can view the changes in the MIC vowel system as an acoustic hitchhiker (Mendoza-Denton Reference Mendoza-Denton2011) on this strategic embodiment, that is, a variable pattern that arises and gains its meaning from the embodied posture with which it is associated. Clearly, in the case of the short front vowels, the variants themselves have been enregistered, leading to their use in parodic representations like those described here. But we contend that variants’ link to embodiment is what provides the initial impetus to the development of indexical meaning (i.e., their baptismal essentialization; Silverstein Reference Silverstein2003).
Treating the vowel changes we observe as iconically linked to embodiment would not only provide us with an account of what is happening in MIC. It would also provide us with a new way of understanding why the short front vowel shift arises in parallel across geographically distant varieties. The association of qualities such as restraint and indifference with elite status is by no means unique to the UK. In his study of upper-class Philadelphians, for example, Kroch (Reference Kroch1996) describes the speech style of members of Philadelphia’s historically most elite families as characterized by a “relaxed articulation” that “conveys a strong sense of entitlement.” Linguistically, Kroch describes how this “relaxed” speech style correlates with a slower speaking rate (a so-called “drawling quality”) as well as a laryngealized voice quality, known locally as “Main Line Lockjaw.” Similarly, Pratt and D’Onofrio (Reference Pratt2017) describe how parodic performances of elite Californians in the sketch television show Saturday Night Live rely on the use of an open jaw articulatory setting as a way of signaling being “relaxed” and “laid-back,” qualities stereotypically associated with eliteness in California. Pratt and D’Onofrio also show that the linguistic consequence of this open jaw setting is increased backing and lowering of the performers’ vowels. Once again then, we have an example of a specific bodily posture (open jaw) used to embody eliteness, resulting in a strikingly similar set of linguistic outcomes as in MIC.
In his discussion of contemporary articulations of privilege, Khan (Reference Khan2011) argues that elite status today is experienced as a sense of ease, an ability to maintain one’s composure and emotional detachment no matter the circumstances. We suggest that a lax voice articulatory setting (e.g., “open jaw,” “lockjaw”) is a quale (Harkness Reference Harkness2015) of ease, a practical materialization of this abstract sensuous quality. As a quale, lax voice articulatory setting circulates globally (in the US and the UK, and potentially also farther afield) as a commodified way of “doing” eliteness. In this way, the parallel shifts in vowel systems that we find in disparate English-speaking contexts can be said to result from a common orientation to an elite persona and the iconic way through which that persona is embodied in interaction.
Conclusion
Our principal argument in this article is that the social meanings of linguistic variation may not always be epiphenomenal. Instead, we suggest that culturally meaningful forms of embodiment can serve as a source for the indexical elaboration of variable meaning. For both of the variables we discuss, acoustic patterns are associated with specific articulatory postures: increased alveolar constriction and fronted tongue placement for higher-frequency /s/ and minimal movement of the tongue and an open, immobile jaw for vowel backing and lowering. These articulatory postures are themselves linked to a widely recognized ethnokinesics of social class in Britain (contrasting an animated working-class persona and a more indifferent middle-class one, respectively). We argue that it is possible to see the indexical meanings of these linguistic patterns as derived from the bodily techniques that give rise to them. In other words, we suggest that the body can act as a vehicle for generating social meaning.
In his discussion of the over-arching goals of studies of semiotic meaning, Agha (Reference Agha2007, 259) argues that a key open question is tracing the process through which “a diverse range of indices performable in behaviour … [come to be] linked to readings of personhood in ways that are intelligible to social actors” (see also Eckert Reference Eckert2019). We argue that, at least in certain cases, the nonreferential linguistic sign can become such an index by virtue of being tied to particular forms of bodily enactment, a product of speakers adopting embodied interactional styles in order to differentiate themselves from others in the social landscape (Esposito and Gratton Reference Esposito2022). We acknowledge that our proposal is this regard is preliminary, and that further empirical research is needed in support of our claims. Nevertheless, we argue that studying embodiment and its linguistic correlates is a potentially very fruitful avenue for developing a more comprehensive understanding of the different ways in which variable meanings emerge.