In 2010, I was researching the history of ethnic relations in Laos before the 1975 socialist revolution. During an interview, I asked Noy,Footnote 1 a Lao-American woman in her mid-60s, about inter-ethnic tensions in Laos. She began to tell me a story, popular when she was a child in the capital city of Vientiane. Its central conceit was that ethnic Hmong people don’t bargain like Lao people. Noy told the story through represented discourse and, in the process, gave voice to an imagined Hmong man. As the story unfolded, Noy’s figure of the Hmong man gradually emerged in higher resolution and became more vivid. When she first quoted him, the voice was barely differentiated from its narrative surround. Later, Noy used altered speech, her arms, and her head to enact a caricatured figure that stood out starkly against her own voice.
Over the past several decades, represented discourse like this has been a major topic in linguistic anthropology (see Lempert Reference Lempert2014). The superficially niche topic has borne large fruit, informing how we understand the relation between speakers and those whose words they ‘take on’ (Tannen Reference Tannen2009), and allowing us to see how and why people perform figures of identity and alterity when they do (Hastings and Manning Reference Hastings2004).Footnote 2 This work has shown that the way in which a person represents discourse—for example, whether they ‘put on an accent’ or merely repeat attributed words—is crucial for understanding what sort of social action that person is undertaking (Bakhtin Reference Bakhtin1981, 259; Hill Reference Hill1995, and many others). We know, for instance, to look toward Donald Trump’s shaking hands and gape-mouthed vowels to see that his enactment of the reporter Serge F. Kovaleski was not informative but mocking (see Hall, Goldstein, and Ingram Reference Hall2016, 86–88; Figure 1).Footnote 3 This work has also shown that represented discourse does not merely make use of ideologies of linguistic form, but circulates and (re)produces them, and thereby helps establish ties between kinds of language and kinds of persons, events, and interactional effects (Agha Reference Agha2005, 48).
And yet, while much work has documented that the form of represented discourse is significant, our tools for talking about that form are still crude. What Goffman (Reference Goffman1974, 530) wrote in Frame Analysis remains true today: our competence in recognizing the importance of form in represented discourse “is far ahead of our capacity to explicate the practices involved.” Analytically, we are mostly left to sieve such form through the binary of ‘direct’ and ‘indirect speech,’ a typology that is so basic and so susceptible to complication that, in the end, it leaves the majority of what is interesting in represented discourse out of its scope. The result is a mismatch between our sense of the social utility of discursive form and our lack of analytic resources for exploring how people put various formal elements to work.
In this paper, I offer a tool for talking about the form of represented discourse in the notion of figure composition. Simply put, a figure’s composition is defined as the formal semiotic elements that comprise that figure. These elements are united by a shared, transposed origo.Footnote 4 When one person quotes another, she employs these formal elements to compose a figure.Footnote 5
The notion of figure composition brings diverse formal elements of represented discourse, many of which have been analyzed separately in the literature, under one conceptual umbrella. As such, it offers a clearer language for describing how speakers compose figures and clarifies the relationship between this question and the problem of what makes represented discourse recognizable as represented discourse—what Agha (Reference Agha2005, 43) has called the ‘transparency’ of figures. Together, figure composition and figure transparency provide a comparative framework for describing how speakers use represented discourse to social effect, one which clarifies many of the core issues which—since the work of Bakhtin, Voloshinov, and Goffman, among others—have made represented discourse such an alluring subject for linguistic anthropologists. Broadly put, the notion of figure composition redirects discussion about the form of represented discourse. It invites one to shift from asking what kind of represented discourse any given stretch of represented discourse is to asking (1) What elements of that represented discourse appear to be coming from the quoted figure(s)? and (2) How are these elements used to produce interactional effects?
In what follows, I begin by introducing the notions of figure composition and figure transparency, sketching their advantages over and against other accounts of represented discourse. I next offer a range of formal elements that frequently occur in figure compositions, a heuristic catalogue of what previous research has found. I then return to Noy’s story to demonstrate how the notion of figure composition can enrich analysis, as I also shift from discussing what compositions are comprised of to what compositions can achieve by way of social action. Captioning these achievements ‘composition effects,’ I show that the gradient resolution of Noy’s Hmong figure diagrammed a supposed Hmong irrationality, as it also further tied certain phonetic forms to Hmong people generally. Finally, I conclude that tracing similar ‘composition effects’—and related ‘transparency effects’—is the key task for those trying to understand the roles of represented discourse in social action.
Figure composition and figure transparency
For represented discourse to be construed as represented discourse, it definitionally must present some element of itself as coming from a distinct time, place, person, “footing” (Goffman Reference Goffman1979; Holt Reference Holt2007:57), “vantage point” (Clark and Gerrig Reference Clark1990, 786), et cetera. Deictics—which appear to mutate in the move from ‘indirect’ to ‘direct’ reported speech—make this transposition especially obvious,Footnote 6 but such transposition characterizes represented discourse more generally (Hanks Reference Hanks1990, 205; 212; 215; 222). This suggests a simple definition for figure composition: a figure’s composition is defined as those elements of a voicing event that share a transposed origo or indexical ground (Hanks Reference Hanks1990, 222; Haviland Reference Haviland1993, 40; Agha Reference Agha2005, 42; Agha Reference Agha2007, 52; see also Nakassis Reference Nakassis2020).
The notion of figure composition complements what Agha (Reference Agha2005, 43) has called the transparency of voicing contrasts. A figure’s transparency and its composition are related but distinct ideas (compare Hanks Reference Hanks1990, 222). Figure transparency characterizes the extent to which a represented figure is recognizable as such, and thus construable as distinct from the figure representing it. It is the obviousness or clarity of a figure’s transposed indexical ground, construed with reference to inference, metapragmatic signs, and the formal properties of the figure’s composition itself. The most recognizable example of a sign that makes a figure more transparent is the quotation mark (Klewitz and Couper-Kuhlen Reference Klewitz1999). A figure’s composition, in contrast, consists in the transposed elements that comprise it—the words written within such quotation marks, for example. A figure’s transparency is a function of how thickly drawn the boundaries showing where represented discourse begins and ends are; its composition is what lies ‘within’ those boundaries.
As many have noted in a different technical language, a figure’s transparency and a figure’s composition are often formally entangled with one another and functionally related. On the one hand, for a given figure composition to be recognized as represented discourse, it must be transparently transposed.Footnote 7 Sometimes this transposition is a function of signs distinct from the represented discourse. Verba dicendi are prototypical examples. In prefacing a quotation with a matrix clause that includes such a verbum dicendi, a speaker marks off what is to follow (see i in Table 1). Sometimes speakers achieve a similar transparency with prosodic shifts, such as intonation contours or pauses, that signal or ‘flag’ that reported speech may be coming or ending (Kvavik Reference Kvavik1986, 356; Klewitz and Couper-Kuhlen Reference Klewitz1999, 476; 479). Some of these transparency increasing signs even co-occur with the elements of the figure but in a different modality (see ii). Sidnell (Reference Sidnell2006, 390), for example, found that an interactant might signal a reenactment by simultaneously gazing away from coparticipants. Here, gaze works alongside the bodily machinery of reenacting to draw the boundaries between what is representing and what is represented.
On the other hand, a figure’s composition itself is often construable as a reflexive sign of that figure’s transparency (iii).Footnote 8 From noticeable shifts in prosody (Klewitz and Couper-Kuhlen Reference Klewitz1999, 482; Couper-Kuhlen Reference Couper-Kuhlen1999; Bolden Reference Bolden2004), to changes in facial positioning, posture, or gaze, to choices of vocabulary, utterance initial response cries, or linguistic code (Clark and Gerrig Reference Clark1990, 774), actors and analysts can look toward “the likeness or unlikeness of co-occurring chunks of text” as a sign of the “sameness or difference of speaker” (Agha Reference Agha2005, 40) and thus, reconstruct transposed origos left unstated (Hanks Reference Hanks1990, 216). This capacity in part accounts for the possibility of “unintroduced dialog” (Tannen Reference Tannen1986, 318–319; cf. Mathis and Yule Reference Mathis1994), that is, quotation without anything of type (i). Think of the puppeteer who moves his wooden puppet’s mouth as his own lips remain shut—those same moving wooden lips are both a part of the represented discourse’s composition and a clue that make it more transparent. Alongside these other mechanisms, construals of transparency can also depend upon inferences regarding patterns of turn taking, adjacency pairs, et cetera (iv; Mathis and Yule Reference Mathis1994, 66; Klewitz and Couper-Kuhlen Reference Klewitz1999, 483) and inferences regarding the nature of the narrated event, the narrating event, and the figures within them, i.e., type (v).
Beyond Typologies that Mix Figure Composition and Figure Transparency
In linguistic anthropology, several scholars studying represented discourse have created typologies of it that combine dimensions of transparency and composition (see, for example, Urban Reference Urban1989, 43, Hickmann Reference Hickmann1993 and Agha Reference Agha2005, 43). These typologies tend to be somewhat ambivalently overlaid on the distinction between direct and indirect discourse.Footnote 9 Scholars are ambivalent about this overlaying because most regard the direct and indirect discourse distinction as, in and of itself, essentially inadequate (Günthner Reference Günthner1997; Holt and Clift Reference Holt2006, 11; Good Reference Good2015, 573). Even classic statements in the study of reported speech—for example, Jespersen’s (Reference Jespersen2007, 290) discussion of two kinds of ‘indirect speech,’ or Voloshinov’s (Reference Voloshinov1973, 125–140) comments on ‘modifications’—have pointed out the distinction’s limits, suggesting that the “boundary between direct and indirect discourse is fuzzy” (Tannen Reference Tannen2007, 103). Most contemporary linguistic anthropologists would probably agree that between the two poles of these two kinds of report lies “a range of blended alternatives” (Lucy Reference Lucy1993, 95). Many have aimed to illuminate this range with finer grained distinctions in types, such as ‘free indirect discourse,’ and lists of the features that distinguish one type from another.
Notions of figure composition and transparency allow us to set aside the categories of direct and indirect speech entirely. This is worth doing for the obvious reason that has been noted repeatedly: the distinction does not capture how represented discourse works in the world. Instead, it forces us to reckon with evidence that betrays its inadequacy: innumerable hybrids and intermediaries (pace Partee Reference Partee1973, 411). When one looks closely, it becomes clear that all utterances that include ‘represented discourse’ can appear ‘direct’ in one respect, but ‘indirect’ in other respects. The notion of ‘direct’ with which we are then left becomes merely a shorthand for referring to transposition of an origo regarding one element or another.Footnote 10
Unlike the notion of composition, the notions of direct and indirect discourse have been used to describe utterances not elements. Taking the utterance as a unit is fundamentally misleading, and in a growing body of work the heterogeneity of represented discourse is as clear as crystal. A given stretch of represented discourse can have multiple elements that shift somewhat independently (Evans Reference Evans2012). For example, a speaker might be ‘directly reporting’ some dimension of another’s speech, while laughing through the quotation and thus inserting a commentary ‘in her own voice’ (e.g., Goodwin Reference Goodwin2007, 20). An element’s role in a composition can furthermore shift across and within events of represented discourse. For example, Klewitz and Couper-Kuhlen (Reference Couper-Kuhlen1999, 473–474) found that sometimes the “prosodic formatting of a voice may ‘evolve’ during the stretch of speech being reported,” say, beginning with a sudden shift to a high prosodic register, but dropping as it continues.
The notions of figure composition and transparency invite us to explore this heterogeneity, to trace how the elements of represented discourse combine and transform within and across interactions. From the perspective of these notions, resolving, categorizing, or purifying the innumerable hybrids of indirect and direct speech that linguistic anthropologists have uncovered appears pointless, only a consequence of having the terms in the first place (cf. Latour Reference Latour1993, 10–11). From their vantage, direct and indirect report reveal themselves as not categorical devices for represented discourse in isolation, but as fundamentally relational ideas, compelling because they capture the sense that some figure compositions involve not just transposition, but more transposition.
Common Elements of Compositions
Speakers can divide a quotation any way they are able to so long as they can get their addressees to recognize what they are doing.
—Herbert Clark and Richard Gerrig (Reference Clark1990, 779)With this in mind, in this section I trace some of the multi-modal elements that have been shown to play roles in figure compositions cross-linguistically (Clark and Gerrig Reference Clark1990, 775; Stec, Huiskes, and Redeker Reference Stec2016, 5).Footnote 11 As I do so, I emphasize these elements’ relative independence from one another, because it is precisely that independence that has been obscured in descriptions of ‘direct’ versus ‘indirect’ discourse.
I offer these elements as heuristics meant to highlight a range of possibilities. That is, this list is not a limit on what may play a role in figure compositions but a suggestion as to what is likely to play such a role. When a researcher encounters a stretch of represented discourse, she might thus use this list as a guide for asking herself: how is the represented discourse in front of me composed? What are its significant elements?
A few further qualifications are in order. First, the elements that follow are not always capable of being cleanly distinguished one from another. Second, the medium of communication matters. People speaking in the Lao language while they stand face-to-face can represent discourse differently than people texting on phones or signing ASL through a glass window (see Jones and Schieffelin Reference Jones2009; Klewitz and Couper-Kuhlen Reference Klewitz1999, 471). Different media afford different figure compositions. Third, and relatedly, distinct languages have distinct grammatical forms, lexical items, and conventions of transposition that affect what figure compositions are possible (Coulmas Reference Coulmas1986, 14; Rumsey Reference Rumsey1990, 347; Evans Reference Evans2012).Footnote 12 For example, some verbs of speaking or acting, which are primarily resources for increasing transparency, nevertheless signal or constrain the compositions they control: some require deictic transpositions (Coulmas Reference Coulmas1986, 19); some ensure de dicto readings; and some prefigure certain kinds of spoken, gestural, or corporeal performances (e.g., Streeck Reference Streeck2002, 594–595). Fourth and finally, recognizing a certain element as part of a figure’s composition requires that element to meet a threshold of transparency, and this threshold is not always met. Researchers will thus inevitably find cases in which it is unclear to which figure a formal element belongs, or cases in which an element plausibly belongs to more than one figure (see Woolard Reference Woolard1998). These cases are not a problem for the notions of composition or figure transparency; rather, they are a particularly interesting part of social life that these notions help us identify and better characterize.
Deictics
Unlike many of the elements of represented discourse,Footnote 13 when deictic lexemes are transposed this tends to be obvious (I refer to deictic lexemes here, but some deictics, of course, are affixal, gestural, ocular, et cetera). This obviousness clearly relates to their role in reference (see Silverstein Reference Silverstein1981). That is, since deictic expressions such as this or I are used to “identify referential objects relative to indexical grounds” (Hanks Reference Hanks1990, 197), represented utterances that use them only correctly refer if those grounds are understood as transposed.Footnote 14 If people misconstrue these transpositions, the utterances can become unintelligible; such errors in transposition thus tend to make themselves known in failures to refer.
Perhaps it is because of the salience of deictic transpositions that scholars have treated deictics as something of a litmus test for whether a stretch of speech is ‘direct’ or ‘indirect.’ And yet, while many (e.g., Silverstein Reference Silverstein2014, 141) describe deictics as indexical anchors of the whole utterances in which they occur, utterances are not always cohesive ships to be anchored, and deictic transpositions can happen (or not happen) independently of other transpositions. Put simply, the deictics in a stretch of represented discourse (i) need not always be transposed at the same time (or in the same way), nor (ii) need they necessarily be transposed in order for other elements in the same stretch of represented discourse to be transposed. On the first point, sometimes “shifts in deictic origo do not map neatly onto clause boundaries” (Agha Reference Agha2005, 43 on ‘free indirect speech’) and some languages have conventions of split deixis—take Russian, for example, where “in indirect speech, the pronoun deixis [is] adjusted to the report situation while temporal deixis by means of tense forms keeps its pivot in the reported situation” (Coulmas Reference Coulmas1986, 19).Footnote 15 On the second point, it is easy to find examples where speakers, for instance, take on or simulate the prosody of another without transposed deictics. As Klewitz and Couper-Kuhlen (Reference Klewitz1999, 470) put it (citing Günthner Reference Günthner1997, among other works), “expressive prosodic marking cuts across the canonical, grammatically based distinction between ‘direct’ and ‘indirect’ speech… .” Contrary to how many scholars discuss deictics in relation to quotation, there does not seem to be an easy way to articulate a universal, implicational hierarchy in which deictics hold a privileged place in the transposition of represented discourse.
Suprasegmentals
As the last example from Klewitz and Couper-Kuhlen indicates, people can also transpose the suprasegmentals of their speech, shifting the indexical ground of a whole host of different formal features including loudness, duration, pitch, speech rate, timing, pausing, voice quality, stress, and phonetic realization of lexical tone.Footnote 16 That suprasegmentals can appear to be coming from the speech of another in cases of quotation has frequently been remarked upon and, more recently, studied in depth (e.g., Tannen Reference Tannen1986; Clark and Gerrig Reference Clark1990:776; Mitchell-Kernan and Cohen Reference Mitchell-Kernan2017, 390; Günthner Reference Günthner1997; Couper-Kuhlen Reference Couper-Kuhlen1999; Günthner Reference Günthner1999; Klewitz and Couper-Kuhlen Reference Klewitz1999). Sapir (Reference Sapir1949, 193), for instance, recounted that, “The Nootka Indians of one tribe frequently imitate the real or supposed speech peculiarities of those belonging to other Nootka tribes, the stress being primarily laid not so much on peculiarities of vocabulary and grammatical form as on general traits of intonation or sound articulation (cf. our New England ‘nasal twang’ and Southern ‘drawl’).” The role of such suprasegmentals is perhaps most patent in long narratives and performances, such as puppet theatre, where a single voice actor speaks as if he were a host of different characters with distinct pitch ranges, speech rates, and intonation contours (Gross Reference Gross1983, 300). This is also evident in Don Gabriel’s heteroglossic story of his son’s death, famously captured by Jane Hill (Reference Hill1995), in which Don Gabriel uses prosody, among other elements, to juxtapose different moral and biographical figures (Keane Reference Keane2011).Footnote 17 In both cases, these different voices flesh out the figures they depict as of certain kinds, as they also increase transparency as to who is speaking.Footnote 18
As these studies have found, suprasegmental marking is, in contradiction to what is often presumed, “rather widespread” on apparent ‘indirect reports’ (Klewitz and Couper-Kuhlen Reference Klewitz1999, 470–471; pace Jansen, Gregory, and Brenier Reference Jansen2001). For instance, Klewitz and Couper-Kuhlen (Reference Klewitz1999, 478) describe how a woman named Alina animates an “older guy” and a “young chick,” even as she is describing the “young chick” within otherwise non-transposed discourse:
The man’s voice … is accompanied by prosodic shifts to forte and allegro. The young girl’s speech (f1) in lines 16–17 coincides with a marked prosodic shift to high register, accompanied by a paralinguistic shift to nasal, breathy voice.
When we begin to inspect examples such as these, we take the distinction between so-called “direct” and “indirect” discourse to its limits. It leaves us with only tautologies and contradictions. If we try to maintain it, we must treat suprasegmentals as both constitutive features of the two kinds of report and variables that can occur within either kind.
Segmentals
As speakers can modify their suprasegmentals to achieve a transposed indexical ground, so too can they alter their segmentals. People do this frequently when they represent figures who they purport speak a different dialect or language. Take, for example, how one teenager portrayed the speech of a white youth appropriating Black speech (Bucholtz Reference Bucholtz2011, 258; Table 2):
Over the course of the portrayal, this young man altered his phonological palette away from his normal speech (along with the use of lexical, suprasegmental, and grammatical alterations). He, for instance, vocalized his postvocalic /r/ in “your ass” and realized the diphthong /aj/ in “my” as a monophthong [mɑ:].
Morpho-Syntactic
Morpho-syntax can also be transposed into figure compositions. Take gender indexicals in Kosati. Haas (Reference Haas1944, 145) described how when Koasati men quoted Koasati women they adopted women’s forms and vice-versa (see Agha Reference Agha2005:57–58). Years later, Kimball (Reference Kimball1987) found that Koasati speakers were now using those same ‘male’ forms, which they had otherwise set aside, only to report the speech of respected people from past generations, many of whom were deceased. Meek (Reference Meek2006, 100) likewise documents transposed morpho-syntax in her descriptions of representations of American Indian speech in American films (what Meek calls ‘Hollywood Injun English’). These depictions tend to exhibit unmarked tense and other non-standard English morpho-syntax—e.g., a character in Disney’s Peter Pan says, “Squaw no dance, squaw get-um firewood.”
Lexical
As Meek’s examples further demonstrate, speakers also use transposed lexemes in represented discourse (see Voloshinov Reference Voloshinov1973, 137; Hanks Reference Hanks1993, 136). When James Joyce, for instance, writes ‘moocow’ in the first page of Portrait of the Artist as a Young Man, the form is patently transposed into the mouth of the young narrator (see Banfield Reference Banfield1973, 32). Such transposed lexemes are often associated with the stances of particular individuals, kinds of persons, or linguistic registers (see Agha Reference Agha2005). Like the other elements of figure compositions, they can give figures flesh.
That lexemes can be transposed into figure compositions also underlies a classic philosophical distinction between de dicto and de re utterances (Partee Reference Partee1973). Banfield (Reference Banfield1973, 5) explains the distinction with the sentence, ‘Oedipus said his mother was beautiful.’ This sentence has at least two readings: “(a) that Oedipus said that some one person who the speaker reporting his speech identifies as Oedipus’ mother was beautiful, or (b) that Oedipus said something like, ‘My mother is beautiful.’” (a) is its de re reading. Following that reading, the person representing the speech might be characterizing a host of utterances: for example, ‘Jocasta is beautiful,’ ‘My wife is beautiful,’ ‘The mother of my children is beautiful’ (these examples are from Coulmas Reference Coulmas1986, 4). (b), in contrast, is the sentence’s de dicto reading, which implies that Oedipus had called Jocasta his “mother” in the original utterance. What is in question in deciding whether the meaning of the utterance is de re or de dicto is whether the lexeme mother is to be treated as originating from the figure of Oedipus himself. (Of course, the facts of the matter of what Oedipus actually said—presumably in Greek—are irrelevant here, as they are irrelevant in many discussions of represented discourse (Coulmas Reference Coulmas1986, 6; see Tannen, e.g., Reference Tannen2007, 17 on “constructed dialogue”).)
The possibility of de dicto indirect speech—as in “Oedipus said his mother was beautiful”—underlines another respect in which the category indirect speech, as traditionally understood, can contain forms indexically tied to the origo of the quoted figure.Footnote 19 And de dicto readings are not exceptional in discourse, but pervasive (see Coulmas Reference Coulmas1986). In fact, to repurpose Partee’s (Reference Partee1973, 415) contention that “the quoted sentence always has a de dicto interpretation,” we might say that the elements of a figure composition always have a de dicto interpretation. This de dicto interpretation is their essence as quoted elements, isomorphic with the fact that they are anchored in a given narrated event that is in some way distinct from the speech event.
The Body
That the non-sonically resonating parts of the body can also take part in figure compositions is well documented. Just how much the body can add to a representation of speech is apparent, for example, in the comedian Sarah Cooper’s impersonations of Donald Trump (figure 2). Cooper lip-syncs—but really, more accurately, eye-, eyebrow-, face-, hand-, and shoulder-syncs—Trump (and sometimes his interlocutors) as original audio from his speeches and interviews plays. The result is a vivid underlining of the absurd bits of Trump’s language, re-embedded in a new body.
Some studies still discuss represented discourse as if it were only either sonic or written, and many linguistic anthropologists continue to exclusively use audio recordings—rather than video recordings—of interactions, even when those interactions occur in environments where participants also have visual access to one another. But this is changing. In the last decade research on re-enactments, bodily-quoting, constructed action, and the multi-modality of reported speech has blossomed (e.g., Clark and Gerrig Reference Clark1990; Haviland Reference Haviland1993; Streeck Reference Streeck2002; Sidnell Reference Sidnell2006; Goodwin Reference Goodwin2007; Keevallik Reference Keevallik2010; Keevallik Reference Keevallik2013; Sandlund Reference Sandlund2014; Cormier, Smith, and Sevcikova-Sehyr Reference Cormier2015; Stec, Huiskes, and Redeker Reference Stec2015; Stec, Huiskes, and Redeker Reference Stec2016; Hodge and Cormier Reference Hodge2019). While some of this research draws a distinction between representations of bodily movements and spoken discourse, the concept of figure composition is in line with a growing consensus that “verbal and bodily quoting are essentially the same kinds of activities” (Keevallik Reference Keevallik2010, 402).
Here is a simple example that shows speech and the non-sonic body working together: a speaker composes a figure—she says, “No!”—and, as she does so, she raises her hand up, palm facing outward, in a “please stop”-like gesture (Streeck Reference Streeck2002, 193), quoting the hand movement alongside the speech (e.g., Haviland Reference Haviland1993, 28–29). This gesture takes a “character viewpoint” (McNeill Reference McNeill1992, 190), transposing the indexical ground of the body such that it is treated as if it were emerging from the enacted figure. Sometimes such character viewpoint compositions include much more than the hands. In Keevallik’s (Reference Keevallik2010) examples of bodily quoting, for instance, a dance instructor corrects a student by demonstrating “the wrong way of leading the sugar push.” Evident in these examples is the sense that a person’s body is often the best stand-in for a figure’s body (Sweetser Reference Sweetser2012, 13; cited in Stec, Huiskes, and Redeker Reference Stec2016, 3).
But interactants also have the capability of producing “observer viewpoint” gestures, in which parts of the body (especially the hands) compose distinct parts of figures (e.g., not just hands): such gestures “take place at arm’s length from the observer, as if the hands were detached from the body, self-sufficient organs of representation” (Streeck Reference Streeck2009, 207; citing Sauer Reference Sauer1999, 221). For example, as a speaker describes a blob rising up a drainpipe, he moves his hands up, iconically presenting the blob and its trajectory (McNeill Reference McNeill1992, 191). While the relation between observer and character viewpoints has often been described as analogous to the distinction between direct and indirect report (e.g., Parrill Reference Parrill2012, 104), note that what distinguishes the two viewpoints here is not whether the origo of action has been transposed—it has, in both cases—but whether the gesturer’s body is portraying the figure’s body. These different “viewpoints” are thus not fundamentally distinct, but rather they use distinct principles of composition that align with the narrating and narrated environments in different ways (see Russell Reference Russell2012; cf. Haviland Reference Haviland1993).
During interaction, speakers often alternate between these two perspectives, as they also compose dual viewpoint gestures or “chimeras” (McNeill Reference McNeill1992, 124; Parrill Reference Parrill2009). Some of these involve both observer and character viewpoint, while others involve compositions of multiple figures occurring simultaneously. Take the following example (McNeill Reference McNeill1992, 124; originally from McClave Reference McClave1991), in which two character viewpoints are represented simultaneously. Here the speaker points to his own body as he reports, “[you] had your doctor go over to check out that person’s claim.” In doing so, his pointing hand stands in for the hand of the figure doing the pointing, as his body stands in as the figure being pointed to.Footnote 20
What is remarkable about the research into the bodily dimensions of represented discourse—that is, the issue which examples such as the above make so astonishingly clear—is that not only can the body play an integral part in figure compositions, but the body itself is divisible into different elements. That is, some parts of the body may play a role in a figure’s composition while other parts of the body are playing no such role; or, two body parts may play different roles. To capture this, many who work on bodily communication have analytically divided the body up into different articulators, for example, the head, face, eyes, arms, and torso (Cormier, Smith, and Sevcikova-Sehyr Reference Cormier2015, 1). These physically defined articulators, in turn, have been shown to have unique affordances. The eyes, for instance, are the only human organ capable of, and construable as, both giving and receiving visual information.
These semiotic affordances of different parts of the body are especially clear in studies of sign language (see Stec, Huiskes, and Redeker Reference Stec2016, 1), where the most careful work on multi-modal represented discourse (under the umbrella of ‘role shift’ or ‘constructed action’) is done, and where the languages being studied have the most developed semiotic resources for using the non-sonic parts of the body as elements in figure compositions (see Cormier, Smith, and Sevcikova-Sehyr Reference Cormier2015; Stec, Huiskes, and Redeker Reference Stec2016; Hodge and Cormier Reference Hodge2019 for discussion of this literature and its relation to work on represented discourse in spoken languages). I might have thus been justified in separating this section with sub-sections on the eyes, the hands, the body, and so forth—as I did above regarding deictics, suprasegmentals, segmentals, et cetera.Footnote 21 If I were to have done so, the distinctions I drew among elements would have been as relatively arbitrary as those distinctions I drew above. In practice, whether it is worth distinguishing one element of a composition against others in relation to any stretch of empirical material always depends on the empirical facts, on whether these bodily components are being used in meaningful and relevant ways.
Recent work on the semiotic dimensions of the body shows the body’s import in many of the communicative environments linguistic anthropologists have studied. So much so that we might wonder what we have missed in classic studies of represented discourse. What, for instance, might Don Gabriel have done with his eyes, hands, and mouth as he gave voice to the many figures in his story of the death of his son? What might the two boys playing ping-pong that Hoyle (Reference Hoyle1993) describes and Agha (Reference Agha2005, 50) further analyzes have been doing with their bodies as they narrated their game as if they were sportscasters?
Orthography and Computer Media
Reflecting on the body’s role in figure compositions shows the flimsiness of the boundary between speech and other semiotic activities. As such, it opens our way toward thinking of how other modalities might afford represented discourse. The orthographies in which language is written provide an obvious example, as changes in medium, font, formatting, layout, and spelling, among other formal features, can all become elements of a figure’s composition in written or multi-modal discourse (Clark and Gerrig Reference Clark1990, 786; see Jones and Schieffelin Reference Jones2009; Hoffmann-Dilloway Reference Hoffmann-Dilloway2011). Emoji’s offer an obvious site of interest in this regard (Danesi Reference Danesi2016). We might also think of the represented discourse millions of people create in their own video productions that they later post to websites such as YouTube. From ticky-tacky effects filters that, say, make one’s face look like a racoon or adorable bear to deep fakes that appear to capture the whole essence of a person, computer mediated platforms offer a range of new possibilities for figure compositions with which people are currently experimenting.
Costume, Props, and Other Non-Corporeal Semiotics
Discussion of these less traditionally recognized elements of represented discourse also draws attention to a host of additional elements that can be used in compositions to make figures palpable. The use of physical props, makeup, and costumes is most obvious and well documented in theatre or film, but small little shows with costumes and special effects, so to speak, occur in ordinary interaction as well, as people adjust their glasses or use props like napkins or pieces of paper to compose figures (Hall, Goldstein, and Ingram Reference Hall2016, 85; Goffman Reference Goffman1974). There are also extreme cases where what is an element in a figure composition and what is an element of a person are physically indistinguishable. When Christian Bale prepared to portray Dicky Eckland in the movie The Fighter, he shadowed the real life Eckland to adopt his “distinctive mannerisms and speech patterns,” what some who knew Eckland called “Dickynese” (Lim Reference Lim2010). But he also lost a third of his weight to portray Eckland’s gaunt body, hollowed out from drug addiction. Is this latter weight loss an element of Bale’s composition of Eckland? One could argue for or against the idea, but the question brings to mind a host of other questions regarding not how we define the analytic of composition per se, but regarding how compositions integrate with social life. Bale’s case invites us to think of many of the things that people do which can blur the line between altering oneself and portraying another.
One Variety of Composition Effect: The Gradient Resolution of a Figure
The notion of figure composition is useful not just because it allows us a better language for describing the form of represented discourse, but because it offers a vantage from which we can inspect the relation between that form and social and semiotic action. To capture the various interactional entailments a figure’s composition can have I use the term composition effect. In this section, I return to Noy’s story to describe one variety of such an effect, in which a series of compositions gives the sense, intertextually derived, of the gradient resolution of a figure. Here the cross-modal architecture of figure compositions across events of represented discourse makes some figures appear lower and others higher resolution. Such gradient resolution of figures can serve as a diagrammatic icon of something else—the arc of a story, for instance. In fact, as many have found, narrators often incorporate more robust compositions at the end of a narrative, increasing vividness (Mathis and Yule Reference Mathis1994, 67). In Don Gabriel’s narration of his son’s death, for instance, he moves from less to more reported speech, and begins, as Hill (Reference Hill1995, 115) describes, to incorporate more and more “direct” reports (cf. Hymes Reference Hymes1981, 321 on “vocal realization”). A gradient figure might also be used as a diagrammatic icon of the competency or lack of competency at some skill: Keevallik (Reference Keevallik2010, 420), for instance, describes how dance instructors portray the incorrect stiff dancing of students and correct dancing with very different compositions: they move less stiffly, more fluidly in the correct demonstrations, accompanying their moves with on-time snapping and singing with a breathier and more passionate voice (see also Weeks Reference Weeks1996, 274). Or the effect might be used to contrast different figures in a story, whose compositions model dimensions of their characteristics. In her description of the story that a young medical resident told about his day working in the emergency room, for example, Tannen (Reference Tannen2007, 123–124) writes that:
The paralinguistically exaggerated role-play of Billy’s voice, and the slightly less marked animation of his friends’ voice, both emotion-filled, contrast sharply with the relatively ordinary quality of the voice in which the speaker/hospital staff dialogue is represented. These contrasting voices create the dramatic tension between the unreasonable behavior of ‘these three drunk guys’ and the reasonable behavior of the speaker/staff. This contrast highlights as well the central tension in the story: that the visual display of blood and the extremity of the boys’ emotional display were out of proportion to the severity of the wound.
The notion of gradient resolution is thus an umbrella term for a broad range of composition effects. As such, it reminds us that figures of personhood are not always treated interactionally as monolithic types (Agha Reference Agha2005), but gradiently evocable, and that this gradience can itself be a tool for effective semiotic action.
This is exactly what happens in Noy’s story of the Hmong vendor: the gradient resolution of his figure composition comes to underline the structure of his irrationality.
Noy’s Story
Noy, along with most Lao now living in America, was a refugee in the early 1980s. Fleeing from Laos, her young children in tow, she moved into a series of camps before finally settling in the United States. At the time of the interview, she, along with a few other members in the Lao-American community, was teaching me to speak Lao, which gave the interview, conducted mostly in Lao, a tacit pedagogical frame.
Her story of the Hmong vendor in the market was situated against a background of inter-ethnic tension in Laos, both at the time she lived there and at the time of the interview (see Baird Reference Baird2010). Noy was sensitive to this tension, and voiced genuine sympathy for Hmong people, even as her story further circulated discourses of Hmong as irrational, uneducated, and linguistically incompetent. For example, she told me that when she was a child in Vientiane, Laos’s capital, “If a Lao and a Hmong person fought, it was always the Hmong person who was blamed.” She also preferred to use the term Hmong, rather than the offensive ethnonym “Meo,” which was common when she was a child. When she used the latter to represent her speech in the past, she corrected herself on a few occasions by repeating the ethnonym Hmong afterwards.
Noy vividly remembered Hmong marching into markets in Vientiane when she was a child, in single-file lines with baskets tied to their backs filled with brooms for sale. When young Lao children heard that the Hmong were coming, they would get excited, and Noy demonstrated this by inhabiting the figure of an ebullient child: she smiled, shook her arms to mimic running, and called out, “Come see the Meo! Come see the Meo!” She also remembered Lao children playing with the sound of “Meo,” which is similar to both the Lao word for cat and the onomatopoeia for a cat’s vocalization. The children would meeow meeow like cats at the Hmong broom sellers marching into the market. Most of the Hmong broom-sellers would walk by solemnly, but Noy remembered one young Hmong man who took a broom from his basket and hit one of the children.
It is such scenes of inter-ethnic exchange and tension that form the background of her story’s punchline: “When you go buy brooms from Hmong broom sellers, be careful!” The story purports that Hmong broom-sellers cannot understand a kind of bargaining we might call “generalized negotiation.” Generalized negotiation is the repetition and multiplication of a bargain already made. For example, imagine that a merchant agrees to sell two brooms for four dollars. If generalized negotiation is holding, then one could also buy four brooms for eight dollars or 400 brooms for 800 dollars: the deal scales. Hmong people, according to the story, neither allow nor understand this scaling. Two brooms for four dollars means only two broom for four dollars. Any other brooms you might want to purchase would require further haggling. The heart of the bigoted story, as Noy elaborated, is that Hmong people were too literal minded, under educated, and hard to understand when they spoke Lao to practice generalized negotiation.
The Emerging Figure of a Hmong Man
Noy’s narrative begins in earnest with a representation of dialog between a figure of herself and the—as of then—minimally described generic Hmong person. She says, “Let’s say you ask Hmong people the price of a single broom, and they (khacaw4)Footnote 22 say that, for example, one broom is three kip [Lao currency)].”
For Transcript [1] there are two signs of transparency, signs that Noy is speaking from the figure of a Hmong broom-seller in a market: a verbum dicendi that just precedes line [1a], and the contextual fit of the referential content—i.e., the broom-seller is expected to be the one providing the price of brooms. There are neither transposed deictics nor non-transposed deictics. Rather, the figure’s composition is comprised of the lexical choice of “3 kip.” Notably, the choice of price itself is anchored to the narrated scene’s origo, as it is historically a more likely price for brooms from the time when Noy was a child in Laos. At the time of the interview, one would not be able to find a broom for sale for 3,000 kip, let alone 3.
This line, “3 kip” is, in comparison to the compositions to come later in the story, low-resolution. Notice, for instance, that Noy’s co-speech gesture represents the narrator’s action instead of the Hmong figure (Goffman Reference Goffman1979, 151). As she utters the word kiip5—as part of the phrase meaning ‘3 kip’—in [1a], she lifts both her hands up, palms facing upwards, with her arms slightly outstretched from the sides of her body. This gesture, a shrug, represents the arbitrariness of the number that Noy, in the role of narrator, has chosen. She follows it by saying that 3 kip is just “an example,” and then repeats “3 kip” and shrugs again.
In the utterance in line [5a] of Transcript 2, Noy introduces the Hmong figures’ interlocutor, some apparition of herself, the potential buyer of brooms in the story, with a frustrated response cry: “Qooj!” (Goffman Reference Goffman1978). The cry clearly has a transposed origo, emanating from the emotional state of the figure of Noy in the market, not Noy the narrator sitting across from me during the interview. It brings this new figure, the Hmong man’s interlocutor, into focus.
Immediately after this, in line [5b], the figure then addresses the Hmong man, using the term phòò1 siaw1, which creatively indexes him as a male and further ethnicized figure: as Noy explains to me, the term phòò1 siaw1—meaning ‘close friends’ father’—is how Hmong men prefer to be addressed.Footnote 24 As Noy says it in [5b], she also represents the emotional state of the figure of herself with a gesture: her right hand, with her index finger extended, moves up from the top right of her gesture space to the bottom left, like a whip, bringing to life the stance of a frustrated negotiator, who, as it were, “crosses out” the Hmong man’s suggested price.
In comparison to Transcript [1] above, the figure of the Hmong man in Transcript [2] is more elaborately composed. In Transcript [3], this gradual elaboration continues.
Transcript [3] begins immediately after the figure of Noy asks the Hmong man if he would make her a deal and lower the price of the brooms from ten kip to five (confusingly, Noy apparently forgot the price of brooms in her story and changed them from three to ten kip each). In [17], the figure of the Hmong man responds to the figure of Noy’s suggestion of price with a definitive no. The Hmong man’s figure emerges with a robust array of semiotic resources: first, Noy represents his drawn-out, hesitancy-indexing response cry “oh!” and then she represents his speech, saying “[I] can’t [sell at that price]” [17]. As she says this, she shakes her head [17], iconically paralleling the negation in the Hmong figure’s words and producing an image of his head, moving back and forth in the hypothetical market.
After line [17], the narrator’s voice enters again for a moment and Noy clarifies that this is the Hmong man that she is quoting through two semi-redundant verba dicendi [18a-18b].Footnote 25 These verba dicendi reintroduce the figure of the Hmong man [18-18a], who repeats that “no, [he] will not sell his brooms for five kip.” Noy voices him twice in succession. As she does so, she uses first-person pronouns [19a-19b: haw2], and character viewpoint co-speech gestures—i.e., as she says one broom, she raises her left hand with one index finger extended, paralleling the number one in the figure of the Hmong man’s words [19b-19c].
In comparison to Transcript [1] above, Noy’s composition of the figure of the Hmong man is at something of a high resolution here: the response cry [17], the first-person pronouns [19a-19b], the head-shaking [17], and the raised finger [19b-19c] cross-modally voice the figure of the Hmong man and together comprise an elaborate semiotic architecture. In Transcript [4], she adds yet another layer to this figure composition by enacting and then inhabiting a “Hmong accent,” comprised of altered segmental and suprasegmental forms.
The crucial moment here is the segmental and suprasegmental disjuncture between lines [27] and [29]. In line [27], Noy voices the figure of the Hmong man much as she had done before,Footnote 26 with her head movement again paralleling the negation in the figure’s speech. In contrast, in [29], Noy makes segmental alterations, and her stress and pitch shift and her vocal cords creak [Figure 3]. As Noy shifts her voice, she moves her head and her hands to emphasize these shifts, bobbing them up and down, paralleling the oscillations in the sound from her mouth [Figures [4] and [5]]. After this display, Noy explains what she had just done with her demonstration: “They [i.e., Hmong people] would speak Lao incorrectly” [30]. As she says this, she moves her hand to her mouth and out again, creating a physical image of the sounds of Hmong speech.
By line [29], Noy puts the form of her language on display, highlighting the texture of the imagined Hmong man’s voice and emphasizing the palpability of his language. She does this through juxtaposing line [27] with line [29]—which have very similar lexical content, but quite different figure compositions. Following Roman Jakobson’s classic discussion of the poetic function, this is a moment where the hierarchy of linguistic functions has been reordered. Whereas the referential function is most dominant in [27], in [29] the poetic function—characterized by an orientation towards the form of language (or, in Jakobson’s vocabulary, the message)—is thrust into focus (Jakobson Reference Jakobson1956; Jakobson Reference Jakobson1960). To play on the title of Dell Hymes’s (Reference Hymes1981) classic paper, line [29] is a “breakthrough into poetics,” a moment where the contrastive individuation of the figure’s composition (Agha Reference Agha2005, 54) is made a thematic focus and brought to attention in and of itself.
This poetic breakthrough happens through both aural and visual modalities, which work in concert to stress the sound-shape of Noy’s language. Noy’s hand and her head move alongside her words, bouncing and emphasizing the rhythm and texture of her speech. In contrast to the beginning of [27], where Noy’s head movements emphasize the referential content of the Hmong figures negation, and thus form a part of the composition of that figure, by the end of line [27], Noy has already changed the primary function of her body’s movements. They are now mediators of attention (Streeck Reference Streeck2009): her head traces the final tonal contour of the last word of the line, thùn2. Noy’s corporeal poetics become still more exaggerated in line [29].
In Figures [4] and [5], I have mapped out the relationship between Noy’s speech and her bodily movements in line [29]. Figure [4], which was roughly traced from a still of the video and modified slightly to protect Noy’s identity, shows the axes on which Noy’s hands and head move. The small hands in Figure [5] likewise represent her alternately raised and lowered hand, with her pinky outstretched. The face on Figure [5] represent Noy’s up-and-down head movements. Notice that her head and her hand moved more or less in opposite directions, and that the correlation between the movements and the sound is inexact, both in the Figure and in the video. The movements happen in proportions of either 1:1 or 2:1 to the syllables and are relatively regular until the two final syllables khaat5 thùn2.
After line [29], when Noy characterizes the segmental and suprasegmental elements she has just performed—“They would speak Lao incorrectly,” line [30]—she thereby typifies her breakthrough into poetics as a contrastive enactment, in which putatively “correct” and “incorrect” pronunciation are juxtaposed cross-modally, and where the latter forms are explicitly tied, for me, the novice speaker, to both this specific represented Hmong figure in the story and Hmong people, generically. In this small moment, we see how the poetic underpinnings of figure compositions can be formed and circulate from person to person—here from Noy to me, a then novice Lao language learner. Such acts lay the groundwork for these formal features to later be presupposed in less obvious ways (Agha Reference Agha2005, 55) and used in figure compositions.
A High-Resolution Figure and a Second Story
In fact, throughout the rest of her story, Noy continues to fade in and out of the phonetic alteration demonstrated in her “breakthrough into poetics” as she voices the figure of the Hmong man. The “accent” is characterized by a slower rhythm, irregular creak, and lengthened and stressed syllables at the end of intonation units, as was the case in line [29].Footnote 28
With the addition of the phonetic element to the figure of the Hmong man’s voice, the composition of the figure is now in a relatively sustained high resolution: with a dense clustering of transposed deictics, character perspective gestures, and a differentiated phonetic form. As the story progresses, the two characters finally reach an agreement on the price of brooms: two brooms for eight kip. It is then that the figure of Noy tries to buy six brooms for 24 kip, following a kind of “generalized negotiation,” and the Hmong man emphatically rejects her offer. This is the denouement of the story, the point where the joke emerges: “Hmong people don’t get it.”
Notice the richness with which Noy composes the Hmong figure in lines [98-100b]. Every intonation unit is patterned with a co-speech gesture that represents the bodily movements and the content of the figure of the Hmong man. In addition, the segmentals and suprasegmentals of her speech are altered, contrasting starkly with her voice qua narrator.
After she tells the story, Noy explains to me that, until recently, she never believed its premise. Instead, she always thought that Lao people told it because they were racist and because they hated Hmong people. But when she bargained with a Hmong woman in an American market, to her surprise, she witnessed the stereotype come true.
As above, Noy tells this second story through represented discourse. The composition of the Hmong woman in this story, however, has lost the phonetic weight that the figure of the Hmong man had at the other story’s conclusion. In addition, Noy uses some co-speech gesture, but it is done with less vigor. The figure of the Hmong woman is less elaborate, less ornamented, in a lower resolution.
In this story, Noy bargains with the woman for bundles of lemongrass instead of brooms. The figure of the Hmong woman says that she will sell three bundles for two dollars. Noy agrees and takes nine bunches of lemongrass, planning to buy them for six dollars (following the logic of generalized negotiation). As she does this, her husband warns her that Hmong people do not bargain like that, and she tells him to quiet. “They’re in America already,” she says and begins to hand the Hmong woman six dollars. At this moment, the figure of the Hmong woman seems to break out from Noy’s body: “No, I won’t sell [at that price].” The voice of the Hmong woman is brought back to the resolution it had in the previous story, albeit at a slightly higher pitch to match the figure’s new gender. The final syllables of Noy’s speech are lengthened. There are elaborate co-speech gestures and exaggerated prosodic contours. The figure’s composition is, in comparison to the figure at the beginning of this second story, a collage of heterogenous elements.
It is no surprise that Noy’s high-resolution voice of the Hmong figures reappears at the same time that the supposedly illogical bargaining does. Her representation of the Hmong figures’ communication in that moment in both stories is hyper-contrastive with the voice of the figure of herself in the market and even more contrastive with her restrained narrative voice. This radical contrast between Noy’s Hmong figures and her other figures models the narrative arc of both of her stories. They are about how Hmong people are different: they talk differently, and they think differently; likewise, the figure of the Hmong man and the Hmong women become something different, a foil for the “reasonable” Lao person. After the joke, Noy explains: “Hmong people are literal people, if you agree on something, they really stick to it.”Footnote 29
Composition and Transparency Effects
The gradient resolution of a figure is just one cluster of composition effects common enough to label,Footnote 30 but analytics of transparency and composition help us distinguish a host of such effects. For instance, take the composition effect that Voloshinov (Reference Voloshinov1973, 134) called particularized direct discourse, representations in which “the traits the author used to define a character cast heavy shadows on his directly reported speech.” In these quotidian cases, the elements of a figure composition are taken to be notably characteristic of figures (see Goffman Reference Goffman1974, 534–536 on “mockeries and say-fors”). Narrators can use these elements to project those figures “into particular social roles by putting particular emblems into their mouths” (Wortham and Locher Reference Wortham1994, 11; see also Couper-Kuhlen Reference Couper-Kuhlen1999, 15; Couper-Kuhlen Reference Couper-Kuhlen2007, 119). In their analysis of Trump’s mockery of his opponents, Hall, Goldstein, and Ingram (Reference Hall2016, 85) capture good examples of this, where Trump’s compositions are constructed with elements that are metonymic of their targets: Hilary Clinton’s bookishness is alluded to by Trump’s representation of her face buried into a piece of paper, Mitt Romney’s boring seriousness is indexed by a stiff body, and “low energy” Jeb Bush is characterized with his hands folded under his cheek as if he were falling asleep.Footnote 31 These characterizations through composition often work in concert with characterizations made by means other than represented discourse, such as explicit descriptions (e.g., ‘he’s a loser’), but they are formally distinct from these other means, as they also afford different kinds of social and semiotic action. As Clark and Gerrig (Reference Clark1990, 793) put it, “Many things are easier to demonstrate than describe.”
Analytics of composition and transparency also allow us to disentangle composition effects from transparency effects. For instance, they point the way toward distinguishing and thus better understanding two kinds of ‘double voicing’ that have been discussed in the literature. One is compositional, where the composition is so clearly selected—so, for example, patently parodic (Goffman Reference Goffman1974, 537; Voloshinov Reference Voloshinov1973, 136–137) or emotionally motivated—that the embedding indexical ground—namely the indexicality of speakership associated with the animator of the utterance (or, perhaps, some other responsible entity; see Irvine Reference Irvine1993)—comes to the fore. The other is a transparency effect, where who exactly is speaking or acting is less clear, and where the question is often whether represented discourse is happening at all.
Sometimes this latter opacity of origo is by design. That is, at times speakers aim to muddy the waters so as to incorporate a figure’s style into their own language. In her description of novice Nepali Sign Language (NSL) learners studying visual depictions of new signs, Hoffmann-Dilloway (Reference Hoffmann-Dilloway2020, 127) shows that as signers become better at using these signs of NSL, they also seem to shift to “performing them in ways aimed at yielding identification with the portrayed figures of personhood” in the images from which they learned. But there are also cases of transparency based double-voicing where the line between performance and performer becomes porous in spite of efforts to portray the represented figure as distinct. How far, as Goffman (Reference Goffman1974, 539) put it, can something be mimicked “without the mimic becoming suspect?” How much taboo language can someone employ without becoming responsible for that taboo language in the first place? In the 2008 movie Tropic Thunder, in which Ben Stiller and Robert Downey Jr. both play actors playing actors, this issue comes to the foreground. In one scene, Ben Stiller’s character narrates how diving deeply into performing a mentally disabled person affected him— leaking into the way he brushed his teeth and the way he rode the bus, for instance. Robert Downey Jr.’s character responds that the role was a risky career move that may have dimmed Stiller’s character’s chances of winning an Oscar. “Everybody knows you never go full retard,” Downey Jr.’s character says. This scene, in a movie about making a movie, for which Downey Jr. himself was nominated by the Academy Awards for Best Supporting Actor, has itself become the subject of controversy on exactly these same lines. As Downey Jr.’s character, a White Australian actor, talks to Stiller’s character, he is in fact wearing blackface and speaking with altered suprasegmentals, syntax, and vocabulary—performing his role as a Black soldier. The most prominent comments on the YouTube video at the time of this writing referenced this blackface and reflected on whether it was fundamentally offensive or satirically funny, and how it should, should not, or might have led to Downey Jr.—the real, living actor—being “canceled” (for a related discussion, see Chun Reference Chun2004).Footnote 32 That this conversation is occurring highlights how some elements of figure compositions can be treated as unperformable for some classes of individuals even when done through represented discourse or alongside other metapragmatics efforts at containment (Irvine Reference Irvine2011). When an actor such as Downey Jr. puts on blackface to satirize a character using blackface, he is liable to be, in effect, construed as merging in responsibility with the repugnance of the character he is representing in the film. In this way, in contemporary American culture, blackface is treated as what Fleming (Reference Fleming2011) calls a ‘rigid performative,’ a form that keeps its effects no matter how people attempt to contextualize their uses of it.Footnote 33 In their rigidity, such performatives deny full transparency, at least insofar as responsibility is concerned.
As linguistic anthropologists have shown again and again, even language that is not marked-off as any kind of represented discourse often leaves open the question of which figure is speaking. It is this double-voiced dimension of much discourse that, as Bakhtin (Reference Bakhtin1981, 330) put it, “can never be exhausted … never extracted fully from the discourse—not by a rational, logical counting of the individual parts, nor by drawing distinctions between the various parts of a monologic unit of discourse (as happens in rhetoric), nor by a definite cut-off between the verbal exchanges of a finite dialogue, such as occurs in the theater.” It is also this same characteristic which gives Goffman’s (Reference Goffman1979) account of footing its expansiveness, as it involves not just cases of clearly demarcated represented discourse, but anything we say that keys different figures, participation frameworks, and production formats in ordinary discourse itself. Broadly, the subject of represented discourse shows us that everything has a bit of this capacity, that ordinary transpositions are an essential part of the fabric of normal discursive construction, whether anyone is quoting anyone else transparently or not.
Composition as Analytic
When one begins to consider figure transparency it can feel as if the floor comes out from under the notion of represented discourse entirely. Recognizing the pervasive heteroglossic opacity of speech foregrounds a fundamental instability as to who is speaking at any point, not just when someone is quoting another. But this uncertainty is core to semiotic processes generally, and the apparently solid floor of semiosis is always built on sand liable to shift. As many have shown, this instability comprises an especially interesting part of social life.
Figure composition is valuable in part because it offers a vocabulary for clarifying this instability, for specifying how a stretch of discourse might fail to meet a certain threshold of transparency. But the analysis of figure composition takes the interpretive instability of the indexical ground of represented discourse as a starting point, not its focus. Figure composition, as I sketch it here, is a tool for analyzing the form of represented discourse when this instability is less prominent, passing the threshold of transparency in some uncontroversial way. It is worth taking up because it allows us to think through how compositions of represented discourse effectuate social action in a manner that is tidier and more exacting than the dichotomy of ‘direct’ and ‘indirect’ report, because it enables us attend to the multi-modality of represented discourse as we encounter it, and because it offers a tool for exploring the general finding that the form and semiotic organization of represented discourse—not the mere fact that it has happened—is key for understanding represented discourse’s role in social action.