1. Introduction
An important feature of discourse is that it is structured in a hierarchical way: that is, more central relations hold between segments at higher levels of the discourse structure, and these higher-level segments can be further subdivided into dependent segments at the lower levels of discourse hierarchy (Grosz & Sidner, Reference Grosz and Sidner1986; Hobbs, Reference Hobbs1979; Mann & Thompson, Reference Mann and Thompson1988; Polanyi, Reference Polanyi1988; Van Kuppevelt, Reference Van Kuppevelt1995). A simple example, adapted from Moore and Pollack (Reference Moore and Pollack1992), can illustrate the point:
The short discourse in example (1) is part of a longer interaction in which S expresses the wish for the addressee to come home earlier. Consequently, coming home by 5:00 in (a) is the central and most important utterance in that discourse. Coming home earlier in (a) is motivated by going together to the hardware store in (b), where the latter is in turn motivated by the opportunity to finish the bookshelves in (c), which includes painting them (d).
The discourse relations can be left implicit, and then they are simply inferred from the ordering of the clauses in the text, as in examples (2) and (3), cited from Tylor (2013, p. 101), where their reordering leads to the opposite temporal and causal interpretations:
In (2), we infer from the order of the sentences that John’s banging of his head happened before his falling over and that the former event is likely to be the cause of the latter. In contrast, in (3), we infer that John first fell over and then banged his head, so that the fall could have led to the bang on his head.
In other cases, coherence relations can be made explicit by discourse markers or adverbials. In (4), as well as in (1) above, with the addition of the explicit marker then, the temporal relationship between the sentences becomes explicit. And if, instead, the temporal adverbial earlier was used, as in (5), the default temporal relationship would be reversed:
Examples (4) and (5) show that the conceptual or mental discourse hierarchy can be marked overtly (Cohen, Reference Cohen1984; Hirschberg & Litman, Reference Hirschberg and Litman1993; Hobbs, Reference Hobbs1979; Matthiessen & Thompson, Reference Matthiessen and Thompson1987; Reichman, Reference Reichman1985). Examples (2) and (3) show that the discourse hierarchy and its marking are not fully isomorphic; that is, the relations exist conceptually, but they are often inferred and not overt. The examples above show relations between two sentences only. We will see in the rest of this study that a discourse is typically divided into several different levels, going from broad or central topics of the discourse, down to narrower and narrower kinds of supplementary information, all arranged in a hierarchy. Levels can be covertly related to one another, as in examples (2) and (3), or overtly marked as related, as in examples (4) and (5). As we will see below, there are different ways to overtly mark the levels of a discourse hierarchy, ranging from lexical to prosodic to gestural.
Conveying and understanding the hierarchical relations in a discourse is pivotal for successful communication (e.g., Singer, Reference Singer1990). However, little is known about the existence and marking of a discourse hierarchy in the early formation of a language, nor do we know whether the mental conception and the overt marking of discourse structure necessarily emerge in tandem.
In this paper, we ask three questions: (1) Is there a conceptual discourse hierarchy in the earliest stages of language emergence? (2) Are there identifiable overt cues to this hierarchy from the outset? (3) Can we trace the emergence of mapping between conceptual discourse structure and overt marking? In other words, are the conceptual discourse hierarchy and its overt manifestations inextricably intertwined in the very first stages? Here we probe these questions by examining a young, visual language, Israeli Sign Language (ISL).
The ideal, and, in fact, the only, empirical evidence we have for approaching these questions comes from sign languages,Footnote 1 for two reasons. First, sign languages can emerge at any time, providing a natural laboratory for investigating the course of language development in a community, and facilitating diachronic tracing of linguistic complexity (Dachkovsky et al., Reference Dachkovsky, Stamp and Sandler2018; Meir & Sandler, Reference Meir, Sandler, Doron, Rappaport, Horav, Reshef and Taube2020; Sandler, Reference Sandler2018; Sandler et al., Reference Sandler, Meir, Padden and Aronoff2005, Reference Sandler, Meir, Dachkovsky, Padden and Aronoff2011; Senghas & Coppola, Reference Senghas and Coppola2001). Second, the visible connection between linguistic markers and their linguistic functions in sign languages makes the tracing of linguistic complexity more transparent, since there is often a direct correspondence between particular articulations of the hands, face, head and torso, and linguistic functions (Baker-Shenk & Padden, Reference Baker-Shenk and Padden1978; Fenlon et al., Reference Fenlon, Denmark, Campbell and Woll2007; Liddell, Reference Liddell1980; Nespor & Sandler, Reference Nespor and Sandler1999; Nicodemus, Reference Nicodemus2007). Together, these characteristics of sign languages provide a transparent view of the emergence of language complexity (Dachkovsky et al., Reference Dachkovsky, Stamp and Sandler2018; Kocab et al., Reference Kocab, Pyers and Senghas2015; Sandler et al., Reference Sandler, Meir, Dachkovsky, Padden and Aronoff2011; Sandler, Reference Sandler2012, Reference Sandler2018).
Our study builds on earlier findings based on spoken languages that point to a motivated mapping between conceptual hierarchy and overt structural cues in discourse (Section 2). For example, in spoken languages, higher discourse levels with more central information are distinguished from lower levels by longer pauses and more prosodic cues (Price et al., Reference Price, Ostendorf, Shattuck-Hufnagel and Fong1991; Silverman, Reference Silverman1987). In Section 3, we turn to discourse structuring in signed languages. Research on the discourse hierarchy in signed languages is still in its infancy. An exception is a single study of American Sign Language (ASL) narratives by Gee & Shepard-Kegl, Reference Gee and Shepard-Kegl1983, which finds a correspondence between levels of discourse hierarchy and the length of pauses in ASL narratives (Section 3.1). The intricate interaction between conceptual and structural complexity in the process of language emergence has been addressed in research on emerging sign languages, yet mostly in utterance-level analyses and not discourse-level analyses (see Section 3.2).
The work we report her is motivated by studies of different young sign languages in Israel, which demonstrate the transparent relation between bodily articulations and linguistic structure, and show that each successive generation in a young sign language recruits additional articulators to mark linguistic functions (Sandler, Reference Sandler2012; Sandler et al., Reference Sandler, Meir, Dachkovsky, Padden and Aronoff2011). We found that different parts of the body come to be used for more specialized and refined functions over time (Dachkovsky, Reference Dachkovsky2018; Dachkovsky et al., Reference Dachkovsky, Stamp and Sandler2018). An interim summary discussing previous studies in sign languages and the predictions based on them is presented in Section 3.3.
Informed by findings from investigations into both spoken and signed languages, the present study for the first time combines two lines of linguistic research – discourse analysis and language emergence. Using the Apparent Time Construct (Labov, Reference Labov1963), we track the emergence of the mapping between conceptual levels of discourse hierarchy and particular linguistic features by comparing the structure of the narratives produced by different generations of signers of a young sign language: Israeli Sign Language (ISL; Section 4). To this end, as the first stage of the analysis, we generate visual representations of conceptual hierarchies of 10 narratives produced by older and younger ISL signers in accordance with the Rhetorical Structure Theory (RST). We first determine the conceptual structure of the narratives, and then map these conceptual hierarchies to minutely coded and quantified prosodic bodily signals occurring at discourse boundaries.
The results of the study (Section 5) and the discussion (Section 6) demonstrate two phenomena. First, our RST analysis shows that the conceptual hierarchy of discourse structure is equally deep and complex in both older and younger signers’ narratives. This implies that signers conceive of the same complex layering of information in early and later stages of language emergence. However, our methodology reveals that overt linguistic marking, which explicitly distinguishes the discourse levels, comes later in the development of the language, as revealed by the younger signers’ data. Although the narratives of older signers have equally deep and complex conceptual hierarchies to those of younger signers, the older signers’ narratives display few distinct and systematic signals distinguishing the levels of this discourse hierarchy. Such systematic signaling of discourse levels is found in the data of the younger signers, revealing the diachronic development of overt discourse marking in language emergence. Section 7 concludes by explicitly extending the claim that the body is a direct map of linguistic organization in sign languages (Sandler Reference Sandler2012, Reference Sandler2018) to the level of discourse. The body provides the affordances of its multiple and varied articulators for the visible construal of hierarchical narrative structure.
2. Discourse organization: The mapping between conceptual hierarchy and multi-modal cues in spoken and signed languages
Discourse, like other levels of linguistic organization, represents an interface between conceptual (mental) representation and its overt manifestation. This idea is not new, of course. It goes back to the dichotomy between internal coherence and overt cohesion (e.g., Halliday & Hasan, Reference Halliday and Hasan1976), and will be further elaborated in Section 2.1. Here we address each facet of discourse organization in turn.
2.1. Conceptual hierarchy in discourse
When we communicate, we segment the continuous flow of our perception of an experience into propositions, in what Chafe describes as the ‘island-like quality of verbalized experience’ (Chafe, Reference Chafe1994, p. 202). Yet, a well-formed discourse is more than just a series of well-formed propositions – it is hierarchically structured (Grosz & Sidner, Reference Grosz and Sidner1986; Hobbs, Reference Hobbs1979; Mann & Thompson, Reference Mann and Thompson1988; Polanyi, Reference Polanyi1988; Van Kuppevelt, Reference Van Kuppevelt1995). This is most clearly exemplified when we consider a story without hierarchical structure that lacks any sense of coherence, such as the following example from Tomlin (Reference Tomlin, Pederson and Nuyts1997, p. 90), entitled ‘Toothache’:
This contrived text has a certain kind of mechanistic continuity: the sentences relate to each other one by one, by referring to the subject from the preceding sentence. Yet, this discourse unit as a whole does not exhibit any overall conceptual hierarchy, lacking what Grice calls a conceptual ‘mutually accepted direction’ (Grice, Reference Grice, Cole and Morgan1975, p. 45).
The conceptual organization of a text and the coherence between the sentences can be represented as a hierarchical structure, which has several dimensions depending on the type of discourse. In narration, the basis of the present study, temporal relations are one of the most important types of connectivity, because in the course of a narrative we usually reconstruct the chronological sequence of events. Apart from temporal relations, other types of coherence relations may connect individual units of various sizes to the overall structure and to one another (e.g., Mann & Thompson, Reference Mann and Thompson1988). For example, a proposition or a set of propositions can be a justification, a cause, or an elaboration in relation to another unit. Moreover, every narrative is organized around some main point, a “reportable” event, often a dramatic or unusual experience that the narrator has encountered in life and either resolved or did not resolve (Grice, Reference Grice, Cole and Morgan1975; Labov, Reference Labov1997). This means that some discourse units can contain more important information (e.g., express a solution to a problem, or provide a summary of the preceding exposition) than others, which may simply be an elaboration or clarification (cf. Chafe, Reference Chafe, Tannen, Schiffrin and Hamilton2001; Labov, Reference Labov1997; Mann & Thompson, Reference Mann and Thompson1988; Van Dijk, Reference Van Dijk1980, inter alia). That higher level sentences contain more important information than those at lower levels was demonstrated by Singer (Reference Singer1990), who showed that sentences at high positions in the hierarchy are better recalled than sentences at low positions.
Most models in the field of discourse studies represent hierarchical discourse structures graphically as fully connected trees with branches, the end nodes of which are individual propositions. Examples are Story Grammar (Thorndyke, Reference Thorndyke1977), intention-based analyses (Grosz & Sidner, Reference Grosz and Sidner1986), the Linguistic Discourse Model (Polanyi & Scha, Reference Polanyi and Scha1983), Structured Discourse Representation Theory (Asher, Reference Asher and Asher1993), and the type of analysis that we adopt here, Rhetorical Structure Theory (RST, Mann & Thompson, Reference Mann and Thompson1988). These analyses generate fully connected trees representing both the hierarchically organized structure of a text and the labeled relations between the branches of the tree. In the current study, we examine the mapping between the levels of discourse hierarchy and linguistic marking in sign language narratives,Footnote 2 and utilize RST as a tool for discourse parsing. This is the first time that RST is applied to sign language, an important step in exploring and comparing applications of RST in languages conveyed in both physical channels.
Rhetorical Structure Theory addresses text organization by means of relations that hold between parts of text, as represented by Fig. 1 adapted from Moore and Pollack (Reference Moore and Pollack1992). Most of the RST relations are binary and hypotactic, such that they consist of two parts where one is considered to be more central (the “nucleus”), and the other more peripheral (the “satellite”). Some RST relations are multinuclear. RST posits 25 rhetorical relations, for example, background, motivation, sequence, contrast, circumstance, elaboration, summary, solutionhood, and so forth. For a complete list of relations and description of the analytic tool, see Mann and Thompson (Reference Mann and Thompson1988).
Fig. 1 is a visual schema of the RST hierarchy for the short discourse presented in example (1). In the schema, the two parts of this discourse are represented by arrowed arcs, like the motivation relations in Fig. 1. Vertical lines signal the nucleus of each binary relation, whereas satellite segments are not marked by vertical lines. In example (1) repeated in Fig. 1, coming home by 5:00 is the nucleus of the entire discourse (A): it is the action S wishes the addressee to perform. Therefore, it appears at the highest level of the hierarchy. The satellite B going to the store motivates the nucleus, whereas B in turn is jointly motivated by C and D – finishing bookshelves and painting them. They appear at the lower discourse level. Segments C and D stand in a multinuclear joint relation, signaled by arrows -- they coordinate structures of equal importance Fig. 1.
-
(A) Come home by 5:00.
-
(B) Then we can go to the hardware store before it closes.
-
(C) That way we can finish the bookshelves tonight
-
(D) and paint them.
The theory relies on conceptual relations among text units and is therefore independent of linguistic form, preventing circularity in the analysis of the conceptual relations and corresponding forms. The RST tool is applicable to any text, limited neither by size nor by content. Although it was originally developed for automatic text generation, RST has proven to be a reliable tool, successfully adopted for a variety of different goals, ranging from linguistic text analyses (Cui, Reference Cui1985; Kumpf, Reference Kumpf1986) to computational applications in language generation (Chakrabarty et al., Reference Chakrabarty, Hidey, Muresan, Mckeown and Hwang2020; den Ouden, Reference den Ouden2004; Fox, Reference Fox1987). The methodology of the RST-based parsing procedure applied to the signed texts in the present study is outlined in Section 4. First, in Section 2.2, we review the mapping between the discourse hierarchy and overt cues signaling hierarchical relations in discourse of spoken language.
2.2. Overt cues of discourse structure
The conceptual discourse hierarchy is often overtly cued by prosodic signals, discourse markers or conjunctions, as well as non-linguistic cues, like gestures (e.g., Sanders & Spooren, Reference Sanders, Spooren, Sanders, Schilperoord and Spooren2001). Such cues act as operating instructions, explicitly relating the content of connected segments in a specific type of relationship (Ducrot, Reference Ducrot and Parret1980; Lang, Reference Lang1984).
Although explicit connective signals are not obligatory, as illustrated in examples (2) and (3), there is ample psycholinguistic evidence showing their relevance as processing instructors (Cozijn, Reference Cozijn2000; Millis & Just, Reference Millis and Just1994; Noordman & Vonk, Reference Noordman and Vonk1998; Sanders & Noordman, Reference Sanders and Noordman2000). Explicit marking of coherence relations improves discourse processing and mental text representation, as shown by better recall performance, faster and more accurate responses to prompted recall tasks, faster responses to verification tasks, and better answers on comprehension questions (see Sanders & Noordman, Reference Sanders and Noordman2000; Zwaan & Rapp, Reference Zwaan and Rapp2006 for comprehensive overviews). Since the importance of discourse connectives provides not only the foundation, but also the motivation for the present study, we review the range of their functions below.
There is a growing body of discourse-oriented research that focuses on prosodic signals (den Ouden et al., Reference den Ouden, Noordman and Terken2009; Hirschberg & Grosz, Reference Hirschberg and Grosz1992; Lehiste, Reference Lehiste, Cohen and Nooteboom1975; Lin & Fon, Reference Lin and Fon2009). In these analyses, basic prosodic units called intonational phrasesFootnote 3 roughly correspond to thought units, according to work by Chafe (Reference Chafe, Haiman and Thomspon1988) and Du Bois (Reference Du Bois1985). Consequently, groupings of prosodic units and intonational links between them reflect the corresponding conceptual groupings of discourse segments, and prosody plays an important role in our analysis.
Numerous cross-linguistic studies point toward a motivated relationship between prosody and discourse structure: Transitions between bigger/more central segments are characterized by longer/stronger prosodic signals, and segments with less important information are marked by shorter/weaker signals. The consistent finding in these studies is that breaks at higher levels of discourse correlate with longer pauses (Couper-Kuhlen, Reference Couper-Kuhlen1996; Swerts, Reference Swerts1998; Swerts et al., Reference Swerts, Wichmann and Beun1996; Wichmann, Reference Wichmann2016, inter alia) and with a significant pitch change – lowered pitch before the break and pitch reset after the break (Mayer et al., Reference Mayer, Jasinskaja and Kölsch2006; Price et al., Reference Price, Ostendorf, Shattuck-Hufnagel and Fong1991; Silverman, Reference Silverman1987, for an overview). Conversely, the absence of a prominent intonational change (e.g., lower pitch at the onset of an utterance) usually signals that a discourse segment is part of a larger unit, for example, a reformulation of what has just been said (Chafe, Reference Chafe1994; McNeill et al., Reference McNeill, Quek, McCullough, Duncan, Furuyama, Bryll, Furuyama and Ansari2001; Swerts, Reference Swerts1998). In addition, there is a clear relation between the discourse boundary level and the number of prosodic cues: the general trend is that larger discourse boundaries are accompanied by more cues (e.g., de Pijper & Sanderman, Reference de Pijper and Sanderman1994).
In addition to prosody, lexical discourse markersFootnote 4 and expressions – such as now, well, after all and actually – are among the most common cues of discourse relations (see Maschler & Schiffrin, Reference Maschler, Schiffrin, Tannen, Hamilton and Schiffrin2015, for an overview). For instance, discourse markers such as now or by the way tend to introduce larger rather than smaller discourse units (Horne et al., Reference Horne, Hansson, Bruce, Frid and Filipsson2001). In contrast, the lexical discourse marker, after all in example (7), contributes to lower-level discourse coherence, specifically, a justification for the belief that Moby the dog’s behavior is only temporary.
While early work on discourse structure focuses mostly on linguistic cues of the vocal tract, non-vocalic signals – hand gestures, facial expressions and head movements – play an important role in communication as well (e.g., McClave, Reference McClave2000; McNeill et al., Reference McNeill, Quek, McCullough, Duncan, Furuyama, Bryll, Furuyama and Ansari2001; Perniss, Reference Perniss2018). As early as the 1970s, in his discussion of the use of the body for communication, Kendon (Reference Kendon, Siegman and Pope1972, p. 204) proposed that body movements should be viewed as a hierarchy of articulators exactly parallel to the conceptualized hierarchy of “text units”. His groundbreaking hypothesis, validated later by Cassell et al. (Reference Cassell, Nakano, Bickmore, Sidner and Rich2001), relates the two hierarchies in a more direct and motivated way. First of all, the size of the discourse unit is claimed to correspond to the size of the articulator: higher-level boundaries are marked by body shifts, while head movements and hand gestures separate smaller discourse units (p. 205). Secondly, Kendon proposed that the number of articulators is also correlated with the discourse hierarchy; for example, the highest discourse levels tend to be marked by a change in the position of all the body articulators (Kendon, Reference Kendon2012). The rationale for the motivated relationship between bodily articulators and discourse hierarchy can be rooted in the physiological characteristics of different articulators. The tissues and muscles located in the torso are thicker than the tissues and muscles of the neck and, obviously, thicker than facial tissues (e.g., Baker-Shenk, Reference Baker-Shenk1983; Prendergast, Reference Prendergast, Shiffman and Di Giuseppe2013). Consequently, the muscles of the face are the fastest to activate but the least suited to prolonged activation, whereas the opposite is true of torso movements and postures – they can spread over longer stretches of discourse.
In sum, the discourse hierarchy in spoken language is often overtly cued by both linguistically organized cues (lexical and prosodic) and by the more idiosyncratic, less systematic cues conveyed by bodily gestures and body movements. The motivated correlation between the conceptual discourse hierarchy and its overt signals is relative rather than absolute (den Ouden, Reference den Ouden2004). All the signals available for communication constitute an interdependent, neatly orchestrated system (Cassell et al., Reference Cassell, Nakano, Bickmore, Sidner and Rich2001; Kendon, Reference Kendon, Siegman and Pope1972; McNeill et al., Reference McNeill, Quek, McCullough, Duncan, Furuyama, Bryll, Furuyama and Ansari2001; Sandler, Reference Sandler2022). However, the principles which guide the emergence and operation of this natural orchestra are still not documented in a population, nor are the nature of its mapping to discourse. These issues can be resolved by examining a communicative system in which the articulations belong to the visual modality, one in which all the articulators are directly observable, and at the same time display a higher degree of flexibility and independence than the intricate articulators of the vocal tract. Such a system is represented by natural sign languages.
3. Sign languages: Visible discourse structuring and its emergence
Unlike spoken languages, which rely mostly on auditory signals, and some visual cues for various communicative functions, signed languages rely solely on visual signals, produced by a wide range of visibly perceivable articulations of the hands, face, head and torso. Signed languages mold some of the visual signals, which optionally and idiosyncratically accompany spoken languages, into systematic linguistic components (Pfau & Steinbach, Reference Pfau and Steinbach2006).
3.1. Signals of discourse structure in sign languages
There is a bifurcation between the roles of the hands and of other articulators in sign language structure (Dachkovsky & Sandler, Reference Dachkovsky and Sandler2009; Nespor & Sandler, Reference Nespor and Sandler1999; Sandler, Reference Sandler2010; Sandler et al., Reference Sandler, Lillo-Martin, Dachkovsky, Quadros, Gussenhoven and Chen2020). Hands not only convey lexical information in sign languages, but fulfill another important role: changes in timing delineate boundaries of prosodic units (Nespor & Sandler, Reference Nespor and Sandler1999). Specifically, according to Nespor and Sandler (Reference Nespor and Sandler1999) and subsequent research, in Israeli Sign Language (ISL), the final sign in an intonational unit is marked by lengthening, in one of five ways (see Fig. 2): larger signs (e.g., BAKE), longer duration (e.g., TASTY), reduplication (e.g., the indexing sign, IX), and in some cases by a hold (the sign is held in its final location) or a pause (a complete relaxation of the hands).
Movements of the face, head and torso contribute to discourse organization by aligning temporally with prosodic boundaries. So, in Fig. 2, the entire first prosodic unit is marked by raised eyebrows, squinted eyes, and forward head movement. Research across sign languages indicates that the phrasal boundary is also commonly marked by eyeblinks (for ASL, Baker-Shenk & Padden, Reference Baker-Shenk and Padden1978; Wilbur, Reference Wilbur1994; for ISL, Nespor & Sandler, Reference Nespor and Sandler1999) and a contrastive change in the head position (Dachkovsky, Reference Dachkovsky2018; Nespor & Sandler, Reference Nespor and Sandler1999) and torso position (Crasborn & Kooij, Reference Crasborn and Kooij2013; Puupponen, Reference Puupponen2018; Sandler, Reference Sandler2018).
In addition to demarcating prosodic boundaries, facial expressions, head and torso movements also signal key grammatical functions in sign languages. Particular configurations accompany yes/no and content questions, information structure categories, as well as topic-comment and other complex relations (see Pfau & Quer, Reference Pfau, Quer and Brentari2010 for an overview). We adopt the position supported by earlier research that facial expressions as well as head movements of this kind are akin to intonation in spoken language (Dachkovsky & Sandler, Reference Dachkovsky and Sandler2009; Nespor & Sandler, Reference Nespor and Sandler1999; Sandler & Lillo-Martin, Reference Sandler and Lillo-Martin2006; Reilly & McIntire, Reference Reilly and McIntire1991).
Such prosodic signals not only demarcate the boundaries at the level of individual sentences, but also contribute to the parsing of connected discourse across sign languages (Brentari & Crossley, Reference Brentari and Crossley2002; Fenlon et al., Reference Fenlon, Denmark, Campbell and Woll2007; Nicodemus, Reference Nicodemus2007).Footnote 6 For example, Nicodemus (Reference Nicodemus2007) found that in ASL, cues involving larger articulators, such as hand clasps and body leans, were most frequent at boundaries in ASL interpreted lectures. Similar findings were reported in British Sign Language and Swedish Sign Language: signed narratives which contain a larger number of prosodic cues (e.g., dropped hands and holds), as well as a change in head positions, facilitate the detection of boundaries for signers and non-signers alike (Fenlon et al., Reference Fenlon, Denmark, Campbell and Woll2007).
One early study considered how linguistic signals correlate with the discourse hierarchy in sign languages. Gee and Shepard-Kegl (Reference Gee and Shepard-Kegl1983) found a strong correlation between pause length and the narrative hierarchy in stories signed in ASL: longer pauses appeared at higher-level boundaries, while shorter pauses or holds tended to occur at lower levels (e.g., clause boundaries). The study suggests that, as in spoken languages, prosodic cues are clear signals of discourse hierarchy in signed languages. Yet, the analysis of discourse cues in that study was limited to pauses alone. As shown above, numerous articulators are involved in the production of linguistic signals in signed languages, and one study suggests that some of the bodily articulations explicitly mark discourse structure as a language emerges (Sandler, Reference Sandler2012). With this in mind, the present study aims to rigorously investigate the degree of overlap between discourse hierarchy and particular visual cues in the emergence of a young sign language, ISL.
3.2. Young sign languages: A natural laboratory for investigating language emergence
As explained in the introduction, there are two major advantages to studying sign languages in the context of narrative structure emergence. One is the correspondence between visible physical articulations and linguistic structures in sign languages, and the other is their youth. Due to these unique characteristics, linguists are able to observe the emergence of linguistic structure and complexity (Dachkovsky, Reference Dachkovsky2018; Dachkovsky et al., Reference Dachkovsky, Stamp and Sandler2018; Sandler, Reference Sandler2012). The burgeoning field of young and emerging sign languages has opened an even broader window into understanding how language arises (see Sandler et al., Reference Sandler, Aronoff and Padden2022).
Senghas (Reference Senghas1995) and Senghas et al. (Reference Senghas, Coppola, Newport, Supalla, Hughes, Greenhill and Hughes1997) were pioneers in this field. They claimed that there is evidence of rapid language development and change between cohorts of children in a deaf school in Nicaragua. Members of the first (older) cohort arrived at the school with no language model and with whatever home signing system they had developed with their hearing families. The second (younger) cohort at the school had the advantage of the older cohort as a language model. Among other studies on this language, the researchers examined the emergence of a particular discourse signal in Nicaraguan Sign Language (NSL) – referential shift devices – which are used to shift the perspective of the discourse. The researchers found a higher frequency in the use of spatial devices (e.g., indexical points to space, body shifts and spatially modulated signs), by the second-cohort signers (Kocab et al., Reference Kocab, Pyers and Senghas2015). Similarly, in her study of word order in a recently discovered sign language – Central Taurus Sign Language (CTSL), in Southern Turkey, Ergin (Reference Ergin2017) demonstrated that specified use of body articulators (‘body segmentation’) in signaling reciprocal argument relations in a sentence is more characteristic of the younger signers’ production.
Research on young sign languages in Israel has revealed and emphasized the role of the body in the earliest stages of language emergence, by tracking the gradual recruitment of articulators for signaling particular linguistic structures. This direction began with the study of Al-Sayyid Bedouin Sign Language (ABSL), a language that first emerged in an insular Bedouin village in the Negev desert in Israel about 90 years ago (see Sandler et al. Reference Sandler, Meir, Padden and Aronoff2005, Aronoff et al., Reference Aronoff, Meir, Padden and Sandler2008; Sandler et al., Reference Sandler, Aronoff, Padden, Meir, Sidnell, Enfield and Kockelman2014 for overviews). Sandler (Reference Sandler2012, Reference Sandler2013, Reference Sandler2018) proposed a model showing how the step-by-step recruitment of different articulators – the hands, face, head, torso, and nondominant hand – created an increasingly complex linguistic system over time in ABSL, by comparing a narrative of a first-generation ABSL signer to narratives produced by individuals in later generations of signers.
Specifically, the study found that the older signer used his hands for signs (words), but tended to use the rest of his body in a more holistic and mimetic way.Footnote 7 In contrast, second-generation signers added head movements to delineate prosodic groupings, such as parenthetical information, while the third-generation signer in the study added torso shifts for different referents and different topics in the discourse, and use of the nondominant hand as a discourse topic marker (Sandler, Reference Sandler2012). This small study was ground-breaking by linking bodily articulations to the emergence of linguistic structure. Furthermore, the data from a third-generation signer indicated that, as the language matured, discourse constituents were explicitly marked, such as different referents across sentences, and topic continuity. Here we pursue these insights rigorously and quantitatively, by adopting a specific model of discourse structure (i.e., RST) as well as minute coding of the narratives of 10 ISL signers in two age groups.
Although ISL is about the same age as ABSL, it was formed under different social circumstances (Meir et al., Reference Meir, Israel, Sandler, Padden and Aronoff2012; Meir & Sandler, Reference Meir, Sandler, Doron, Rappaport, Horav, Reshef and Taube2020). The consolidation of the Israeli deaf community began with the establishment of the first School for the Deaf in 1932 in Jerusalem. Immigrants from all over the world contributed their sign languages or home sign communication systemsFootnote 8 to the emergence and conventionalization of ISL. As a result, a community-wide sign language evolved, and today, ISL is used in a wide range of educational and social settings, and displays a high degree of structural complexity (Meir & Sandler, Reference Meir and Sandler2008).
By comparing different generations of ISL signers, Dachkovsky (Reference Dachkovsky2018) demonstrated that the head and face articulations that systematically mark relative clauses prosodically in contemporary ISL (see Fig. 2) emerge gradually. While older signers did not mark the relevant structures systematically, younger signers showed significant regularity in their temporal and intonational marking. Crucially, however, both the older and younger signers in Dachkovsky’s study demonstrated understanding of the function of relative clauses by successfully identifying otherwise identical referents distinguished only by different modifying information in a picture matching task. This implies that conceptual organization of communicative messages precedes linguistic organization, an implication that provides one of the motivations for the present study. Spoken language research indicates that prosodic marking varies at different levels of the discourse hierarchy, the second finding that motivates our investigation of the emergence of discourse hierarchy marking in a young language. The fact that sign languages tend to mark prosodic structure with bodily articulations provides an essential methodological tool.
3.3. Interim summary and predictions
To sum up, the following observations emerge from the work on spoken and signed languages outlined here. First, in both language modalities there is a motivated mapping between conceptual discourse hierarchy and the prominence (strength, size and number) of its overt signals (den Ouden, Reference den Ouden2004; Gee & Shepard-Kegl, Reference Gee and Shepard-Kegl1983). The second insight, based on the research of young sign languages, is that, although language can express complex concepts from the outset, the systematic correspondence between concept and linguistic marking takes a few generations to emerge and develop (Dachkovsky, Reference Dachkovsky2018; Dachkovsky et al., Reference Dachkovsky, Stamp and Sandler2018; Sandler, Reference Sandler2012). Moreover, bodily signals play an essential part in this process, as the rate and the order of their recruitment reflect the order of emergence of linguistic structure. Taken together, these findings lead to the following hypotheses:
-
(I) ISL narratives produced by both older and younger signers will be organized hierarchically at the conceptual level.
-
(II) Levels of the discourse hierarchy in ISL will be marked with distinct linguistic signals only in the younger signers:
-
(a) The level of the discourse hierarchy will be reflected in the size of the articulator: larger articulators for higher discourse levels and smaller articulators for lower discourse levels.
-
(b) Higher-level boundaries will be characterized by a greater number of linguistic cues than lower-level boundaries.
-
4. Methodology
The following section describes the methodology that we applied in order to quantify and map the conceptual discourse structure to its form, across two age groups, in a more objective and systematic way than in previous studies. To identify conceptual structure in our narratives, we adopted the Rhetorical Structure Theory (RST). Raw data sheets and statistical outputs are available on OSF (https://osf.io/dfuwn/).
4.1. Participants: Representatives of the ISL community
The narratives of 10 deaf ISL signers were analyzed in this study (age range: 18–68, mean age: 42 years, 6:9 male: female). We divided participants into two age groups, five younger (18–54) and five older (55+ years). Members of the older group have more varied backgrounds than the younger group. Some older signers were born outside of Israel, immigrating to Israel in their youth, and others arrived a bit later. We did not control for heterogeneity (i.e., variation in terms of age of emigration, education and literacy or country of birth), precisely because it is this sort of variation that characterizes the language of this age group, and which was the model for the younger generations. Younger signers in our dataset are more homogeneous than older signers, as they were all exposed to peer and adult models of ISL from a young age, and attended school in deaf education frameworks. Yet, all signers use ISL as their preferred language. All participants consented to their involvement in this project, and were compensated for their time. Filming took place at the University of Haifa in the Sign Language Research Lab.
4.2. Task procedure
Participants were asked to tell a personal life story to a deaf native ISL research assistant. Their narratives ranged in length from 3 to 40 min. We analyzed either the complete narrative if under 2 min, or we selected a complete story out of a larger narrative, lasting 2–5 min (on average 3 min per narrative). A total of 36 minutes of data was analyzed as part of this study. While spontaneous data like these are less controlled than elicited data, spontaneous data have the advantage of being natural and more ecologically valid than elicited data in terms of narrative structuring and the linguistic marking associated with them.
4.3. Coding and analysis of the conceptual and formational discourse organization in signed narratives
In order to determine whether the conceptual depth of the narratives corresponded to the number and size of bodily cues in signed narratives, we first identified the conceptual hierarchical relations of the stories in the dataset, using RST as an analytic tool (4.3.1). The resulting conceptual hierarchies of the stories served as a basis for determining overt marking of levels (4.3.2). Specifically, each conceptual boundary was coded for the number and types of manual and non-manual cues accompanying it.
4.3.1. Step 1: Segmentation of texts using RST
All 10 narratives were divided into text units based on RST (Mann & Thompson, Reference Mann and Thompson1988). As mentioned in Section 2.1, RST characterizes the conceptual structure of a text in terms of labeled relations that hold between parts of that text, resulting in a tree-like structure, where the leaves of the tree are the elementary discourse units (typically propositions). Translations of the signed narratives into written texts as well as glossed versions of the signed narratives were used as the basis for the RST analyses in our study (see Table 1 for an example of a translated text and Appendix A for an example of glossed text from our dataset). Both translations and glossed versions of the signed narratives were performed by skilled sign language interpreters. Whereas glossed versions of the narratives refer to sign-by-sign annotations of the signed stories, translations represent an interpretation of the overall meaning.
The RST analysis of the 10 texts took place in two stages: the segmentation into elementary discourse units and the assignment of RST relations, in that order. The assignment of RST relations to a text can be done in two ways: locally or holistically (Marcu, Reference Marcu1999; Vis et al., Reference Vis, Spooren and Sanders2010). The local approach is bottom-up, with the annotator building the tree structure in an incremental way. The holistic approach is top-down, with the annotator starting with the text as a whole, segmenting it into smaller units. Although many RST-based studies assign the coherence relations locally (e.g., den Ouden, Reference den Ouden2004), others opt for the holistic approach (Vis et al., Reference Vis, Spooren and Sanders2010). In the present case, the special characteristics of the signed narratives prompted our choice of the top-down approach. We found that it was not possible to understand these texts when reading gloss-by-gloss translations (see Appendix 2), since the sentence boundaries cannot be taken for granted in the signed modality. In addition, as explained above, function words, such as conjunctions, prepositions and other types of connectives, are less common in signed languages than in spoken languages, which makes the inference of the coherence relations more challenging at the local level. Therefore, in our study, it was necessary to read the entire glossed and translated texts to understand the signer’s intentions. We then selected the large textual segments holistically and subsequently analyzed these segments locally. Authors Dachkovsky and Stamp independently divided the stories into the largest conceptual units by finding boundaries in the glossed and translated texts, and establishing rhetorical relations holding between them. Fig. 3 illustrates the result of such a procedure implemented on the narrative of a younger signer, along with the text shown in Table 1.
As illustrated at the top of Fig. 3, the text is split into two large related units, connected by the background relation (see Table 1 for the whole text). In the text segment spanning lines 1–15, the signer describes her decision to study to become a veterinarian, and this segment serves as background for the remainder of the story in lines 16–54, in which the signer describes her experience and ultimate decision to cease her studies. Each segment in the background is split into smaller units, each dominated by a larger segment. For example, in the first segment of the background relation (lines 1–15) there are two smaller units related through the circumstance relation (lines 1–6 and 7–15, in Fig. 3). These text spans are decomposed further into smaller text spans, and rhetorical relations between them are labeled, until finally we reach the level of a proposition. Furthermore, the RST analysis ascribes distinct numeric scores to each level in the conceptual discourse hierarchy (on the left side of the figure), reflecting their information value for the text as a whole.
Inter-reliability agreement was checked between the two coders using kappa statistics. The method measures pairwise agreement among a set of coders who make category judgments, correcting for chance expected agreement. A kappa value greater than 0.8 reflects very high agreement and values between 0.6 and 0.8 reflect good agreement. More specifically, we checked reliability of the number of boundaries identified at each level of the hierarchy. The kappa value for this study was 0.88. Any disagreements were resolved after discussion. For coding of the types of cues at each boundary, two of the authors are Facial Action Coding System (FACS, Ekman & Friesen, Reference Ekman and Friesen2002) certified and 10% of the data was further checked by a native ISL signer for consistency.
To simplify the following discussion and analysis, we merged the range of levels into three broader categories. Levels 1 and 2 were combined as the higher levels, 3–5 as the intermediate levels and 6 and 7 as the lower levels. Three categories were chosen (i.e., high, intermediate, low) in order to test whether there is a linear decrease in articulations down the hierarchy.
4.3.2. Step 2: Coding and analysis of boundary cues
Once the RST analysis was completed, we turned to an analysis of articulations that corresponded to the conceptual discourse boundaries. As explained in Section 2.1, overlap between the conceptual hierarchy and the linguistic hierarchy is a research question in this study; therefore, it was important to analyze the two independently, to avoid circularity. In coding for visual cues, we examined each boundary, both in terms of changes that take place across the boundary (e.g., changes in facial expressions, head or torso movements) and cues that occur at the boundary (e.g., head nod, pause, blink), following the coding system used in previous studies (Dachkovsky et al., Reference Dachkovsky, Stamp and Sandler2018). The coded cues comprised five different cue categories (see Table 2 for an exhaustive list of cue types): three types of bodily articulators (the torso, head or face), as well as timing cues (e.g., slower signs, pauses in signing, including drops of the hands or a hand clasp) and lexical cues (discourse markers which express an interpretation of propositions, such as ‘then’, or ‘finish’). A common occurrence was the production of multiple cues by the same articulator; for example, the head can be tilted to the side and thrust forward, and part of our analysis was to examine whether multiple articulations of the same articulator, and not only multiple articulators, also correlate with the level of the hierarchy. For the head and torso, we coded based on four different and independent movement types: turn, tilt, forward/back and up/down. For the face, we coded for eye and eyebrow movement (see Table 2). In sign languages, these articulators and articulations are independent from one another and it is possible for co-occurrence in different ways. In previous studies, our team showed that individual articulators and individual movements (e.g., forward and back) can simultaneously signal separate functions in sign languages (Dachkovsky, Reference Dachkovsky2018; Dachkovsky et al., Reference Dachkovsky, Stamp and Sandler2018). We coded for explicit movements of the body and not for resets; for example, we coded an eyebrow raise if present at the boundary but not if the eyebrows lowered in order to return to neutral position, however, if the eyebrows lowered from neutral, this was coded. All coding was conducted using the video annotation software ELAN (Crasborn & Sloetjes, Reference Crasborn and Sloetjes2008). The raw data files and the statistical outputs are available on OSF (https://osf.io/dfuwn/); ELAN files are available upon request from the authors. In our analyses, we included all types of cues when we looked at the number of cues at each level; however, in order to examine how the size of the articulator is associated with the level of the discourse hierarchy, we only included face, head and torso cues (articulators representing different sizes).
In order to clarify our method of coding and analysis of formational cues, Fig. 4 presents an excerpt of our analysis, focusing on one discourse boundary – the boundary of circumstance (see the discussion of the circumstance relation in Section 5.2) at the second level of the discourse hierarchy. The second-level boundary between the two circumstance segments (lines 1–6 and 7–15) involves a visible change in the movements of the head (i.e., head thrust together with head turned left) and the torso (i.e., torso tilt right). In the next stage of the analysis, in order to examine the emergence of the mapping between the conceptual discourse hierarchy and its articulatory cues, we compared narratives produced by older and younger ISL signers – a comparison motivated by Labov’s Apparent Time Construct (Labov, Reference Labov1963; Sankoff, Reference Sankoff2006).
5. Results
Section 5.1 first compares the conceptual hierarchies of narratives produced by the two ISL age groups. Evidence for consistent and distinct linguistic marking of conceptual levels consists of the number of cues (Section 5.2) and types of cues (Section 5.3) associated with discourse boundaries.
5.1. Depth of conceptual hierarchy across age groups
Crucially, our results show no difference across the age groups in terms of the depth of the conceptual structure – the older and younger signers show equal depth of discourse hierarchy levels (see Fig. 5). All 10 narratives were checked for the degree of conceptual depth by segmenting the texts to the lowest point. And all 10 narratives, from older and younger signers, reached an equal level (level 7) in terms of conceptual depth.
5.2. Number of cues correlated with discourse boundary level
We used a linear mixed model analysis to examine the relationship between the number of cues, and age and the level (i.e., high, intermediate, low) as independent variables. Participant was included as a random intercept because of the dependency of the number of cues within the measures of each participant. A generalized linear mixed model program within Statistical Analysis System software version 9.4 was used to perform the statistical analysis.
In the first analysis, all 10 participants were included. The results are shown in Tables 3 and 4. Due to incompatibility with the normal distribution, a log transformation was applied to the total sum of cues. Age was not found to be significant. The interaction between level and age was tested but it was found to be insignificant. Level was found to be significant (F(2,16) = 28.31, p < 0.0001). A significant difference in the number of cues was found between high-level and low level (t(16) = 4.58, p = 0.0003), and between high-level and middle level (t(16) = 3.24, p = 0.0051). The specific types of such cues that cluster at each level will therefore be crucial to our hypotheses and analysis. Figure 6 shows the interaction between level and age for both age groups.
5.3. Types of cues correlated with discourse boundary level
Although the interaction between age and level was not significant for number of signals, we hypothesized that the trends in the distribution were different if we consider the type of cues as well. Therefore, we ran two separate analyses, one for each age group, to examine the relationship between the type of cues, the level and the number of boundary cues. We included level and cue type as independent factors. We selected three categories of cues: torso (largest cue type), head (middle), and face (smallest). A log-transformed value for the number of cues plus one was used as the dependent variable in the model.
The results are shown in Tables 5 and 6 below. For older signers, there was a significant effect of level (F(2,536) = 4.86, p < 0.001) and cue type (F(2,536) = 17.03, p < 0.001). No significant interaction was found between cue type and level (df(4,536, F = .31, p = 0.87). Multiple comparisons for the level effect using Tukey–Kramer adjustments show that there is a significant difference in the number of cues between the low and high levels (T(536) = −2.42, adjusted-p = 0.042) and between middle and high (T(536) = −3.12, adjusted-p = 0.006). No significant difference was found between the low and middle levels. The significant effect of cue type revealed that there was a significant difference between the number of torso and head cues (T(536) = 4.34, adjusted-p < 0.001) and between the number of torso and face cues (T(536) = 5.55, adjusted-p < 0.001). No significant difference was found between the number of head and face cues.
For younger signers, the picture was very different. We found a significant effect of level (F(2,599) = 21.3, p < 0.001), cue type (F(2,599) = 59.3, p < 0.001) and an interaction between level and cue type (F(4,599) = 2.54, p = 0.039). Multiple comparisons for the cue type effect using Tukey–Kramer adjustments show that there is a significant difference between the number of cues and between the low and high levels (T(599) = −6.45, adjusted-p < 0.001), low and middle levels (T(599) = −3.23, adjusted-p = 0.004), and middle and high levels (T(599) = −4.81, adjusted-p < 0.001). The number of head cues was significantly different from the number of torso cues (T(599) = 10.06, adjusted-p < 0.001), and the number of torso cues was significantly different from the number of face cues (T(599) = 8.64, adjusted-p < 0.001). No significant difference was found between the number of head cues and the number of face cues. Figure 7 shows the interaction between level and cue type in both age groups.
We found a significant interaction between level and cue type for the young group only, in that:
-
• For head cues, there are significant differences between high level and low level (T(599) = 3.61, adjusted-p = 0.001) and between high level and middle level (T(599) = 3.22, adjusted-p = 0.004). No significant difference was found for head cues between low and middle.
-
• For torso cues, there are significant differences between high level and low level (T(599) = 5.27, adjusted-p < 0.001) and between high level and middle level (T(599) = 4.57, adjusted-p < 0.001). No significant difference was found between low and middle for torso cues.
-
• For face cues, there are significant differences between low level and middle level (T(599) = 3.33, adjusted-p = 0.003) and between low level and high level (T(599) = 2.44, adjusted-p = 0.040). No significant difference was found between middle and high levels for face cues.
In conclusion, the interaction shows that for head and torso cues, high level differs from low-mid levels, while for face cues, low level differs from mid-high levels.
6. Discussion
Segmenting narratives according to the RST model revealed that the same depth in the conceptual hierarchy (i.e., similar number of levels) underlies the narratives produced by both age groups. That is, from the earliest stages in language emergence (i.e., older signers), signers conceive of levels of complex hierarchical relations in their narratives. Recall that higher levels represent more central material, while lower levels in some way enhance or modify the higher levels. We found that the distribution of distinct bodily articulations distinguished different levels in the hierarchy in the younger group, but not in the older group.
6.1. Hypothesis 1: Conceptual depth of the discourse structure
As shown in Section 5, we found no difference in the depth of conceptual relations across the age groups; all participants, young and old, produced narratives with equally deep conceptual structures. This implies that contemporary humans communicate measurably complex narratives, even before the linguistic apparatus for distinguishing complex hierarchies overtly kicks in.
This finding is compatible with other studies of the same population at the sentence level. In a study that examined the number of linguistically marked relations within sentences across age groups in ISL (described in Section 2; Dachkovsky et al., Reference Dachkovsky, Stamp and Sandler2018), the authors found fewer complex relations explicitly signaled by older signers than by younger signers. At the same time, there was no difference in the number of intonational breaks between phrases across age groups.
Chafe (Reference Chafe1984) and Du Bois (Reference Du Bois1985) propose that intonational phrases correspond to thought units. This suggests that the equal numbers of intonational phrases in the two age groups reflects an equal number of thought units in the sentences of older and younger signers. But Dachkovsky et al., (Reference Dachkovsky, Stamp and Sandler2018) demonstrated that only the younger signers have developed linguistic devices to express the relations among the intonational units within sentences. That is, both groups separated phrases with timing and bodily signals (intonation breaks), but the nature of the relations between the phrases was marked only by the younger group.
In the current study, we pursue this insight at a higher level of structure – the level of discourse – with a tool that defines levels of the discourse hierarchy and relations between them: Rhetorical Structure Theory (RST). We will show that older signers have the same depth of conceptual structure in discourse as younger signers, but that only the younger signers have the linguistic means for distinguishing the relations between the levels of the hierarchy.
6.2. Hypothesis 2: The articulatory strength is in direct relationship with boundary level
We hypothesized that the strength of the boundary marking will correlate with the level of the discourse boundary. In our interpretation of sign language, this means that the boundaries of higher levels of the hierarchy will be signaled with a larger articulator (Hypothesis 2A) and more articulations (Hypothesis 2B). The analysis of the data supports Hypothesis 2A for younger ISL signers only. Hypothesis 2B is supported for both age groups.
6.2.1. Hypothesis 2A: The size of articulators is correlated with boundary level
Torso movements are common to older and younger signers, but are restricted mainly to higher-level boundaries only for younger signers. We can illustrate this finding with two examples, taken from the narratives of one older and one younger signer. The younger signer’s example, provided in Fig. 8, demonstrates the difference between a higher- and a lower-level boundary. A higher-level boundary divided the two segments of a circumstance relation, where the younger signer specifies the circumstances which preceded her choice of extracurricular activity. This boundary is signaled with multiple cues, including a clear torso movement forward. In the same figure, we see a lower-level boundary between the two segments conjoined by the cause relation, in which the signer explains the reasons for choosing the veterinary course. The lower-level boundary is manifested by a change only in head and facial movements, without any torso movement. As a result, for this younger signer, torso movements are indeed the most reliable indicator of the difference between discourse levels.
Unlike younger signers, older signers exploit the use of the torso indiscriminately. This is illustrated by Fig. 9 extracted from the narrative, where an older signer explains about the time when he and his brother were at an immigration office during their immigration to Israel which he undertook as a child. At the highest level, the older signer produces a strong torso movement forward, dividing the narrative into its two largest segments, connected by the summary relation (lines 1–49, 50–56). The boundary is also signaled by a number of head movements as well as the lexical cue meaning ‘that’s it’. In the same figure, the presence of torso movement, however, is also seen at the lower-level boundary of a relation that connects a sequence of three events. In fact, in this example, various torso and head movements accompany each individual sign in the figure and depict the chronological order of the events – of an official turning to them and calling the signer’s name, then the official asking about his deafness, and the signer sitting next to his brother. In this example, the torso in the older signers’ narratives is activated for mimetic depiction of events in chronological order, and not for signaling hierarchical organization. It has been found in many studies that younger signers use their bodies less mimetically than older signers (Kegl et al., Reference Kegl, Senghas, Coppola, DeGraff and Rosenschein1999; Sandler et al., Reference Sandler, Meir, Dachkovsky, Padden and Aronoff2011; Stamp & Sandler, Reference Stamp and Sandler2021).
Thus, these findings are compatible with Sandler’s (Reference Sandler2012) study of ABSL (as discussed in Section 3.2), in which the torso is recruited for linguistic purposes (e.g., discourse-level functions) only later in language emergence (i.e., in the younger signers’ production). At the lower level in Fig. 9, the signer mimetically positions his body to represent the actual positions of the two referents in the story. These findings also tie in with the ABSL study and other studies which suggest that the body is utilized for pantomimic functions in early stages of language emergence (Kegl et al., Reference Kegl, Senghas, Coppola, DeGraff and Rosenschein1999; Sandler, Reference Sandler2012, 2013; Sandler et al., Reference Sandler, Meir, Dachkovsky, Padden and Aronoff2011). Furthermore, our result supports the claim from spoken language research that there is a motivated relationship between the strength of boundary signals (prosodic and non-linguistic cues) and the levels of discourse structure, as described in Section 2.2. In younger signers’ narratives, the size of the articulator reflects the level of the boundary; larger articulators (i.e., torso) are activated for signaling higher levels of discourse structure, and smaller articulators (activation of facial muscles) occur at prosodic boundaries within text segments. The older signers use the largest articulator, the torso, at all levels of the hierarchy.
6.2.2. Hypothesis 2B: Cumulative cues reflect the level of discourse boundaries
Following spoken and signed language studies noted in Sections 2 and 3, we predicted that higher-level boundaries will be accompanied by a larger number of cues, and lower level boundaries will be accompanied by fewer cues. The ‘cumulative cues’ hypothesis was supported by our data, but this time, for both age groups, that is, not as an emergent property. The narratives produced by both older and younger signers display a gradual and systematic decrease in the number of cues as the narrative is segmented into smaller and smaller units (Fig. 6). This means that the differences between discourse levels are predictable and systematic from the very beginning in terms of number of cues.
7. Conclusions and directions for future research
There are two essential findings of this study. One is that both older and younger ISL signers produced narratives characterized by hierarchical organization of equal conceptual depth, that is, equal complexity at the conceptual level. The second finding is that only the younger signers systematically mark boundaries by different articulators, according to the level of the discourse hierarchy. Previous research established the linguistic nature of these markers (see Section 3). Our results track the emergence of these markers in relation to the conceptual organization of narratives. We show that hierarchically structured discourse can reflect conceptual organization semantically and pragmatically without overt linguistic organization, and that linguistic organization emerges gradually over time. In sign languages, linguistic organization is signaled by systematic recruitment of articulations of different parts of the body. In other words, the conceptual hierarchy inherent in discourse precedes the systematic linguistic organization of signals, and this phenomenon is seen clearly in visual languages.
As the language matures and conventionalizes, narratives of successive age groups start to display more and more systematic and motivated mapping between discourse hierarchical structure. While the number of bodily cues increases with the discourse level for both age groups of ISL signers, the size of the articulator correlates with the discourse level only for younger signers. In other words, only younger participants imbue hierarchical relations across different discourse levels not only with distinct number of signals but with systematic patterns of distinct cues – a visible representation of this aspect of language emergence.
The changes we explored here do not represent a mere increase or decrease in the use of bodily signals for discourse purposes, but rather a change in the nature of the mappings between conceptual discourse organization and its overt linguistic expression. As we mentioned in Section 2.1, the conceptual organization of narratives involves an intricate interaction between a chronological representation of a story line and a hierarchy of coherence relations. While older signers’ narratives tend to accurately depict the chronology of events in a narrative, by ordering the events and occasionally introducing lexical discourse markers, later stages of language emergence seen in the younger signers’ mapping of linguistic cues produces a finer, more precise representation of relations in a discourse hierarchy as well.
This study has implications not only for sign language but for language generally. We have provided empirical evidence from a natural human language that it is possible to conceive of complex, hierarchically structured information before systematic linguistic marking arises. This does not mean that systematic linguistic structuring does not aid communication – it certainly does, implied by the more nuanced nature of the narratives of younger signers, exemplified above. There can be little doubt that the diachronic change implied by the present study enhances communicative efficiency in the language. However, this study shows that it is not necessary to wait for complex linguistic structuring to develop in order to conceive and convey complex messages. Future perception studies on the association between conceptual discourse levels and linguistic organization will shed additional light on the mapping between these two kinds of structuring, as well as upon the relation between intelligibility of the message and systematic overt linguistic form.
Supplementary Materials
To view supplementary material for this article, please visit http://doi.org/10.1017/langcog.2022.25.
Competing Interests
The authors declare none.
Data Availability Statement
The raw data files and the statistical outputs are available on OSF (https://osf.io/dfuwn/). The ELAN files are available upon direct request to the authors.
Appendix A. Example of glossed text (from a younger signer)