1. INTRODUCTION
The goal of this article is to provide a coherent picture of Swedish prosody from the prosodic word level to the intonation phrase level. We thoroughly review the prosodic phenomena that define the constituents of the prosodic hierarchy in Swedish. Much of the data below has been presented in previous work (Myrberg Reference Myrberg2010, Reference Myrberg2013a, Reference Myrberg2015a, Reference Myrbergb; Riad Reference Riad2012, Reference Riad2014; Myrberg & Riad Reference Myrberg, Riad, Asu and Lippus2013, forthcoming), but not as a single coherent and empirically oriented picture, which we provide here.
We base our discussion on assumptions that are generally accepted among researchers working on the prosodic hierarchy, such that our model can relatively easily be translated to theoretical approaches other than the one we apply here. Nevertheless, we will take a number of theoretical assumptions as a point of departure for our discussion.
First, our model for the prosodic hierarchy of Swedish (starting from the prosodic word and moving upward) contains three categories: the prosodic word (ω), the prosodic phrase (φ),Footnote 1 and the intonation phrase (ι).Footnote 2 Each of these prosodic categories holds a basic correspondence to some morphosyntactic category, where the ω is assumed to correspond to morphosyntactic words, the φ to syntactic phrases, and the ι to clauses (Selkirk Reference Selkirk and Kubozono2009, Reference Selkirk, Goldsmith, Riggle and Yu2011; Itô & Mester Reference Itô, Mester, Borowsky, Kawahara, Shinya and Sugahara2012). The notions ‘word’, ‘phrase’ and ‘clause’ will need to be given proper definitions within syntactic theory, an undertaking that has proven non-trivial, and cannot be reviewed here (but see e.g. Selkirk Reference Selkirk and Kubozono2009, Reference Selkirk, Goldsmith, Riggle and Yu2011; Myrberg Reference Myrberg2013a; Hamlaoui & Szendrői 2015).
Secondly, prosodic categories sometimes fail to align perfectly with morphosyntactic categories, and alignment between morphosyntax and prosody is subject to substantial optionality and variation. The non-perfect alignment between syntax and prosody is well attested for a vast number of languages (see e.g. Selkirk Reference Selkirk1984, Reference Selkirk1986; Nespor & Vogel Reference Nespor and Vogel1986; Ladd Reference Ladd1986 for early discussion), as are several sources of such variation. For instance, prosodic structure tends to be flatter than syntactic structure (e.g. Chomsky & Halle Reference Chomsky and Halle1968; Selkirk Reference Selkirk1984, Reference Selkirk, Morgan and Demuth1996; Nespor & Vogel Reference Nespor and Vogel1986), and prosody may add signals of information structure influenced by discourse (e.g. Jackendoff Reference Jackendoff1972; Selkirk Reference Selkirk1984, Reference Selkirk and Goldsmith1995; Truckenbrodt Reference Truckenbrodt1995). There is also variation in alignment between prosody and syntax that has to do with speech rate, speech style, and constituent length (e.g. Fougeron & Jun Reference Fougeron and Jun1998; Jun Reference Jun1998, Reference Jun, Solé, Recasens and Romero2003; Frota & Vigário Reference Frota and Vigário2007). We shall tacitly assume that the basic alignment between syntax and prosody, as well as the deviations from the prosodic alignment with morphosyntax, is the result of constraint interactions of the type assumed within Optimality Theory (OT; McCarthy & Prince Reference McCarthy, Prince, Booij and van Marle1993, Prince & Smolensky Reference Prince and Smolensky1993) to account for the syntax–prosody interface (among many others Selkirk Reference Selkirk, Morgan and Demuth1996, Reference Selkirk and Horne2000, 2011; Truckenbrodt Reference Truckenbrodt1999; Féry & Samek-Lodovici Reference Féry and Samek-Lodovici2006; Itô & Mester Reference Itô, Mester, Borowsky, Kawahara, Shinya and Sugahara2012). For details of the application of these constraint interactions to Swedish data, we refer the interested reader to our previous work, primarily Riad (Reference Riad, Kehrein and Wiese1998), Myrberg (Reference Myrberg2010, Reference Myrberg2013a), and Myrberg & Riad (forthcoming).
For Swedish, we find misalignment between morphosyntax and prosody at all levels of structure, providing evidence that prosodic constituents exist independently of syntactic constituents (e.g. Nespor & Vogel Reference Nespor and Vogel1986). Misalignment and optionality are important sources of information in the investigation of individual prosodic categories.
2. TONAL PROMINENCE IN SWEDISH
Before discussing the individual prosodic categories, we review the four most basic tonal contours in Swedish phonology, as exemplified for Stockholm Swedish in the first two columns of (1), set off in the table by the heavy surrounding line.
(1)
The separation of tonal events into these four basic contours is widely accepted in the research tradition of Swedish prosody, and is originally due to Bruce (Reference Bruce1977, Reference Bruce1998). The table in (1) illustrates the shape of the four basic contours in Stockholm Swedish, which is the most well-studied dialect of Swedish. However, the distinction extends more generally to Swedish as well as Norwegian dialects (e.g. Bruce Reference Bruce2007; Riad, forthcoming). It serves as a base for our observations at all prosodic levels, ω, φ, and ι, and is therefore best described independently of the discussion of any individual category.
Two properties occasion the four distinct contours. First, there is lexical tone in Swedish, yielding a binary surface melodic distinction between tone accent 2 (lexical tone + intonation) and tone accent 1 (intonation only), represented in the first two columns of (1). The lexical tone that yields accent 2 in a prosodic domain, resides in many suffixes and some roots (Riad Reference Riad2009, Reference Riad2014; discussion in Section 3.2 below). The third column of the table in (1) describes the accent 2 contour when it appears in a word with several stressed syllables (accent 1 is not applied to such words in Stockholm Swedish). This contour is melodically the same as the accent 2 contour, but it is not triggered by a lexically specified tone, as it is postlexical (Section 3.2).
Secondly, there is a separation of two levels of intonational prominence, expressed in both accent 1 and accent 2 (Bruce Reference Bruce1977). This is represented in the two rows of (1). We shall use a new term pair for the accent types belonging to the respective prominence categories, namely big accents and small accents. Big accents are bigger than small accents both in terms of scaling (they have larger fundamental frequency (f0) excursions) and in terms of being (in general) perceptually more prominent than small accents.
In much previous research, these two intonational categories have been referred to as sentence accent or focal accent and word accent or just accent (e.g. Bruce Reference Bruce1977, Reference Bruce1998, Reference Bruce and Jun2005, Reference Bruce2007; Heldner Reference Heldner2001; Hansson Reference Hansson2003; Ambrazaitis Reference Ambrazaitis2009; Roll, Horne & Lindgren Reference Roll, Horne and Lindgren2010; Myrberg Reference Myrberg2013a; Myrberg & Riad, forthcoming). These terms are problematic, as they imply a strong correlation between the form of the accents and their function. Since the introduction of these terms, we have learned that the functions of the accents are quite varied. Many big accents appear on material that is not focused, and not all words have their own small accent. This leads to confusing formulations (e.g. ‘focal accents that appear on given material’, ‘word accents that cue phrasal structures’).
The intention behind the terms big and small accent is to avoid this form/function problem. These terms should be sufficiently flexible and theory neutral to be useful for the broad research community. The terms big and small accent are, to the best of our knowledge, not used to refer to any similar distinction elsewhere in prosodic theory, and confusion with empirical distinctions in other languages (e.g. minor/major accent) should thereby be avoided. They are also relatively theory neutral (as opposed to e.g. focal/focus accent, phrase accent, head accent). Big and small are also not tied to the melodic shape of the accents (e.g. HL-accent or HLH-accent), a fact that allows us to use these terms for any individual Swedish or Norwegian dialect or for dialect typology, despite the fact that the shape of the basic contours is highly variable in the Scandinavian area.
In the following sections, subtypes of big and small accents will be discussed. These subtypes relate both to separate functions of the accents, and to their phonological and phonetic realization. Below, we discuss the category ω in Section 3, φ in Section 4, and ι in Section 5. We conclude the paper in Section 6.
3. THE PROSODIC WORD (ω)
The prosodic word clearly exhibits two levels in Swedish. The distinction is apparent when comparing forms containing one stress (simplex words and some derivations) with forms containing multiple stresses (compounds and some other derivations). We will analyze both these levels as aspects of the prosodic word, and will refer to them as the minimal prosodic word (ωmin) and the maximal prosodic word (ωmax), respectively. Both levels have one phonological prominence as their defining property (so-called culminativity). For the minimal ω, the defining characteristic is the presence of stress. The properties of the minimal ω can further be supported by rules and processes related to syllabification, distribution of /h/ and aspiration, truncation in coordinations, and the lexical assignment of tone (Sections 3.1–3.2). For the maximal ω, the defining characteristic is the presence of one tone accent (big or small).
In simplex words, as in (2), there is no distinction between the minimal and maximal ω. Simplex forms thus receive one stress and one accent. When the minimal and maximal ω is not distinct, we indicate this with ‘ωmin=max’.
(2)
The pattern in (2) is distinct from several of the other Germanic languages (Kager Reference Kager1989, Hammond Reference Hammond1999, Kaltenbacher Reference Kaltenbacher1999, Zonneveld et al. Reference Zonneveld, Trommelen, Jessen, Rice, Bruce, Àrnason and van der Hulst1999:503ff.), in that there is very little evidence of stable secondary stress within the ω in Swedish. Swedish forms are compared with cognates in American English in (3) and German in (4).Footnote 3
(3)
(4)
The contrast between minimal and maximal shows up in forms with more than one stress, and in forms with unstressed prefixes för- and be-, illustrated in (5).
(5)
The structure in (5a) is a compound consisting of three minimal ωs organized into one maximal ω. The examples in (5b–c) include forms with a stressed suffix and a stressed prefix, which get the same ω structure as compounds. Finally, (5d) exemplifies prefixation with för-, a pretonic prefix that adjoins prosodically to an adjacent minimal ω. Adjunction also results in a difference between minimal and maximal ω. We return to the accentual properties of these forms below.
The baseline in North Germanic tonal varieties is for most content words, simplex or compounds, to form a maximal ω, and thereby to constitute a tonal domain. The general impression is that accents are more frequent in e.g. normal spoken Swedish than in corresponding West Germanic varieties, like English.
The notions of minimal and maximal ω come from the general model for the prosodic hierarchy given in Itô & Mester (Reference Itô and Mester2007, Reference Itô, Mester, Borowsky, Kawahara, Shinya and Sugahara2012). Itô & Mester propose a small set of prosodic categories (ω, φ, ι), which admits recursion and which applies universally. Phonological rules can refer separately to either the minimal (lowest) or the maximal (highest) projection of a category, or to all projections of a category simultaneously. For Swedish, the distinction between minimal and maximal is particularly useful for our analysis of the ω.
Previous models of North Germanic intonation have included separate categories headed by an accent at the word level, like our maximal ω, e.g. the Tonal Foot (Fretheim & Nilsen Reference Fretheim and Nilsen1989), the Accentual Phrase (Kristoffersen Reference Kristoffersen2000, Abrahamsen Reference Abrahamsen2003), and the Prosodic Word (Bruce Reference Bruce1998, Hansson Reference Hansson2003). However, these prosodic categories lack mechanisms for aligning prosody with morphosyntactic categories. From a typological perspective, Vigário (Reference Vigário2010) proposes a category Prosodic Word Group (PWG) between the prosodic word and the prosodic phrase, and Vogel (Reference Vogel, Scalise and Vogel2010) proposes the Composite Group (CompG).
3.1 Processes within the minimal ω
The arguments that provide evidence for the separation of the minimal ω from the maximal ω pertain to syllabification, distribution of /h/ and aspiration, and coordinatory truncation (Myrberg & Riad Reference Myrberg, Riad, Asu and Lippus2013, Riad Reference Riad2014).
3.1.1 Syllabification
Like most Germanic languages, Swedish has the minimal ω as domain for syllabification (e.g. Wiese Reference Wiese1996, Booij Reference Booij1999). To show this, we may challenge the Onset principle across different types of morpheme boundaries and see whether or not a consonant is syllabified with a following vowel-initial morpheme. Most simplex forms (monomorphemic and inflected words) will constitute a single ω (where ωmin =ωmax), whereas true compounds will consist of more than one minimal ω, as illustrated in (6).
(6)
The Onset principle is clearly thwarted across a ω-boundary (6d), thereby showing that syllabification stops within the minimal ω. This intuition remains clear when stresses are further apart, as in (6e).
3.1.2 Distribution of /h/ and aspiration
The main segmental indicator of the minimal ω is the distribution of aspiration, both in voiceless stops and in the phoneme /h/. Aspiration is prominent in the initial position of a ω and in the initial position of a stressed syllable, as illustrated with /t/ in (7).
(7)
In (7a) /t/ is initial in the ω and in a stressed syllable. In (7b) /t/ is in a stressed but non-initial syllable, and in (7c) it is initial in the ω, but in an unstressed syllable. In these situations, /t/ is aspirated. In (7d) /t/ is in the onset of a non-initial, unstressed syllable, and in (7e) it is final in the ω. In these situations, /t/ is unaspirated.
The same distribution holds for the realizations of the phoneme /h/, whose presence may therefore also serve as indicator of stress (feet) and left edges of ωs:
(8)
The phoneme /h/ is pronounced [h] in any stressed syllable, ω-initial or not, as is (8a–b), and in ω-initial unstressed syllables, as in (8c). However, [h] only optionally surfaces between two unstressed vowels, as seen in (8d), where it may be voiced, hence [ɦ] (Garlén Reference Garlén1984:39; Engstrand Reference Engstrand2004:168).
With the exception of (8d), where the pronunciation of /h/ seems optional rather than prohibited, this distribution is by and large the same as in American English (Davis & Cho Reference Davis and Cho2003 and references given there).
3.1.3 Coordinatory truncation
The minimal ω is also the minimal unit that can be truncated in coordination (Booij Reference Booij, van der Hulst and Smith1985 for Dutch, Wiese Reference Wiese1996 for German, and Riad Reference Riad2014 for Swedish). Compound members and derivations with stressed suffixes permit truncation, see (9), whereas members of prosodically reduced compounds and unstressed derivational suffixes do not, see (10) and (11).
(9)
(10)
(11)
In (10) the disallowed truncations of weekdays (-dag) and berries (-bär) indicate that lexicalization has taken place, the phonological effect of which is the removal of stress. Reduction is evident also in the fact that postlexical tonal accent is assigned in (10b–c), which is accent 1 when there is one stress, and accent 2 when there are two stresses (see (14), below). Other compounds involving -dag, which do not belong to the closed set of weekdays, retain stress and permit truncation (10c). Similar cases can be found in Dutch and German (however, without the benefit of the accentual difference, which makes the case for reduction easier to argue in Swedish).
There are parallels here with cognate suffixes in German (-schaft/-skap, -ig/-ig), but there are also contrasts, e.g. -lich/-lig, where only the German suffix is a ω.
3.2 Lexical and postlexical assignment of tone accent
The assignment of tone accent provides a source of evidence for both the minimal and maximal ω, in that accent assignment is differently conditioned in them. Lexical tone accent takes the minimal ω as its domain, whereas postlexical tone accent assignment targets the maximal ω.
Accent 2 in simplex forms contains the marked, lexically represented tone, which is H in Central Swedish. Accent 1 is the melody of the unmarked pitch accent, i.e. the case when no lexical tone is present. Accent 2 resides in many suffixes, that is, the lexical specification for a tone comes with the suffix. The tone is assigned to a preceding primary stressed syllable, within the same minimal ω. In addition to large classes of suffixes, lexical tone is also inherent in some roots.Footnote 4 Lexical accent is marked by a lowered digit (2 for accent 2) at the end of the morpheme. The raised digit (1 or 2) marks the realized tone accent, at the primary stress.
(12)
In North Germanic, the tone bearing unit (known as TBU) is the stressed syllable. The assignment of a lexical tone from a suffix is subject to a constraint on locality, normally meaning that the suffix has to be directly adjacent to the stressed syllable, as seen in (13a–b) below. Lexical accent assignment is also sensitive to whether the word begins with a stressed or unstressed syllable. In forms that begin with one or more unstressed syllables (so-called anacrusis), accent-2-inducing suffixes may fail to assign their lexical tone, even if locality is met. Indeed, suffixes fall into two classes (strong and weak) with respect to anacrusis, as seen in (13c–d).
(13)
The reader is referred to Riad (Reference Riad2009, Reference Riad2012, Reference Riad2014, 2015) for a fuller discussion of the distributional properties.Footnote 5
The clearest indication of lexical accent assignment being limited to the minimal ω is the fact that more complex structures are clearly assigned accent without heed to lexical information, as will now become evident. If lexical tone cannot be expressed, accent will be postlexically assigned, according to the simple parameters in (14).
(14) Postlexical accent assignment
a. One stress yields accent 1 (i.e. tonal prominence pure and simple).
b. Two stresses yield accent 2 (i.e. tonal prominence preceded by a H tone).
Culminativity in the maximal ω is expressed by one accent, big or small. Thus, irrespective of the lexical tonal specifications included in the morphemes of a large maximal ω (e.g. a compound), only one accent can be realized. The accent may be determined lexically or postlexically (depending on size, construction, dialect) but it is invariably the case that only one accent is realized in a maximal ω.
Accent 1 is always postlexical. This is the assignment of tonal prominence, pure and simple in the absence of a lexical tone. Accent 2 is either lexical, i.e. induced by a lexically marked root or suffix, or postlexical, namely in case there are two or more stresses in the structure.Footnote 6 This is sometimes called a ‘compound rule’ (e.g. Gussenhoven Reference Gussenhoven2004:214), since compounds typically contain two stresses (or more) in Germanic. However, the regularity is purely prosodic, extending to any structure that contains two stresses, including morphologically complex forms containing stressed prefixes or suffixes. This is illustrated in (15).
(15)
The phonological association pattern of big accent 2 in Central Swedish compounds makes the dependence on stress quite clear, as the tonal contour associates at two points, both of which are stressed syllables. The accent 2 marker H* goes to the first stressed syllable, and the prominence marker L*H goes to the last stressed syllable. This is illustrated in Figure 1. The assignment of postlexical accent is entirely insensitive to any lexical tonal specifications in Central Swedish.
Another informative case of postlexical accent is constituted by forms derived with an unstressed prefix (be- or för-), which invariably get accent 1. These forms are interesting in that they may contain a suffix that is lexically marked for accent 2. The form (för-1(ˈtal-a2)ωmin)ωmax in (16c) below is a case in point, where we take the prefix to be adjoined (rather than incorporated) to the minimal ω. The infinitive suffix induces accent 2 in canonical forms without the prefix, i.e. in the simple verb 2(ˈtal-a2)ωmin=max, as in (12e) above. When a prefix is adjoined, a maximal ω is created which is no longer coextensive with the minimal ω. This renders the lexical tone invisible, and postlexical accent 1 is assigned, per (14a).
(16)
To make this point more forcefully, we can compare prefixation of för- with forms with a different type of unstressed prefix des-, which incorporates into the minimal ω.Footnote 7
(17)
The accent difference between the two prefixed forms is thus due to the difference in prosodic structure, adjoined vs. incorporated. In the incorporated structure, accent assignment is lexical, while in the adjoined structure it is postlexical, as we have seen, and this distinction serves as evidence for two levels of the ω.
3.3 Mismatches between words in morphosyntax and prosody
We have now looked at formations of minimal and maximal ω, where one word or morpheme in morphosyntax in principle corresponds to one minimal or maximal ω. There are, however, also cases where units that consist of more than one word in morphosyntax, correspond to a single word as regards prosodic structure. In a small number of cases, a single word in morphosyntax can also correspond to multiple prosodic words. In such cases, we may speak of mismatches between words in morphosyntax and words in prosody. Below, we review such mismatch cases, as they shed light on the general principles of alignment between prosody and morphosyntax.
3.3.1 Incorporation into a minimal ω
A first case in point is that a stressable pronoun can sometimes be incorporated at the right edge of a minimal ω. We can observe such incorporation with h-initial pronouns (e.g. han ‘he’, hon ‘she’, henne ‘her’), since the distribution of [h] indicates the minimal ω, see (8). In the cases in (19) there is loss of [h], and the pronoun is syllabified with the preceding word. Both these facts indicate the absence of a minimal ω boundary preceding the h-initial pronoun. Small caps mark the presence of a big accent (ωmin = max in all instances).
(18)
A similar observation can be made for pronouns (e.g. den, det ‘it’, du, dej ‘you’, de ‘they’, dem ‘them’), which undergo a rule of d-continuization (/d/ > [r]) when incorporated into the minimal ω (Teleman Reference Teleman and Lindblom2013, Riad Reference Riad2014:99ff.).
(19)
The distribution of d-continuization indicates that the forms undergoing it are unstressed and incorporated into a minimal ω together with some stressed item nearby.
A third, slightly more complex case is derivation with the suffix -eri. This is a highly productive suffix, which always attracts primary stress by the phonological stress rule of Swedish. This type of suffix belongs to the class of unspecified morphemes (Riad Reference Riad2012, Reference Riad2014, 2015). What makes -eri particularly interesting is the fact that it attaches to any type of root, i.e. not only to unspecified roots (typically non-native), but also to tonic (i.e. lexically stressed) roots (typically native Germanic), and it always forms a single syllabification domain with the preceding root morpheme. Thus, suffixation with -eri always leads to the formation of a minimal ω. With unspecified roots this is straightforward, as such roots combine with a large set of unspecified suffixes (-era and -ik in (20)). With tonic roots, however, culminativity is challenged, as both the root and suffix require stress, yet must form a single minimal ω domain.Footnote 8 Combination of -eri with tonic (T) and unspecified (U) roots is exemplified in (20), where the quality and quantity of vowels under stress (left) is compared with the case where stress is further to the right (right).
(20)
In the unspecified roots, vowel length is present only under stress, whereas in tonic roots, some half-length is retained even when stress moves to the suffix -eri. This is, we assume, the result of the arising conflict when lexical stress cannot be expressed at the surface, due to the suffixation with -eri, whose stress attraction dominates. Thus, some phonetic properties of stress are retained, but stress itself is removed and placed on -eri, in accordance with the stress rules, and with the requirement of culminativity within the minimal ω, which constitutes one syllabification domain.
3.3.2 Syntactic phrases that form a single maximal ω
In a number of cases, a syntactically complex unit forms a single maximal ω. First, the adverb för ‘too’ may adjoin prosodically to a minimal ω, e.g. 2ˈliten ‘small’, 2ˈmånga ‘many’ and 2ˈlänge ‘long’, in a manner that is parallel to the morphological case we reviewed in (16).
(21)
The evidence that the structures in (21) are cases of prosodic adjunction comes from tonal accent assignment. These forms would normally have accent 2, via lexical specification in the root or suffix, but here they get accent 1, which is invariably assigned postlexically, and which indicates that the maximal word is different from the minimal word. Factors like frequency and structure affect the likelihood of this type of adjunction in syntax, and it also takes place in a few other lexicalized cases beside för ‘too’, e.g. till 1ˈsalu 2 ‘for sale’, det 1ˈsamma 2 ‘the same’ (Myrberg & Riad Reference Myrberg, Riad, Asu and Lippus2013).
A second case is given in (22). In several North Germanic dialects (however not in standard Central Swedish), a so-called particle verb, or in some cases a verb plus an object, may exhibit compound prosody, where the verb and its particle/object form a maximal ω with the head to the left, on the verb. This is illustrated in the righthand column of (22) with examples from East Norwegian (ENw) and northern varieties of Swedish (NSw). The regular prosody for particle verbs in dialects that do not have this option is an unaccented verb followed by the accented particle (the left-hand column in (22)).
(22)
We see from the above examples that what is basically word prosody sometimes extends beyond the morphosyntactic word. In all cases, there is some kind of semantic motivation for the prosodic formation, indicating that this use of word prosody is due to some notion of lexicalization. Adjunction occurs with the adverb för ‘too’, but not with the (homonymic) preposition för ‘for’. Compound formation occurs with particle verbs, i.e. verbs with a semantically closely connected particle, not regularly with e.g. verbs+object, which may often have the same prosody as particle verbs (i.e. with a deaccented verb). Still, (22c) is a verb+object structure, but one which is lexicalized.
A third, slightly different type of case is illustrated in (23), where we have lexicalized phrases that include more than one content word and where one or more of these content words are deaccented, though not destressed (underscored).
(23)
By definition, a content word that fails to receive an accent is not a maximal ω. Instead it must form part of an adjacent maximal ω. The structures in (23) above look much like compounds, but the important difference is the locus of the head, to which the pitch accent obligatorily associates. In compounds, the head is to the left, whereas in phrases it is to the right. We take this to be a reflection of the syntactic status and, to some extent, the diachronic formation of the structures in (23). It is well-known that the prosodic head (i.e. the strongest prominence) of a phrase generally appears close to the right phrase edge in Germanic languages (e.g. Selkirk Reference Selkirk and Goldsmith1995, Féry & Samek-Lodovici Reference Féry and Samek-Lodovici2006). In (23), the elements in the phrases are prosodically grouped together, which is expressed by deaccentuation and incorporation into the same prosodic unit, which we take to be the (right-headed) maximal ω. If the semantic coherence of a phrase type becomes strong, the next step in prosodic lexicalization would be for the head to move to the left edge of the maximal ω, as with the particle verbs in (22). The same development is manifest historically in the formation of compounds from phrases (kungs gård ‘king's estate’ > kungsgård, see Strandberg Reference Strandberg2014).Footnote 9
This analysis entails a strong structural affinity between lexicalized phrases and compounds and predicts that they should exhibit similar grammatical behaviour, in particular, that lexicalized phrases should behave like words. This prediction is borne out, as we can see by studying incorporation of phrases into compounds. Deaccented phrasal structures like the ones in (23) can occur in initial, medial and final position in compounds; see (24), where the relevant phrases are underscored.
(24)
Another type of deaccentuation structure, which we may call local deaccentuation, is exemplified in (25). The structures in (25) differ from those in (23) in that the domain of deaccentuation is a single word, rather than a whole morphosyntactic phrase. The deaccented word attaches to any immediately adjacent word that carries a word accent, even though these two words do not form a syntactic constituent.
(25) Local deaccentuation
a. ((ˈliten2)ω2(ˈsmuts-ig2)ω)ωmax2(ˈgryt-a2)ωmin=max
‘little dirty pot’
b. ((ˈlag-a2)ω (be-1(ˈgagn-ade2)ω)ω’)ωmax (kopia1ˈtorer)ωmin=max
‘repair used copying machines’
The nature, and to some extent even the scope, of this deaccentuation pattern remain to be understood, but it is clear that deaccentuation may be local.
3.4 One morphological word that forms two prosodic words
We have seen word prosody extending beyond the morphological word. In this section we look at the reverse situation, where single words are split into two ωs. The first pattern occurs regularly in a set of words, most of which end in -iv, and where the result is two maximal ωs. The word type exhibits three different prosodic shapes, and for the segmentally homophonic ones, the different prosodies are often associated with different word meanings. The first is final main stress and accent 1, as in (26a), the second has prosodic compound shape and accent 2, as in (26b). The third pattern contains two separate maximal ωs, exhibiting two separate instances of accent 1, as in (26c), which is the same prosody as real phrases like those in (26d).
(26)
The pattern in (26c) is mentioned in Bruce (Reference Bruce1993), citing Kjellin (Reference Kjellin1978), who also makes the connection between the prosody of a two-word phrase like that in (26d) and the prosodic pattern in (26c). The similarity with a regular phrase is apparent in the regular plateau sandhi following big accents in Central Swedish (see Section 5.3 below, Bruce Reference Bruce, Gregersen and Basbøll1987, Myrberg Reference Myrberg2010:102f.). When the words in (26c) carry a big accent, it appears on the initial syllable, and a plateau is created which stretches to the final stressed syllable.
It would appear that the phrasal shape of the -iv words in (26c) is related to the contrastive use of accents. One indication of this is that many of the words that have it come in pairs: aktiv/passiv, explicit/implicit, positiv/negativ. The same effect is in evidence with corrective accentuation in forms like (kapriˈfol)ωmin=max ‘honeysuckle’ (e.g. Did you say (ˈKIPri)ωmax(ˈfol)ωmax? No I said (ˈKAPri)ωmax(ˈfol)ωmax!).
A second pattern of words split into two prosodic ωs occurs in East Norwegian, at the level of the minimal ω, where the addition of an initial primary stress leads to the creation of a formal compound, always with postlexical accent 2, see (27).
(27)
This pattern (also exemplified in (26b) above) can be interpreted as a tendency to stress initially what is perceived as a semantically coherent unit.
4. THE PROSODIC PHRASE (φ)
The primary defining property of the prosodic phrase (φ) is the presence of a big accent. Specifically, every big accent is the head of some φ, making the φ a prosodic domain defined in terms of culminativity, like the minimal/maximal ω.Footnote 10
The big accent is a marker of the phrasal structure within clauses, and many clauses contain more than one big accent (e.g. (32)). An all-new sentence with a preverbal full NP subject normally contains at least two big accents, one within the preverbal XP and one within the VP. Further subdivision into more φs is possible, and sometimes preferred. The φ thus interfaces with the syntactic structure at the level of the syntactic phrase, XP (Itô & Mester Reference Itô, Mester, Borowsky, Kawahara, Shinya and Sugahara2012, Selkirk Reference Selkirk, Goldsmith, Riggle and Yu2011) rather than with the clause, as was implied by the term sentence accent used in the original work by Bruce (Reference Bruce1977).Footnote 11 The correlation between syntactic phrases and big accents constitutes the main argument for separating the φ from the intonation phrase (ι), which governs the distribution of a set of other clause-related phenomena (Sections 5.1–5.3).Footnote 12
The edges of φs are induced primarily from three commonly assumed principles in prosody research: (i) prosodic prominences are heads of prosodic constituents (Selkirk Reference Selkirk1984, Nespor & Vogel Reference Nespor and Vogel1986, Truckenbrodt Reference Truckenbrodt1995), (ii) prosodic heads align with edges of their constituents (Féry & Samek-Lodovici Reference Féry and Samek-Lodovici2006, Féry Reference Féry2013), and (iii) syntactic constituents serve as a base for the formation of prosodic constituents (Itô & Mester Reference Itô, Mester, Borowsky, Kawahara, Shinya and Sugahara2012, Selkirk Reference Selkirk, Goldsmith, Riggle and Yu2011). This means that the arguments for φ-edges are primarily theoretical. We assume no φ-edge tones (though it may turn out that an L boundary tone is possible/optional in the φ). Other phonetic correlates of φ-edges, such as final lengthening or pauses, have not received extensive study as regards Swedish, and therefore, we leave the correlates of φ-edges (beyond head alignment) for future research. At this point, we have no strong evidence from Swedish for recursion at the level of φ, and we will subsequently not make use of a distinction between minimal and maximal φ. Within Itô & Mester's (Reference Itô and Mester2007, Reference Itô, Mester, Borowsky, Kawahara, Shinya and Sugahara2012) model, applied in Section 3, this would mean that there are no known phonological rules that apply separately in the minimal and maximal φ, respectively.
Below, we take a closer look at the distribution of big accents in Swedish, inducing from this some generalizations for φ-phrasing. This discussion is primarily based on an experiment described in Myrberg (Reference Myrberg2015b), where 1200 sentences of the type in (28) below were studied with respect to the distribution of big accents. Information structural focus was on the VP in half of the sentences, as marked by the subscript F in (28), and on the subject in the other half, see (33). The subscript G is to be interpreted as information structurally given. Small capitals in the examples here and henceforth mark words that carry big accents.
(28)
Subjects varied in length between two and five ωs. The sentence in (28) shows an example with four maximal ωs. These are marked with subscript ω, but without brackets for expository purposes. In most examples in Sections 4 and 5, the minimal and maximal ω are coextensive. Therefore, any word marked with ω has both a stressed syllable and a big or small accent. Words not marked with ω have no big or small accent. We assume that function words and deaccented words are incorporated into the following maximal ω, see (23) above. We shall use the example sentence in (28) many times (albeit with different prosodic structure), and exclude the glossing below.
4.1 Obligatory φ in the preverbal constituent
A full NP in the preverbal position (often called the fundament in Scandinavian literature, Diderichsen Reference Diderichsen1946) obligatorily contains at least one big accent and thus necessarily forms a φ. Even when a full NP in preverbal position is minimal (a single ω), it obligatorily has a big accent, as seen in (29). The requirement for φ-formation is thus not a simple reflex of phrase length, but is at least partly triggered by the syntactic boundary between the preverbal constituent and the rest of the clause.
(29)
However, when the preverbal position contains a pronoun, or another word that does not have lexical stress and therefore does not form a minimal or maximal ω of its own, the preverbal constituent usually does not form a φ. Unstressed words (words not marked ω) in the preverbal position are instead incorporated in the φ together with the VP, as in (30a) below. It should be noted that it is possible for unstressed words to be assigned ω-status, i.e. to be realized with both stress and accent when called for by information structure (e.g. in a contrastive topic reading). In these cases, the accent will obligatorily be big, and the preverbal constituent is then analyzed as a φ, see (30b).
(30)
4.2 Left-headed vs. right-headed φ: Initiality accents
In longer preverbal XPs, we observe that the obligatory big accent may appear on the leftmost ω, as in (31a), or on the rightmost ω, as in (31b).
(31)
Prosodic heads are generally assumed to align with the edge of their prosodic constituents, rather than directly with the morphosyntactic structure (the indirect reference hypothesis, e.g. Inkelas Reference Inkelas1989, Truckenbrodt Reference Truckenbrodt1999; also McCarthy & Prince Reference McCarthy, Prince, Booij and van Marle1993, Féry & Samek-Lodovici Reference Féry and Samek-Lodovici2006). We characterize a φ in which the big accent appears on the leftmost ω as left-headed, and one where the big accent appears on the rightmost ω as right-headed. The possibility of having a left-headed φ appears to be particular to (some dialects of) North Germanic, as there are no reports of this phenomenon (beyond information structural focus) for West Germanic languages.
When the big accent appears on the leftmost ω as in (31a), we refer to it as an initiality accent following Myrberg (Reference Myrberg2010) (see also Roll, Horne & Lindgren Reference Roll, Horne and Lindgren2009, who discuss the same phenomenon, but refer to it as a high left-edge boundary tone).
A left-headed φ (i.e. initiality accent) is the most common option for clauses that do not have a narrow focus in the subject. However, left-headed φ also alternates with right-headed φ, as in (31b), seemingly in free variation (Myrberg Reference Myrberg2015b). The structure in (31b) is compatible with a contrastive topic reading on the subject, but does not necessarily invoke it. However, the structure in (31a) is incompatible with a contrastive topic reading. Neither structure is compatible with a narrow focus reading on the subject.
4.3 Optionality in φ-phrasing
Long and/or syntactically complex preverbal XPs, as in (31), optionally form two or more φs. The alignment of XP-edges with φ-edges is thus subject to considerable variation (Myrberg Reference Myrberg2013a, Reference Myrberg2015b). The constraints that impose the alignment between the XP in syntax and the φ in prosody must therefore be assumed to interact with a number of other constraints.
The preverbal constituent in (31) consists of two XPs. These two XPs are frequently (but not obligatorily) separated prosodically, forming separate φs. In such cases, the first φ is usually left-headed, while the following φ or φs are right-headed, as in (32a). However, it is also possible to have two right-headed φs, as in (32b), or indeed two left-headed ones, as in (32c). For a detailed discussion of these phrasing options and frequencies, see Myrberg (Reference Myrberg2015b).
(32)
4.4 The rest of the clause
We have looked at the preverbal constituent and should also comment briefly on the prosodic phrasing of the remainder of the clause, which contains the finite and nonfinite verbs, objects, adverbials, and sometimes also the subject (which appears to the right of the finite verb when not in the preverbal position). The verb and its complement(s) frequently form a single φ when complements are short. Long verbal complements may form a φ of their own. Adjuncts frequently also form their own φ.Footnote 13 Many details remain to be understood in this area, but in general it can be said that observations previously made for pitch accent distribution in West Germanic languages like German and English also apply to the distribution of big accents in Swedish (e.g. Selkirk Reference Selkirk1984, Féry Reference Féry1993). See further Myrberg (Reference Myrberg2010, Reference Myrberg2013a, Reference Myrberg and Eklundb), and Myrberg & Riad (forthcoming).
4.5 Focus, givenness and φ-phrasing
In addition to constituent length and ω-status, φ-phrasing is also affected by information structural focus and givenness. To the right of a focus, there are normally no φ-heads, as illustrated by the absence of a big accent on parken ‘the park’ in (33a–c).
(33)
To the left of a focus, φ-phrasing is much less affected than to the right, as illustrated in (33b–c), where the subject contains two φs. Thus, a subject may be phrased as multiple φs, even when it contains a narrow focus.
In the absence of φ-heads, there is also no evidence for φ-phrasing in postfocal areas, i.e. following a narrow focus. Consequently, at our current state of knowledge, there is not much evidence for distinguishing between the two rightmost brackets marked with superscript i and ii and set in grey to mark that they are hypothetical in (33a–c) (one but not both will be necessary to complete the prosodic structure).
In terms of f0, the postfocal area may contain either a high plateau followed by a sequence of downstepped small accents, or a sharp fall directly after the nuclear big accent followed by a sequence of downstepped small accents, compare the internal and external low areas in Figure 3 below (Section 5.3). In the former case, either location of the rightmost φ-edge (i or ii) is compatible with the evidence. However, in the latter case, we analyze the sharp fall following the big accent as a boundary tone licensed by the insertion of a ι-edge (as explained in Section 5.3). Under this analysis, the prosodic pattern that involves a fall directly after the big accent is only compatible with the innermost right bracket (marked i).
5. THE INTONATION PHRASE (ι)
There are three primary cues for the intonation phrase, ι. In Section 5.1 we introduce a separation between prenuclear and nuclear big accents, where nuclear big accents are heads of ι. Any accent preceding the rightmost big accent in the ι is prenuclear. In Section 5.2 we examine how the ι governs the sequencing of the φs it contains, primarily by limiting left-headed φs to the initial position of the ι. This initiality accent can be seen as a subtype of the prenuclear big accent. In Section 5.3 we show that the ι has a set of boundary tones at the right edge.
The three properties of the ι all have a basic correspondence with clauses in syntax, such that there is usually one nuclear accent per clause, left-headed φs usually appear initially in clauses, and the set of edge tones that belong to the ι usually appear at the right edge of a clause, and rarely in the middle of a clause. We therefore assume that the ι, which is responsible for the distribution of these three phenomena, interfaces with syntactic structure at the level of the clause.
5.1 The head of ι: The nuclear accent
The ι is always right-headed, i.e. the rightmost φ within an ι contains the head of that ι, which will be referred to here as the nuclear big accent. Technically, the nuclear accent is the head of a φ as well as an ι, and indeed also of a maximal ω.Footnote 14
The term nuclear accent has been widely used in the literature on West Germanic languages to refer to the rightmost accent of a sentence, i.e. in a manner similar to our use of the term (e.g. Chomsky & Halle Reference Chomsky and Halle1968, Pierrehumbert Reference Pierrehumbert1980, Selkirk Reference Selkirk1984). However, it does not have a strong tradition in the literature on Swedish, where the primary distinction has been between big versus small accents (or variants of these terms, e.g. ‘focal’ versus ‘nonfocal’ accents), since Bruce (Reference Bruce1977). Subtypes of the big and small accents have not received much attention.
We argue that there are empirical grounds for distinguishing nuclear from prenuclear big accents in Swedish. The nuclear accent is characterized by three distinct features, which are simultaneously fulfilled in canonical cases, such as (34), but not always (Myrberg Reference Myrberg2015a, Myrberg & Riad, forthcoming).
First, just like in the West Germanic languages, there is a strong requirement that information structural focus be marked with a nuclear accent (Bruce Reference Bruce1977, Heldner Reference Heldner2001, Myrberg Reference Myrberg2010, Myrberg & Riad, forthcoming), i.e. that the focus carry the rightmost accent in the ι. Big accents to the right of a focus therefore become deaccented, as in (34b) below, also (33) above. We can refer to this as postnuclear/postfocal deaccentuation, but note that only big accents are targeted. Small accents remain unaffected by postnuclear deaccentuation. In (34), nuclear accents are indicated in bold small caps and prenuclear accents in plain small caps. Curly brackets represent ι-edges. Prosodic words are marked ω, but without brackets in these representations.Footnote 15
(34)
The nuclear accent that marks focus is obligatorily aligned with the right edge of the focused constituent. Consequently, a left-headed φ can only ever project to a nuclear accent under a narrow focus reading, as in (35a). The structure (35b) is illicit because the nuclear accent is left-aligned in the focused constituent. We conclude from this that left-alignment of the head, which is frequently observed at the φ-level, is ungrammatical at the ι-level in Swedish (unless forced by the presence of a narrow focus).Footnote 16
(35)
The second property of the nuclear accent is its obligatory position in the sentence. The nuclear accent cannot be moved to another word in the sentence without significantly changing the reading of the sentence. This is in contrast with the prenuclear accents, which were shown to exhibit distributional variation in Section 4.
Thirdly, nuclear accents exhibit phonetically higher scaling than prenuclear accents. Myrberg (Reference Myrberg2015b) shows that prenuclear accents on ungar ‘kids’ in (34a), has a lower f0 maximum than nuclear accents in the same position of the sentence, see ungar in (34b). This is illustrated in Figure 2, adapted from Myrberg (Reference Myrberg2015b).
Similarly, a nuclear accent that signals all-new focus, as on taxin ‘the taxi’ in (36a), is lower scaled than a nuclear accent that signals narrow or contrastive focus, as taxin in (36b–c) (Myrberg Reference Myrberg and Eklund2013b).Footnote 17
(36)
To our knowledge, no study has shown that prenuclear and nuclear big accents are perceptually distinct in Swedish. It would seem likely that such a distinction could be found, but we must leave the testing of this hypothesis to future research.
5.2 φ-sequencing
In addition to controlling the distribution of edge tones and nuclear accents, ι governs the sequencing of φs. The very existence of constraints on φ sequencing suggests that there is a prosodic structure that is larger than the φ. These constraints thus constitute further evidence for the ι in Swedish.
Within ι, φs can be left- or right-headed, but they may not occur in any order (Myrberg Reference Myrberg2015b). As was illustrated in (32), partly repeated in (37), an ι may consist of three φs.
(37)
Of these, the initial one is often (but not obligatorily) left-headed, (37a). If the initial φ is left-headed, the medial φ may also be left-headed (37b). However, (37c), where the initial φ is right-headed and the medial φ is left-headed, is not possible. It is also not possible to have a final left-headed φ as in (37d), since the ι-final φ contains the ι-head, and recall from (35) that an ι-head may not be projected from a left-aligned φ-head, unless required by a narrow focus.
5.3 Right-edge boundary tones and postnuclear areas
In addition to the differences between ι-heads and φ-heads, these two phrase categories are different in terms of their edge cues. While the φ has no strong edge correlates beyond the alignment of φ-heads with φ-edges (indirectly evidenced via syntactic structure, Section 4), the ι has a rather rich set of options at its right edge. The claim that these edge options are associated with the ι implies that these are options that the speaker has at ‘larger breaks’, hence ι-breaks. These options are not available at φ-breaks, i.e. between any two big accents. A sentence may be divided into two ιs (commonly one ι for the preverbal constituent and one ι for the rest of the clause), or even more than two ι in some types of speech. When there are two foci in one sentence, we also assume that the sentence is divided into two ιs in order for each focus to have its own nuclear accent. In such cases, it would be possible to have boundary tones between big accents inside the sentence.
The options for the right edge of ι are summarized in Figure 3, from Myrberg (Reference Myrberg2010).Footnote 18 Each black dot represents the rightmost high tone in a nuclear big accent L*H or H*LH (in some previous literature this tone is referred to as the focus tone, e.g. Bruce Reference Bruce2007). The area following the dot is the postnuclear area, i.e. the area to the right of a nuclear accent. The triangles represent small accents inside the postnuclear area. The number of small accents is dependent on the number of maximal ωs contained in that area. In Figure 3, the abstract examples contain three such accents, but in a given sentence, there may be more or fewer.
We distinguish two right-edge boundary tones: H% (high or rising f0) and L% (low or falling f0). These are represented on the two rows in Figure 3. In addition, we distinguish between postnuclear areas that contain only one ι-boundary tone, and areas that contain two ι-boundary tones. We refer to areas with one boundary tone as internal areas (the postnuclear area is internal to the ι that contains the nuclear accent). These ιs always have a high plateau between the nuclear accent (H*LH or L*H) and the following small accent (H*L or HL*).
When there are two boundary tones, one appears immediately after the nuclear accent, and the other at the right edge of the postnuclear area. These types are marked external low and external high in Figure 3. The boundary tone directly following the nuclear accent can be H% or L%. The scaling of the postnuclear area is always compressed, so that accents are realized with generally smaller f0 range than in the prenuclear area. When the leftmost boundary tone is H%, compression is made from below (high area), and when it is L%, compression is made from above (low area). The two boundary tones are inserted in response to a conflict between different requirements on the prosodic structure. On the one hand, there is a general pressure for heads of a constituent to appear at one edge of that constituent (see among many others Féry & Samek-Lodovici Reference Féry and Samek-Lodovici2006), and also for a focus to be aligned with the edge of some constituent (Féry Reference Féry2013). In Swedish, this pressure causes ι-heads to appear rightmost in the ι in the neutral case, see (34). On the other hand, there are requirements on the location of a nuclear accent, which sometimes force the nuclear accent to be located at a position away from the rightmost edge of ι. For example, nuclear accents must appear on focused information, but be avoided on given material (Truckenbrodt Reference Truckenbrodt1995, Schwarzschild Reference Schwarzschild1999, Féry & Samek-Lodovici Reference Féry and Samek-Lodovici2006). To satisfy conflicting constraints simultaneously, speakers sometimes insert an additional ι edge directly following the nuclear accent, in addition to the final boundary tone. We assume that the bracketing that provides the two boundary tones is caused by recursion at the level of the ι.
The idea that the insertion of an extra L% edge tone directly following the nuclear accent is a consequence of conflicting constraints in the grammar is at least partly supported by results from Myrberg (Reference Myrberg and Eklund2013b). This study showed a tendency for internal areas to be used for all-new focus (38a), whereas narrow or contrastive focus was often realized with an extra L% edge tone directly following the nuclear accent (38b). This most likely means that the pressure for edge alignment (and therefore insertion of an extra L%) is related to a higher degree of prominence, and correlates with higher scaling and increased duration of the nuclear accent in narrow and contrastive focus than in all-new focus (Féry Reference Féry2013). The distribution is, however, not obligatory, but probabilistic, and it must be noted that further research will be needed in order to confirm the findings in Myrberg (Reference Myrberg and Eklund2013b).
(38)
Much remains to be understood regarding the function of the six edge types in Figure 3. Areas containing only low tones (L%L% and L%) are probably the most commonly used, being the neutral choice for statements as well as questions. Areas containing high tones are used for lists and to signal continuation. However, such areas are not consistently employed to signal e.g. questions (House Reference House2003, Reference House, Branderud and Traunmüller2004), otherwise a common fact in Germanic languages. The function and use of the different types of structures that end high and rising will also need to be identified by future research.
6. CONCLUSION
In this article, we have summarized the arguments for three levels of the prosodic hierarchy in Swedish: ω, φ and ι. While Swedish is a Germanic language, sharing many intonational features with West Germanic, the separation of stress from lexical accent, and the distinction between big and small accents evidence a richness of structure that is lacking in West Germanic, and that, as we have shown, greatly affects the definitions of the prosodic categories. This complex structure necessitates a distinction between maximal and minimal ω, as well as between left- and right-headed φ. As regards the ι-level, Swedish appears to have fewer options than West Germanic languages (Gussenhoven Reference Gussenhoven2004). There is in principle only one option for the ι-head, whereas many more are commonly taken to exist in West Germanic languages like English (e.g. Pierrehumbert Reference Pierrehumbert1980). For postnuclear areas in Swedish, a relatively rich set of options are available. However, the use of rising intonation is more limited than in West Germanic languages.
The differences between West Germanic and Swedish in terms of which phonological rules apply within ω, φ and ι, do not, however, preclude deeper similarities between these language groups. Many resemblances are still in evidence in terms of the strong interdependence between intonation and the expression of information structural focus and givenness, as well as the signaling of different types of morphosyntactic structure.
ACKNOWLEDGEMENTS
We would like to acknowledge the generous support of the Swedish Research Council for the project ‘The Prosodic Hierarchy of Swedish’, within which much of the research reported here was carried out. We would also like to thank the editors and an anonymous reviewer for very useful feedback on an earlier version of this article. All errors are our own.