1. Introduction
This study examines the prosodic properties of surprise questions (SQs) in Estonian by comparing them to string-identical information-seeking questions (ISQs). The main goal of this research is to contribute to the understanding of the role of prosody in the expression of surprise and speech acts.
SQs are a kind of little-studied non-canonical question. According to Farkas (Reference Farkas2022), non-canonical questions arise when the default contextual assumptions that accompany a canonical question are overridden either by a special context or by a special formal marker that adds special discourse effects. The default contextual assumptions of a canonical question include an ignorant speaker requesting an addressee she assumes to be competent and compliant to resolve an issue she raises. These default assumptions follow from the semantic content and conventional discourse effects of unmarked interrogatives, that is, simple interrogative sentences that do not contain special discourse markers, such as, for instance, again in the sentence ‘What’s your name again?’, which signals that the answer to the question was already known to the speaker at some point in the past (Sauerland & Yatsushiro Reference Sauerland and Yatsushiro2017).
The SQs examined in the current study are syntactically and lexically canonical wh-interrogatives that convey mirativity, more concretely, the surprise of the speaker caused by her unprepared mind or counterexpectation (see Aikhenvald Reference Aikhenvald2012). In terms of Farkas’ characterisation, they are unmarked interrogatives uttered in a special context in the sense that the goal of the speaker who utters an SQ is not to request information. However, in addition to the effect of the non-default context, we also expect SQs to be formally marked by prosody, because it is a long-standing observation that indirect speech acts are disambiguated by intonation (e.g. Sag & Liberman Reference Sag and Liberman1975).
SQs can be considered a special type of non-canonical questions, distinct from rhetorical and exclamatory questions. Unlike rhetorical questions (RQs), they do not involve prior commitments to similar and obvious answers by the speaker and the addressee (Rohde Reference Rohde2006). SQs differ from exclamatory questions (i.e. exclamations in the form of an interrogative sentence) in that the wh-phrase is not a degree phrase as is typical for wh-exclamatives (Michaelis Reference Michaelis, Haspelmath, König, Österreicher and Raible2001; Rett Reference Rett2011).
SQs are a kind of expressive question, as discussed in Celle et al. (Reference Celle, Jugnet, Lansari, Trotzke and Villalba2021). In terms of Celle et al.’s work, the SQs examined in this study can be taken to convey emotional expressivity (as opposed to iconic expressivity). They express an affective state – surprise – caused by a state of affairs that does not correspond to the speaker’s expectations.
As a way of expressing surprise, SQs form part of a larger category of mirativity markers. In many languages, mirativity is a morphologically realised grammatical category, often linked to evidentiality (DeLancey Reference DeLancey1997, Reference DeLancey2012; Aikhenvald Reference Aikhenvald2012; Rett & Murray Reference Rett and Murray2013; Peterson Reference Peterson2016). As a semantic category, mirativity can additionally be marked by a variety of mirative strategies, as defined by Aikhenvald (Reference Aikhenvald2012). These include morphological markers that primarily mark another grammatical category (see Aikhenvald Reference Aikhenvald2012: 462–473), lexical markers, e.g. interjections, syntactic strategies, e.g. the exclamative sentence type (see also Celle et al. Reference Celle, Jugnet, Lansari, L’Hôte, Celle and Lansari2017: 223 for an overview of mirative syntactic constructions), and prosodic marking, e.g. exclamatory intonation (e.g. Rett & Sturman Reference Rett and Sturman2020). In Estonian, mirativity is not a grammatical category but is signalled by mirative strategies, which, among others, include SQs, the exclamative sentence type, and exclamatory intonation.
The meanings conveyed by mirativity markers involve sudden discovery, revelation or realisation, surprise, unprepared mind, counterexpectation, and new information (Aikhenvald Reference Aikhenvald2012). As mirativity markers express rather than describe these meanings, they do not include lexemes like surprised (Celle et al. Reference Celle, Jugnet, Lansari, L’Hôte, Celle and Lansari2017).
A dynamic semantic account of mirativity markers has recently been proposed by Rett, covering the more general class of emotive markers, which mirativity markers are a part of (Rett Reference Rett2021; Rett & Sturman Reference Rett and Sturman2020), as well as the class of expressives (Rett Reference Rett and Sturman2020). According to the account, emotive markers express (rather than describe) the speaker’s emotive attitude – in the case of mirativity markers, the speaker’s attitude of surprise or exceeded/violated expectation – toward a proposition that the speaker has recently learned and that is made salient by the utterance in which the marker occurs. In the case of interrogatives, this salient proposition is not provided by their denotation, but instead, it can be contributed by their existential presupposition or, in the case of polar interrogatives, a salient prejacent or highlighted alternative (Rett Reference Rett2021: 333). The emotive attitude toward this salient proposition is treated as a kind of illocutionary content that is modelled as the speaker’s public discourse commitment – in the case of mirativity markers, of the form is-surprisedx(p) –, the nature of which is taken to be encoded in the lexical or prosodic entry of the emotive marker.
In the SQs examined in the present study, we expect prosody to play the role of the emotive marker that signals their mirative meaning. The existing literature on the prosody of SQs mainly examines utterances that are lexically or syntactically marked interrogatives. Among questions conveying surprise cross-linguistically are declarative questions, i.e. declarative sentences that are prosodically (and contextually) marked as non-canonical statements, e.g. It was raining?
For English, this aspect of the prosody of declarative questions has received considerable attention (see e.g. Bartels Reference Bartels2013; Gunlogson Reference Gunlogson2003; Reference Gunlogson2008; Truckenbrodt Reference Truckenbrodt, Maienborn, von Heusinger and Portner2012). For example, according to Truckenbrodt (Reference Truckenbrodt, Maienborn, von Heusinger and Portner2012), English declarative and polar questions are labelled with the intonational morpheme H-, which marks a salient proposition as put up for question by the speaker. Additionally, a declarative question may, but need not, express surprise. According to Gunlogson (Reference Gunlogson2003), this is achieved by an expanded pitch range. Gunlogson argues that this is not specific to rising declaratives referring to Ladd & Morton (Reference Ladd and Morton1997) and Hirschberg & Ward (Reference Hirschberg and Ward1992) to suggest that surprise can also be expressed by superimposing an expanded pitch range on the falling intonation of declaratives or the ‘rise-fall-rise’ contour. Thus, in English declarative SQs, prosody has two roles: rising intonation signals that the declarative utterance is not a canonical statement, while the expanded realisation of this rising intonation conveys the surprise of the speaker.
Another type of question that serves to express surprise cross-linguistically is echo questions, including echo wh-questions, which differ from canonical wh-questions in terms of syntax (the wh-phrase may appear in situ), information structure (narrow focus on the wh-phrase) and prosody, e.g. [Where did John go?] Where did WHO go?
While English canonical wh-questions usually share the falling intonation of declaratives, echo wh-questions are marked by a high phrase accent (H-), especially when they seek repetition or express surprise (Bartels Reference Bartels2013; Truckenbrodt Reference Truckenbrodt, Maienborn, von Heusinger and Portner2012). For example, according to Truckenbrodt’s (Reference Truckenbrodt, Maienborn, von Heusinger and Portner2012) analysis of canonical wh-questions, the interrogative form of the sentence signals the questioning speech act, while the nuclear intonational morpheme H* instructs the addressee to add a salient proposition to the common ground of the speaker and addressee, the salient proposition being the implicature that there is a true answer. Echo wh-questions, by contrast, are marked with H- signalling that the echoed utterance has not become part of the common ground. Repp & Rosin (Reference Repp and Rosin2015) have shown that emotionally expressive indignant echo wh-questions in German are phonologically identical to information-seeking and repetition-asking echo questions, but characterised by a greater pitch excursion. They draw parallels with the results of an earlier study by Bänziger & Scherer (Reference Bänziger and Scherer2005), where a greater pitch range was associated with high levels of emotional arousal.
To our knowledge, the only study that directly addresses the prosody of syntactically unmarked interrogative sentences expressing surprise is Celle & Pélissier’s (Reference Celle and Pélissier2021) study on French that revealed significant prosodic differences between syntactically similar ISQs and SQs: SQs exhibited lengthening, slower speech rate, and less frequent final rising contours (but no difference in mean pitch or pitch range). The lengthening observed in French SQs could also be associated with signalling emotional expressivity. The underlying contour, in turn, characterised by less frequent final rises than in ISQs, could indicate that the utterance is not a canonical question.
Somewhat similarly, prosody has been found to play two separate roles also in exclamatory questions. Rett & Sturman (Reference Rett and Sturman2020) examined four types of English exclamations showing that they were all characterised by the L+H* pitch accent, extra-high tonal targets, and the insertion of additional intermediate phrases, which were interpreted to mark mirativity. They also found that each type of exclamation displayed additional prosodic properties, which they took to serve to maximally differentiate exclamations from their closest syntactic non-mirative counterparts. In particular, wh-exclamatives contrasted with canonical wh-questions in that their wh-phrase was highly prominent, while in canonical wh-questions this was not the case.
In summary, the literature predicts that prosody will be crucial in syntactically and lexically unmarked wh-interrogatives that express surprise. Prosody in Estonian is similarly expected to signal that such utterances are not canonical questions and additionally convey their emotional expressivity, which would be manifested in both a longer duration and a wider pitch range as in the languages reviewed above. As for the prosodic marking of interrogatives as non-canonical questions, however, we expect Estonian to differ from the languages described so far (primarily English). In most accounts, the prosodic expression of speech acts is associated with intonational morphemes in the form of a specific pitch accent, phrase accent, or boundary tone.
While Estonian is an intonation language like English, it has a relatively small inventory of pitch accents. Phrase accents are not used, and only a limited use of boundary tone contrast is needed (see Asu Reference Asu2004 for the details). The transcription system for Estonian intonation follows a ToBI-like (Tones and Break Indices) system, e.g. ToDI for Dutch (Gussenhoven Reference Gussenhoven and Jun2005). The most common pitch accent in Estonian is H*+L – a high tone aligned with the accented syllable followed by a low tone on the following unaccented syllable(s). There is an upstepped variant ^H*+L (not treated as a separate category) that is used when the high accented syllable in the bitonal pitch accent is higher in pitch than the preceding H*. Upstepped pitch accents are usually associated with emphasis marking, e.g. to signal narrow focus as opposed to broad focus (Sahkai, Mihkla & Kalvik Reference Sahkai, Mihkla and Kalvik2015). An upstepped nuclear ^H*+L is characteristic of polar questions (Asu Reference Asu2004). Another frequently occurring pitch accent in Estonian is H+L* – a low tone aligned with the accented syllable preceded by a high tone on the preceding unaccented syllable. Earlier findings suggest that H+L* is frequent in statements but absent from questions (Asu & Nolan Reference Asu and Nolan2007). In addition to these two most common (and some other less frequently used) bitonal pitch accents, the intonational inventory of Estonian also includes monotonal pitch accents H* and L*.
Intonational phrase (IP) boundaries in Estonian are by default left unmarked, as there are no important pitch events associated with the phrase boundary. After the default H*+L nuclear accent, the boundary ends on a low pitch and is labelled as 0%. In the case of rising intonation, the rising pitch movement takes place immediately before the IP boundary after the low tone on the nuclear accent (L*), and the boundary is marked as H%. IP-final rises in Estonian can be used in canonical questions, although they are not obligatory and their occurrence is extremely speaker-dependent. An IP can also end with a high plateau in which case the nuclear accent is H* followed by a plateau and the boundary is labelled as 0%.
Given the relatively small tonal inventory of Estonian, we expect speech acts to be signalled by other prosodic means than pitch accent types and boundary tones. This prediction is additionally supported by earlier findings on the prosodic comparison of Estonian statements and questions (e.g. Asu Reference Asu, Bel and Marlien2002; Asu Reference Asu2004; Vende Reference Vende1973) and ISQs and RQs (Asu, Sahkai & Lippus Reference Asu, Sahkai and Lippus2020).
Estonian canonical questions are similar to canonical statements in that they display a series of H*+L pitch accents (Asu Reference Asu2004). Likewise, a comparison of RQs with string-identical ISQs revealed that RQs contained on average more pitch accents per utterance than ISQs but were otherwise phonologically identical to the latter (Asu, Sahkai & Lippus Reference Asu, Sahkai and Lippus2020).
A universal characteristic of canonical questions is higher pitch, either globally during the whole utterance or locally somewhere in the utterance (Haan Reference Haan2001: 56). While higher pitch is a continuous phonetic parameter, it can be regarded as playing a meaningful role in signalling the questioning speech act. This has also been shown to be true for Estonian. For instance, declarative questions are distinguished from formally identical statements by a higher and later peak of the nuclear accent and an overall higher pitch (Vende Reference Vende1973) as well as shallower declination (Asu Reference Asu, Bel and Marlien2002; Reference Asu2004). Also, Estonian RQs are characterised by a significantly narrower pitch range and lower pitch than canonical questions (Asu, Sahkai & Lippus Reference Asu, Sahkai and Lippus2020; see similar findings on the phonetic realisation of speech acts in Braun et al. Reference Braun, Dehé, Neitsch, Wochner and Zahner2019).
It has been shown that voice quality can also play a role in signalling different speech acts. For instance, Braun et al. (Reference Braun, Dehé, Neitsch, Wochner and Zahner2019) show that RQs in German are more frequently realised with breathy voice quality than ISQs. Preliminary observations from a pilot study on Estonian RQs (Asu, Sahkai & Lippus Reference Asu, Sahkai and Lippus2020) lead us to suggest that non-modal (creaky) voice quality can be used to additionally differentiate between canonical and non-canonical questions. Creaky voice quality is a common feature of Estonian speech (e.g. Aare, Lippus & Šimko Reference Aare, Lippus and Šimko2017), which can be associated with various linguistic and communicative functions, the scope of which has, however, not been studied in depth.
Taking into account the previously reviewed literature, we make the following predictions:
-
1) SQs, as compared to ISQs, are realised with enhanced prosody, which is likely to be manifested in a longer total duration and a wider pitch range, including upstepped H*L pitch accents.
-
2) Prosodic means signalling that the utterance is not a request for information but expressing surprise include lower mean pitch, non-modal voice quality (creaky voice), and a different distribution of pitch accents but not the phonological inventory of pitch accents and boundary tones.
2. Materials and methods
The study compared the prosody of string-identical SQs and ISQs that were elicited by means of context descriptions. The design of the experiment as well as the types of interrogatives were inspired by Celle & Pélissier (Reference Celle and Pélissier2021). All data, analysis code, and research materials of the study are available at the Open Science Framework repository: https://osf.io/knygh/.
2.1 Materials
The materials included two sets of open interrogative sentences. Each set contained 12 sentences. All sentences consisted of a non-subject interrogative phrase (wh-phrase) followed by a pronominal subject and a finite verb.
In the first set of interrogative sentences (mis-interrogatives), the wh-phrase corresponded to the predicative complement and consisted of the interrogative determiner mis (‘what’) and a nominative singular noun. The noun was varied, being monosyllabic, CVVC, in six cases (e.g. pood ‘shop’) and disyllabic, CV.CV(C), in six cases (e.g. pidu ‘party’). The subject was always the demonstrative pronoun see (‘this’) in nominative singular, and the finite verb was always the present tense third person singular form of the copula olema ‘to be’ (on ‘be.3SG’), e.g. Mis kook see on? (‘What cake is this?’).
In the second set of interrogatives (mida-interrogatives), the wh-phrase corresponded to the object and consisted of the interrogative pronoun mida (partitive case of mis ‘what’). The subject was always the short form of the second person singular pronoun sa (‘you’) in the nominative case. The finite verb was in the present tense second person singular. The verb was varied, being monosyllabic (CVVC) in six cases (e.g. teed ‘do.2SG’) and disyllabic (CVV.CVC) in six cases (e.g. keedad ‘boil.2SG’), e.g. Mida sa keedad? (‘What are you boiling?’).
In order to elicit a surprise reading and an information-seeking reading of the structurally identical target sentences, two different contexts (situation descriptions) were created for each test sentence. The contexts for eliciting ISQs prompted the participant to imagine a situation that would require asking for information from a knowledgeable source. The contexts for eliciting SQs, on the other hand, prompted the participant to act surprised by including one of the following expressions: “you are surprised”, “you see/hear/find with surprise”, and “to your surprise”. In mis-SQs, the object of surprise was the identity or properties of the subject referent. In mida-SQs, the surprising element was the activity of the subject or the object of this activity. The surprise was caused by the speaker’s unprepared mind or counterexpectation regarding the object of surprise. In the case of counterexpectation, the context additionally described the expected situation and, in most cases, contained the adversative particle hoopis (‘instead’).
In addition to the two basic causes of surprise (the speaker’s unprepared mind or counterexpectation), the context descriptions contained two more features. In the case of mis-SQs, half of the contexts included the feature ‘incongruity’, i.e. in addition to being unexpected by the speaker or contrary to her expectation, the object of surprise was an atypical member of its category or the speaker was unable to categorise it. In the case of mida-SQs, half of the contexts explicitly specified that in addition to being surprised the speaker was also annoyed. It is possible that some of the descriptions may have implicitly induced some other attitudes. In particular, some of the contexts involving counterexpectation may have additionally induced a certain degree of disappointment or disapproval while others were more neutral.
Each context description also included an addressee who, in the case of mis-SQs, either shared the surprise of the speaker or participated in the situation that caused the speaker to be surprised and, in the case of mida-SQs, was the cause of surprise for the speaker due to his/her activity or the object of this activity. An example of a set of context descriptions can be found in Table 1.
In total, there were 48 contexts eliciting the target sentences (12+12 mis-interrogatives and 12+12 mida-interrogatives). Additionally, twice as many fillers were used, including requests and exclamations along with string-identical ISQs.
2.2 Informants
Twenty-one speakers of Standard Estonian participated in the recordings. They were all right-handed women between 20 and 32 years old. All the informants could speak at least one foreign language and only two informants had lived abroad longer than one year. The informants were remunerated for their participation.
2.3 Procedure
The recordings were made in the sound-treated recording booth of the phonetics laboratory of the University of Tartu using a Praat (Boersma & Weenink Reference Boersma and Weenink2020) demo script. The recording sessions were self-paced and took on average 35 to 40 minutes to complete. The informants were asked to first silently read the context description that appeared on the computer screen, and when ready they could proceed to the next slide where the test sentence was displayed. They had five seconds to record the test sentence. If needed, the latest test sentence could be re-recorded. Each recording session was preceded by three trial contexts in order to make sure that the participant understood the test procedure. All the materials (144 situation descriptions, including 48 test items and 96 fillers) were presented to each participant in randomised order at one sitting. Each context and test item appeared only once.
2.4 Analysis
The recordings were segmented with a forced aligner using automatic speech recognition (Alumäe, Tilk & Asadullah Reference Alumäe, Tilk and Asadullah2018), and the segment boundaries manually corrected. The data were annotated for intonational pitch accents and boundary tones following the transcription system for Estonian intonational phonology (Asu Reference Asu2004).
The utterance duration in seconds and pitch in Hertz from 100 equidistant points were extracted using a Praat script. Additionally, the following F0 measures were calculated: utterance mean F0; pitch range between the 5% and 95% quantiles within the utterance (rather than absolute minimum and maximum values in order to minimise random measurement errors); and F0 in the beginning and end of the utterance as the mean of the first and the last vowels, accordingly. The results were tested in R (R Core Team 2020) using the packages LME4 (Bates et al. Reference Bates, Mächler, Bolker and Walker2015) and lmerTest (Kuznetsova, Brockhoff & Christensen Reference Kuznetsova, Brockhoff and Rune2017). All F0 measures were converted from Hertz to the semitone scale with reference to the speaker’s mean. The acoustic measures of duration and F0 were fitted with linear mixed models, with condition (levels ISQ and SQ) and type of interrogative (levels mis and mida) as fixed effect, and random intercepts and random slopes of condition for speaker and item. The durations were log-normalised in order to approach a normal distribution. The occurrence of creaky voice as a binary factor was tested with a logistic mixed model with the same fixed and random factors.
3. Results
The results are presented in two parts: first, the analysis of intonational phonology in terms of phonological pitch accents and boundary tones as well as accentuation patterns, and second, the phonetic analysis of the various prosodic features.
3.1 Intonational features
3.1.1 Pitch accents
The most frequently occurring pitch accent was H*+L followed by its upstepped variant ^H*+L. The upstepped pitch accent occurred almost twice as frequently in SQs than in ISQs (117 vs. 60) and was particularly common in the nuclear position of mis-interrogatives (83 SQs vs. 43 ISQs).
The otherwise common H+L* was rare in the present data, occurring only in 18 instances, all mida-interrogatives (16 ISQs vs. 2 SQs), and only in the data of four speakers. Eleven of all the instances occurred in the data of one speaker.
3.1.2 Boundary tones
Most of the utterances ended on a low pitch; therefore, the final boundaries were labelled as 0%. Only 24 utterances ended with a high boundary (H%). The final rise was clearly more common with ISQs (19 vs. 5) and more common in mida-interrogatives than in mis-interrogatives (18 vs. 6). A high final plateau (H* 0%) occurred in only six cases and was also more common with mida-interrogatives (4 vs. 2).
3.1.3 Accentuation patterns
Table 2 gives an overview of the different accentuation patterns in the data (disregarding the type of pitch accent). Pitch accented constituents are marked with capital letters. All utterances were grouped into five accentuation patterns. The two most common patterns were pattern 2 (WH-subj-VERB, 651 test utterances) and pattern 3 (WH-SUBJ-verb, 218 utterances). The former is characteristic of mida-interrogatives in both speech act conditions and the latter is more common for mis-interrogatives, in particular mis-SQs.
In all the patterns, the wh-phrase was overwhelmingly accented in both speech act conditions in both mis- and mida-interrogatives. It was unaccented only in one pattern (pattern 5), including nine utterances – all mida-interrogatives (3 ISQs and 6 SQs) where the only pitch accent was on the subject. In mis-interrogatives, there was always a pitch accent within the wh-phrase. The accent was on the interrogative pronoun, the noun, or both (these different options are not represented in Table 2). The least frequent pattern among these was the one where only the wh-pronoun received a pitch accent and this type was more frequent in SQs (53 SQs vs. 21 ISQs).
The subject pronoun was unaccented in two patterns (patterns 1 and 2) and accented in three patterns (patterns 3 to 5). The accentuation of the subject pronoun was characteristic of SQs: it was accented in 130 (52%) mis-SQs and 36 (14%) mida-SQs, as opposed to 54 (21%) and 14 (6%) ISQs, respectively.
The verb was accented in two patterns (patterns 2 and 4) and unaccented in three patterns (patterns 1, 3, and 5). The deaccentuation of the verb was characteristic of SQs. The verb was unaccented in 185 (73%) mis-SQs and 31 (12%) mida-SQs, as opposed to 120 (48%) and 14 (6%) ISQs, respectively.
The nuclear accent can occur on any of the three constituents. It was least frequently on the wh-phrase and this placement did not distinguish the two speech act conditions (pattern 1). When the subject pronoun received an accent, this accent was nearly always also the last accent (see patterns 3 to 5). Consequently, SQs were characterised by nuclear accent placement on the subject pronoun. The finite verb was overall the most frequent location of the nuclear accent (patterns 2 and 4) except in mis-SQs, where the nuclear accent was most often on the subject pronoun.
3.2 Phonetic features
The analysis of the phonetic features is based on the whole data set except in the calculation of final F0, where the utterances ending with the high boundary tone and the high plateau and those containing creaky voice were excluded. The average pitch contours were based on utterances that were phonologically identical in terms of pitch accent distribution, pitch accent types, and boundary tone.
3.2.1 Duration
Figure 1 compares the utterance duration of ISQs and SQs in mis- and mida-interrogatives. SQs are significantly longer in duration than ISQs for both sets of interrogatives (β = 0.13, t = 5.88, p < 0.001), although the effect is slightly stronger for mida-interrogatives (β = 0.06, t = 2.11, p = 0.047). As mida-interrogatives contained three words as opposed to the four words of mis-interrogatives, there was a significant difference in their duration (β = –0.19, t = –6.2, p < 0.001).
3.2.2 F0 mean and F0 range
The left panel of Figure 2 displays the comparison of F0 means (in semitones) in ISQs and SQs. Mean pitch is significantly lower for SQs (β = –0.9, t = –5.37, p < 0.001). It is significantly higher in mida-interrogatives for both ISQs and SQs (β = 0.69, t = 5.13, p < 0.001): 0.77 vs. 0.05 semitones in ISQs and –0.17 vs. –0.82 semitones in SQs.
The right panel of Figure 2 displays the comparison of F0 range (in semitones) in the two speech acts. There is a significant difference between ISQs and SQs for mida-interrogatives (β = 0.21, t = 4.1, p < 0.001), where the F0 range is wider in SQs, while this effect is not significant for mis-interrogatives (β = 0.02, t = 0.39, p = 0.69). The F0 range in mida-interrogatives is significantly narrower than in mis-interrogatives (β = –0.21, t = –6.31, p < 0.001): 6.94 vs. 8.58 semitones for ISQs and 8.85 vs. 8.92 semitones for SQs.
3.2.3 Utterance-initial and final F0
Figure 3 compares utterance-initial (left panel) and utterance-final (right panel) F0 (in semitones) in ISQs and SQs. For the initial F0, mis- and mida-interrogatives did not differ significantly, and the comparison is based on the presence or absence of an intonational pitch accent on the wh-pronoun. There is a significant difference between the utterances where the wh-pronoun is accented and those where it is not accented: utterances with accent on the interrogative word start higher (β = 1.23, t = 8.77, p < 0.001). Independent of the accentuation, the initial F0 in SQs is significantly lower than in ISQs (β = –0.42, t = –2.73, p = 0.013): 0.69 vs. 1.52 semitones for the non-accented condition and 2.41 vs. 2.70 semitones for the accented condition.
The right panel of Figure 3 compares the final F0 (in semitones) in the two speech acts. Utterances ending with H% and those containing creaky voice were excluded from this comparison. There is a significant difference between ISQs and SQs in the final F0 height: SQs end at a lower pitch than ISQs (β = –0.89, t = –4.75, p < 0.001): –4.06 vs. –3.01 semitones for mida-interrogatives and –3.80 vs. –2.50 semitones for mis-interrogatives. The difference between the final F0 of the two interrogatives is significant: mida-interrogatives end significantly lower (β = –0.3, t = –2.35, p = 0.025).
3.2.4 Voice quality (creaky voice)
Figure 4 displays the occurrence (left panel) and relative duration (right panel) of creaky voice in the data. Utterance-final creaky voice occurs significantly more often in SQs than in ISQs (β = 1.06, z = 4.61, p < 0.001): 42% vs. 22% of all the utterances. There is, however, a large inter-speaker variation as to the occurrence of creaky voice ranging between 4% and 73% of the utterances per speaker (mean 32%). In the utterances containing creaky voice, its duration is significantly longer in SQs (β = 0.17, t = 2.95, p = 0.003). There is no significant difference between the two wh-interrogatives with respect to the occurrence or duration of creaky voice.
3.2.5 Average pitch contours
Figure 5 presents average time-normalised pitch contours for the two most frequently occurring accentuation patterns – pattern 2 (WH-subj-VERB) and pattern 3 (WH-SUBJ-verb) – that were identical in the two speech act conditions (as presented in Table 2 above) and where there were enough data for a meaningful comparison. The left panel shows average pitch contours of SQs and ISQs for the mis-interrogatives of pattern 2 (119 ISQs and 64 SQs), the middle panel for the mida-interrogatives of pattern 2 (207 ISQs and 209 SQs), and the right panel for the mis-interrogatives of pattern 3 (30 ISQs and 67 SQs). It should be noted that fewer test items could be included in the comparison for pattern 3 due to more variation in the placement of pitch accents within the wh-phrase; only such utterances were included where the noun received a pitch accent.
It can be seen from Figure 5 that SQs have a lower mean pitch than ISQs. They also have a lower initial and final F0 except in mis-SQs in pattern 3 (right panel) where the nuclear accent is on the subject pronoun. A wider F0 range of the wh-pronoun can be observed in mida-SQs (middle panel).
4. Discussion
The aim of the study was to investigate prosodic characteristics of Estonian SQs by comparing them to string-identical ISQs. Two predictions were made. First, SQs were expected to be realised with enhanced prosody manifested in a longer total duration, a wider pitch range, and more upstepped pitch accents compared to ISQs, and second, prosodic characteristics signalling that the utterance is not a request for information were expected to include lower mean pitch, non-modal voice quality, and a different distribution of pitch accents.
As predicted, we did find evidence of enhanced prosodic realisation in SQs compared to ISQs in that SQs had a significantly longer duration, and the pitch range was significantly wider over the entire utterance for SQs. The latter was, however, true of only mida-interrogatives and not mis-interrogatives.
Additionally, the average pitch contour of mida-SQs shown in Figure 5 (middle panel) suggests that the wh-phrase of mida-SQs can receive an emphatic pre-nuclear accent. SQs also contained more upstepped accents, which can likewise be associated with emphasis in Estonian (Sahkai, Mihkla & Kalvik Reference Sahkai, Mihkla and Kalvik2015). All these prosodic characteristics convey the emotional expressivity of SQs.
As predicted, several prosodic characteristics of SQs differentiating them from ISQs can be used to signal that SQs are not canonical questions, as SQs were shown to have significantly lower mean, initial, and final pitch and a larger proportion of creaky voice quality. This is in accordance with earlier findings according to which Estonian ISQs are systematically distinguished from statements as well as from RQs by a higher pitch level. As mentioned before, high pitch is used to indicate the questioning speech act cross-linguistically (e.g. Haan Reference Haan2001). Consequently, the main prosodic feature signalling a request for information in Estonian can be taken to be a higher pitch level, while the absence of such a request is signalled by lower pitch, which can additionally be reinforced by creaky voice quality.
Also, as predicted, pitch accent types and boundary tones did not play a role in distinguishing SQs from ISQs. The H*+L pitch accent was the overwhelmingly predominant pitch accent in the data in both speech act conditions. The otherwise relatively frequent H+L* pitch accent only appeared in 18 instances. This is in accordance with earlier studies, which have shown that H+L* is frequent in statements but absent from questions (Asu & Nolan Reference Asu and Nolan2007). The present study suggests that H+L* is also absent from non-canonical questions (as also shown for RQs in Asu et al. Reference Asu, Sahkai and Lippus2020). Consequently, while non-canonical questions are similar to statements in terms of pitch level, they are different from statements in terms of the use of the H+L* pitch accent. The high boundary tone (H%) was more frequent in ISQs but overall rare in the data (only 24 instances). The realisation of intonation is extremely speaker specific, which is evidenced here by the occurrence of the high boundary tone in the data of only nine speakers and that of the H+L* pitch accent in the data of just four speakers.
Based on earlier findings regarding RQs in Estonian (Asu, Sahkai & Lippus Reference Asu, Sahkai and Lippus2020), it was also predicted that the difference between ISQs and SQs could be manifested in a different distribution of pitch accents. This prediction was borne out, in particular in mis-interrogatives. However, while RQs contained on average more pitch accents per utterance than ISQs, without change in the placement of the nuclear accent, SQs were distinguished by a different placement of the nuclear accent but not by the number of pitch accents. The most distinctive property of SQs in terms of accent distribution was nuclear accent placement on the subject pronoun, signalling narrow focus on the subject. In ISQs, the most frequent location for nuclear accent was the utterance-final finite verb. The subject pronoun was more often accented in mis-SQs than in mida-SQs. This can be associated with the fact that, in mis-SQs, the object of surprise, as described in the eliciting context, was the identity or properties of the referent of the subject pronoun, while in mida-SQs, it was the activity of the subject or the object of this activity. As can be seen from Table 3, the accentuation of the subject of mis-SQs was most frequently triggered by the contexts that contained the feature ‘incongruity’, that is, where the referent of the subject was an atypical member of its category in addition to being unexpected by or contrary to the expectation of the speaker. In such cases, the focus accent on the subject can be taken to contrast the observed referent with the speaker’s conception of the typical members of the relevant category. In other words, it can be taken to evoke an alternative set (in terms of Rooth Reference Rooth1992), including the typical members of the category. Moreover, the counterexpectation of the speaker, unaccompanied by incongruity, also favoured the accentuation of the subject. In these cases, the focus on the subject can be taken to contrast the observed entity with the one(s) expected by the speaker, or, put differently, it can be taken to evoke an alternative (set) that consists in the expected entity or entities.
In mida-SQs, the surprise of the speaker was not caused by the identity or properties of the subject referent but by the subject’s activity or the object of this activity. Nevertheless, in 14% of mida-SQs (36 instances), the subject was accented. This could suggest that the focus on the subject can occasionally create a contrast involving other unexpected elements in the situation besides the subject referent. In the case of mida-SQs, all of the elicitation contexts were of the counterexpectation type but differed with respect to whether the surprise was accompanied by disapproval or not. It can be observed that only three of the 36 mida-SQs with an accented subject were triggered by a context with the disapproval component. This is in accordance with the interpretation that the narrow focus on the subject signals the counterexpectation of the speaker, which is less in the foreground in the disapproving utterances.
ISQs, too, can have a narrow focus on the subject pronoun. This was the case in 54 mis-ISQs and 14 mida-ISQs. All ISQ-eliciting contexts that induced more than two renderings with an accented subject evoked a wider set of entities, which the subject referent formed part of and which was present in the utterance situation.
In summary, the focused subjects in the data evoke alternatives that have different sources. In ISQs, the subject referent is part of a wider set of entities that are present in the context of the utterance. In mis-SQs, the alternatives to the subject referent consist of the entities expected by the speaker or in the typical members of the category of which the subject referent is a part. In mida-SQs, the alternatives are contributed by the speaker’s expectations regarding the wider situation: the activity of the subject or the object of this activity. The relationship between mirativity and focus has repeatedly been noted in earlier literature (see Cruschina Reference Cruschina2021 for an overview). Cruschina (Reference Cruschina2021) defines mirative focus in terms of conventional implicature. According to the definition, mirative focus implies that there is at least one focus alternative proposition that is more likely than the asserted proposition with respect to a contextually relevant modal base (the context set) shared by the speakers and a stereotypical ordering source defining the normal course of events. This account of mirative focus needs to be developed in order to apply it to SQs, because these do not explicitly assert a proposition.
In addition to the various differences between SQs and ISQs, the results of the study also revealed differences between mis- and mida-interrogatives. Mida-interrogatives had a significantly higher mean pitch, lower final pitch, and narrower pitch range than mis-interrogatives. The higher mean pitch and narrower pitch range might be partly explained by the more frequent occurrence of utterance-final rise in mida-interrogatives included in this calculation. We do not, however, have an explanation for the lower final pitch in mida-interrogatives. Also, the features that distinguished SQs from ISQs turned out to be somewhat different in mis- vs. mida-interrogatives. Generally speaking, mis-SQs were more strongly distinguished phonologically, in terms of the placement of the nuclear accent, while mida-SQs showed more phonetic differences, in particular, in terms of pitch range, which did not distinguish between mis-SQs and mis-ISQs.
The results of the study further suggest that prosody distinguishes Estonian SQs not only from ISQs but also from RQs. Both SQs and RQs have a lower mean pitch and a larger proportion of creaky voice than ISQs, which could signal that they are not used to seek information. Likewise, both seem to be incompatible with the use of the H+L* pitch accent, which is restricted to statements. However, SQs tend to have a wider pitch range than ISQs, while RQs have a narrower pitch range than ISQs; this is consistent with the emotional expressivity of SQs that is absent from RQs. Both RQs and SQs have a longer duration than ISQs, although in RQs this can be associated with there being more pitch accents per utterance, which is not the case in SQs. Finally, SQs, unlike RQs, may have a characteristic information structure: contrastive focus signalling an alternative (set) that arises from the expectations of the speaker.
It remains to be studied whether SQs also differ from exclamatory questions. Recent studies on the prosody of exclamatory questions in German (Repp Reference Repp2020; Repp & Seeliger Reference Repp and Seeliger2020) and English (Rett & Sturman Reference Rett and Sturman2020) suggest that exclamatory questions differ from ISQs in similar ways as SQs do: they are longer in duration and contain an emphatic pitch accent and a different distribution and types of pitch accents.
Further study is also needed in order to establish whether there are any other phonetic characteristics that differentiate canonical and non-canonical questions. For instance, in the current study intonational peak alignment was not investigated, but, as SQs are longer in duration, they might exhibit different peak alignment compared to ISQs. Also, in addition to creaky voice, occurrences of breathy voice quality and laughter were observed in the data but not quantified for the present purposes.
5. Conclusions
The study examined the prosody of a little-studied type of non-canonical questions in Estonian: canonical interrogative sentences expressing surprise on the part of the speaker. The study was carried out by comparing string-identical ISQs and SQs and was based on data elicited by means of context descriptions. On the basis of earlier cross-linguistic studies on the prosody of SQs, it was predicted that prosody would play two roles: first, to convey the emotional expressivity of SQs by a longer duration and a wider pitch range, and second, to signal that the utterances are not canonical questions by a lower pitch level, non-modal (creaky) voice quality and a distinct distribution of pitch accents. Based on earlier studies on Estonian intonation, it was predicted that emphasis would additionally be manifested in a larger number of upstepped pitch accents. No differences were expected in terms of the phonological inventory of pitch accents and boundary tones.
These predictions were mostly borne out. SQs had a longer duration and a larger proportion of upstepped accents and tended to have a wider pitch range, suggesting that they were realised emphatically. They were also characterised by significantly lower mean, initial, and final pitch and a larger proportion of creaky voice quality, signalling that the utterance is not a canonical question. Additionally, it was shown that as a third role, prosody signalled a characteristic information structure that we associated with the expression of the speaker’s surprise caused by an incongruous or counterexpectational entity or state of affairs: we took a focal accent on a referring expression to evoke an alternative (set) arising from the speaker’s expectations.
As predicted, pitch accent types and boundary tones did not play a role in marking surprise. Still, both ISQs and SQs were characterised by the absence of the H+L* pitch accent, which is frequent in Estonian statements, implying that this might be a common feature of canonical and non-canonical questions.
The results of the study suggest that SQs differ not only from ISQs but also from RQs, in particular in terms of emphasis and information structure. Further study is needed to establish whether they also differ from exclamatory questions.