1. Introduction
Tima (tms), a language of Sudan, is typologically unusual in having a 12-vowel system made up of six pairs contrasting in advanced tongue root (ATR), shown in (1) (Dimmendaal Reference Dimmendaal and Dimmendaal2009; Bashir Reference Bashir2010; Tabain & Schneider-Blum Reference Tabain and Schneider-Blum2024).Footnote 1 The six pairs include a low central vowel pair as well as a high central vowel pair; both of these (but especially the high central pair) are unlikely contrasts in an ATR system. As we will show, all vowels participate fully in Tima’s ATR harmony process.
We present for the first time an acoustic study of Tima vowels, considering the results in light of questions of particular interest to phonologists. Our main goal is to understand how vowel contrasts might be maintained in Tima given the large number of vowels in combination with proposed constraints on ATR inventories. For example, it is unusual to find a [+ATR] counterpart of /a/ that is [+low], or to find central vowels in an ATR inventory. After surveying these proposed constraints and demonstrating the basic facts of vowel harmony in Tima, this study focuses on the acoustic realisation of these vowels, considering not just their place in the vowel formant space, but also their voice quality and duration characteristics. We show, among other things, that /ʌ/, the [+ATR] counterpart of /a/, patterns as a mid vowel; that the central vowels /ʌ/ and /ɘ/ overlap considerably but differ markedly in duration; and that, while F1 is the primary individual acoustic correlate of the Tima ATR contrast, as is typical of ATR languages, there are also systematic differences in voice quality between the [+ATR] and [−ATR] vowel classes. We contextualise the latter findings in terms of the Laryngeal Articulator Model of Esling et al. (Reference Esling, Moisik, Benner and Crevier-Buchman2019).
In addition to the above, this study contributes to our general understanding of the acoustic correlates of ATR contrasts, and is notable for presenting voice quality measures for such a large vowel inventory. Finally, it makes a contribution to our descriptive understanding of Tima, a highly understudied and marginalised language.
2. Tima background and ATR harmony
2.1. Tima background
Tima is commonly assumed to be a Niger-Congo language (though see discussion in Güldemann Reference Güldemann2018 for a comprehensive review of the classification of African languages). It is spoken in the Nuba Mountains of Sudan, in north-eastern Africa. The language is spoken in the home area of the Tima by roughly 7,000 people, and dialectal variation within the close-knit society is not attested.Footnote 2 Additional speakers are found in smaller communities in the bigger towns of Sudan such as Khartoum and Port Sudan (Meerpohl Reference Meerpohl2012: 23–24). Tima is part of the Katloid language cluster, which includes Katla, Julut and Tima, with Tima being the most distinct of the three (Dimmendaal Reference Dimmendaal, Schneider-Blum, Hellwig and Dimmendaal2018: 6). All three members of this group are regarded as endangered, mainly due to the spreading influence of Arabic in recent decades, but also due to greater speaker mobility (see Hashim et al. Reference Hashim, Alamin and Schneider-Blum2020: 175–176).
There is, broadly speaking, a decline in speaking fluency from the eldest to the youngest speakers of Tima. The Tima people are not only exposed to Arabic as the lingua franca and official language of Sudan, but also to English and Kiswahili, which were introduced into the school system as a result of the extremely difficult circumstances of the second civil war (1983–2005) (see Meerpohl Reference Meerpohl2012: §3.1). As Hashim et al. (Reference Hashim, Alamin and Schneider-Blum2020) point out, these various languages are often not perceived as distinct units, but together form a system that exploits all of them to various degrees in various contexts. In contrast, most neighbouring languages do not appear to show any influence on the Tima lexicon, which itself is comparatively small, a fact which is compensated for by making use of metaphor, metonymy and synecdoche (see Schneider-Blum Reference Schneider-Blum, Brenzinger and Fehn2012; Schneider-Blum & Dimmendaal Reference Schneider-Blum, Dimmendaal, Hanks and de SchryverTo appear). However, a good number of words from Arabic (or via Arabic) have entered the lexicon, with the new words mostly being morphophonologically adapted to the Tima system (see Hashim et al. Reference Hashim, Alamin and Schneider-Blum2020); such Arabic-origin roots were not considered in the present study.
Tima has the 21 consonants /p b t d ʈ c ɟ k ɡ ʔ ɓ m n ɲ ŋ r ɽ h l j w/.Footnote 3 This inventory is notable in lacking oral fricatives. Tima also maintains a two-way tonal contrast between high and low, as well as having downstepped high (↓H) realisations.
2.2. Tima vowels and vowel harmony
Our primary interest is in the vowel phonemes of Tima, which are shown in Table 1. Tima has 12 vowels, made up of six pairs contrasting in ATR (Dimmendaal Reference Dimmendaal and Dimmendaal2009; Bashir Reference Bashir2010; Tabain & Schneider-Blum Reference Tabain and Schneider-Blum2024). Table 1 shows these vowels as they are represented in the orthography of the language (first row) and as they would be represented using the International Phonetic Alphabet diacritics for advancement and retraction (second row). This table makes clear our assumptions about the featural representations of Tima vowels. Anticipating one of our results, we treat the vowel /ʌ/ as [−low], though it is the ATR counterpart of /a/.
Tima also has a vowel length contrast: all vowels can be long or short, for example, kɔ́yɔ̀ ‘do, make, produce’ vs. kɔ́ɔ́yɔ̀ ‘skin, pelt, fur’. The exception is /ɘ/, which occurs only as a short vowel.
For typographical ease, and since they are largely familiar from the phonological literature on ATR, we use the orthographic vowel symbols in this article (top row of Table 1). In line with previous work on Tima, in citing forms we also use orthographic 〈t̪〉 for the voiceless denti-alveolar /t/, 〈t〉 for the retroflex plosive /ʈ/ and 〈y〉 for /j/.
The (near-)minimal pairs in Table 2 show the ATR contrast. The first four pairs agree with regard to their tonal pattern, showing that the ATR value is independent of the tonal melody.
According to Casali (Reference Casali2008: 497), ‘[…] the greatest concentration of languages unquestionably regarded as having ATR harmony occurs within the Niger-Congo and Nilo-Saharan language families of sub-Saharan Africa’. Tima is among such languages; ATR harmony is regular and pervasive in the language. Within roots, vowels must agree in their ATR specification, as can be seen in Table 2. In other words, all vowels in a root must be drawn either from the set /i ɨ u e o ʌ/ or from the set /ɪ ɘ ʊ ɛ ɔ a/. In addition, a large range of prefixes, suffixes, proclitics and enclitics agree with roots/stems in ATR. Our description of the vowel patterns here relies on Bashir (Reference Bashir2010) and on the fieldwork of author Schneider-Blum, including corrections to some of Bashir’s data.
A large number of Tima affixes and clitics also alternate in other features, though we are not concerned here with such alternations. Most commonly, [back] and [round] spread together (subject to a prohibition on front rounded vowels). We call this ‘colour harmony’, following Padgett (Reference Padgett1995, Reference Padgett2002). Within the Katloid cluster of Tima, Katla and Julut, only Tima has colour harmony, and we mention it here only because it is evident in much of the Tima data. Like [ATR] harmony, colour harmony spreads from root vowels to those of affixes and clitics. Following Dimmendaal (Reference Dimmendaal and Dimmendaal2009: 335), we assume that colour harmony targets vowels that are [ $+$ high, $+$ back]. There are a number of [ $+$ high, $-$ back] affixes and clitics which do not undergo colour harmony (including the plural/collective prefix i/ɪ-, shown in (2) below), while the data at our disposal suggest that there are none with [ $+$ high, $+$ back] vowels that resist harmony.Footnote 4
Table 3 shows ATR (and colour) harmony at work with the nominal prefix kV-, one of the prefixes indicating singular number. As can be seen, the vowel of the prefix is [+high] but is otherwise fully predictable from the first root vowel. Since ATR harmony rarely co-occurs with inventories having non-low central vowels (as discussed later), and often does not affect /a/, it is worth noting here that the Tima vowels /ɨ ɘ ʌ a/ participate fully in ATR harmony; that is, they undergo harmony in addition to triggering it.
The examples in (2) show ATR harmony affecting the plural (or collective) prefix i/ɪ-, triggered by front, central and back vowels.Footnote 5 On adjectives, this prefix marks plural agreement; on verbal nouns, it means ‘in several places/times’ (Bashir Reference Bashir2010: 170). Since the prefix is underlyingly [−back], it does not undergo colour harmony.
The examples in (3) show that ATR harmony extends from roots to suffixes as well, again from front, central or back vowels. These examples show a mix of suffixes: the transitive (-I), antipassive (-Ak) and middle voice (-Vl) suffixes.Footnote 6 The precise realisation of these suffix vowels, apart from their ATR values, is a matter of allomorphy or assimilation, depending on the suffix. For example, the antipassive suffix (-Ak) may be [o/ɔ] (depending on ATR harmony) after [o/ɔ] (e.g., tóɽòk ‘support’; ŋɔ́lɔ̀k ‘draw water’), and is [ʌ/a] otherwise (again depending on ATR harmony).
Bashir (Reference Bashir2010) describes the behaviour of roughly three dozen affixes and clitics. Across these bound forms, ATR harmony holds regularly, with a couple of exceptions. First, certain pronominal enclitics do not harmonise, and their vowels may be either [+ATR] or [−ATR]. The examples in (4) show the first person exclusive and inclusive plural enclitics (underlined). This failure to harmonise is most likely a matter of the harmony domain, though an analysis of such facts is beyond the scope of this article.
Second, there are affixes and clitics in which the vowel /a/ fails to harmonise. (5) shows the verbal instrumental marker -aa. However, this is not a general property of /a/: this vowel always harmonises within roots, and, for example, the antipassive suffix -Ak seen in (3) harmonises. The first person enclitic -dA in (5), based on Bashir (Reference Bashir2010: 198), can also surface as [da] or [dʌ] depending on the harmony environment. In the examples below, it surfaces as [da] in harmony with instrumental -aa. Thus, the suffix -aa is opaque to harmony and spreads its own ATR value rightward.
To summarise the most important point of this section: ATR harmony is very regular in Tima. It is root-controlled, and importantly for our purposes, all 12 Tima vowels participate fully. This includes the low and central vowels.
3. Questions posed by the Tima ATR system
3.1. Tongue root–body synergies
In (6a)–(6c), we show schematically three commonly described ATR phoneme inventories. Languages exemplifying these inventories include Yoruba, Kinande and Akan (see Casali Reference Casali2008 and Rose Reference Rose, Schneider-Blum, Hellwig and Dimmendaal2018b for discussion). For comparison, we also repeat the Tima inventory in (6d).Footnote 7
As (6) might suggest, a consistent acoustic correlate of ATR contrasts in languages is F1 (Halle & Stevens Reference Halle and Stevens1969; Lindau Reference Lindau1979), the same formant responsible for perceived differences in vowel height. This is because an ATR contrast involves manipulation of the pharyngeal cavity volume, with a larger volume correlating with a lower F1. Tongue root advancement and retraction, characteristic of ATR languages, is one means of achieving differences in pharyngeal cavity volume, but it is not the only one. In some ATR languages, a [+ATR] vowel implies a lower larynx position compared to a [−ATR] vowel (see, e.g., Lindau Reference Lindau1979); some languages manipulate pharyngeal cavity walls in addition to the tongue root (see Tiede Reference Tiede1996 on Akan). Since tongue body height also affects F1, it requires an articulatory study to know for certain the relative contributions of tongue body height and pharyngeal cavity expansion in an ATR contrast. Certainly, the tongue body need not be a contributing factor (Ladefoged Reference Ladefoged1964; Lindau Reference Lindau1979; Allen et al. Reference Allen, Pulleyblank and Ajíbóyè2013; Hudu Reference Hudu2014).
Consider the asymmetries, or gaps, seen in (6a)–(6c). These reflect broad cross-linguistic tendencies about ATR systems: phoneme inventories often lack [−ATR] high vowels, as in (6a), or [+ATR] mid vowels, as in (6b). Low vowels that are [+ATR], missing from all of (6a)–(6c), are uncommon. These asymmetries are likely rooted in articulatory synergies between tongue root advancement and tongue body height. Ladefoged et al. (Reference Ladefoged, DeClerk, Lindau and Papçun1972) suggest such a synergy between tongue root advancement and tongue body raising, and MacKay (Reference MacKay1976: 104–105) argues that tongue root advancement is difficult for low vowels and notes a correlation between tongue root advancement and tongue body fronting. The essential observation is that the tongue root and body are part of the same mass, so that tongue root advancement tends to cause raising and fronting of the tongue body, and tongue body raising and fronting tends to pull the tongue root forward. In the phonological literature, Archangeli & Pulleyblank (Reference Archangeli and Pulleyblank1994) and Calabrese (Reference Calabrese1995) exploit such notions to explain phonological generalisations. For example, based on implicational generalisations over phoneme inventories, patterns of sound change, and other considerations, Calabrese argues for featural implicational generalisations like those seen in (7).Footnote 8 Calabrese further argues that if (7a) holds, then (7b) and (7c) must also hold, and if (7b) holds, (7c) must also hold. Put differently, low [+ATR] vowels are the most marked. Calabrese shows that violations of these implications can be satisfied in various ways: for example, a vowel that is [ $+$ low, $+$ ATR] can become [−ATR] (neutralising with /a/), or become [−low] (raising to mid), among other possibilities.
These articulatory synergies have implications for the acoustics and perception of vowels. As Casali (Reference Casali2008) points out, the vowels [i u] and [ɛ ɔ] (as well as [a]) are especially favoured because they synergise ATR and height. In contrast, the vowels [ɪ ʊ] and [e o] occupy an uncomfortable middle ground. Pursuing this idea, if we take their ATR values to be fixed, [ɪ ʊ] must be under pressure to lower, while [e o] are under pressure to raise. Indeed, in many languages, [ɪ ʊ] can be confused with [e o], respectively, at least by field workers (Casali Reference Casali2008; Rose Reference Rose2018a). On the other hand, [ɪ ʊ] can be confused with [i u] as well, and Rose (Reference Rose2018a) shows that these sets of vowels can have close F1 values. Either way, [ɪ ʊ e o] are perceptually vulnerable as well as articulatorily disfavoured, and these vowels are often absent from ATR systems.Footnote 9
Returning to Tima, its vowel inventory is unusual in flouting all of the markedness generalisations in (7). One of the goals of our study is to explore how the ATR contrasts are realised given the articulatory and perceptual facts discussed above. For example, it has been observed (Casali Reference Casali2008) that for languages having a low vowel ATR contrast, the [+ATR] low vowel is often not actually phonetically low as depicted in (6d) above. Is this true of Tima? If so, where exactly is this vowel with respect to /ɨ/ and /ɘ/? Likewise, we are interested in how the contrast is maintained between /ɪ ʊ/ and both /i u/ and /e o/.
3.2. High central vowels and inventory crowding
The Tima inventory includes the high central pair /ɨ ɘ/ contrasting in ATR. Based on a survey of 615 African languages with an ATR contrast, Rolle et al. (Reference Rolle, Faytak and Lionnet2017) conclude that there is incompatibility between having ATR harmony and having non-low ‘interior’ vowels. The articulatory synergies discussed above cannot account for such a gap: [+ATR] (which favours fronting and raising) is more compatible with central /ɨ/ than it is with back /u/, and [−ATR] (which favours backing and lowering) is more compatible with /ɘ/ than it is with /ɪ/.
We suggest that the incompatibility is a matter of inventory size and crowding. Putting aside contrasts in length, nasality, laryngealisation, etc., languages with 12 (or more) vowel qualities are only about 3% of languages surveyed by Maddieson (Reference Maddieson1984), and 1% of those discussed in Kingston (Reference Kingston and Lacy2007), who describes a larger sample of languages. Evidence that the distance between vowels matters (if it is needed) comes from the fact that larger inventories take up more of the vowel space than smaller inventories do (Becker-Kristal Reference Becker-Kristal2010) and that smaller inventories are more likely than larger ones to lack the corner vowels /i/ and /u/ (Sanders & Padgett Reference Sanders and Padgett2010).
The Tima vowel inventory is also very large in the context of ATR languages. In her survey of ATR harmony systems in the Nuba Mountains (where Tima is spoken), Rose (Reference Rose, Schneider-Blum, Hellwig and Dimmendaal2018b) mostly finds languages with 10 vowels or fewer. Katla and Julut, closely related to Tima, have 11 vowels (Birgit Hellwig, p.c.; Nüsslein Reference Nüsslein2020: 41).Footnote 10 Systems with 11 or 12 vowels are reported by Morton (Reference Morton, Marlo, Adams, Green, Morrison and Purvis2012) and Vahoua (Reference Vahoua2011) for Anii (probably Kwa) and for Bété (kpɔ̍kɷ̀gbɷ̀/kpokolo, Eastern Kru), respectively.Footnote 11 In addition, Zogbo (Reference Zogbo, Clem, Jenks and Sande2019: 728–730) lists several other Eastern Kru languages (Godié, Koyo, Guibéroua Bété and Gbawale) that have 13-vowel systems, with four central vowels participating in the ATR harmony system as well as neutral /a/. Thus, Tima is not unique. Yet all of these languages represent a relative extreme; as noted earlier, inventories with 7–10 vowels are much more common.
Given these considerations, our study is interesting for what it can tell us about how Tima’s vowels differ from each other. For example, does duration play any role in distinguishing among vowels? What about other phonetic properties, such as voice quality?
3.3. Voice quality
Though F1 distinctions are a consistent correlate of ATR contrasts, there have long been indications that at least some languages’ ATR contrasts involve distinctions of voice quality, not just vowel quality (see Casali Reference Casali2008: 510 for discussion). Descriptions of ATR contrasts have often called on impressionistic terms like ‘hollow’, ‘breathy’, ‘muffled’, ‘deep’ or ‘dull’ to describe [+ATR] vowels, vs. ‘hard’, ‘creaky’, ‘brassy’, ‘harsh’ or ‘pressed’ to describe [−ATR] vowels. These voice quality properties have been argued to arise due to synergistic articulatory connections between the tongue root and epiglottis, the aryepiglottic folds and the larynx (Denning Reference Denning1989; Edmondson et al. Reference Edmondson, Padayodi, Hassan, Esling, Trouvain and Barry2007; Moisik Reference Moisik2013; Esling et al. Reference Esling, Moisik, Benner and Crevier-Buchman2019).Footnote 12 As many have observed, such covariation in ATR languages between vowel quality and voice quality can resemble the clustering of properties found in languages having a so-called ‘register’ contrast, including languages of Southeast Asia.Footnote 13
By ‘voice quality’, we mean what Esling et al. (Reference Esling, Moisik, Benner and Crevier-Buchman2019: 2) call ‘vocal quality’ and characterise as ‘[…] short term effects, or “register” effects, that originate within the larynx […]. They are generally syllabic in duration and linguistically contrastive’. In fact, Esling’s Laryngeal Articulator Model treats an ATR contrast fundamentally as one of the epilarynx, that is, the constrictor mechanism uniting the tongue root and epiglottis, the aryepiglottic folds and the larynx: ‘The key articulatory basis associating ATR-like systems with correlated phonatory and general voice quality effects (such as a raised-larynx voice quality) is a relationship mediated by the state of the epilarynx’ (Reference Esling, Moisik, Benner and Crevier-Buchman2019: 174). In the production of [−ATR] vowels, this laryngeal structure is constricted, causing the tongue root to retract, the larynx to raise and phonation to become more constricted. When the tongue root is advanced for [+ATR] vowels, the epilarynx is unconstricted, the larynx can drop and phonation can become more open (or breathy). Esling et al. (Reference Esling, Moisik, Benner and Crevier-Buchman2019: 174) suggest that ATR is like register, only ‘vowel-oriented, not phonation-oriented’, which we understand to mean relying more on vowel quality than on voice quality for the contrasts. This is consistent with the general finding that F1 is the primary acoustic correlate of ATR, with voice quality effects playing a secondary role.
Casali (Reference Casali2008) suggests that correlations between ATR specification and voice quality are widespread in African ATR systems, but also suggests that the voice quality distinctions are often subtle. Nevertheless, Casali suggests that voice quality features may help distinguish the ATR contrasts, particularly for the perceptually vulnerable vowels [ɪ ʊ]. Given how rich the Tima vowel system is, the question therefore arises whether the ATR distinctions come with voice quality distinctions, and whether listeners might rely on both when identifying vowels. Thus, a further goal of our article is to provide acoustic data not only on vowel quality but also on voice quality, with questions of contrast in mind. We also simply hope to add to our general understanding of how voice quality manifests in ATR systems.
There are still relatively few acoustic studies of ATR languages that explore voice quality. These include Hess (Reference Hess1992), Fulop et al. (Reference Fulop, Kari and Ladefoged1998), Guion et al. (Reference Guion, Post and Payne2004), Przezdziecki (Reference Przezdziecki2005), Anderson (Reference Anderson2006/2007), Starwalt (Reference Starwalt2008) and Remijsen et al. (Reference Remijsen, Ayoker and Mills2011). Based on these studies, measures of spectral tilt and also F1 bandwidth can correlate with ATR contrasts, with [+ATR] vowels having a greater spectral drop-off and narrower bandwidth.Footnote 14 Some of these studies also consider vowel duration, but results are mixed or even contradictory across languages. Taken together, these studies suggest that voice quality measures do not correlate with ATR as consistently or robustly as does F1. However, Olejarczuk et al. (Reference Olejarczuk, Otero and Baese-Berk2019) find that two measures of periodicity, harmonics-to-noise ratio (HNR) and cepstral peak prominence (CPP), also separate all [ATR] vowel pairs in Komo, a Nilo-Saharan language of Western Ethiopia. Consequently, Olejarczuk et al. (Reference Olejarczuk, Otero and Baese-Berk2019) suggest that future ATR studies should incorporate such measures of voice quality differences, something we do here. The measures discussed here are defined in §4.2.
3.4. Goals of the acoustic study
In addition to providing a contribution to our understanding of ATR correlates, as well as a descriptive contribution on a highly understudied language, the goal of our study is to explore whether Tima is affected by the synergistic pressures affecting many ATR systems (involving the relationship between ATR and tongue height), and to learn about the means by which vowels are differentiated in such a rich ATR inventory. We can formulate the following hypotheses based on the above sections:
On the more exploratory side, given the vowel system crowding, we expect to find other voice quality measures that distinguish [+ATR] from [−ATR] vowels; duration may also play a role. Finally, /ɪ ɘ ʊ/, the high [−ATR] vowels, may be in close proximity to their [+ATR] counterparts /i ɨ u/ in the vowel space; alternatively, /ɪ ʊ/ may be close to /e o/. In either case, we may find that voice quality measures or duration contribute to these contrasts.
4. Method
4.1. Speakers, recordings and labelling
The recordings chosen for this study are from speakers who were recommended by the community as being among the best Tima speakers, meaning not only that they speak Tima fluently, but also that they were generally aware of which lexemes and more complex utterances are based on Arabic. We chose recordings from three male speakers who had contributed a relatively large number of recordings for a larger project on Tima, and whose recordings were of good audio quality for acoustic phonetic analysis. These speakers were born in 1968 (speaker HKD), 1943 (NAK) and ca. 1960 (KAM), and had no audible speech disorders.
The data for the present study were collected between 2007 and 2010, as part of a language documentation and dictionary project. Words were recorded in citation form. There were usually two productions of a given word of which one was used for this study.Footnote 15 Multiple morphosyntactic versions of a word were recorded at the same time (e.g., singular and plural), and any given word was usually recorded by only one speaker (more than one speaker was recorded if a pronunciation needed to be checked or compared). All words were discussed before recording. The metalanguage used was English with HKD and KAM, since it was the only language which both the researcher (author Schneider-Blum, a native German speaker) and the Tima speakers knew to a sufficient extent for the elicitation work. Elicitation with NAK was done using photos or, in the presence of HKD, via Arabic.
A word may be appropriate here on our methods. Many phonetic studies use highly controlled and balanced data. Our data are neither, but they do contain over 700 different words, meaning that we can generalise our conclusions with high confidence to all Tima words. Our data were not originally collected for a phonetic study, but as we will see, there is a great deal we can still learn from them.
Recordings were made using an Edirol R-09 recorder and Beyerdynamic M 58 microphone. Files were saved in WAV format at a 48 kHz sampling rate and 16 bits per channel. (The original stereo files were subsequently converted to mono for the purposes of phonetic labelling and analysis.)
Transcriptions of the words (in Tima and English) were imported from a spreadsheet and used for preliminary phonetic segmentation with the Munich AUtomatic Segmentation system (MAUS; Kisler et al. Reference Kisler, Reichel and Schiel2017) pipeline function G2P $\to $ MAUS $\to $ PHO2SYL. Manual correction of the phonetic MAUS labelling (e.g., correcting vowel–stop boundaries in cases of voicing bleed into the stop closure) was conducted using the EMU Speech Database Management System (Winkelmann et al. Reference Winkelmann, Harrington and Jänsch2017, Reference Winkelmann, Jänsch, Cassidy and Harrington2019), interfaced with the R statistical software package (R Core Team 2020). This manual correction was carried out by author Gregory and verified by author Tabain.
Table 4 gives the number of tokens in the database. These tokens are taken from 712 different words; these words with their translations are given in the Supplementary Material. Tima has both long and short vowels; however, the long vowels are much less common, making up about 20% of the database for this study (410 out of a total of 2,492 tokens). Moreover, this database deliberately included as many long vowels as possible, and so the relative frequency of long vowels in the language is likely much lower. In addition, the high central [−ATR] vowel /ɘ/ occurs only as a short vowel. For all of these reasons, we collapse across long and short vowel tokens for our analyses (acknowledging that vowel length may affect our measures in ways not explored here), and our analyses of duration will be based only on short vowel tokens. The most common vowels by far are /a ʌ/, while the mid vowels /ɛ e/ and /ɔ o/ (and to a lesser extent the high vowels /ɪ i/ and /ʊ u/) are less common. This is in line with typological observations of the observed-to-expected frequency of vowels in a given language (Gordon Reference Gordon2016: ch. 13).
It should be noted that most of the vowel tokens are produced by one speaker (1,786 tokens for HKD, compared to 531 tokens for NAK and 175 tokens for KAM). This is indicative of the number of words recorded by each speaker.
4.2. Measures and analyses
Signal processing of the WAV files was conducted using VoiceSauce (Shue Reference Shue2010; Shue et al. Reference Shue, Keating, Vicenik and Yu2011). In addition to vowel duration, the measures extracted and used here were the following:
-
1. Vowel formants and bandwidth. Using the Snack signal processor (Sjölander Reference Sjölander2014) within VoiceSauce. These data were sampled at the temporal midpoint of the vowel, in order to minimise consonant place effects from adjacent segments.
-
2. Root mean square (RMS) energy. This measures the energy of the output spectrum (source and filter), and is correlated with vowel height.
-
3. Strength of excitation (SoE). A measure of voicing intensity calculated over a short interval of time around each individual glottal closure, in order to isolate source energy.
-
4. Cepstral peak prominence (CPP). A particularly robust subclass of more general harmonic to noise ratio (HNR) measures. HNR measures separate out modal (i.e., periodic) signals from non-modal (i.e., non-periodic) signals, and are therefore indicative of voice quality contrasts when applied to similar speech signals (e.g., vowels). Lower CPP values suggest a noisier voice quality, either breathy or creaky, relative to a modal voice quality.Footnote 16 Combining CPP with a spectral tilt measurement (in our case, H1*-H2*, given next) can make clearer whether the noise is due to breathiness or creakiness.
-
5. H1*-H2*. The difference in amplitude between the first and second harmonics, corrected for vowel formants. Higher values suggest greater vocal fold spread. One of a class of spectral tilt measures assessing the relative amplitude of lower and higher frequencies in the spectrum. This is labelled as ‘H1H2c’ in our figures.
-
6. (‘Integrated’) spectral tilt. The spectral tilt (or slope, using regression) of the output vowel spectrum based on a 20 ms Hamming windowed Fast Fourier Transform (FFT) of the extracted audio samples, taken at the temporal midpoint of the vowel. See below. H1*-H2* is a measure purely of the laryngeal source signal, while integrated spectral tilt is a measure of the output signal which combines the laryngeal source and supralaryngeal filter (vocal tract resonances).
Measures 2–5 were calculated as means across the total vowel duration. Measures 1–5 were extracted at a sample rate of 1,000 Hz (i.e., every 1 ms).
For the (integrated) spectral tilt measure, we used the EMU Speech Database Management System (Winkelmann et al. Reference Winkelmann, Harrington and Jänsch2017, Reference Winkelmann, Jänsch, Cassidy and Harrington2019), interfaced with the R statistical software package (R Core Team 2020). Using the frequency range 100–1,000 Hz, we calculated a regression on the values returned by the FFT in order to obtain a spectral tilt value that considered the total spectral output (both source and filter), and was thus not dependent on any individual harmonic or formant. The spectral tilt measure therefore combines information from the vocal source with the formant output of the vocal tract filter. Since our male speakers’ f0 tended around 150 Hz, it can be expected that the frequency range 100–1,000 Hz included the first six to seven harmonics, together with F1 in this frequency range. This approach was adopted because the source spectral shape and the vowel quality are closely intertwined from the perspective of the listener, who does not have access to the separate source and filter signals. Indeed, the primary determinants of vowel quality (i.e., formants) can change markedly in amplitude as the source spectral shape is modified (see Kreiman et al. Reference Kreiman, Lee, Garellek, Samlan and Gerratt2021: §III.C for discussion of these issues, including the difference between narrow and broad views of voice quality).Footnote 17
Plots for this study were generated using the R package ggplot2 (Wickham Reference Wickham2009).
A linear mixed effects (LME) analysis was conducted for the measures explored in this study, using the nlme package of R Pinheiro et al. (Reference Pinheiro, Bates, DebRoy, Sarkar, Heisterkamp, Van Willigen, Ranke and Team2021). LME models allow us to set speaker and word as random effects in the data analysis, and are robust against differing numbers of tokens in each cell. Examination of the Akaike Information Criterion suggested that it was best to include the independent variables ATR ([+ATR] or [−ATR]) and Height as an interaction, rather than without an interaction.Footnote 18 The following command was used in R, where ‘DependentVariable’ was one of the various acoustic measures (such as F1, F2, BW1 and CPP) used in this study:
lme(DependentVariable~ATR*Height, data=data.df, random=~1|speaker/words)
For the purposes of these analyses, we created a binary factor Height, coding /ɪ i ɘ ɨ ʊ u/ as ‘High’ and /ɛ e a ʌ ɔ o/ as ‘Low’, though the latter are more precisely described as non-high from a phonological point of view. The questions we address are not about height per se but about height as it bears on ATR realisations; as we will see, for that purpose, the data do not support distinguishing mid and low vowels from each other. This feature of vowel Height will be represented in the box plots below, together with the [±ATR] feature.
5. Results
The LME results are presented in Tables 5 and 6; these will be referred to during our discussion. In both tables, the reference for ATR is [−ATR], and the reference for vowel Height is High.
Figure 1 shows the vowel formant results for our data. We removed from this plot 279 tokens with F2 higher than 2,200 Hz and/or F1 higher than 1,000 Hz, since these were assumed to be formant tracking errors. This left 2,213 vowel formant tokens. These are the tokens that are used as the basis for the LME results for both formant and bandwidth data. Short and long vowel data are pooled, for reasons discussed in §4.1.
The LME results confirm that [+ATR] vowels (brown) have a significantly lower F1 than [−ATR] vowels (dark blue), as expected. Of course, High vowels have a significantly lower F1 overall than Low vowels. (Recall that ‘Low’ combines mid and low vowels.) After Bonferroni correction for correlated measures (since formants are highly correlated within the vocal tract), there are no significant effects for F2.Footnote 19 Since it is possible that the significantly lower F1 for [+ATR] vowels could be primarily due to the /a/–/ʌ/ contrast, we ran a separate LME model excluding these two vowels. The results confirm that the significant result is not just being driven by these two vowels, but is in fact a general property of all of the ATR vowels ( $\textrm {Intercept} = 403~\textrm {Hz}$ , $\textrm {Beta} = 45~\textrm {Hz}$ for [+ATR], $\textrm {S.E.} = 5.0~\textrm {Hz}$ , $\textrm {D.F.} = 808$ , $t = 8.95$ , $p < 0.0001$ ).
It can also be seen in Figure 1a that the [−ATR] vowels occupy a much larger space than do the [+ATR] vowels, since the bottom of the vowel space is raised for the [+ATR] vowels. In particular, the [+ATR] vowel /ʌ/ is comparable in its F1 values to the mid vowels, unlike its [−ATR] counterpart /a/. We will therefore treat it as mid in the following discussions. For example, we will compare [−ATR] /ɪ ɘ ʊ/ to [+ATR] /e ʌ o/ respectively.
Based on discussion in §2, we wondered whether the high [−ATR] vowels /ɪ ɘ ʊ/ would be close in the formant space to either their [+ATR] counterparts /i ɨ u/ or to the mid [+ATR] vowels /e ʌ o/. As can be seen in Figure 1b, which combines the [+ATR] and [−ATR] vowels, the peripheral and central vowels actually pattern differently. /ɪ ʊ/ are very close to /i u/ and not /e o/, while /ɘ/ is closer to (mid) /ʌ/ and not high /ɨ/. Impressionistically, both /ɘ/ and /ʌ/ sound quite schwa-like, and different from /ɨ/, which indeed sounds like IPA [ɨ].
Figure 2 shows the box plots of vowel duration plotted according to ATR and vowel Height. The plot only shows the short vowels, since /ɘ/ occurs only as a short vowel. In this and all subsequent box plots, [−ATR] and [+ATR] vowels are shown in different colours, and vowel Height is shown using different line types.Footnote 20 ATR status appears to have no general effect on vowel duration. For example, /i ɨ u/ are not different from /ɪ ɘ ʊ/ in this respect. On the other hand, it can be seen that High vowels have a shorter duration than Low vowels overall, as would be expected cross linguistically. For example, though the vowels /i ɨ u/ seem no different from their [−ATR] counterparts /ɪ ɘ ʊ/, respectively, they are shorter than /e ʌ o/, respectively. Focusing on the central vowels, /ɘ ɨ/ are shorter than /a ʌ/, and /ɘ/ is shorter than /ʌ/. While /ʌ/ has a mean value of 79 ms ( $sd = 38.0$ , $N = 258$ ), /ɘ/ has a mean value of 41 ms ( $sd = 19.6$ , $N = 118$ ) – the latter is a very short vowel. The LME results confirm an effect of vowel height on duration values, and they also confirm that there is no effect of ATR on vowel duration.Footnote 21 This effect of duration is therefore orthogonal to (i.e., statistically independent of) the ATR contrast.
We turn now to the various measures of voice quality. Figure 3 shows the box plots of the mean CPP across the vowel token. While there is no effect of ATR, there is a clear effect of Height, with High vowels having a lower CPP value than Low vowels (suggesting a noisier vowel quality for the High vowels). These observations are confirmed by the LME models. CPP may contribute to distinguishing /ɪ ɘ ʊ/ from /e ʌ o/ respectively, since the former are high and the latter non-high.
Figure 4 shows the box plots of the mean H1*-H2* (our figures use the VoiceSauce label ‘H1H2c’) across the vowel token. There is a clear effect of vowel Height, with High vowels having a greater H1*-H2* value than Low vowels, suggesting greater vocal fold spread for the High vowels. This is confirmed by the LME results. In addition, the LME results indicate a significant effect of ATR on this harmonic tilt value, with harmonic tilt being greater for [+ATR] vowels as hypothesised, though this is not true for /ɛ e/ or /ɔ o/, as seen in Figure 4. From these results, we may conclude that [+ATR] vowels have a more open glottis for [+high] vowels and /a ʌ/. Tentatively, we may also guess that the High [+ATR] vowels have a more breathy voice quality, since the CPP values suggested that High vowels have a noisier spectrum than Low vowels.
Figure 5 shows the related measure of spectral tilt in the frequency range 0.1 to 1.0 kHz. Here the patterns are clear. [+ATR] vowels have a consistently more negative spectral tilt value than [−ATR] vowels, indicating a greater drop-off in spectral energy as frequency increases.Footnote 22 This suggests overall greater lossiness in the spectrum for [+ATR] vowels. Relatedly, High vowels also have a consistently more negative spectral tilt value than Low vowels; however, this may be at least partly due to the prominence of F1 in the lower frequency regions for High vowels as compared to Low vowels. Note also that there seems to be an effect of front–back on these results, in that ɪ/i has a steeper negative tilt than ɛ/e, and ʊ/u has a steeper negative tilt than ɔ/o – but ɛ/e and ʊ/u have similar tilt values. These results therefore reflect some contribution from the broader F2 bandwidth to the overall tilt. Note in addition that /ʌ ɘ ɨ/ (but not /a/) all pattern similarly using this measure. Given these results, spectral tilt may contribute to distinguishing /ɪ ɘ ʊ/ from both their high [+ATR] counterparts (with the possible exception of ɨ/ɘ) and the mid [+ATR] vowels. Overall, these spectral tilt results give the impression that voice quality may be a combination of effects from the voice source together with the filter.
Figure 6 shows the RMS energy of the spectral output (our figures use the label ‘mean energy’), and Figure 7 shows the strength of excitation at the glottal source. In Figure 6, it can be seen that overall, Low vowels have more energy than High vowels (confirmed by the LME models); this pattern is to be expected given the overall greater airflow and sonority of Low vowels. This observation extends to the contrast between /ɘ ɨ/ and /a ʌ/, with the former having less energy than the latter. More generally, /ɪ ɘ ʊ/ seem to be separated from /e ʌ o/ by this measure. By contrast, there is no overall effect of ATR on RMS energy, although the plots suggest that [+ATR] High vowels may have more energy than [−ATR] High vowels.
The results for mean SoE (Figure 7), by contrast, do not show any effect of vowel Height (confirmed by the LME models), but they show a clear effect of ATR, with the [+ATR] vowels having a greater strength of excitation than the [−ATR] vowels. One could hypothesise that the enlarged oro-pharyngeal cavity which may result from an ATR facilitates voicing due to the aerodynamic voicing constraint, which requires subglottal pressure to be greater than supraglottal pressure. Thus, voicing may be better facilitated for [+ATR] vowels as compared to [−ATR] vowels, in the same way that voicing is better facilitated for bilabial stops as compared to velar stops. This therefore provides another voice quality basis to the ATR contrast.
Taken together, these various voice quality results suggest that [+ATR] vowels have a less constricted voice quality (more vocal fold spread) in Tima, and also a stronger glottal source. It is not clear if the [+ATR] vowels are more breathy, compared to a more modal [−ATR] vowel, or if the [+ATR] vowels are more modal, compared to more creaky [−ATR] vowels, since the CPP measure was not informative in this respect. However, the fact that SoE is greater for the [+ATR] vowels suggests that it is more likely that the [+ATR] vowels are modal, rather than breathy, since voicing efficiency is greater for a modal voice quality than it is for a non-modal quality. At the same time, given the interaction between ATR and vowel height with regard to the voice quality measures, it is possible that there is a breathier voice quality for the High [+ATR] vowels, but not the Low ones. Note that auditorily, we (the authors) find it difficult to discern these differences, which we find very subtle to our non-Tima ears. It could be that there are differences between speakers in how the voice quality difference is realised, or between individual vowel pairs. This may be an area for further study as regards the interaction between tongue position and laryngeal state.
Finally, we consider BW1, the bandwidth of the first formant. Figure 8 (which contains 2,213 tokens due to removal of outliers as detailed for the vowel formant plot above) shows that the bandwidth is narrower for [+ATR] vowels than it is for [−ATR] vowels, though this does not seem to include /ɛ e/ and /ɔ o/. One could hypothesise that the difference by ATR is due to a shorter constriction for [+ATR] High vowels (see fn. 13), which leads to fewer losses arising from the constriction. It is notable that /ɘ ɨ/ are more similar to /a ʌ/ in their bandwidth values than to the other high pairs /ɪ i/ and /ʊ u/. Low vowels could be expected to have greater losses due to radiation at the lips, thanks to a more open jaw position. It is therefore not clear where precisely the losses arise for /ɘ ɨ/ – whether they arise mostly from a constriction, or mostly from radiation at the lips. These are questions which can only be answered by an imaging study.
After Bonferroni correction for correlated measures (since bandwidths are highly correlated within the vocal tract), the LME results confirm that [+ATR] vowels have a significantly lower BW1 than [−ATR] vowels (and therefore a narrower bandwidth). They also suggest a significantly higher BW1 for Low vowels (and therefore a greater bandwidth) – noting, however, that /ɘ ɨ/ pattern with the Low vowels in terms of raw BW1 values. There are no significant effects of ATR or Height for BW2, and for reasons of space, we do not plot these data here.
Table 7 presents the $\eta ^2$ results (conducted in R) for the acoustic measures examined above, as an effect size measure which explains the proportion (expressed as a value between 0 and 1) of variance accounted for by each of our two dependent variables (i.e., ATR and Height, as well as their interaction). It is calculated as , where is the sum of squares of an effect for one variable, and is the total sum of squares in an ANOVA (analysis of variance) model. Note that the $\eta ^2$ value therefore does not take into account speaker or word as a random effect. Nonetheless, it is a useful indicator of the extent to which a particular acoustic measure is affected by the ATR or Height contrast. A value less than 0.01 may be considered a small effect size; a value of around 0.06 may be considered a medium effect size; and a value of 0.14 or higher may be considered a large effect size.
It can be seen that ATR has a medium-large effect on F1, and vowel Height has a large effect on the same measure. In addition, vowel Height has a medium effect on BW1. No other formant or bandwidth measures show anything greater than a minimal effect of ATR or Height, and Vowel Duration only shows an effect of Height.
Of the various voice quality measures, CPP and RMS energy show an effect only of Height (medium-strong and medium, respectively). H1*-H2*, SoE and spectral tilt all show a small-medium or medium effect of ATR, and H1*-H2* and spectral tilt also show an effect of Height (medium-strong and strong, respectively).
6. Discussion
6.1. Initial hypotheses
Hypothesis (8a): [+ATR] vowels have lower F1 values compared to their [−ATR] counterparts
This hypothesis was confirmed. In addition, of all of the measures we employed, F1 is the measure that distinguishes [+ATR] from [−ATR] with the largest effect size ( $\eta ^2$ ). In this respect at least, F1 is the primary acoustic correlate of the Tima ATR contrast.
Hypothesis (8b): [+ATR] vowels have a greater spectral drop-off and narrower bandwidth compared to their [−ATR] counterparts
This hypothesis was confirmed, though for bandwidth only partially. Our spectral tilt measure distinguishes [+ATR] from [−ATR], indicating a greater spectral drop-off for [+ATR] vowels (see fn. 20). In addition, H1*-H2* (another spectral tilt measure) also indicates a greater spectral drop-off for [+ATR] vowels, though only for [+high] vowels and /a-ʌ/. These spectral tilt results suggest that [+ATR] vowels have greater vocal fold spreading, leading to a greater drop-off in spectral energy over the 100–1,000 Hz range. Finally, [+ATR] vowels have narrower Bandwidth values than [−ATR] vowels with the exception of /ɛ-e/ and /ɔ-o/. (See §5 section for more discussion of bandwidth.)
We also found that strength of excitation distinguishes [+ATR] from [−ATR] vowels. The greater strength of excitation of [+ATR] vowels may be due to an enlarged oro-pharyngeal cavity.
Hypothesis (8c): /ʌ/ has a lower F1 compared to its [−ATR] counterpart
This hypothesis was confirmed; indeed, our results support treating /ʌ/ as a mid vowel.
6.2. Duration
Duration differences do not support the ATR contrast in Tima. (Recall that Tima has contrastive vowel length, which may be relevant here.) Duration does play the expected role in distinguishing High from Low (non-high) vowels. Most interesting in this respect is the difference between /ɘ/ (which is very short) and /ʌ/, since these vowels are so close in the vowel space. We suggest that duration plays an important role in distinguishing this pair of central vowels.Footnote 23
6.3. Tongue root–body synergies
One of our goals was to explore whether Tima, in spite of its impressive symmetry in ATR, shows signs of the phonetic pressures involving ATR-height synergies. Recall that [+ATR] is antagonistic to lower tongue bodies and [−ATR] is antagonistic to higher tongues bodies, with the former pressure perhaps being stronger, given the relative dearth of [ $+$ ATR, $+$ low] vowels in languages. Indeed, though we suggest that the [+ATR] counterpart of /a/ is [−low], at least at the surface, we do not find that the high [−ATR] vowels /ɪ ʊ/ are lowered; in fact, they seem very close to /i u/. The central high vowels behave differently, with both /ɨ/ and /ɘ/ being lower than their front or back counterparts. Overall, our study does not find support for an incompatibility between [−ATR] and [+high].
6.4. Inventory crowding
Another goal was to explore whether Tima employs other acoustic dimensions besides F1 in order to support contrasts in a crowded vowel space. As we have already seen, duration may play a role in distinguishing [−ATR] /ɘ/ from the [+ATR] raised counterpart of /a/, /ʌ/. In addition, our results raise the possibility that voice quality differences play an important and systemic role in Tima contrasts. The diagram in (9) repeats the schematically presented Tima inventory from earlier, except that /ʌ/ is now grouped with the other mid [+ATR] vowels. As we noted earlier in the article, a pressing question for ATR systems is how high [−ATR] vowels like /ɪ ɘ ʊ/ are distinguished from their [+ATR] counterparts like /i ɨ u/ and/or from mid [+ATR] vowels like /e ʌ o/ respectively.
In Tima, the vowels /ɪ ʊ/ are very close to their [+ATR] counterparts /i u/, but not to the mid [+ATR] vowels /e o/. However, the central high vowels /ɨ ɘ/ behave differently; they are not notably close, and [−ATR] /ɘ/ is actually very close to /ʌ/, which we treat as mid, as noted above.
Table 8 summarises the measures other than F1 that may help to maintain such contrasts based on our results. The measures shown in the top row are based on significant differences found for [+ATR] vs. [−ATR] vowels, or for high [+ATR] vs. high [−ATR] vowels. Those on the bottom are based on significant differences found between High and Low (i.e., non-high) vowels. Though we cannot verify that each of these measures is significant for each of the shown vowel pairs, that is not the point.Footnote 24 Our results make it clear that these measures could contribute to making such vowel distinctions in Tima, and only perceptual studies could determine the extent to which they indeed matter to listeners. At the least, our results suggest that such studies are worth doing.
One implication of Table 8 is worth stressing. The presence of voice quality cues may be meaningful not just for corresponding [−ATR] and [+ATR] vowel pairs such as /i ɪ/, but also for pairs like /ɪ e/, that is, [ $+$ high, $-$ ATR] and corresponding [ $-$ high, $+$ ATR] pairs. (As we saw in §3.1, the latter pairs are also often confusable in ATR systems, at least for linguists, and they can be very close in F1 values. Though they do not seem to be very close in Tima, author Schneider-Blum of the current study nevertheless reports having trouble distinguishing the ɪ/e and ʊ/o pairs at times.) Our analyses found measures that distinguished between corresponding ATR pairs only if they were high, and others that distinguished primarily between high and non-high vowels. These distributions of correlates may initially seem like limitations for the purposes of contrast; but since /ɪ ɘ ʊ/ are particularly vulnerable when it comes to contrast, this may not actually be the case.
7. Conclusion
The results of this study support the existence of the full set of contrasts posited for Tima (see (1)). They provide further evidence that an ATR is relatively incompatible with a low tongue body; they are also consistent with previous claims that this incompatibility is more severe than that between a retracted tongue root and a high tongue body. Perhaps of greatest interest, they provide new evidence for a connection between [ATR] and voice quality.
It has been suggested that a focus on the tongue root in ATR languages may be unhelpfully ‘linguocentric’ (Moisik Reference Moisik2013), given that previous descriptions of ATR contrasts have often remarked on associated voice qualities, as discussed earlier. Our results support a voice quality distinction as part of the ATR contrast, though they also suggest that F1 is the primary basis of the Tima contrast (given the $\eta ^2$ results). The voice quality features are each less influenced by [ATR] compared to F1, and they are modulated by vowel height (though as we noted this may not be as limiting as it seems).Footnote 25 This sensitivity to vowel quality is in line with Esling et al.’s (Reference Esling, Moisik, Benner and Crevier-Buchman2019) Laryngeal Articulator model, which suggests a close anatomical relationship between vowel quality and laryngeal state. A link between ATR and voice quality may also be motivated by perceptual enhancement (Holt et al. Reference Holt, Lotto and Kluender1997) and/or perceptual integration (Kingston et al. Reference Kingston, Macmillan, Dickey, Thorburn and Bartels1997).
In many respects, our results for Tima, and previous results for other ATR languages, are reminiscent of the register contrasts found in south-east Asian languages. These show correlation of tone, vowel quality, phonation, duration, consonantal voicing and other voice quality measures (Denning Reference Denning1989). As discussed by Tạ et al. (Reference Tạ, Brunelle and Nguyn2022), the High (or Tense, Clear) register is characterised by higher pitch, a tense (i.e., possibly creaky) or modal voice quality, lower (or more peripheral) vowels, a shorter voice onset time (VOT) and shorter vowels. This register is believed to have been derived from voiceless stops. By contrast, the Low (or Lax, Breathy) register is characterised by lower pitch, a lax or breathy voice quality, higher (or more centralised) vowels, a longer VOT and longer vowels. This register is believed to have been derived from voiced stops.
We can see the similarities here with ATR languages: there is a similar relationship between [−ATR] vowels that are lower in the vowel space and have a more constricted voice quality (i.e., modal to creaky), and between [+ATR] vowels that are higher in the vowel space and have a less constricted (i.e., modal to breathy) voice quality. It is interesting that under this taxonomy, vowel spaces for Low register may have higher vowels or more centralised vowels; and vowel spaces for High register may have lower vowels or more peripheral vowels. This similarity with the constraint against having a low central (and therefore peripheral) [+ATR] vowel is striking, although it must be stressed that there is no clear cross-linguistic pattern for ATR in terms of vowel centralisation or peripherality.
Moreover, the similarities are not quite as extensive as they may first appear, since there is no indication that there is a relationship between tone and ATR in Tima. Table 9 shows the proportion of tones by ATR (where contour tones are HL or LH).Footnote 26 It can be seen that there is no relationship between the ATR status of the vowel and the associated tone in Tima, with low tones being a little bit more frequent overall in our database. Moreover, there was no effect of ATR on vowel duration in Tima, contra the trend in register languages. In addition, a brief examination of VOT values in our stop consonant data (not presented here) showed that although the trend may be in the right direction (with slightly longer VOT values for stops preceding [+ATR] vowels), the effect was very weak, and a much larger database would be needed to look at any consonant effect related to ATR quantitatively (see, however, Local & Lodge Reference Local and Lodge2004 for a qualitative examination of different effects of ATR on adjacent consonants).
Though this study, like many previous studies, finds that F1 is the most reliable individual measure of the ATR contrast, we believe it would be premature to conclude that voice quality correlates are unimportant or incidental in Tima or other ATR languages. First, some studies have shown significant variability between subjects in which measures correlate most robustly with ATR contrasts. For example, based on a random forest analysis of acoustic data, Olejarczuk et al. (Reference Olejarczuk, Otero and Baese-Berk2019) show that certain subjects rely even more on H1*-H2* or CPP than on F1 in producing a distinction between vowel pairs. A study that incorporates more Tima speakers might reveal similar speaker variation.Footnote 27 Second, linguists are still learning how to best measure voice quality differences, and some measures may be more reliable than others (cf. Kreiman et al. Reference Kreiman, Lee, Garellek, Samlan and Gerratt2021). Though H1*-H2* only partially distinguished [+ATR] from [−ATR] vowels in the current study, the spectral tilt measure we employed distinguishes vowels across heights. In addition, as noted earlier, measures of periodicity are more recently being explored, and we in fact found that strength of excitation distinguishes ATR vowel pairs in Tima. Third, there is not a direct line from individual acoustic measures to perception of the ATR contrast. We know little about the relative perceptual importance of individual acoustic correlates of ATR, nor about how they might perceptually combine (though see Kingston et al. Reference Kingston, Macmillan, Dickey, Thorburn and Bartels1997 on the latter). There is still little work on the perception of ATR contrasts, but what exists suggests we should be cautious about underrating voice quality correlates. Olejarczuk et al. (Reference Olejarczuk, Otero and Baese-Berk2019: 36) report that, based on preliminary results of a perceptual experiment, ‘some [Komo] speakers respond more than others to F1 manipulations in resynthesised stimuli, suggesting differences in the importance of this cue’. Fulop et al. (Reference Fulop, Kari and Ladefoged1998: 97) found that ‘Degema speakers do not classify their vowels very well using formant frequencies as the sole acoustic variable’. Finally, Rose et al. (Reference Rose, Obiri-Yeboah and Creel2023) found, based on discrimination tasks, that Akan speakers fared poorly at distinguishing /ɪ ʊ/ from /e o/, respectively (i.e., [−ATR] high from [+ATR] mid), though this is a phonemic contrast. (A similar finding is reported by Ozburn et al. Reference Ozburn, Canavesi and Akinbo2022, based on a different task.)
Given all of the above, we agree with other researchers cited in this article that we should look beyond the tongue root and vowel quality when considering ATR languages. A ‘linguocentric’ understanding of ATR comes perhaps too naturally for researchers whose native languages employ vowel quality differences signaled by F1. After analysing a register contrast in Chrau, a language of South Vietnam, Tạ et al. (Reference Tạ, Brunelle and Nguyn2022: 27) ask why previous researchers treated it (incorrectly) as a simple voicing contrast, and suggest that ‘[t]his could largely be due to the fact that the linguists who first described Chrau were not familiar with the concept of register and described it with the closest contrast available in their native language.’ Many of us may face similar limitations in dealing with ATR languages.
Supplementary material
The online-only supplementary material for this article provides a list of Tima words used in the present study and plots showing the data separately for the factors ATR and Height for selected measures. The supplementary material for this article can be found at https://doi.org/10.1017/S0952675724000125.
Acknowledgements
We would like to thank all of the Tima speakers we have worked with, and in particular Hamid Kafi Daldum, Nasraldeen Abdallah Korsha (d. 2018) and Kano Morto, who contributed recordings for the present study. We would also like to thank Pierre Badin for extensive discussion of losses in the vocal tract in relation to ATR; and Rosey Billington, Gerrit J. Dimmendaal, John Kingston, Grant McGuire, Avery Ozburn, Sharon Rose, Kimiko Tsukada and colleagues at a UC Santa Cruz Phonology Lunch meeting for feedback on early presentations of this work. Finally, we would like to thank the anonymous associate editor and three reviewers at Phonology for their thoughtful reviews. Akwaaɽɘkat̪aŋ /á↓kwááɽɘ́káátáŋ/!
Funding statement
This research was made possible through funding from the Volkswagen Programme DoBeS (Dokumentation bedrohter Sprachen; Documenting Endangered Languages). Research on Tima is currently continued under the umbrella of SFB 1252 (Project-ID 281511265) ‘Prominence in language’, funded by the German Research Foundation (DFG). We would like to thank both institutions for their generous support.
Competing interests
The authors declare no competing interests.