Introduction
Among the three main Eastern Yiddish dialects, only Central Yiddish (CY), once spoken in modern-day Poland, eastern Slovakia, northeastern Hungary, northwestern Romania, and parts of western Ukraine, consistently preserved the vocalic length feature inherited from Middle High German. However, the status of this feature, and its potential variability across CY territory, has never been examined. This study analyzes the acoustic correlates of the historical length contrast in the peripheral vowels /i/, /u/, and /a/ of CY and compares them across two regions: central Poland (roughly, an oval encompassing Warsaw, Łódź, Kielce, and Lublin) and the area south of the Carpathian Mountains known among Yiddish speakers as the Unterland. Figure 1 shows the geographical location of the Unterland, based on demarcations by Weinreich (Reference Weinreich1964) and Krogh (Reference Krogh, Aptroot, Gal-Ed, Gruschka and Neuberg2012).Footnote 1

Figure 1. Approximate bounds of historical Unterland, following Weinreich (Reference Weinreich1964) and Krogh (Reference Krogh, Aptroot, Gal-Ed, Gruschka and Neuberg2012).
While the Yiddish of the northern CY regions has been well described (see, e.g., Herzog, Reference Herzog1965; Jacobs, Reference Jacobs and Goldberg1993, Reference Jacobs2005), it has not been analyzed acoustically. The Yiddish of the Unterland is even less well understood. This region was settled by large numbers of Yiddish-speaking Jews only in the 19th century and was the site of extensive mobility, geopolitical upheaval, and language and dialect mixing in the century before World War II. These factors, along with the spatial and cultural disjunction between the Yiddish-speaking communities in the north (central Poland) and the south (Unterland), likely led to divergence of the Yiddish varieties spoken there.
Two earlier studies of the peripheral vowels revealed surprisingly small long/short ratios in the three pairs in the Unterland region, with no evidence of a shift in the perceptual cue for the contrast toward vowel quality (Nove, Reference Nove2021b; Nove & Sadock, Reference Nove, Sadock, Wagner and Stange-Hundsdörfer2025). In Nove (Reference Nove2021b), female Unterland speakers also exhibit significantly shorter duration differences than male speakers in that region, and in Nove and Sadock (Reference Nove, Sadock, Wagner and Stange-Hundsdörfer2025) the duration difference in the /i/ pair is significantly shorter among Unterland versus Polish speakers. Taken together, these findings suggest that the vowel length contrast in interwar Unterland Yiddish (UY) was tenuous and may have been undergoing change.
Studies demonstrate that language contact can impact the acoustic properties of vowel length contrasts in historically length-distinguishing systems. For example, research on New Zealand Māori reveals a progressive reduction in the distinction between short and long vowels in quality and quantity, presumably through contact with English, which lacks phonemic vowel length. Long vowels, except /aː/, have shortened, while short vowels have shifted to more peripheral positions (Harlow et al., Reference Ray, Keegan, Jeanette, Margaret, Watson, Stanford and Dennis2009; King et al., Reference King, Watson, Margaret, Ray, Keegan, Holmes and Marra2010; Maclagan et al., Reference Margaret, Ray, Jeanette, Keegan, Watson, Elhindi and McGarry2013).
We hypothesized that UY would show signs of divergence from the Yiddish of central Poland (Polish Yiddish, or PY), which we view as the more stable, conservative dialect, and that the latter would exhibit greater long-short ratios in all three vowel pairs. We also predicted a gender effect like the one found in Nove (Reference Nove2021b) in UY but not in PY. To test these predictions, we used audio files extracted from recordings of 38 Holocaust testimonies in the newly developed Corpus of Spoken Yiddish in Europe, or CSYE (Bleaman & Nove, Reference Bleaman and Nove2025), to compare the duration and quality of the peripheral vowels across the two regions. The results largely support our hypotheses: regression modeling indicates that duration differences in the long-short /i/ and /a/ vowel pairs are significantly smaller among Unterland speakers compared to those from Poland. Moreover, as projected, there is a within-region gender effect for these same vowel pairs showing Unterland females in the lead, but no analogous gender effect in the Polish group. Estimated model means of duration for these vowel pairs show that while central Poland meets or exceeds the 50-millisecond threshold for vowel identification posited by Labov and Baranowski (Reference Labov, Ash and Boberg2006) (50 ms for /i/ and 55 ms for /a/), among Unterland speakers the distinctions fall below the threshold (27 ms for /i/ and 43 ms for /a/). In the /u/ pair, we find no cross-regional differences, nor is there a gender effect within any region. Furthermore, the estimated duration difference for both groups is notably small (29 ms for the entire group, 33 ms in Poland alone), a result we interpret as reflecting ambiguity in the phonemic status of the short vowel. Finally, analysis of vowel quality shows no evidence of divergence in quality of the long-short vowels in UY.
The destruction of European Yiddish-speaking communities created a profound gap in Yiddish scholarship. Tracing patterns in linguistic variation within and across speakers in border regions such as the Unterland is crucial for understanding the role of sociocultural dynamics—including historical events, political shifts, migration patterns, and language contact—on the development of Yiddish dialects. Perhaps more importantly, research on prewar Yiddish provides essential baselines for understanding developments in contemporary Yiddish varieties. Our interest in UY stems from its significance as the ancestral dialect of most modern Hasidic Jews, who constitute the majority of global Yiddish speakers. The scarcity of prewar UY studies leads linguists to use PY as a benchmark for tracking changes in Hasidic Yiddish, which may lead to mistaking inherited features for innovations, a concern detailed on pages 151 to 153 of Bleaman and Nove (Reference Bleaman and Nove2025). The data in the CSYE are a crucial link between the Yiddish of today and that of the prewar era, enabling the diachronic study of contemporary Yiddish.
The next section of this paper provides sociohistorical context on the Jewish communities of prewar central Poland and the Unterland, emphasizing factors that could contribute to dialect divergence. We then introduce the variables analyzed and the three long-short vowel pairs, contextualizing them within other Yiddish dialects. After outlining our data and methods, we present findings on vowel duration and quality. Finally, we discuss the results and draw conclusions about dialect variation in these communities.
Sociohistorical background
Central Poland and the Hungarian Unterland differ markedly in their histories of Jewish settlement and cultural development, which may explain the observed dialectal variation between these two regions. We provide historical context for both regions to ground our analysis of these differences.
Jewish communities in central Poland
There is reliable evidence of a Jewish presence in central Poland starting in the latter half of the 15th century. While neighboring lands persecuted and expelled Jews, Poland’s relative tolerance attracted migrants and enabled Jewish communities to flourish and become vital to the region’s cultural and economic development (Polonsky, Reference Polonsky2010; Weinryb, Reference Weinryb1973). By the 19th century, Poland was the epicenter of European Jewry. At the onset of World War II, Warsaw alone had about 350,000 Jewish inhabitants, nearly one-third of its population, constituting Europe’s largest Jewish community and second only to New York City worldwide (Kamusella, Reference Kamusella2022). While most Jews were proficient in Polish and other languages of the region, Yiddish thrived here, as evidenced by approximately 80% of Jews reporting Yiddish as their mother tongue in the 1931 census (Prokop-Janiec, Reference Prokop-Janiec2019). In traditional religious schools, secular folk schools, and even public schools with a Jewish majority, Yiddish was commonly taught (Marcus, Reference Marcus1983:148-149).
Jewish communities in the Unterland
The historical Unterland, situated at the convergence of modern-day Slovakia, Hungary, Ukraine, and Romania, exhibited distinct Jewish cultural and religious traditions (Krogh, Reference Krogh, Aptroot, Gal-Ed, Gruschka and Neuberg2012; Weinreich, Reference Weinreich1964). Jewish communities there formed mainly through 19th century migration from surrounding areas. Early arrivals represented at least three Yiddish dialects, a blend that left discernible traces in the Yiddish of the region (Weinreich, Reference Weinreich1964). According to scholars studying new dialect formation (see, e.g., Kerswill, Reference Kerswill, Britain and Cheshire2003, Reference Kerswill, Mattheier, Ammon and Trudgill2006; Kerswill & Trudgill, Reference Kerswill, Trudgill, Auer, Hinskens and Kerswill2005; Kerswill & Williams, Reference Kerswill and Williams2005), such mixing often leads to dialect leveling and mergers. Additionally, Unterland communities remained smaller, less densely populated, and more dispersed than those in Poland. The region retained a rural and peripatetic quality, serving as a sanctuary for Jewish refugees from nearby countries and a transit hub for those relocating or seeking to leave Central and Eastern Europe entirely (Jelinek, Reference Jelinek2007; Magosci, Reference Magosci1978:11; Magosci & Petrovsky-Shtern, Reference Magosci and Petrovsky-Shtern2018:47; Sole, Reference Sole1968; Švorc, Reference Švorc2020). Over time, it developed a distinct culture that set it apart from the homelands of the original settlers (Keren-Kratz, Reference Keren-Kratz2017, Reference Keren-Kratz2019).
Where the non-Jewish population of Poland was relatively homogeneous linguistically, the Unterland was more diverse, hosting Hungarian, Romanian, Slovakian, German, Ruthenian, Romani, and Czech (Beranek, Reference Beranek1936; Komoróczy, Reference Komoróczy and Kahn2018; Shpirn, Reference Shpirn, Reyzn and Weinreich1926; Weinreich, Reference Weinreich1964). Unterland Jews became proficient in many of these co-territorial languages.Footnote 2 According to survivors’ reports, under government Magyarization policies, many Jews shifted from Yiddish to Hungarian, particularly women, who typically did not receive formal Jewish education (Jelinek, Reference Jelinek2007:11-16, 83-89).Footnote 3 With Yiddish absent from public schools and secular folk schools rare in the Unterland, girls—who, unlike boys, did not attend kheyder ‘religious school’—became increasingly bilingual, making them the primary agents of language contact.
In summary, while the Jewish communities of Poland epitomized Eastern European Jewish culture due to their longevity, centrality, and size, the Unterland communities were culturally, geographically, and numerically peripheral. Moreover, the Unterland was a transnational and multilingual zone in which various place identities coexisted or overlapped. Cooper (Reference Cooper2019:200) emphasized this, describing Carpathian Ruthenia as a metaphorical catch basin for Jewish ideologies from surrounding regions. Studies of Central Yiddish (CY) thus tended to focus on central Poland and Galicia, largely ignoring UY. Weinreich termed the region linguistic “terra incognita” (Reference Weinreich1964:245).
A recent study by Schäfer (Reference Schäfer, Bannasch, Reichert and Wildfeuer2022) underscores the impact of political and demographic changes on Eastern European Yiddish. Using advanced dialectometric methods to analyze data from the Language and Culture Atlas of Ashkenazic Jewry (Herzog et al., Reference Herzog, Baviskar, Kiefer, Neumann, Putschke, Sunshine and Weinreich2017), Schäfer concluded that Eastern Yiddish varieties were shaped primarily by political borders established in the 17th and 19th centuries, along with historical migration patterns, rather than by contact with dominant regional languages like Polish and Ukrainian.
Studies have shown that when borders shift, differences in speakers’ sense of place and regional allegiance can impact the proliferation of linguistic variants (see, e.g., Baker-Smemoe & Jones, Reference Baker-Smemoe, Jones and D.Watt2014; Beal, Reference Beal, Llamas and Watt2010; Llamas, Reference Llamas2007; Llamas et al., Reference Llamas, Watt and Johnson2009). Geographic mobility—characteristic of the Unterland—has also been found to influence linguistic behavior, with migrants introducing features from their homelands or more readily adopting innovative forms in their new environments (Gabriel & Kireva, Reference Gabriel and Kireva2014; Schleef et al., Reference Schleef, Meyerhoff and Clark2011; Urbatsch, Reference Urbatsch2015). Given all this, we hypothesized that Unterland CY developed independently from Polish CY in the pre-World War II century.
The variables: Central Yiddish long-short vowel pairs
Within the tripartite division of Eastern Yiddish into Northeastern (NEY), Southeastern (SEY), and CY, only the last preserves distinctive vowel length. The CY stressed vowel system comprises eight monophthongs /iː i uː u ɛ ɔ aː a/ and four diphthongs /eɪ oʊ ɔɪ aɪ/. Note that although the contrast between /uː/ and /u/ initially emerged through phonological conditioning, Yiddish linguists generally treat these as distinct phonemes on the assumption that they were ultimately phonologized (Herzog, Reference Herzog1965; Jacobs, Reference Jacobs1990; Katz, Reference Katz1982; Weinreich, Reference Weinreich2008). However, Beider (Reference Beider2015; p.c., 2 April 2017) disputed the phonemic status of /u/ and omitted it from CY’s phonemic inventory. Our analysis of the three length-contrasting peripheral vowels /i a u/ maintains neutrality on this phonemic distinction, letting the data guide our conclusions.
Proto-Yiddish is reconstructed with five long-short vowel pairs, derived from Middle High German (MHG); however, some of the long vowels underwent diphthongization, leading to qualitative rather than quantitative contrasts (Weinreich, Reference Weinreich1973). Through a series of vowel shifts, CY ultimately acquired the long-short vowel pairs that are the focus of this study:
1. {/iː/, /i/}: This contrast preserves both the MHG {/iː/, /i/} and {/uː/, /u/} length distinctions, following the merger of MHG back vowels with their front counterparts, through fronting and unrounding, in southern dialects (CY and SEY) during the 14th through 16th centuries. Minimal pairs for these vowels in CY include /ziːn/ (‘son,’ ‘sons’) and /zin/ (‘sun,’ ‘sense’), and /hiːt/ (‘protect’) and /hit/ (‘hat’).
2. {/aː/, /a/}: The short vowel in this pair derives from MHG /a/, while the long vowel is a reflex of proto-Yiddish /aɪ/, which monophthongized to /aː/ in this dialect. To the long class were added words with underlying /a/ that elongate in certain phonological contexts and recent German borrowings with /aː/. Minimal pairs include /haːnt/ (‘today’) and /hant/ (‘hand’), and /laːχt/ (‘easy, lightweight’) and /laχt/ (‘laughs’).
3. {/uː/, /u/}: This pair, the most recent addition to the class of long-short vowels, arose through the shortening of /uː/ (from proto-Yiddish /ɔː/ or /oː/, a reflex of MHG /aː/) before labial and velar consonants. Structural accounts attribute this split to internal pressures for vocalic system symmetry restoration: a filling of the long-short gap left by the fronting of MHG {/uː/, /u/} to {/iː/, /i/} (Jacobs, Reference Jacobs1990). Jacobs (Reference Jacobs1990:70), who referred to the split as “the birth of a phoneme,” offered some evidence for the phonologization of this pair; however, there are no minimal pairs to support such a claim. An oft-cited near minimal pair is /ʃtruːf/ (‘punish’) and /ʃluf/ (‘sleep’).
Figure 2 shows the changes described here, from proto-Eastern Yiddish to CY, on a vowel quadrilateral, with color coding and numbering to indicate the order of processes.

Figure 2. Schema of vocalic sound changes in Central Yiddish leading to the length contrasts in /i u a/, color coded and numbered to show chronology.
Another long-short vowel pair, not analyzed in this study because it is not a pan-CY contrast, is {/oː/, /o/}, which emerged in some CY dialect regions through the monophthongization of /ou/ to /oː/ (Herzog, Reference Herzog1965).
In NEY the length distinction was lost, and the long-short vowels merged. SEY reportedly lost length contrast, but the phonemic distinction in {/iː/, /i/} was preserved as a qualitative (tense-lax) distinction (/i/ versus /ɪ/) (see Glasser, Reference Glasser2017; Jakobson, Reference Jakobson1953:77; Weinreich, Reference Weinreich and H. Kučera1963:339-342, Reference Weinreich2008). Similarly, a merger of {/aː/, /a/} was avoided in parts of SEY through a qualitative shift in /a/, which raised to /o/ in some territories; in much of SEY, the reflex of proto-Yiddish /aɪ/ remained a diphthong. The split in /uː/ did not occur in SEY.
Data and methods
The data for this study come from the Corpus of Spoken Yiddish in Europe (CSYE) (Bleaman & Nove, Reference Bleaman and Nove2025), which consists of Holocaust testimonies recorded in the 1990s on behalf of the USC Shoah Foundation Visual History Archive. The multi-hour interviews were originally recorded on videocassettes and broken into 30-minute segments. These personal narratives resemble spontaneous conversations used in sociolinguistic research and naturally include content like discussions of danger, injustice, and social norm violations that sociolinguists typically aim to elicit, as suggested by Labov (Reference Labov2013:4) (see Bleaman & Nove, Reference Bleaman and Nove2025).
As “found” data not originally intended for linguistic analysis, the CSYE recordings present some limitations. Most informants had been removed from their homelands for over five decades, leading to expected code-switching and linguistic borrowing from post-immigration languages. Speakers were also not systematically paired with dialect-matching interviewers, some of whom were heritage speakers with varying degrees of linguistic competence. These circumstances may have influenced the data through interviewer accommodation, prestige dialect effects, adstratum influences, and lifespan changes.Footnote 4 Such limitations, discussed in detail in Cieri and Yaeger-Dror (Reference Cieri, Yaeger-Dror, Wagner and Buchstaller2017), complicate the study of purely prewar Yiddish dialects. However, we believe, based on existing literature, that these factors do not fundamentally compromise our findings.Footnote 5
Data selection
Utilizing the CSYE’s metadata, we selected 17 speakers (9 men, 8 women) from central Poland (hereafter “Poland” for conciseness) and 21 (12 men, 9 women) from the Unterland (Visual History Archive). Demographic details of speakers from both regions are in Appendix A (Poland) and Appendix B (Unterland), organized by gender and surname.Footnote 6 Figures 3 and 4 depict these speakers as points on maps of Poland and the Unterland, marking their upbringing locations.

Figure 3. Map of central Poland with red points representing speakers on the locations where they were raised (jittered slightly for better viewing).

Figure 4. Map of the Unterland with red points representing speakers on the locations where they were raised (jittered slightly for better viewing).
Data extraction
The audio from the initial two videocassettes (approximately 60 min) was downloaded from each speaker’s testimony pages on the CSYE website, and the corresponding Praat files, in TextGrid format, were obtained from the “CSYE Transcripts” GitHub repository. The transcripts were tokenized, and unique words were added to a pronunciation dictionary, with each entry annotated for CY pronunciation by the first author. The dictionary was then used to create time boundaries for words and phonemes using the Montreal Forced Aligner train function (McAuliffe et al., Reference McAuliffe, Socolof, Mihuc, Wagner and Sonderegger2017), and the output was spot-checked for accuracy.Footnote 7 Fast Track (Barreda, Reference Barreda2021),Footnote 8 a Praat plugin, was then used to extract all the stressed vowels and measure their durations and first and second formants. Fast Track takes formant measurements every two milliseconds and groups them into a designated number of bins using median values. We used five bins per vowel for our analysis.
Filtering
Vowels with a duration of under 10 milliseconds were automatically discarded by Fast Track during extraction. Vowels in incomplete or uncertain words and monosyllabic function words tagged as stop words due to reduction tendencies were also removed. The Mahalanobis distance criterion was used to exclude the most extreme 5% of outliers using the method implemented in Stanley (Reference Stanley, Fridland, Wassink, Hall-Lew and Kendall2020b). Additionally, after normalization, we removed the first five minutes of one speaker’s testimony due to his initial use of a more “standard” Yiddish pronunciation, as well as all his long and short /a/ tokens, given his near-consistent production of /aː/ as /aɪ/.Footnote 9
Normalization
The filtered data were normalized by speaker and vowel based on the median formant values of the third bin (40-60% of the vowel duration). Normalization was executed in R software (R Core Team, 2025) using the norm_anae function (Stanley, Reference Stanley2020a), which follows the modified Nearey method described in the Atlas of North American English (Labov et al., Reference Labov, Ash and Boberg2006).
When the dataset was filtered to include only the six vowel classes targeted in this study /iː i aː a uː u/, a total of 50,978 tokens remained available for analysis, 22,053 from Poland and 28,925 from the Unterland. Tables 1 and 2 show a summary of the datasets by vowel, which comprises all stressed vowels extracted from the testimonies.
Table 1. Vowels analyzed from the Poland region

Table 2. Vowels analyzed from the Unterland region

Results
The outcomes of the analysis of vowel duration are presented separately for each vowel pair below, followed by the results for vowel quality.
Duration
To compare the duration differences of the long-short vowel dyads across Poland and Unterland, we fit linear mixed-effects models (LMMs) for each pair using the lmer() function from the lme4 package (Bates et al., Reference Bates, Mäechler, Bolker and Walker2025) in R (R Core Team, 2025), with decadic log-transformed duration (in milliseconds) as the response variable. As predictor variables we included vowel, region, a two-way interaction of vowel by region, and a three-way interaction of vowel by region by gender. To control for isochronic and coarticulatory effects on vowel duration, we included as fixed effects the number of sound segments in the word, as well as the preceding and following contexts (silence; vowel; consonant coded for voice, manner, and articulation; or unknown).Footnote 10 Additionally, the country where each interview took place was included as a fixed effect to account for potential influences of languages acquired postwar. Finally, a random slope was added for vowel by speaker to account for unsystematic cross-speaker differences in the pronunciation of the vowels, and random intercepts for speaker and word.Footnote 11 The Satterthwaite approximation in the lmerTest package (Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017) was used to calculate all p-values.
The two-way interaction (vowel by region) in the LMMs addresses the main research question about a difference in long-short vowel duration across the two regions. The three-way interaction (vowel by corpus by gender) tests the hypothesis regarding a differential gender effect in Poland versus Unterland (a gender effect in the latter and an absence of an effect in the former).
To visualize the model results, estimated marginal means (EMM) of duration were extracted from each model, by vowel (long versus short) intersecting with region and with gender (respectively), using the emmeans package in R (Lenth, Reference Lenth2025).Footnote 12 The results are averaged over other fixed effects in the model. EMMs of each model are presented in table format and as visual plots.
High front vowels: [iː] versus [i]
Linear mixed modeling
A summary of the results of the LMM for duration is shown in Table 3, with the variables of interest, the two-way interaction of vowel by region, and the three-way interaction of vowel by region by gender, highlighted in gray. Control variables mentioned in the methods (including phonological context, interview location, and word length) are not shown in this abbreviated table but can be found in the complete model output (Appendix C). There is a significant difference in the duration of [i] (versus [iː]), indicating that, overall, these vowels are indeed temporally distinct. For vowel by region, the model shows a significant positive effect for Unterland (versus Poland), signifying that the duration of the short vowel [i] is longer relative to [iː] for speakers of the Unterland (β = -0.11, SE = 0.01, t(37.46) = -7.33, p < 0.001). Additionally, while the three-way interaction is not significant in Poland, male Unterland speakers are shown to have a larger durational difference than females (M β = -0.03, SE = 0.01, t(35.44) = -2.34, p = 0.019).
Table 3. Results of linear mixed model assessing durational distinction for the high vowels [iː] and [i], with variables of interest highlighted in gray

Estimated marginal means
The EMMs extracted from the LMM analyzing high front vowel duration, along with the standard deviations, are shown in Table 4. Estimated differences between the long and the short vowels, and the long/short ratios, are shown in the right-hand columns in the short vowel sections. It is apparent that PY speakers, both male and female, have longer long vowels and shorter short vowels compared to UY speakers, with a mean estimated difference of 50 milliseconds and a long/short ratio above 1.5. Within the Unterland group, women have longer vowels overall than men, but a smaller difference between the long-short vowels (25 ms). These effects can be visualized in Figure 5, in which the EMMs of each vowel are plotted separately for each gender (colors) and regional (columns) group, with lines connecting the long and short vowels. The regional differences can be seen in the slopes of the Polish lines, which are visibly steeper than the Unterland lines. Also perceptible, albeit less conspicuous, is a difference in slope between male versus female Unterland speakers.

Figure 5. Estimated marginal means of duration LMM for [iː] versus [i] plotted with vowel on the x-axis and (log-transformed) duration on the y-axis, grouped by gender and faceted by region (rows), with colored lines connecting the vowels of each group (red is female).
Table 4. Estimated marginal means of [iː] versus [i] for vowel duration (in milliseconds) extracted from the linear mixed model, by vowel | generation and vowel | gender, averaged over preceding and following context and country where interviewed, with standard error and degrees of freedom (method: Satterthwaite). Long-short differences and ratios included by gender and region

Low vowels: [aː] versus [a]
Linear mixed modeling
The results of the LMM for the low vowel pair {[aː] versus [a]} are shown in Table 5. Here too, we see significantly shorter difference in duration of [a] versus [aː] across the dataset. As in the high front vowel pairs, the interaction of vowel by region is significant, with Unterland speakers showing a diminished duration difference (β = -0.05, SE = 0.01, t(37.76) = -3.68, p < 0.001), as is the three-way interaction of vowel by corpus by gender, which shows a smaller difference among Unterland females versus males (β = -0.05, SE = 0.01, t(35.36) =-3.85, p < 0.001). Like the previous pair, there is no gender effect among Polish speakers.
Table 5. Results of linear mixed model assessing durational distinction for the high vowels [aː] and [a], with variables of interest highlighted in gray

Estimated marginal means
EMMs of the model for the duration of the low vowels are shown in Table 6. Comparing only the female speakers in both regions, estimates show Polish speakers have longer long vowels and shorter short vowels. However, the same is not true for males, where both regional groups pattern similarly (although Unterland males have longer vowels overall), with estimated duration differences above 50 milliseconds. This effect is also visible in Figure 6, where the EMMs are plotted by vowel separately for each gender group, faceted by region. A stark gender effect is visible in the Unterland group, but the slope of the male speaker lines in both regions is similar. In fact, when the female speakers are removed from the LMM, the regional effect is no longer significant.
Table 6. Estimated marginal means of [aː] versus [a] for vowel duration (in milliseconds) extracted from the linear mixed model, by vowel | generation and vowel | gender, averaged over preceding and following context and country where interviewed, with standard error and degrees of freedom (method: Satterthwaite). Long-short differences and ratios included by gender and region


Figure 6. Estimated marginal means of duration LMM for [aː] versus [a] plotted with vowel on the x-axis and (log-transformed) duration on the y-axis, grouped by gender and faceted by region (rows), with colored lines connecting the vowels of each group (red is female).
High back vowels: [uː] versus [u]
Linear mixed modeling
The results of the statistical analysis of duration for the high back vowels {[uː] versus [u]} are shown in Table 7. Our modeling approach for this vowel pair was adapted to account for their distribution in relation to following consonants. As described in the section on long-short vowel pairs in CY, [uː] is largely associated with following coronal consonants, while velar and labial consonants generally follow [u]. To avoid collinearity, we did not include the place category from following segment, controlling only for voicing and manner.
Table 7. Summary results of linear mixed model assessing durational distinction for the high vowels [uː] and [u], with variables of interest highlighted in gray

While the estimate of [u] indicates that the long and short vowels are significantly different in duration, the interactions that are of interest in this study, vowel by region and vowel by gender by region, are not significant, indicating that the groups do not vary systematically on the temporal dimension.
Estimated marginal means
The EMMs for the high back vowel pair are shown in Table 8. The estimated duration differences (and ratios) overall are significantly lower in this pair than the other two, with an average difference of just under 33 milliseconds across both gender groups in Poland and a long/short ratio of 1.3. The plot in Figure 7 illustrates that the decline in all the slopes is gentler/shallower than those seen in the results of the analyses of the other vowels.
Table 8. Estimated marginal means of [uː] versus [u] for vowel duration (in milliseconds) extracted from the linear mixed model, by vowel | generation and vowel | gender, averaged over preceding and following context and country where interviewed, with standard error and degrees of freedom (method: Satterthwaite). Long-short differences and ratios included by gender and region


Figure 7. Estimated marginal means of duration LMM for [uː] versus [u] plotted with vowel on the x-axis and (log-transformed) duration on the y-axis, grouped by gender and faceted by region (rows), with colored lines connecting the vowels of each group (red is female).
Summary of results for vowel duration
The high front and low (mid) vowel pairs exhibit several common patterns:
1. In the Polish group, the average estimated duration difference meets or exceeds 50 milliseconds.
2. Both pairs exhibit a significant cross-regional disparity, with Unterland speakers showing shorter duration differences than speakers from Poland.
3. In both pairs, there is a gender effect in the Unterland but not the Polish group, with female speakers exhibiting smaller duration differences than males.
The gender difference is larger in {[iː], [i]} versus {[aː], [a]}, with the regional effect appearing only among female speakers in the low vowel pair—suggesting a more recent change in the Unterland variety.
In contrast to the other two, the high back vowel pair {[uː], [u]} shows no significant effects of region or gender. Additionally, the duration differences between the long and short counterparts are remarkably small for a supposed length contrast. This may be due to the predictable nature of the distinction and the lack of minimal pairs. Visualizing the duration of all {[uː], [u]} tokens across different contexts (as shown in Figure 8), we see identical patterns for vowels before dorsal and labial consonants (the predicted shortening environments), while vowels before coronal consonants and other vowels show a broader range of medians and interquartile ranges.

Figure 8. Boxplot depicting vowel duration variation (in milliseconds) of all non-word-final tokens of [uː] and [u] by place of articulation of subsequent segment (V0 = unstressed vowel, V2 = vowel with secondary stress). Boxes show quartile range and median; whiskers extend to 1.5 times the interquartile range.
Another plausible reason for small duration ratios in the high back vowel pair is that they are sufficiently distinct in quality, rendering duration redundant. Similarly, given the small duration differences in Unterland {[iː], [i]} and among women in {[aː], [a]}, it is possible that in this region, cue weightingFootnote 13 has shifted from duration to quality for all three historical long-short vowel pairs. Such diachronic shifts are not uncommon cross-linguistically and can result from internal or external linguistic pressures, including changes in language contact (Abramson & Ren, Reference Abramson and Ren1990; see also Francis & Nusbaum, Reference Francis and Nusbaum2002; Hazan & Boulakia, Reference Hazan and Boulakia1993; Idemaru & Holt, Reference Idemaru and Holt2011; Kondaurova & Francis, Reference Kondaurova and Francis2010; MacKain et al., Reference MacKain, Best and Strange1981; Yamada & Tohkura, Reference Yamada, Tohkura, Tohkura, Vatikiotis-Bateson and Sagisaka1992). In the following section, we explore this possibility further by examining the results from our analysis of vowel quality.
Quality
In this section, we analyze vowel quality to understand the diminishing duration differences in [iː] versus [i] and in [aː] versus [a] pairs in UY, and the smaller than expected duration differences in the {[uː], [u]} pair overall. First, we mapped the acoustic space of the six target vowels based on normalized formant values from the center bin (40-60% duration) of all vowels, as shown in Figure 9. The vowel systems of both regions appear similar. The plots indicate slightly more separation between [iː] and [i] in Poland on the F2 dimension, more distinction between [uː] and [u] in Unterland, and the {[aː], [a]} pair positioned slightly higher in Poland.

Figure 9. Vowel tokens (n = 50,978) plotted by F2 on the x-axis and F1 on the y-axis and faceted by region (Poland on the left and Unterland on the right). Square labels with IPA symbols represent the location of the vowel means, and ellipses show 95% confidence in the mean.
Statistical analyses
To analyze cross-regional spectral overlap differences in each long-short vowel pair, we calculated Pillai scores by fitting multivariate analysis of variance (MANOVA) models, with normalized F1 and F2 as the dependent variables and (decadic log-transformed) duration, as well as the preceding and following context (silence; vowel; consonant coded for voice, place, and manner; or unknown) as independent variables. For the {[uː], [u]} pair, only voicing and manner of the following segment were included.
The Pillai scores of [iː] versus [i], shown in Table 9, mirror the marginally greater separation of Polish long versus short vowels observed in Figure 9. For both [aː] and [a], the Pillai scores are close to zero, reflecting the near overlap of these two vowels in both groups. Finally, Pillai scores for [uː] and [u] in both regions are below 0.1 and differ only marginally (0.021).
Table 9. Pillai scores by vowel pair and region

To further explore cross-regional differences in quality while also controlling for possible random effects (e.g., a skewness caused by a particular lexical item), LMMs were again fit for each vowel dyad, with normalized F1 and F2 (separately) as the response variable(s), and the same fixed and random effects and interactions as were entered into the LMMs for duration. Here too, the two-way interaction of vowel by region aimed to test if the distinctiveness of the short and long counterparts of each pair differs between the two groups. The interaction could reveal whether Unterland speakers compensate for shorter durations through vowel centralization, which would be reflected in higher F1 values. Laxer [i] would also manifest in lower F2 values, while laxer [u] would show an increase in F2. The three-way interaction (vowel by corpus by gender) investigates whether there is a within-region gender effect corresponding to the effects observed in vowel duration.
In the six LMMs fit for vowel quality (two for each vowel pair, with F1 and F2 as response variables), all models except for F2 of [a] indicate that the short vowel is significantly different from the long vowel overall, but the interactions vowel by region and vowel by region by gender are not significant in any of them. This suggests that these vowels do not differ systematically in quality across the two regions or across the gender groups within each region.
In summary, our analyses of vowel quality, based on normalized first and second formant measures extracted from the vowel nuclei, show no evidence of cross-regional difference. The absence of significant qualitative differences thus rules out an account in which the contrast in Unterland {[iː], [i]} and {[aː], [a]} is preserved via a shift in cue weighting from duration to quality. With these findings, we proceed to further interpret and discuss our results.
Discussion
Our analysis examines the vocalic length contrast in Yiddish, a feature preserved for centuries in CY’s northern (Polish) region. Comparing long-short vowel pairs of speakers from the relatively newer Unterland area with those from Poland, we find significantly smaller duration differences in Unterland {[iː], [i]} and {[aː], [a]}, indicating divergence of UY from the established CY variety spoken in Poland. The length contrast in the Unterland appears tenuous: the average duration difference for [iː] versus [i] is 27 milliseconds, and for [aː] versus [a] among women it is 35 milliseconds. Moreover, we find no spectral divergence correlating to the reduced duration difference, ruling out a shift from duration to quality as the distinguishing cue.
While these disparities may reflect stable variances in the acoustic properties of vowel lengths across the two regions, we question whether the observed Unterland differences are sufficient to sustain a long-short vowel distinction given Labov and Baranowski’s (Reference Labov, Ash and Boberg2006) proposed 50-millisecond threshold for category identification. This raises the possibility that we are observing the initial stage of a phonemic change that could have led to a merger if the dialect had continued to develop in situ. However, without evidence that subsequent generations are continuing this trajectory, any conclusions about structural language changes must remain tentative.
Higher rates of bilingualism and linguistic assimilation among Unterland women might explain their robust representation in the observed cross-dialect differences. Moreover, that both male and female speakers participate in the {[iː], [i]} change, with females in the lead, while only females contribute to the cross-regional difference in {[aː], [a]}, suggests that the latter may be a more recent development. A key characteristic of the Yiddish /a/ pair is its orthographical distinction—/aː/ with double yud (ײ), reflecting its diphthongal origins, and /a/ with an alef (א)—unlike the other two vowel pairs, which lack separate written forms. There is also a mental awareness of the underlying quality of /aː/, with dialects pronouncing it as a diphthong perceived as more prestigious. The higher incidence of Yiddish formal education among Unterland men compared to women (as described above) might explain why the {[aː], [a]} change hasn’t fully taken root among male speakers.
Interestingly, the high vowels {[iː], [i]} and {[uː], [u]} do not behave as a class regarding length. Specifically, the high back vowels {[uː], [u]} show a minimal estimated duration distinction across our dataset (mean 29 ms overall, 33 in Poland), with no significant regional or gender-based differences. As noted earlier, the {[uː], [u]} split is a more recent development, arising from the shortening of [uː] before labial and velar consonants. While most literature on CY includes both long and short /u/ in discussions of peripheral vowel length (Herzog, Reference Herzog1965; Jacobs, Reference Jacobs1990; Katz, Reference Katz1982; Weinreich, Reference Weinreich2008), Beider (Reference Beider2015, p.c., 2 April 2017) has challenged this classification, including only /uː/ in descriptions of CY. Our findings reflect the lack of scholarly consensus regarding the phonemic independence of [u], which remains largely in complementary distribution with [uː], consistent with its historical status. Although regression modeling confirms duration differences between [uː] and [u], the small magnitude of these (compared to the other two vowel pairs) may signal incomplete phonologization. Descriptions of sound change, such as by Ohala (Reference Ohala, Joseph and Janda2003), note that they begin when deviations caused by automatic phonetic processes are perceived by some listeners as intentional. The so-called “birth of a phoneme,” as Jacobs described the splitting of /uː/, would thus have been a gradual process (Reference Jacobs1990:70). Our data suggest that the acoustic split in prewar CY [uː] may be at a stage where contextual pressures have not yet led to a consistent and stable reinterpretation by enough speakers, or where it has occurred in some contexts but not others. This scenario reflects the approach of scholars such as Goldstein (Reference Goldstein1995), Hall (Reference Hall2013), and Renwick (Reference Renwick2014), who have challenged the binary treatment of phonemic contrast, advocating for a nuanced understanding that considers the influence of various factors including phonology, morphology, and frequency. Further phonetic analyses, beyond the scope of this study, may shed more light on this matter.
This study captures UY in what appears to be a transitional state, reflecting its social and political context in the interwar period. Collectively, the findings show a shift in vowel length contrast in the Unterland, particularly evident in {[iː], [i]}, and likely in {[aː], [a]}, as well, with potential implications for the entire vocalic system. Unterland Jewish communities were relatively new, having formed through the migration of Jews from various dialect regions in the 19th century. This early mingling, combined with the geopolitical, geographical, and social characteristics of the Unterland—including its smaller, more dispersed populations in the Carpathian Mountain valleys compared to Poland’s more urbanized communities—may have led to dialect leveling and the emergence of a novel variety within the region. The patterns observed in UY, including significant interspeaker and intraspeaker variation, align with new dialect formation theories (e.g., Kerswill, Reference Kerswill, Britain and Cheshire2003, Reference Kerswill, Mattheier, Ammon and Trudgill2006; Kerswill & Trudgill, Reference Kerswill, Trudgill, Auer, Hinskens and Kerswill2005; Kerswill & Williams, Reference Kerswill and Williams2005), which predict intermediate stages in koineization during which multiple variants coexist.
Because natural transmission of the language was disrupted by genocide and displacement of its remaining speakers, we can only speculate how these vowels might have developed had the speakers remained in the region. As outlined in the description of the variables above, research suggests that in SEY dialects, the phonemic contrast represented by {/iː/, /i/} was maintained through a qualitative tense-lax relationship, and a merger of {/aː/, /a/} was circumvented by raising /a/ to /o/ or maintenance of the historic diphthong. It is plausible that similar qualitative shifts might have occurred in UY over time had the language been left undisturbed.
In the introduction, we emphasized our aim to address the lack of research on UY, providing a baseline for understanding developments in contemporary dialects derived from it. Considering the results reported here, there emerges the intriguing question of what happened to the length contrast when UY was transplanted, via post-Holocaust immigration, to the United States and other countries around the world (including Israel, England, Canada, and Belgium), where it encountered new languages. Contrary to what might be expected given this trajectory toward merger, studies show that New York Hasidic Yiddish has maintained vowel contrasts through mechanisms that reflect the influence of contact with English (Nove, Reference Nove and Zimmer2021a, Reference Nove2021b). For the high vowels, the first New York-born generation shows more robust durational distinctions relative to prewar speakers, an apparent reinforcement of a tenuous contrast. This durational difference is then reanalyzed as a tense-lax distinction by subsequent generations, ultimately coming to resemble English {/i/-/ɪ/} and {/u/-/ʊ/}. Patterns in interspeaker covariation show regularity in the lowering of the short high vowels, indicating that the contact-induced effects are acting upon them as a class. For the low vowels, where English does not provide a model for contrast, the {/aː/-/a/} contrast has been maintained as a durational difference; however, apparent-time patterns show this difference is diminishing among younger speakers.
The linguistic dynamics of the Unterland region certainly warrant further investigation. We suggested that geopolitical and sociocultural circumstances created optimal conditions for linguistic change, but we have yet to understand why this feature was impacted and why in the observed direction. Analyzing speakers’ language backgrounds, including dominant languages and other languages spoken (where such information is available), could clarify this. It is also worth examining differences between speakers from major Unterland cities (e.g., Satu Mare, Sighetu Marmaţiei, Mukachevo [Yiddish: Satmer, Siget, Munkatsh]), where Hungarian dominated, and those from rural areas, where minority languages, some of which do not have vowel length (e.g., Ruthenian), were spoken. Comparing speakers from areas consistently under Hungarian rule versus those from politically variable regions could be enlightening as well. Additionally, information from survivor interviews might help identify speakers who were particularly mobile or from recently settled families to see whether they are leading these changes. The CSYE’s growing collection of testimonies will enable us to expand our dataset to include more speakers from the Unterland, as well as from additional regions, such as historical Galicia, which lies between central Poland and the Unterland. This will allow for the inclusion of additional variables and may provide some answers to these outstanding questions.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0954394525100604.
Acknowledgments
This material is based upon work supported by the National Science Foundation under Grant No. (BCS 2142797). This project would not have been possible without Isaac Bleaman, whose vision and expertise helped bring the Corpus of Spoken Yiddish in Europe to fruition. We are grateful to the anonymous referees for their detailed and constructive comments, which greatly improved the quality of this work. We thank the audience at the LSA Annual Meeting 2023 for feedback on an early version of this study. Chaya R. Nove also thanks Bill Haddican (Queens College) and Juliette Blevins (Graduate Center, CUNY) for their invaluable input during the earlier stages of this project, the Linguistics Program at Brown University for providing the time and resources to develop the project fully, and Uriel Cohen-Priva (Brown University) for his guidance.
Competing interests
The authors declare none.




