Published online by Cambridge University Press: 01 January 2026
Sequences such as [mb, kp, ts] pattern as complex segments in some languages but as clusters of simple consonants in others. What evidence is used to learn their language-specific status? We present an implemented computational model that starts with simple consonants and builds more complex representations by tracking statistical distributions of consonant sequences. This strategy succeeds in a wide range of cases, both in languages that supply clear phonotactic arguments for complex segments and in languages where the evidence is less clear. We then turn to the typological parallels between complex segments and consonant clusters: both tend to be limited in size and composition. We suggest that our approach allows the parallels to be reconciled. Finally, we compare our model with alternatives: learning complex segments from phonotactics and from phonetics.
We received useful feedback from Andries Coetzee, Michael Becker, Lisa Davidson, Gillian Gallagher, Maddie Gilbert, Donca Steriade, and an anonymous referee. Thanks to audiences at MIT, AMP 2019, and NYU, as well as Adam Albright for Latin paradigms, Michael Becker for Turkish materials, and Maxim Kisilier for the Modern Greek corpus. This work was supported in part by NSF BCS-1724753 to the first author.