Learning with hidden structure in Optimality Theory and Harmonic Grammar: beyond Robust Interpretive Parsing*

Gaja Jarosz

doi:10.1017/S0952675713000031

Learning with hidden structure in Optimality Theory and Harmonic Grammar: beyond Robust Interpretive Parsing*

Published online by Cambridge University Press: 01 May 2013

Gaja Jarosz

Show author details

Gaja Jarosz*: Affiliation:
Yale University

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

This paper explores the relative merits of constraint ranking vs. weighting in the context of a major outstanding learnability problem in phonology: learning in the face of hidden structure. Specifically, the paper examines a well-known approach to the structural ambiguity problem, Robust Interpretive Parsing (RIP; Tesar & Smolensky 1998), focusing on its stochastic extension first described by Boersma (2003). Two related problems with the stochastic formulation of RIP are revealed, rooted in a failure to take full advantage of probabilistic information available in the learner's grammar. To address these problems, two novel parsing strategies are introduced and applied to learning algorithms for both probabilistic ranking and weighting. The novel parsing strategies yield significant improvements in performance, asymmetrically improving performance of OT learners. Once RIP is replaced with the proposed modifications, the apparent advantage of HG over OT learners reported in previous work disappears (Boersma & Pater 2008).

Type: Research Article
Information: Phonology , Volume 30 , Issue 1 , May 2013 , pp. 27 - 71

DOI: https://doi.org/10.1017/S0952675713000031 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

[*]

This work has benefited from discussion with a number of colleagues, including Joe Pater, Paul Boersma, Paul Smolensky, Colin Wilson, Jason Riggle, John McCarthy, Bob Frank and Jeff Heinz. I have also received valuable comments on portions of this work presented to audiences at NECPhon, University of Massachusetts Amherst, Mayfest, the University of Delaware Workshop on Stress and Accent, and the Yale Computational Linguistics research group (CLAY). Finally, I would also like to thank three anonymous reviewers and the associate editor for very thorough and thoughtful comments on an earlier version of this paper.

References

REFERENCES

Akers, Crystal (2011). Commitment-based learning of hidden linguistic structures. PhD dissertation, Rutgers University.Google Scholar

Alderete, John, Brasoveanu, Adrian, Merchant, Nazarré, Prince, Alan & Tesar, Bruce (2005). Contrast analysis aids the learning of phonological underlying forms. WCCFL 24. 34–42.Google Scholar

Apoussidou, Diana (2006). On-line learning of underlying forms. Ms, University of Amsterdam. Available as ROA-835 from the Rutgers Optimality Archive.Google Scholar

Apoussidou, Diana (2007). The learnability of metrical phonology. PhD dissertation, University of Amsterdam.Google Scholar

Apoussidou, Diana & Boersma, Paul (2003). The learnability of Latin stress. Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam 25. 101–148.Google Scholar

Bane, Max & Riggle, Jason (2009). The typological consequences of weighted constraints. CLS 45:1. 13–27.Google Scholar

Bane, Max, Riggle, Jason & Sonderegger, Morgan (2010). The VC dimension of constraint-based grammars. Lingua 120. 1194–1208.CrossRef Google Scholar

Biró, Tamás (to appear). Towards a Robuster Interpretive Parsing: learning from overt forms in Optimality Theory. Journal of Logic, Language and Information.Google Scholar

Boersma, Paul (1997). How we learn variation, optionality, and probability. Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam 21. 43–58.Google Scholar

Boersma, Paul (2003). Review of Tesar & Smolensky (2000). Phonology 20. 436–446.CrossRef Google Scholar

Boersma, Paul (2009). Some correct error-driven versions of the Constraint Demotion Algorithm. LI 40. 667–686.Google Scholar

Boersma, Paul & Hayes, Bruce (2001). Empirical tests of the Gradual Learning Algorithm. LI 32. 45–86.Google Scholar

Boersma, Paul & Levelt, Clara C. (2000). Gradual constraint-ranking learning algorithm predicts acquisition order. In Clark, Eve V. (ed.) Proceedings of the 30th Child Language Research Forum. Stanford: CSLI. 229–237.Google Scholar

Boersma, Paul & Pater, Joe (2008). Convergence properties of a Gradual Learning Algorithm for Harmonic Grammar. Ms, University of Amsterdam & University of Massachusetts, Amherst. Available as ROA-970 from the Rutgers Optimality Archive. To appear in McCarthy, John J. (ed.) Harmonic grammar and harmonic serialism. London: Equinox.Google Scholar

Chomsky, Noam (1981). Lectures on government and binding. Dordrecht: Foris.Google Scholar

Coetzee, Andries W. & Pater, Joe (2008). Weighted constraints and gradient restrictions on place co-occurrence in Muna and Arabic. NLLT 26. 289–337.Google Scholar

Coetzee, Andries W. & Pater, Joe (2011). The place of variation in phonological theory. In Goldsmith, John, Riggle, Jason & Yu, Alan (eds.) The handbook of phonological theory. 2nd edn.Malden, Mass. & Oxford: Wiley-Blackwell. 401–431.CrossRef Google Scholar

Daelemans, Walter, Gillis, Steven & Durieux, Gert (1994). The acquisition of stress: a data-oriented approach. Computational Linguistics 20. 421–451.Google Scholar

Daland, Robert, Hayes, Bruce, White, James, Garellek, Marc, Davis, Andrea & Norrmann, Ingrid (2011). Explaining sonority projection effects. Phonology 28. 197–234.CrossRef Google Scholar

Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39. 1–38.CrossRef Google Scholar

Dresher, B. Elan (1999). Charting the learning path: cues to parameter setting. LI 30. 27–67.Google Scholar

Dresher, B. Elan & Kaye, Jonathan D. (1990). A computational learning model for metrical phonology. Cognition 34. 137–195.CrossRef Google Scholar PubMed

Fischer, Marcus (2005). A Robbins-Monro type learning algorithm for an entropy maximizing version of Stochastic Optimality Theory. MA thesis, Humboldt University, Berlin.Google Scholar

Goldrick, Matthew (2011). Linking speech errors and generative phonological theory. Language and Linguistics Compass 5. 397–412.CrossRef Google Scholar

Goldsmith, John A. (1994). A dynamic computational theory of accent systems. In Cole, Jennifer & Kisseberth, Charles (eds.) Perspectives in phonology. Stanford: CSLI. 1–28.Google Scholar

Goldwater, Sharon & Johnson, Mark (2003). Learning OT constraint rankings using a Maximum Entropy model. In Spenador, Jennifer, Eriksson, Anders & Dahl, Östen (eds.) Proceedings of the Stockholm Workshop on Variation within Optimality Theory. Stockholm: Stockholm University. 111–120.Google Scholar

Gordon, Matthew (2002). A factorial typology of quantity-insensitive stress. NLLT 20. 491–552.Google Scholar

Gupta, Prahlad & Touretzky, David S. (1994). Connectionist models and linguistic theory: investigations of stress systems in language. Cognitive Science 18. 1–50.CrossRef Google Scholar

Hammond, Michael (2004). Gradience, phonotactics, and the lexicon in English phonology. International Journal of English Studies 4. 1–24.Google Scholar

Hayes, Bruce (1995). Metrical stress theory: principles and case studies. Chicago: University of Chicago Press.Google Scholar

Hayes, Bruce (2004). Phonological Acquisition in Optimality Theory: the early stages. In Kager, René, Pater, Joe & Zonneveld, Wim (eds.) Constraints in phonological acquisition. Cambridge: Cambridge University Press. 158–203.CrossRef Google Scholar

Hayes, Bruce & Londe, Zsuzsa Cziráky (2006). Stochastic phonological knowledge: the case of Hungarian vowel harmony. Phonology 23. 59–104.CrossRef Google Scholar

Hayes, Bruce & Wilson, Colin (2008). A maximum entropy model of phonotactics and phonotactic learning. LI 39. 379–440.Google Scholar

Hayes, Bruce, Zuraw, Kie, Siptár, Péter & Londe, Zsuzsa (2009). Natural and unnatural constraints in Hungarian vowel harmony. Lg 85. 822–863.Google Scholar

Heinz, Jeffrey (2009). On the role of locality in learning stress patterns. Phonology 26. 303–351.CrossRef Google Scholar

Hulst, Harry van der, Goedemans, Rob & van Zanten, Ellen (eds.) (2010). A survey of word accentual patterns in the languages of the world. Berlin & New York: De Gruyter Mouton.CrossRef Google Scholar

Hyde, Brett (2007). Non-finality and weight-sensitivity. Phonology 24. 287–334.CrossRef Google Scholar

Jäger, Gerhard (2007). Maximum entropy models and Stochastic Optimality Theory. In Zaenen, Annie, Simpson, Jane, King, Tracy Holloway, Grimshaw, Jane, Maling, Joan & Manning, Chris (eds.) Architectures, rules, and preferences: variations on themes by Joan W. Bresnan. Stanford: CSLI. 467–479.Google Scholar

Jäger, Gerhard & Rosenbach, Anette (2006). The winner takes it all – almost: cumulativity in grammatical variation. Linguistics 44. 937–971.CrossRef Google Scholar

Jarosz, Gaja (2006a). Rich lexicons and restrictive grammars: maximum likelihood learning in Optimality Theory. PhD dissertation, Johns Hopkins University.Google Scholar

Jarosz, Gaja (2006b). Richness of the Base and probabilistic unsupervised learning in Optimality Theory. In Wicentowski, Richard & Kondark, Grzegorz (eds.) Proceedings of the 8th Meeting of the ACL Special Interest Group in Computational Phonology. New York: Association for Computational Linguistics. 50–59.Google Scholar

Jarosz, Gaja (2010). Implicational markedness and frequency in constraint-based computational models of phonological learning. Journal of Child Language 37. 565–606.CrossRef Google Scholar PubMed

Jarosz, Gaja (to appear). Naive parameter learning for Optimality Theory: the hidden structure problem. NELS 40.Google Scholar

Jesney, Karen & Tessier, Anne-Michelle (2011). Biases in Harmonic Grammar: the road to restrictive learning. NLLT 29. 251–290.Google Scholar

Johnson, Mark (2002). Optimality-theoretic Lexical Functional Grammar. In Merlo, Paola & Stevenson, Suzanne (eds.) The lexical basis of sentence processing: formal, computational and experimental issues. Amsterdam & Philadelphia: Benjamins. 59–73.CrossRef Google Scholar

Keller, Frank (2000). Gradience in grammar: experimental and computational aspects of degrees of grammaticality. PhD dissertation, University of Edinburgh.Google Scholar

Keller, Frank & Asudeh, Ash (2002). Probabilistic learning algorithms and Optimality Theory. LI 33. 225–244.Google Scholar

Legendre, Géraldine, Miyata, Yoshiro & Smolensky, Paul (1990). Can connectionism contribute to syntax? Harmonic Grammar, with an application. CLS 26:1. 237–252.Google Scholar

Legendre, Géraldine, Sorace, Antonella & Smolensky, Paul (2006). The Optimality Theory–Harmonic Grammar connection. In Smolensky, & Legendre, (2006: vol. 2). 339–402.Google Scholar

Liberman, Mark & Prince, Alan (1977). On stress and linguistic rhythm. LI 8. 249–336.Google Scholar

McCarthy, John J. (2003). OT constraints are categorical. Phonology 20. 75–138.CrossRef Google Scholar

McCarthy, John J. & Prince, Alan (1993). Generalized alignment. Yearbook of Morphology 1993. 79–153.CrossRef Google Scholar

Magri, Giorgio (2012). Convergence of error-driven ranking algorithms. Phonology 29. 213–269.CrossRef Google Scholar

Martin, Andrew (2011). Grammars leak: modeling how phonotactic generalizations interact within the grammar. Lg 87. 751–770.Google Scholar

Merchant, Nazarré (2008). Discovering underlying forms: contrast pairs and ranking. PhD dissertation, Rutgers University.Google Scholar

Merchant, Nazarré & Tesar, Bruce (2008). Learning underlying forms by searching restricted lexical subspaces. CLS 41:2. 33–47.Google Scholar

Pater, Joe (2008). Gradual learning and convergence. LI 39. 334–345.Google Scholar

Pater, Joe (2009a). Review of Smolensky & Legendre (2006). Phonology 26. 217–226.CrossRef Google Scholar

Pater, Joe (2009b). Weighted constraints in generative linguistics. Cognitive Science 33. 999–1035.CrossRef Google Scholar PubMed

Pater, Joe (to appear). Canadian raising with language-specific weighted constraints. Lg.Google Scholar

Pearl, Lisa S. (2011). When unbiased probabilistic learning is not enough: acquiring a parametric system of metrical phonology. Language Acquisition 18. 87–120.CrossRef Google Scholar

Potts, Christopher, Pater, Joe, Jesney, Karen, Bhatt, Rajesh & Becker, Michael (2010). Harmonic Grammar with linear programming: from linear systems to linguistic typology. Phonology 27. 77–117.CrossRef Google Scholar

Prince, Alan (1990). Quantitative consequences of rhythmic organization. CLS 26:2. 355–398.Google Scholar

Prince, Alan (2002). Entailed ranking arguments. Ms, Rutgers University. Available as ROA-500 from the Rutgers Optimality Archive.Google Scholar

Prince, Alan (2010). Counting parses. Ms, Rutgers University. Available as ROA-1097 from the Rutgers Optimality Archive.Google Scholar

Prince, Alan & Smolensky, Paul (2004). Optimality Theory: constraint interaction in generative grammar. Malden, Mass. & Oxford: Blackwell.CrossRef Google Scholar

Pruitt, Kathryn (2010). Serialism and locality in constraint-based metrical parsing. Phonology 27. 481–526.CrossRef Google Scholar

Riggle, Jason (2009). The complexity of ranking hypotheses in Optimality Theory. Computational Linguistics 35. 47–59.CrossRef Google Scholar

Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review 65. 386–408.CrossRef Google Scholar PubMed

Rubach, Jerzy & Booij, Geert E. (1985). A grid theory of stress in Polish. Lingua 66. 281–319.CrossRef Google Scholar

Smolensky, Paul (1996). The initial state and ‘Richness of the Base’ in Optimality Theory. Ms, Johns Hopkins University. Available as ROA-154 from the Rutgers Optimality Archive.Google Scholar

Smolensky, Paul & Legendre, Géraldine (eds.) (2006). The harmonic mind: from neural computation to optimality-theoretic grammar. 2 vols. Cambridge, Mass.: MIT Press.Google Scholar

Soderstrom, Melanie, Mathis, Don & Smolensky, Paul (2006). Abstract genomic encoding of Universal Grammar in Optimality Theory. In Smolensky, & Legendre, (2006: vol. 2). 403–471.Google Scholar

Tesar, Bruce (1995). Computational Optimality Theory. PhD dissertation, University of Colorado, Boulder.Google Scholar

Tesar, Bruce (1998). An iterative strategy for language learning. Lingua 104. 131–145.CrossRef Google Scholar

Tesar, Bruce (2000). Using inconsistency detection to overcome structural ambiguity in language learning. Technical Report TR-58, Department of Computer Science, University of Colorado, Boulder. Available as ROA-426 from the Rutgers Optimality Archive.Google Scholar

Tesar, Bruce (2004). Using inconsistency detection to overcome structural ambiguity. LI 35. 219–253.Google Scholar

Tesar, Bruce (2006). Faithful contrastive features in learning. Cognitive Science 30. 863–903.CrossRef Google Scholar PubMed

Tesar, Bruce (2008). Output-driven maps. Ms, Rutgers University. Available as ROA-956 from the Rutgers Optimality Archive.Google Scholar

Tesar, Bruce (2011). Learning phonological grammars for output-driven maps. NELS 39. 785–798.Google Scholar

Tesar, Bruce, Alderete, John, Horwood, Graham, Merchant, Nazarré, Nishitani, Koichi & Prince, Alan (2003). Surgery in language learning. WCCFL 22. 477–490.Google Scholar

Tesar, Bruce & Smolensky, Paul (1998). Learnability in Optimality Theory. LI 29. 229–268.Google Scholar

Tesar, Bruce & Smolensky, Paul (2000). Learnability in Optimality Theory. Cambridge, Mass.: MIT Press.CrossRef Google Scholar

Tessier, Anne-Michelle (2009). Frequency of violation and constraint-based phonological learning. Lingua 119. 6–38.CrossRef Google Scholar

Wexler, Kenneth & Culicover, Peter W. (1980). Formal principles of language acquisition. Cambridge, Mass.: MIT Press.Google Scholar

Wilson, Colin (2006). Learning phonology with substantive bias: an experimental and computational study of velar palatalization. Cognitive Science 30. 945–982.CrossRef Google Scholar PubMed

Article contents

Learning with hidden structure in Optimality Theory and Harmonic Grammar: beyond Robust Interpretive Parsing*

Abstract

Access options

Footnotes

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests