Hostname: page-component-cd9895bd7-dk4vv Total loading time: 0 Render date: 2024-12-26T09:21:46.038Z Has data issue: false hasContentIssue false

A systematic review of unsupervised approaches to grammar induction

Published online by Cambridge University Press:  27 October 2020

Vigneshwaran Muralidaran
Affiliation:
School of English, Communication and Philosophy, Cardiff University, John Percival Building, Colum Drive, Cardiff, UK
Irena Spasić
Affiliation:
School of Computer Science and Informatics, Cardiff University, Queen’s Buildings, The Parade, Cardiff, UK
Dawn Knight*
Affiliation:
School of English, Communication and Philosophy, Cardiff University, John Percival Building, Colum Drive, Cardiff, UK
*
*Corresponding author. E-mail: knightd5@cardiff.ac.uk

Abstract

This study systematically reviews existing approaches to unsupervised grammar induction in terms of their theoretical underpinnings, practical implementations and evaluation. Our motivation is to identify the influence of functional-cognitive schools of grammar on language processing models in computational linguistics. This is an effort to fill any gap between the theoretical school and the computational processing models of grammar induction. Specifically, the review aims to answer the following research questions: Which types of grammar theories have been the subjects of grammar induction? Which methods have been employed to support grammar induction? Which features have been used by these methods for learning? How were these methods evaluated? Finally, in terms of performance, how do these methods compare to one another? Forty-three studies were identified for systematic review out of which 33 described original implementations of grammar induction; three provided surveys and seven focused on theories and experiments related to acquisition and processing of grammar in humans. The data extracted from the 33 implementations were stratified into 7 different aspects of analysis: theory of grammar; output representation; how grammatical productivity is processed; how grammatical productivity is represented; features used for learning; evaluation strategy and implementation methodology. In most of the implementations considered, grammar was treated as a generative-formal system, autonomous and independent of meaning. The parser decoding was done in a non-incremental, head-driven fashion by assuming that all words are available for the parsing model and the output representation of the grammar learnt was hierarchical, typically a dependency or a constituency tree. However, the theoretical and experimental studies considered suggest that a usage-based, incremental, sequential system of grammar is more appropriate than the formal, non-incremental, hierarchical view of grammar. This gap between the theoretical as well as experimental studies on one hand and the computational implementations on the other hand should be addressed to enable further progress in computational grammar induction research.

Type
Survey Paper
Copyright
© The Author(s), 2020. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Adriaans, P., Trautwein, M. and Vervoort, M. (2000). Towards high speed grammar induction on large text corpora. In International Conference on Current Trends in Theory and Practice of Computer Science. Berlin, Heidelberg: Springer, pp. 173186.Google Scholar
Araujo, L. and Santamaría, J. (2010). Evolving natural language grammars without supervision. In Evolutionary Computation (CEC), 2010 IEEE Congress on (pp. 1–8). IEEE.Google Scholar
Bates, E. and McWhinney, B. (1982). Functionalist approaches to grammar.Google Scholar
Berant, J., Gross, Y., Mussel, M., Sandbank, B., Ruppin, E. and Edelman, S. (2007). Boosting unsupervised grammar induction by splitting complex sentences on function words. In Proceedings of the Boston University Conference on Language Development.Google Scholar
Bloom, L., Hood, L. and Lightbown, P. (1974). Imitation in language development: if, when, and why. Cognitive Psychology 6, 380420.CrossRefGoogle Scholar
Bloomfield, L. (1962). Language. 1933. Holt, New York.Google Scholar
Bod, R. (2006). An all-subtrees approach to unsupervised parsing. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 865872.Google Scholar
Bod, R. (2007). A linguistic investigation into unsupervised DOP. In Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition. Association for Computational Linguistics, pp. 18.Google Scholar
Bod, R. (2009). From exemplar to grammar: a probabilistic analogy-based model of language learning. Cognitive Science 33, 752793.CrossRefGoogle ScholarPubMed
Boonkwan, P. and Steedman, M. (2011). Grammar induction from text using small syntactic prototypes. In Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 438446.Google Scholar
Brin, S. and Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 107117.CrossRefGoogle Scholar
Briscoe, T. and Waegner, N. (1992). Robust stochastic parsing using the inside-outside algorithm. In Proc. of the AAAI Workshop on Probabilistic-Based Natural Language Processing Techniques, pp. 3952.Google Scholar
Brodsky, P. and Waterfall, H. (2007). Characterizing motherese: on the computational structure of child-directed language. In Proceedings of the Annual Meeting of the Cognitive Science Society, Vol. 29.Google Scholar
Brooks, D.J. (2006). Unsupervised grammar induction by distribution and attachment. In Proceedings of the Tenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, pp. 117124.CrossRefGoogle Scholar
Chen, D. and Christopher, M. (2014). A fast and accurate dependency parser using neural networks. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).CrossRefGoogle Scholar
Chomsky, N. (1957). Syntactic Structures. The Hague: Mouton.CrossRefGoogle Scholar
Chomsky, N. (1965). Aspects of the Theory of Syntax. Cambridge, MA: MIT Press.Google Scholar
Chomsky, N. (1968). Remarks on Nominalization. Linguistics Club, Indiana University.Google Scholar
Chomsky, N. (2014). Aspects of the Theory of Syntax, vol. 11. MIT Press.Google Scholar
Clark, A. and Lappin, S. (2010). Linguistic Nativism and the Poverty of the Stimulus. John Wiley & Sons.Google Scholar
Cocos, A., Masino, A., Qian, T., Pavlick, E. and Callison-Burch, C. (2015). Effectively crowdsourcing radiology report annotations. In Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis, pp. 109114.CrossRefGoogle Scholar
Cramer, B. (2007). Limitations of current grammar induction algorithms. In Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop. Association for Computational Linguistics, pp. 4348.CrossRefGoogle Scholar
Dalrymple, M. (2001). Lexical Functional Grammar. Brill.CrossRefGoogle Scholar
Dennis, S.J. (2005). An exemplar-based approach to unsupervised parsing.Google Scholar
Dik, S. (1987). Some principles of functional grammar. Functionalism in Linguistics 20, 81.CrossRefGoogle Scholar
Dik, S. (1991). Functional grammar. Linguistic theory and grammatical description, pp. 247274.CrossRefGoogle Scholar
D’Ulizia, A., Ferri, F. and Grifoni, P. (2011). A survey of grammatical inference methods for natural language learning. Artificial Intelligence Review 36, 127.CrossRefGoogle Scholar
Dunn, J. (2017a). Learnability and falsifiability of construction grammars. Proceedings of the Linguistic Society of America 2, 1.CrossRefGoogle Scholar
Dunn, J. (2017b). Computational learning of construction grammars. Language and Cognition 9, 254292.CrossRefGoogle Scholar
Dominguez, M.A. and Infante-Lopez, G. (2011). Unsupervised induction of dependency structures using Probabilistic Bilexical Grammars. In Natural Language Processing and Knowledge Engineering (NLP-KE), 2011 7th International Conference on (pp. 314–318). IEEE.Google Scholar
Edelman, S., Solan, Z., Horn, D. and Ruppin, E. (2003). Rich syntax from a raw corpus: unsupervised does it. In NIPS-2003 Workshop on Syntax, Semantics and Statistics.Google Scholar
Edelman, S., Solan, Z., Horn, D. and Ruppin, E. (2005). Learning syntactic constructions from raw corpora. In 29th Boston University Conference on Language Development.Google Scholar
Ellefson, M.R. and Christiansen, M.H. (2000). Subjacency constraints without universal grammar: Evidence from artificial language learning and connectionist modeling. In Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 22, No. 22.Google Scholar
Evans, V. (2006). Cognitive Linguistics. Edinburgh University Press.Google Scholar
Falk, Y. (2011). Lexical-Functional Grammar. Oxford University Press.Google Scholar
Fauconnier, G. (1994). Mental Spaces: Aspects of Meaning Construction in Natural Language. Cambridge University Press.CrossRefGoogle Scholar
Frank, S.L., Bod, R. and Christiansen, M.H. (2012). How hierarchical is language use? Proceedings of the Royal Society of London B: Biological Sciences, p.rspb20121741.Google ScholarPubMed
Gazdar, G., Klein, E., Pullum, G.K. and Sag, I.A. (1985). Generalized Phrase Structure Grammar. Harvard University Press.Google Scholar
Gillenwater, J., Ganchev, K., Pereira, F. and Taskar, B. (2011). Posterior sparsity in unsupervised dependency parsing. Journal of Machine Learning Research 12, 455490.Google Scholar
Givón, T. (1983). Topic Continuity in Discourse: A Quantitative Cross-Language Study, vol. 3. John Benjamins Publishing.CrossRefGoogle Scholar
Goldberg, A.E. (2003). Constructions: a new theoretical approach to language. Trends in Cognitive Sciences 7, 219224.CrossRefGoogle Scholar
Hajic, J., Hajicová, E., Panevová, J., Sgall, P., Bojar, O., Cinková, S., Fucíková, E., Mikulová, M., Pajas, P., Popelka, J. and Semecký, J. (2012). Announcing Prague Czech-English Dependency Treebank 2.0. In LREC, pp. 31533160.Google Scholar
Hampe, B. and Grady, J.E. (eds.) (2005). From Perception to Meaning: Image Schemas in Cognitive Linguistics, vol. 29. Walter de Gruyter.CrossRefGoogle Scholar
Harrison, C., Nuttall, L., Stockwell, P. and Yuan, W. (2014). Introduction: cognitive grammar in literature. In Cognitive Grammar in Literature. John Benjamins, pp. 116.CrossRefGoogle Scholar
Harrison, M.A. (1978). Introduction to Formal Language Theory. Addison-Wesley Longman Publishing Co., Inc.Google Scholar
Headden, W.P. III, Johnson, M. and McClosky, D. (2009). Improving unsupervised dependency parsing with richer contexts and smoothing. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 101109.Google Scholar
Jackendoff, R. (1977). X syntax: A study of phrase structure. Linguistic Inquiry Monographs 4. Cambridge, Mass., (2), pp. 1249.Google Scholar
Jensen, K.E. (2014). Performance and competence in usage-based construction grammar. In Multidisciplinary Perspectives on Linguistic Competences, pp. 157188.Google Scholar
Jin, L., Doshi-Velez, F., Miller, T., Schuler, W. and Schwartz, L. (2018). Unsupervised Grammar Induction with Depth-bounded PCFG. arXiv preprint arXiv:1802.08545.Google Scholar
Jones, K.S. (2007). Computational linguistics: what about the linguistics? Computational Linguistics 33, 437441.CrossRefGoogle Scholar
Joshi, A.K. (1985). Tree adjoining grammars: How much context-sensitivity is required to provide reasonable structural descriptions? In Dowty, D.R., Karttunen, L. & Zwicky, A.M. (eds), Natural language parsing, Cambridge University Press, pp. 206250.CrossRefGoogle Scholar
Kaiser, E. and Trueswell, J.C. (2004). The role of discourse context in the processing of a flexible word-order language. Cognition 94, 113147.CrossRefGoogle ScholarPubMed
Kitchenham, B. and Charters, S. (2007). Guidelines for performing systematic literature reviews in software engineering.Google Scholar
Kiperwasser, E. and Yoav, G. (2016) Simple and accurate dependency parsing using bidirectional LSTM feature representations. Transactions of the Association for Computational Linguistics 4, 313327.CrossRefGoogle Scholar
Klein, D. and Manning, C.D. (2004). Corpus-based induction of syntactic structure: models of dependency and constituency. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, p. 478.Google Scholar
Król-Markefka, A. (2014). Between usage-based and meaningfully-motivated grammatical rules: a psycholinguistic basis of applied cognitive grammar. Studia Linguistica Universitatis Iagellonicae Cracoviensis 131, 43.Google Scholar
Lakoff, G. and Johnson, M. (1980). Conceptual metaphor in everyday language. The Journal of Philosophy 77, 453486.CrossRefGoogle Scholar
Lakoff, G. (1988). Cognitive semantics. Meaning and Mental Representations 119, 154.Google Scholar
Langacker, R.W. (1987). Foundations of Cognitive Grammar: Theoretical Prerequisites, vol. 1. Stanford university press.Google Scholar
Langacker, R.W. (2008). Cognitive Grammar: A Basic Introduction. Oxford University Press.CrossRefGoogle Scholar
Langacker, R.W. (2009). Investigations in Cognitive Grammar, vol. 42. Walter de Gruyter.CrossRefGoogle Scholar
Lawrence, S., Giles, C.L. and Fong, S. (2000). Natural language grammatical inference with recurrent neural networks. IEEE Transactions on Knowledge and Data Engineering 12, 126140.CrossRefGoogle Scholar
Leech, G.N. (1993). Statistically-Driven Computer Grammars of English: The IBM/Lancaster Approach (No. 8). Rodopi.Google Scholar
Levine, R.D. and Meurers, W.D. (2006). Head-driven phrase structure grammar. Encyclopedia of Language and Linguistics 5, 237252.CrossRefGoogle Scholar
Mareček, D. and Žabokrtský, Z. (2012a). Unsupervised dependency parsing using reducibility and fertility features. In Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure. Association for Computational Linguistics, pp. 8489.Google Scholar
Mareček, D. and Žabokrtský, Z. (2012b). Exploiting reducibility in unsupervised dependency parsing. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, pp. 297307.Google Scholar
Marques, T. and Beuls, K. (2016). Evaluation strategies for computational construction grammars. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 11371146.Google Scholar
Matthiessen, C.M. and Halliday, M.A.K. (2009). Systemic functional grammar: a first step into the theory.Google Scholar
Moshier, M. (1988). Extensions to unification grammar for the description of programming languages.Google Scholar
Neves, M. and Ševa, J. (2019). An extensive review of tools for manual annotation of documents. Briefings in Bioinformatics.Google Scholar
Nichols, J. (1984). Functional theories of grammar. Annual Review of Anthropology 13, 97117.CrossRefGoogle Scholar
Östman, J.O. and Fried, M. (eds) (2005). Construction Grammars: Cognitive Grounding and Theoretical Extensions, vol. 3. John Benjamins Publishing.CrossRefGoogle Scholar
Paillet, J.P. (1973). Computational linguistics and linguistic theory. In Proceedings of the 5th Conference on Computational Linguistics, vol. 2. Association for Computational Linguistics, pp. 357366.CrossRefGoogle Scholar
Petticrew, M. 2001. Systematic reviews from astronomy to zoology: myths and misconceptions. British Medical Journal 322, 98101.CrossRefGoogle ScholarPubMed
Pollard, C. and Sag, I.A. (1994). Head-Driven Phrase Structure Grammar. University of Chicago Press.Google Scholar
Ponvert, E., Baldridge, J. and Erk, K. (2011). Simple unsupervised grammar induction from raw text with cascaded finite state models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1. Association for Computational Linguistics, pp. 10771086.Google Scholar
Post, M. and Gildea, D. (2013). Bayesian tree substitution grammars as a usage-based approach. Language and Speech 56, 291308.CrossRefGoogle ScholarPubMed
Radford, A. (1981). Transformational Syntax: A Student’s Guide to Chomsky’s Extended Standard Theory. Cambridge University Press.Google Scholar
Reichart, R. and Rappoport, A. (2008). Unsupervised induction of labeled parse trees by clustering with syntactic features. In Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1. Association for Computational Linguistics, pp. 721728.Google Scholar
Rimell, L., Clark, S. and Steedman, M. (2009). Unbounded dependency recovery for parser evaluation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2. Association for Computational Linguistics, pp. 813821.Google Scholar
Roche, E. and Schabes, Y. (eds) (1997). Finite-State Language Processing. MIT press.CrossRefGoogle Scholar
Saffran, J.R. (2001). The use of predictive dependencies in language learning. Journal of Memory and Language 44, 493515.CrossRefGoogle Scholar
Sangati, F. (2010). A probabilistic generative model for an intermediate constituency-dependency representation. In Proceedings of the ACL 2010 Student Research Workshop. Association for Computational Linguistics, pp. 1924.Google Scholar
Santamaria, J. and Araujo, L. (2010). Identifying patterns for unsupervised grammar induction. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, pp. 3845.Google Scholar
Schabes, Y., Roth, M. and Osborne, R. (1993). Parsing the Wall Street Journal with the inside-outside algorithm. In Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 341347.CrossRefGoogle Scholar
Seginer, Y. (2007). Fast unsupervised incremental parsing. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 384391.Google Scholar
Skinner, B.F. (2014). Verbal Behavior. BF Skinner Foundation.Google Scholar
Snyder, B., Naseem, T. and Barzilay, R. (2009). Unsupervised multilingual grammar induction. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 1. Association for Computational Linguistics, pp. 7381.Google Scholar
Solan, Z., Horn, D., Ruppin, E. and Edelman, S. (2004). Unsupervised context sensitive language acquisition from a large corpus. In Advances in Neural Information Processing Systems, pp. 961968.Google Scholar
Søgaard, A. (2011). From ranked words to dependency trees: two-stage unsupervised non-projective dependency parsing. In Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing. Association for Computational Linguistics, pp. 6068.Google Scholar
Spitkovsky, V.I., Alshawi, H. and Jurafsky, D. (2010). From baby steps to leapfrog: How less is more in unsupervised dependency parsing. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 751759.Google Scholar
Spitkovsky, V.I., Alshawi, H. and Jurafsky, D. (2011). Punctuation: Making a point in unsupervised dependency parsing. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, pp. 1928.Google Scholar
Spitkovsky, V.I., Alshawi, H. and Jurafsky, D. (2012). Capitalization cues improve dependency grammar induction. In Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure. Association for Computational Linguistics, pp. 1622.Google Scholar
Taylor, A., Marcus, M. and Santorini, B. (2003). The Penn treebank: an overview. In Treebanks. Dordrecht: Springer, pp. 522.CrossRefGoogle Scholar
Yang, C. (2011). A statistical test for grammar. In Proceedings of the 2nd workshop on Cognitive Modeling and Computational Linguistics. Association for Computational Linguistics, pp. 3038.Google Scholar
Zuidema, W. (2006). What are the productive units of natural language grammar?: a DOP approach to the automatic identification of constructions. In Proceedings of the Tenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, pp. 2936.CrossRefGoogle Scholar