Hostname: page-component-78c5997874-g7gxr Total loading time: 0 Render date: 2024-11-10T15:16:06.122Z Has data issue: false hasContentIssue false

Lexical acquisition and semantic space models: Learning the semantics of unknown words

Published online by Cambridge University Press:  05 March 2013

KOSTADIN CHOLAKOV*
Affiliation:
University of Groningen, Oude Kijk in 't Jatstraat 26, 9712EK Groningen, The Netherlands e-mail: k.cholakov@rug.nl

Abstract

In recent studies it has been shown that syntax-based semantic space models outperform models in which the context is represented as a bag-of-words in several semantic analysis tasks. This has been generally attributed to the fact that syntax-based models employ corpora that are syntactically annotated by a parser and a computational grammar. However, if the corpora processed contain words which are unknown to the parser and the grammar, a syntax-based model may lose its advantage since the syntactic properties of such words are unavailable. On the other hand, bag-of-words models do not face this issue since they operate on raw, non-annotated corpora and are thus more robust. In this paper, we compare the performance of syntax-based and bag-of-words models when applied to the task of learning the semantics of unknown words. In our experiments, unknown words are considered the words which are not known to the Alpino parser and grammar of Dutch. In our study, the semantics of an unknown word is defined by finding its most similar word in cornetto, a Dutch lexico-semantic hierarchy. We show that for unknown words the syntax-based model performs worse than the bag-of-words approach. Furthermore, we show that if we first learn the syntactic properties of unknown words by an appropriate lexical acquisition method, then in fact the syntax-based model does outperform the bag-of-words approach. The conclusion we draw is that, for words unknown to a given grammar, a bag-of-words model is more robust than a syntax-based model. However, the combination of lexical acquisition and syntax-based semantic models is best suited for learning the semantics of unknown words.

Type
Articles
Copyright
Copyright © Cambridge University Press 2013 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Almuhareb, A., and Poesio, M. 2004. Attribute-based and value-based clustering: an evaluation. In Proceedings of EMNLP 2004, Edinburgh, UK, pp. 158–65.Google Scholar
Baldwin, T. 2005. General-purpose lexical acquisition: Procedures, questions and results. In Proceedings of the Pacific Association for Computational Linguistics, Tokyo, Japan, pp. 2332.Google Scholar
Barg, P., and Walther, M. 1998. Processing unknown words in HPSG. In Proceedings of the 36th Conference of the ACL, Montreal, Quebec, Canada, pp. 91–5.Google Scholar
Berry, M. W., Dumais, S. T., and O'Brien, G. W. 1994. Using linear algebra for intelligent information retrieval. SIAM Review 37: 573–95.CrossRefGoogle Scholar
Cholakov, K., Kordoni, V., and Zhang, Y. 2008. Towards domain-independent deep linguistic processing: ensuring portability and re-usability of lexicalised grammars. In Proceedings of COLING 2008 Workshop on Grammar Engineering Across Frameworks (GEAF08), Manchester, UK, pp. 5764.Google Scholar
Cholakov, K. and van Noord, G. 2009. Combining finite state and corpus-based techniques for unknown word prediction. In Proceedings of the 7th Conference on Recent Advances in Natural Language Processing (RANLP), Borovets, Bulgaria, pp. 6065.Google Scholar
Cholakov, K. and van Noord, G. 2010. Acquisition of unknown word paradigms for large-scale grammars. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING-2010), Beijing, China, pp. 153–61.Google Scholar
Cholakov, K., van Noord, G., Kordoni, V., and Zhang, Y. 2011. Adaptability of lexical acquisition for large-scale grammars. In Proceedings of the 8th Conference on Recent Advances in Natural Language Processing (RANLP), Hissar, Bulgaria, pp. 355–62.Google Scholar
Church, K. W., and Hanks, P. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics 16 (1): 22–9.Google Scholar
Copestake, A., and Flickinger, D. 2000. An open-source grammar development environment and broad-coverage English grammar using HPSG. In Proceedings of the 2nd International Conference on Language Resource and Evaluation (LREC 2000), Athens, Greece.Google Scholar
Crysmann, B. 2003. On the efficient implementation of German verb placement in HPSG. In Proceedings of RANLP 2003, Borovets, Bulgaria.Google Scholar
Curran, J. R., and Moens, M. 2002. Improvements in automatic thesaurus extraction. In Proceedings of the ACL 2002 Workshop on Unsupervised Lexical Acquisition, Philadelphia, PA, pp. 5966.Google Scholar
Erbach, G. 1990. Syntactic processing of unknown words. IWBS Technical report 131, IBM, Stuttgart.Google Scholar
Erk, K. 2007. A simple, similarity-based model for selectional preferences. In Proceedings of the 45th ACL Meeting, Prague, Czech Republic, pp. 216–23.Google Scholar
Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: The MIT Press.Google Scholar
Fouvry, F. 2003. Lexicon acquisition with a large-coverage unification-based grammar. In Companion to the 10th Conference of EACL, Budapest, Hungary, pp. 8790.Google Scholar
Golub, G. H. and Van Loan, C. F. 1996. Matrix Computations, vol. 3. St Baltimore, MD: Johns Hopkins Univ. Press.Google Scholar
Grefenstette, G. 1994. Explorations in Automatic Thesaurus Discovery. New York: Springer.CrossRefGoogle Scholar
Horák, A., Vossen, P., and Rambousek, A. 2008. The development of a complex-structured lexicon based on WordNet. In Proceedings of the 4th International Global WordNet Conference (GWC-2008), Szeged, Hungary, pp. 200–8.Google Scholar
Lin, D. 1998a. Automatic retrieval and clustering of similar words. In Proceedings of the 17th International Conference on Computational Linguistics, Montreal, Canada, pp. 768–74.Google Scholar
Lin, D. 1998b. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, Madison, WI, pp. 296304.Google Scholar
Lowe, W. 2001. Towards a theory of semantic space. In Proceedings of the 2nd Annual Conference of the Cognitive Science Society, Edinburgh, UK, pp. 576–81.Google Scholar
Malouf, R. 2002. A comparison of algorithms for maximum entropy parameter estimation. In Proceedings of the 6th conference on Natural Language Learning (CoNLL-2002), Taipei, Taiwan, pp. 4955.Google Scholar
McCarthy, D., Koeling, R., Weeds, J., and Carroll, J. 2004. Finding predominant word senses in untagged text. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Stroudsburg, PA, pp. 279–86.Google Scholar
Miller, G. A., and Charles, W. G. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes 6 (1): 128.Google Scholar
Ordelman, R. J. F. 2002. Twente nieuws corpus (TwNC). Technical report, Parlevink Language Technology Group, University of Twente, Enschede, Netherlands.Google Scholar
Padó, S., and Lapata, M. 2007. Dependency-based construction of semantic space models. Computational Linguistics 33 (2): 161–99.Google Scholar
Rapp, R. 2004. A freely available automatically generated thesaurus of related words. In Proceedings of the 4th Language Resources and Evaluation Conference (LREC 2004), Lisbon, Portugal, pp. 395–8.Google Scholar
Rothenhäusler, K., and Schütze, H. 2009. Unsupervised classification with dependency-based word spaces. In Proceedings of the Workshop on Geometrical Models of Natural Language Semantics, Singapore, pp. 1724.Google Scholar
Salton, G., Wong, A., and Yang, C. S. 1975. A vector space model for automatic indexing. Communications of the ACM 18: 613–20.Google Scholar
Schütze, H. 1998. Automatic word sense discrimination. Computational Linguistics 24 (1): 97123.Google Scholar
Turney, Peter D., and Pantel, P. 2010. From frequency to meaning. Vector space models of semantics. Journal of Artificial Intelligence Research 37 (1): 141–88.Google Scholar
Van de Cruys, T. 2008. A comparison of bag of words and syntax-based approaches for word categorization. In Proceedings of the ESSLLI Workshop on Distributional Lexical Semantics. Bridging the Gap Between Semantic Theory and Computational Simulations, Hamburg, Germany, pp. 4754.Google Scholar
Van der Plas, L., and Tiedemann, J. 2006. Finding synonyms using automatic word alignment and measures of distributional similarity. In Proceedings of the COLING-ACL Joint Conference, Sydney, Australia, pp. 866–73.Google Scholar
van Noord, G. 2006. At last parsing is now operational. In Proceedings of TALN, Leuven, Belgium, pp. 2042.Google Scholar
Vossen, P. (ed.) 1998. EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Dordrecht, Netherlands: Kluwer.CrossRefGoogle Scholar
Wu, Z., and Palmer, M. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, Las Cruces, New Mexico, pp. 133–8.Google Scholar
Zhang, Y., and Kordoni, V. 2006. Automated deep lexical acquisition for robust open text processing. In Proceedings of the 5th International Conference on Language Recourses and Evaluation (LREC 2006), Genoa, Italy, pp. 275–80.Google Scholar