Hostname: page-component-78c5997874-s2hrs Total loading time: 0 Render date: 2024-11-10T17:15:12.511Z Has data issue: false hasContentIssue false

Exploring patterns in dictionary definitions for synonym extraction

Published online by Cambridge University Press:  11 July 2011

TONG WANG
Affiliation:
Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4, Canada e-mail: tong@cs.toronto.edu, gh@cs.toronto.edu
GRAEME HIRST
Affiliation:
Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4, Canada e-mail: tong@cs.toronto.edu, gh@cs.toronto.edu

Abstract

Automatic determination of synonyms and/or semantically related words has various applications in Natural Language Processing. Two mainstream paradigms to date, lexicon-based and distributional approaches, both exhibit pros and cons with regard to coverage, complexity, and quality. In this paper, we propose three novel methods—two rule-based methods and one machine learning approach—to identify synonyms from definition texts in a machine-readable dictionary. Extracted synonyms are evaluated in two extrinsic experiments and one intrinsic experiment. Evaluation results show that our pattern-based approach achieves best performance in one of the experiments and satisfactory results in the other, comparable to corpus-based state-of-the-art results.

Type
Articles
Copyright
Copyright © Cambridge University Press 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alshawi, H. 1987. Processing dictionary definitions with phrasal pattern hierarchies. Computational Linguistics 13 (3–4): 195202.Google Scholar
Amsler, R. 1980. The Structure of the Merriam-Webster Pocket Dictionary. PhD thesis, The University of Texas, Austin, TX.Google Scholar
Barnbrook, G. 2002. Defining Language: A Local Grammar of Definition Sentences, Amsterdam, The Netherlands: John Benjamins.Google Scholar
Barzilay, R. and McKeown, K. 2001. Extracting paraphrases from a parallel corpus. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, pp. 5057, Toulouse, France.Google Scholar
Bikel, D. and Castelli, V. 2008. Event matching using the transitive closure of dependency relations. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 145148, Columbus, Ohio, USA.Google Scholar
Blondel, V. and Senellart, P. 2002. Automatic extraction of synonyms in a dictionary. In Proceedings of the Society for Industrial and Applied Mathematics Workshop on Text Mining, pp. 713, Arlington, Virginia, USA.Google Scholar
Boguraev, B. and Briscoe, T. 1989. Introduction to Computational Lexicography for Natural Language Processing, White Plains, NY: Longman.Google Scholar
Chodorow, M., Byrd, R. and Heidorn, G. 1985. Extracting semantic hierarchies from a large on-line dictionary. In Proceedings of the 23rd Annual Meeting of the Association for Computational Linguistics, pp. 299304, Chicago, Illinois, USA.Google Scholar
Curran, J. 2002. Ensemble methods for automatic thesaurus extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 222229, Philadelphia, Pennsylvania, USA.Google Scholar
Delbridge, A. ed. 1981. The Macquarie Dictionary. Australia: Macquarie Library, McMahons Point, NSW.Google Scholar
Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.Google Scholar
Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., Rohwer, R., and Wang, Z. 2005. New experiments in distributional representations of synonymy. In Proceedings of the 9th Conference on Computational Natural Language Learning, pp. 2532, Ann Arbor, Michigan, USA.Google Scholar
Gabrilovich, E. and Markovitch, S. 2007. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 612, Hyderabad, India.Google Scholar
Guthrie, J., Guthrie, L., Aidinejad, H. and Wilks, Y. 1991. Subject-dependent cooccurrence and word sense disambiguation. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, pp. 146152, Berkeley, California, USA.Google Scholar
Guthrie, L., Slator, B., Wilks, Y. and Bruce, R. 1990. Is there content in empty heads. In Proceedings of the 13th Conference on Computational Linguistics, pp. 138143, Helsinki, Finland.Google Scholar
Hagiwara, M. 2008. A supervised learning approach to automatic synonym identification based on distributional features. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 16, Columbus, Ohio, USA.Google Scholar
Harris, Z. 1954. Distributional structure. Word 10 (23): 146162.Google Scholar
Hearst, M. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International Conference on Computational Linguistics, vol. 2, pp. 539545, Nantes, France.Google Scholar
Ho, N.-D. and Cédrick, F. 2004. Lexical similarity based on quantity of information exchanged – synonym extraction. In Proceedings of the Research Informatics Vietnam-Francophony, Hanoi, Vietnam, pp. 193–198.Google Scholar
Jarmasz, M. and Szpakowicz, S. 2003. Roget's thesaurus and semantic similarity. In Proceedings of International Conference on Recent Advances in Natural Language Processing, pp. 212219, Borovets, Bulgaria.Google Scholar
Jurafsky, D. and Martin, J. 2008. Speech and Language Processing, 2nd ed.Upper Saddle River, NJ: Pearson Education.Google Scholar
Landauer, T. and Dumais, S. 1997. A solution to Plato's problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review 104 (2): 211240.CrossRefGoogle Scholar
Lesk, M. 1986. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 2426, New York, USA.Google Scholar
Lin, D. 1998. Automatic retrieval and clustering of similar words. In Proceedings of the 17th International Conference on Computational Linguistics, pp. 768774, Montreal, Canada.Google Scholar
Lin, D., Zhao, S., Qin, L. and Zhou, M. 2003. Identifying synonyms among distributionally similar words. In Proceedings of International Joint Conference of Artificial Intelligence, pp. 14921493, Acapulco, Mexico.Google Scholar
Mandala, R., Tokunaga, T. and Tanaka, H. 1999. Combining multiple evidence from different types of thesaurus for query expansion. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 191197, Berkeley, California, USA.Google Scholar
McCarthy, D., and Navigli, R. 2009. The English lexical substitution task. Language Resources and Evaluation 43 (2): 139159.Google Scholar
Mish, F. ed. 2003. Merriam-Webster's Collegiate Dictionary, 11th ed.Springfield, MA: Merriam-Webster.Google Scholar
Mohammad, S. and Hirst, G. 2006. Distributional measures of concept-distance: a task-oriented evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 3543, Sydney, Australia.Google Scholar
Mohammad, S., Dorr, B. and Hirst, G. 2008. Computing word-pair antonymy. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 982991, Honolulu, Hawaii, USA.Google Scholar
Muller, P., Hathout, N. and Bruno, G. 2006. Synonym extraction using a semantic distance on a dictionary. In Proceedings of TextGraphs: The Second Workshop on Graph Based Methods for Natural Language Processing, pp. 6572, New York, USA.Google Scholar
Navigli, R. 2009. Using cycles and quasi-cycles to disambiguate dictionary glosses. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 594602, Athens, Greece.Google Scholar
Page, L., Brin, S., Motwani, R. and Winograd, T. 1999. The PageRank citation ranking: bringing order to the web. Technical Report 1999-66, Stanford Digital Library Technologies Project.Google Scholar
Procter, P. ed. 1978. Longman Dictionary of Contemporary English. London, UK: Longman.Google Scholar
Reichert, R., Olney, J. and Paris, J. 1969. Two dictionary transcripts and programs for processing them – the encoding scheme, Parsent and Conix, vol. 1. DTIC Research Report AD0691098.Google Scholar
Roget, P. 1911. Roget's Thesaurus of English Words and Phrases. New York, NY: TY Crowell.Google Scholar
Shimohata, M. and Sumita, E. 2002. Automatic paraphrasing based on parallel corpus for normalization. In Proceedings of the 3rd International Conference on Language Resources and Evaluation, pp. 453457, Canary Islands, Spain.Google Scholar
Turney, P., Littman, M., Bigham, J. and Shnayder, V. 2003. Combining independent modules to solve multiple-choice synonym and analogy problems. In Proceedings of the International Conference on Recent Advances in Natural Language Processing, pp. 482489, Borovets, Bulgaria.Google Scholar
Van der Plas, L., and Tiedemann, J. 2006. Finding synonyms using automatic word alignment and measures of distributional similarity. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 866873, Sydney, Australia.Google Scholar
Wu, H. and Zhou, M. 2003. Optimizing synonym extraction using monolingual and bilingual resources. In Proceedings of the 2nd International Workshop on Paraphrasing, pp. 7279, Jeju Island, South Korea.Google Scholar