Hostname: page-component-cd9895bd7-hc48f Total loading time: 0 Render date: 2024-12-26T08:18:09.742Z Has data issue: false hasContentIssue false

Morphological disambiguation of Hebrew: a case study in classifier combination

Published online by Cambridge University Press:  26 July 2012

GENNADI LEMBERSKY
Affiliation:
Department of Computer Science, University of Haifa, Haifa, Israel e-mails: glembers@campus.haifa.ac.il, danny@shach.am, shuly@cs.haifa.ac.il
DANNY SHACHAM
Affiliation:
Department of Computer Science, University of Haifa, Haifa, Israel e-mails: glembers@campus.haifa.ac.il, danny@shach.am, shuly@cs.haifa.ac.il
SHULY WINTNER
Affiliation:
Department of Computer Science, University of Haifa, Haifa, Israel e-mails: glembers@campus.haifa.ac.il, danny@shach.am, shuly@cs.haifa.ac.il

Abstract

Morphological analysis and disambiguation are crucial stages in a variety of natural language processing applications, especially when languages with complex morphology are concerned. We present a system which disambiguates the output of a morphological analyzer for Hebrew. It consists of several simple classifiers and a module that combines them under the constraints imposed by the analyzer. We explore several approaches to classifier combination, as well as a back-off mechanism that relies on a large unannotated corpus. Our best result, around 83 percent accuracy, compares favorably with the state of the art on this task.

Type
Articles
Copyright
Copyright © Cambridge University Press 2012 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Adler, M., and Elhadad, M. July 2006. An unsupervised morpheme-based hmm for hebrew morphological disambiguation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, pp. 665–72. Stroudsburg, PA: Association for Computational Linguistics. http://www.aclweb.org/anthology/P/P06/P06-1084.Google Scholar
Adler, M. September 2007. Hebrew Morphological Disambiguation: An Unsupervised Stochastic Word-based Approach. PhD thesis, Ben-Gurion University.Google Scholar
Adler, M., Goldberg, Y., Gabay, D., and Elhadad, M. June 2008a. Unsupervised lexicon-based resolution of unknown words for full morphological analysis. In Proceedings of ACL-08: HLT, Columbus, Ohio, pp. 728736. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/P/P08/P08-1083.Google Scholar
Adler, M., Netzer, Y., Goldberg, Y., Gabay, D., and Elhadad, M. May 2008b. Tagging a Hebrew corpus: the case of participles. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA). ISBN 2-9517408-4-0. http://www.lrec-conf.org/proceedings/lrec2008/.Google Scholar
Bar-Haim, R., Sima'an, K., and Winter, Y. June 2005. Choosing an optimal architecture for segmentation and POS-tagging of Modern Hebrew. In Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages, Ann Arbor, MI, pp. 3946. Stroudsburg, PA: Association for Computational Linguistics. http://www.aclweb.org/anthology/W/W05/W05-0706.CrossRefGoogle Scholar
Bar-Haim, R., Sima'an, K., and Winter, Y. 2008. Part-of-speech tagging of Modern Hebrew text. Natural Language Engineering 14 (2): 223–51.CrossRefGoogle Scholar
Bentur, E., Angel, A., and Segev, D. December 1992. Computerized analysis of Hebrew words. Hebrew Linguistics 36: 33–8 (in Hebrew).Google Scholar
Brill, E. 1995. Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Computational Linguistics 21 (4): 543–66.Google Scholar
Carmel, D., and Maarek, Y. July 1999. Morphological disambiguation for Hebrew search systems. In Proceedings of the 4th International Workshop, NGITS-99, Lecture Notes in Computer Science, no. 1649, pp. 312–25. New York: Springer.Google Scholar
Cutting, D., Kupiec, J., Pedersen, J., and Sibun, P. March 1992. A practical part-of-speech tagger. In Proceedings of the Third Conference on Applied Natural Language Processing, Trento, Italy, pp. 133140. Association for Computational Linguistics. doi: 10.3115/974499.974523. URL http://www.aclweb.org/anthology/A92-1018.CrossRefGoogle Scholar
Cohen, S. B., and Smith, N. A. June 2007. Joint morphological and syntactic disambiguation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, pp. 208–17. Stroudsburg, PA: Association for Computational Linguistics. http://www.aclweb.org/anthology/D/D07/D07-1022.Google Scholar
Daya, E., Roth, D., and Wintner, S. July 2004. Learning Hebrew roots: machine learning with linguistic constraints. In Proceedings of EMNLP'04, Barcelona, Spain, pp. 357–64.Google Scholar
Daya, E., Roth, D., and Wintner, S. September 2008. Identifying semitic roots: machine learning with linguistics constraints. Computational Linguistics 34 (3): 429–48.CrossRefGoogle Scholar
Florian, R. 2002. Named entity recognition as a house of cards: classifier stacking. In Proceedings of CoNLL-2002, Taiwan, pp. 175–8.Google Scholar
Goldberg, Y., and Tsarfaty, R. June 2008. A single generative model for joint morphological segmentation and syntactic parsing. In Proceedings of ACL-08: HLT, Columbus, OH, pp. 371–9. Stroudsburg, PA: Association for Computational Linguistics. http://www.aclweb.org/anthology/P/P08/P08-1043.Google Scholar
Habash, N., and Rambow, O. June 2005. Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), Ann Arbor, MI, pp. 573–80. Stroudsburg, PA: Association for Computational Linguistics. http://www.aclweb.org/anthology/P/P05/P05-1071.CrossRefGoogle Scholar
Hajič, J. 2000. Morphological tagging: data vs. dictionaries. In Proceedings of ANLP-NAACL Conference, Seattle, WA, pp. 94101.Google Scholar
Hajič, J., and Hladká, B. 1998. Tagging inflective languages: prediction of morphological categories for a rich, structured tagset. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Stroudsburg, PA, pp. 483–90. Stroudsburg, PA: Association for Computational Linguistics. http://dx.doi.org/10.3115/980845.980927.Google Scholar
Itai, A., and Wintner, S. March 2008. Language resources for Hebrew. Language Resources and Evaluation 42 (1): 7598.CrossRefGoogle Scholar
Lafferty, J., McCallum, A., and Pereira, F. 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML-01), Williamstown, MA, pp. 282–9.Google Scholar
Lee, J., Naradowsky, J., and Smith, D. A. June 2011. A discriminative model for joint morphological disambiguation and dependency parsing. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, pp. 885–94. Stroudsburg, PA: Association for Computational Linguistics. http://www.aclweb.org/anthology/P11-1089.Google Scholar
Lembersky, G. March 2003. Named Entity Recognition in Hebrew. Master's thesis, Department of Computer Science, Ben Gurion University, Beer Sheva, Israel (in Hebrew).Google Scholar
Levinger, M., Ornan, U., and Itai, A. September 1995. Learning morpho-lexical probabilities from an untagged corpus with an application to Hebrew. Computational Linguistics 21 (3): 383404.Google Scholar
Manning, C. D., and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA.: The MIT Press.Google Scholar
Marshall, I. 1983. Choice of grammatical word-class without global syntactic analysis: tagging words in the LOB corpus. Computers and the Humanities 17: 139–50.CrossRefGoogle Scholar
McCallum, A., Freitag, D., and Pereira, F. 2000. Maximum entropy Markov models for information extraction and segmentation. In Proceedings of Eighteenth International Conference on Machine Learning (ICML-00), Stanford, CA.Google Scholar
Powell, M. J. D. January 1964. An efficient method for finding the minimum of a function of several variable without calculating derivatives. The Computer Journal 7 (2): 155–62.CrossRefGoogle Scholar
Punyakanok, V., and Roth, D. 2001. The use of classifiers in sequential inference. In Proceedings of the 2000 Conference on Advances in Neural Information Processing Systems 13 (NIPS-13), Vancouver, British Columbia, Canada, pp. 9951001. Cambridge, MA: The MIT Press.Google Scholar
Roth, D. 1998. Learning to resolve natural language ambiguities: a unified approach. In Proceedings of AAAI-98 and IAAI-98, Madison, WI, pp. 806–13.Google Scholar
Roth, D., and Zelenko, D. 1998. Part of speech tagging using a network of linear separators. In The 17th International Conference on Computational Linguistics (COLING-ACL 98), Montreal, Canada, pp. 1136–42.Google Scholar
Segal, E. 1997. Morphological analyzer for unvocalized Hebrew words. Unpublished work. http://www.cs.technion.ac.il/~erelsgl/hmntx.zip Accessed 15 July, 2012.Google Scholar
Segal, E. October 1999. Hebrew Morphological Analyzer for Hebrew Undotted Texts. Master's thesis, Technion, Israel Institute of Technology, Haifa, Israel (in Hebrew).Google Scholar
Shacham, D., and Wintner, S. June 2007. Morphological disambiguation of Hebrew: a case study in classifier combination. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the Conference on Computational Natural Language Learning (EMNLP-CoNLL 2007), Prague, Czech Republic. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar
Sima'an, K., Itai, A., Winter, Y., Altman, A., and Nativ, N. 2001. Building a tree-bank of Modern Hebrew text. Traitement Automatique des Langues 42 (2): 347380.Google Scholar
Tsarfaty, R. July 2006. Integrated morphological and syntactic disambiguation for modern Hebrew. In Proceedings of the COLING/ACL 2006 Student Research Workshop, Sydney, Australia, pp. 4954. Stroudsburg, PA: Association for Computational Linguistics. http://www.aclweb.org/anthology/P/P06/P06-3009.Google Scholar
Wintner, S. 2008. Strengths and weaknesses of finite-state technology: a case study in morphological grammar development. Natural Language Engineering 14 (4): 457–69. ISSN .CrossRefGoogle Scholar
Yona, S., and Wintner, S. April 2008. A finite-state morphological grammar of Hebrew. Natural Language Engineering 14 (2): 173–90.CrossRefGoogle Scholar