On morphological relatedness

AHMED KHORSI

doi:10.1017/S1351324912000071

On morphological relatedness

Published online by Cambridge University Press: 10 February 2012

AHMED KHORSI

Show author details

AHMED KHORSI*: Affiliation:
College of Computer and Information Science, Al-Imam Mohammad Ibn Saud Islamic University, Riyadh, Kingdom of Saudi Arabia email: amakhorsi@imamu.edu.sa, ahmed_khorsi@yahoo.fr

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In this paper, we discuss the results of a new unsupervised and computationally lightweight scoring of how two words are morphologically related to each other. This measure is meant to be an alternative to stemming, radicals (root) extraction, and morphological analysis in a wide range of applications; especially information extraction related ones. Compared to light stemming, which seems to be the most convenient approach for systems with efficiency concerns, our measure does not neglect unconditionally a prefix or a suffix as the light stemming does. Instead, our measure takes into account all letters of the word but with different weights. This prevents the missing of a significant letter. Compared to heavy stemming, morphological analysis, or radicals extraction, which rely on dictionaries and compatibility databases, our measure does not rely on any language-specific morphology knowledge. This makes our approach unsupervised and theoretically language independent and computationally much lighter. Our tests targeted Arabic: a Semitic language recognized to have a complex morphology due to its highly inflectional lexicon.

Information

Type: Articles
Information: Natural Language Engineering , Volume 19 , Issue 4 , October 2013 , pp. 537 - 555

DOI: https://doi.org/10.1017/S1351324912000071 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Aslam, J. A., and Yilmaz, E. 2005. A geometric interpretation and analysis of R-precision. In Herzog, O., Schek, H.-J., Fuhr, N., Chowdhury, A., and Teiken, W. (eds.), Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, pp. 664–71, Bremen, Germany.Google Scholar

Baroni, M., Matiasek, J., and Trost, H. 2002. Unsupervised discovery of morphologically related words based on orthographic and semantic similarity. In Proceedings of the ACL-02 Workshop on Morphological and Phonological Learning MPL '02, pp. 48–57, Stroudsburg, PA.CrossRef Google Scholar

Boudlal, A., Belahbib, R., Lakhouaja, A., Mazroui, A., Meziane, A., and Bebah, M. 2011. A Markovian approach for Arabic root extraction. International Arab Journal of Information Technology 8 (1): 91–8.Google Scholar

Buckley, C., and Voorhees, E. M. 2000. Evaluating evaluation measure stability. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 33–40, Athens, Greece.CrossRef Google Scholar

Buckwalter, T. 2004. Issues in Arabic orthography and morphology analysis. In Proceedings of the Workshop on Computational Approaches to Arabic Script-Based Languages, pp. 31–34, Stroudsburg, PA.Google Scholar

Chen, A., and Gey, F. 2002. Building an Arabic stemmer for information retrieval. In Proceedings of TREC 2002, pp. 631–39, Gaithersburg, MD.Google Scholar

Creutz, M., & Lagus, K. 2002. Unsupervised discovery of morphemes. In Proceedings of the ACL-02 Workshop on Morphological and Phonological Learning - Volume 6, MPL '02, pp. 21–30, Stroudsburg, PA.CrossRef Google Scholar

Crochemore, M., Hancart, C., & Lecroq, T. 2007. Algorithms on Strings. 1st ed. Cambridge University Press.CrossRef Google Scholar

Daya, E., Roth, D., and Wintner, S. 2008. Identifying Semitic roots: machine learning with linguistic constraints. Computational Linguistics 34 (3): 429–48.CrossRef Google Scholar

de Roeck, A. N., and Al-Fares, W. 2000. A morphologically sensitive clustering algorithm for identifying Arabic roots. Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 199–206, Hong Kong.Google Scholar

Goldsmith, J. 2001. Unsupervised learning of the morphology of a natural language. Computational Linguistics 27 (2): 153–98.CrossRef Google Scholar

Grnwald, P. D. 2007. The minimum description length principle. Cambridge, MA: MIT Press.CrossRef Google Scholar

Hafer, M. A., & Weiss, S. F. 1974. Word Segmentation by Letter Successor Varieties. Amsterdam: Elsevier.CrossRef Google Scholar

Harris, Z. S. 1955. From phoneme to morpheme. Language 31 (2): 190–222.CrossRef Google Scholar

Hsu, W. J., and Du, M. W. 1984. New algorithms for the LCS problem. Journal of Computer and System Sciences 29: 133–52.CrossRef Google Scholar

IPA (International Phonetic Association). 1999. Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet. Cambridge, England, UK: Cambridge University Press.Google Scholar

Jurafsky, D., & Martin, J. H. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, NJ: Prentice Hall.Google Scholar

Karagol-Ayan, B., Doermann, D., and Weinberg, A. 2006. Morphology induction from limited noisy data using approximate string matching. In Proceedings of the 8th Workshop of the ACL Special Interest Group in Computational Phonology (SIGPHON), pp. 60–8, New York, USA.Google Scholar

Khorsi, A. 2012. Effective unsupervised Arabic word stemming: towards an unsupervised radicals extraction. IAJIT 9 (6).Google Scholar

Larkey, L. S., Ballesteros, L., and Connell, M. E. 2002. Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–82, Tampere, Finland.CrossRef Google Scholar

Larkey, L., Ballesteros, L., and Connell, M. 2007. Light stemming for Arabic information retrieval. In Ide, N., Vronis, J., Baayen, H., Church, K. W., Klavans, J., Barnard, D. T., Tufis, D., Llisterri, J., Johansson, S., Mariani, J., Soudi, A., van den Bosch, A., & Neumann, G. (eds.), Arabic Computational Morphology, pp. 221–43. Text, Speech and Language Technology, vol. 38. Dordrecht, The Netherlands: Springer.CrossRef Google Scholar

Lee, Y.-S., Papineni, K., Roukos, S., Emam, O., and Hassan, H. 2003. Language model based Arabic word segmentation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, pp. 399–406, Sapporo, Japan.Google Scholar

Menn, L. 1995. Non-Fluent Aphasia in a Multilingual World. Amsterdam: John Benjamins.CrossRef Google Scholar

Monson, C., Lavie, A., Carbonell, J., & Levin, L. 2004. Unsupervised induction of natural language morphology inflection classes. In Proceedings of the 7th Workshop of the ACL Special Interest Group in Computational Phonology (SIGPHON), pp. 52–61, Barcelona, Spain.Google Scholar

Rogati, M., McCarley, S., & Yang, Y. 2003. Unsupervised learning of Arabic stemming using a parallel corpus. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, pp. 391–98, Sapporo, Japan.Google Scholar

Sakai, T. 2007. On the reliability of information retrieval metrics based on graded relevance. Information Processing and Management 43 (2): 531–48.CrossRef Google Scholar

Schone, P., and Jurafsky, D. 2000. Knowledge-free induction of morphology using latent semantic analysis. In Cardie, C., Daelemans, W., Nédellec, C., and Sang, E. T. K. (eds.), Proceedings of the Fourth Conference on Computational Natural Language Learning and of the Second Learning Language in Logic Workshop, pp. 67–72, Lisbon, Portugal.Google Scholar

Sharma, U., Kalita, J., and Das, R. 2002. Unsupervised learning of morphology for building lexicon for a highly inflectional language. In Proceedings of the 6th Workshop of the ACL Special Interest Group in Computational Phonology (SIGPHON), pp. 1–10, Philadelphia, USA.Google Scholar

Smrž, O. 2007. Elixirfm: implementation of functional Arabic morphology. In Semitic '07: Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages, pp. 1–8, Morristown, NJ, USA.Google Scholar

Snover, M. G., Jarosz, G. E., and Brent, M. R. 2002. Unsupervised learning of morphology using a novel directed search algorithm: taking the first step. In Proceedings of the ACL-02 Workshop on Morphological and Phonological Learning - Volume 6, pp. 11–20, Philadelphia, PA, USA.CrossRef Google Scholar

Soudi, A., & van den Bosch, A. 2007. Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Dordrecht, The Netherlands: Springer.CrossRef Google Scholar

Sproat, R. W. 1992. Morphology and Computation. Cambridge, MA: MIT Press.CrossRef Google Scholar

Xu, J., and Croft, W. B. 1998. Corpus-based stemming using cooccurrence of word variants. ACM Transactions on Information Systems 16 (1): 61–81.CrossRef Google Scholar