Skip to main content Accessibility help
×
Hostname: page-component-cd9895bd7-gbm5v Total loading time: 0 Render date: 2024-12-26T18:42:56.785Z Has data issue: false hasContentIssue false

18 - Arabic Computational Linguistics

from Part IV - Arabic Computational and Corpus Linguistics

Published online by Cambridge University Press:  23 September 2021

Karin Ryding
Affiliation:
Georgetown University, Washington DC
David Wilmsen
Affiliation:
American University of Beirut
Get access

Summary

This chapter presents an easily followed overview of computational linguistics and where Arabic fits into it. Computational linguistics, often referred to interchangeably as natural language processing (NLP) or human language technologies, is a large and growing interdisciplinary field of research that lies at the intersection of linguistics, computer science, electrical engineering, cognitive science, psychology, pedagogy, and mathematics, among other fields. Research and work on Arabic computational linguistics has lagged behind English and other languages. This is despite a tremendous increase in the relative growth of Arabic NLP in the period between 2012 and 2016. The reason for its slow start is that Arabic presents a series of difficulties to programmers, those being morphological richness, orthographic ambiguity, dialectal variations, orthographic noise, and resource poverty. Those problems have been or are being overcome, and a new generation of researchers has made great strides in the field. This has partly to do with the growing interest in language technologies for opinion mining and translation in social media, which features dialectal Arabic more than MSA. Another motivation is that commercial giants like Apple and Google are interested in applications of Arabic as it is spoken.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abdelali, A., Darwish, K., Durrani, N., and Mubarak, H. (2016). Farasa: A fast and furious segmenter for Arabic. In Proceedings of the Meeting of the North America Association for Computational Linguistics (NAACL). San Diego, California.Google Scholar
Abdul-Mageed, M. and Diab, M. (2012). Toward building a large-scale Arabic sentiment lexicon. In Proceedings of The International Global WordNet Conference. Matsue, Japan.Google Scholar
Abdul-Mageed, M., Kuebler, S., and Diab, M. (2012). SAMAR: A system for subjectivity and sentiment analysis of Arabic social media. In Proceedings of the Workshop in Computational Approaches to Subjectivity and Sentiment Analysis. Jeju, Korea.Google Scholar
Al-Badrashiny, M., Eskander, R., Habash, N., and Rambow, O. (2014). Automatic transliteration of Romanized dialectal Arabic. In Proceedings of the Conference on Computational Natural Language Learning. Ann Arbor, Michigan.Google Scholar
Al Sallab, A. A., Baly, R., Badaro, G., Hajj, H., El Hajj, W., and Shaban, K. B. (2015). Deep learning models for sentiment analysis in Arabic. In Proceedings of the Arabic Natural Language Processing Workshop (WANLP). Beijing, China.Google Scholar
Badaro, G., Baly, R., Hajj, H., Habash, N., and El Hajj, W. (2014). A large scale Arabic sentiment lexicon for Arabic opinion mining. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP). Doha, Qatar, 165–73.Google Scholar
Bouamor, H., Habash, N., and Oflazer, K. (2014). A multidialectal parallel corpus of Arabic. In Proceedings of the International Conference on Language Resources and Evaluation (LREC). Reykjavik, Iceland.Google Scholar
Boudchiche, M., Mazroui, A., Bebah, M. O. A. O., Lakhouaja, A., and Boudlal, A. (2017). AlKhalil Morpho Sys 2: A robust Arabic morpho-syntactic analyzer. Journal of King Saud University – Computer and Information Sciences, 29(2), 141–6.Google Scholar
Chiang, D., Diab, M., Habash, N., Rambow, O., and Shareef, S. (2006). Parsing Arabic dialects. In Proceedings of the Meeting of the European Association for Computational Linguistics (EACL). Trento, Italy.Google Scholar
Diab, M. (2007). Improved Arabic base phrase chunking with a new enriched POS tag set. In Proceedings of the Workshop on Computational Approaches to Semitic Languages (CASL). Prague, Czech Republic.Google Scholar
Dukes, K., and Buckwalter, T. (2010). A dependency treebank of the Quran using traditional Arabic grammar. In Proceedings of the International Conference on Informatics and Systems (INFOS). Cairo, Egypt.Google Scholar
Dukes, K., Atwell, E., and Habash, N. (2013). Supervised collaboration for syntactic annotation of Quranic Arabic. In Language Resources and Evaluation, 47(1), 3362.Google Scholar
El Kholy, A. and Habash, N. (2012). Orthographic and morphological processing for English–Arabic statistical machine translation. Machine Translation, 26(1–2), 2545.Google Scholar
El Kholy, A. and Habash, N. (2015). Morphological constraints for phrase pivot statistical machine translation. In Proceedings of the Machine Translation Summit (MTSummit). Miami, Florida.Google Scholar
Elfardy, H. and Diab, M. (2013). Sentence-level dialect identification in Arabic. In Proceedings of the Association for Computational Linguistics. Sofia, Bulgaria.Google Scholar
Elkateb, S., Black, W., Rodriguez, H., Alkhalifa, M., Vossen, P., Pease, A., et al. (2006). Building a WordNet for Arabic. In Proceedings of the International Conference on Language Resources and Evaluation. Genoa, Italy.Google Scholar
Eskander, R., Habash, N., Rambow, O., and Tomeh, N. (2013). Processing spontaneous orthography. In Proceedings of the North American Chapter of the Association for Computational Linguistics. Atlanta, Georgia.Google Scholar
Eskander, R., Habash, N., Rambow, O., and Pasha, A. (2016). Creating resources for dialectal Arabic from a single annotation: A case study on Egyptian and Levantine. In Proceedings of the International Conference on Computational Linguistic (COLING). Osaka, Japan.Google Scholar
Fellbaum, C. (ed.) (1998). WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.Google Scholar
Graff, D., Maamouri, M., Bouziri, B., Krouna, S., Kulick, S., and Buckwalter, T. (2009). Standard Arabic Morphological Analyzer – Version 3.1 Catalog No.: LDC2009E73. Linguistic Data Consortium, University of Pennsylvania.Google Scholar
Green, S. and Manning, C. D. (2010). Better Arabic parsing: Baselines, evaluations, and analysis. In Proceedings of the 23rd International Conference on Computational Linguistics. Beijing, China, 394402.Google Scholar
Guzmán, F., Bouamor, H., Baly, R., and Habash, N. (2016). Machine translation evaluation for Arabic using morphologically-enriched embeddings. In Proceedings of COLING 2106. Osaka, Japan.Google Scholar
Habash, N. (2010). Introduction to Arabic Natural Language Processing, vol. 3. Morgan & Claypool.Google Scholar
Habash, N. and Roth, R. (2009). CATiB: The Columbia Arabic Treebank. In Proceedings of the ACL-JNLP Conference. Suntec, Singapore, 221–4.Google Scholar
Habash, N. and Sadat, F. (2006). Arabic preprocessing schemes for statistical machine translation. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL). New York.Google Scholar
Habash, N., Soudi, A., and Buckwalter, T. (2007). On Arabic transliteration. In Soudi, A., Neumann, G., and van den Bosch, A., eds., Arabic Computational Morphology: Text, Speech and Language Technology, vol. 38. Dordrecht: Springer, 1522.CrossRefGoogle Scholar
Habash, N., Eskander, R., and Hawwari, A. (2012a). A morphological analyzer for Egyptian Arabic. In Proceedings of the Workshop on Computational Morphology and Phonology. Montréal, Canada.Google Scholar
Habash, N., Diab, M., and Rambow, O. (2012b). Conventional orthography for dialectal Arabic. In Proceedings of the International Conference on Language Resources and Evaluation. Istanbul, Turkey.Google Scholar
Habash, N., Zalmout, N., Taji, D., Hoang, H., and Alzate, M. (2017). A parallel corpus for evaluating machine translation between Arabic and European languages. In Proceedings of the European Chapter of the Association for Computational Linguistics. Valencia, Spain.Google Scholar
Hirst, G. (ed.) (2008–2017). Synthesis Lectures on Human Language Technologies. Morgan & Claypool.Google Scholar
Hovy, D. (2012). Programming in Python for Linguists: A Gentle Introduction. www.dirkhovy.com/portfolio/papers/download/pfl_handout.pdf; last accessed 10 December 2020.Google Scholar
Jarrar, M., Habash, N., Alrimawi, F., Akra, D., and Zalmout, N. (2017). Curras: An annotated corpus for the Palestinian Arabic dialect. Language Resources and Evaluation, 51, 745–75.Google Scholar
Jinxi, X. (2002). UN Parallel Text (Arabic-English), LDC Catalog No.: LDC2002E15. Linguistic Data Consortium, University of Pennsylvania.Google Scholar
Jurafsky, D. and Martin, J. H. (2008). Speech and Language Processing. Upper Saddle River, NJ: Prentice Hall.Google Scholar
Khalifa, S., Habash, N., Abdulrahim, D., and Hassan, S. (2016). A large scale corpus of Gulf Arabic. In Proceedings of the Language Resources and Evaluation Conference 2016. Portorož, Slovenia.Google Scholar
Khalifa, S., Hassan, S., and Habash, N. (2017). A morphological analyzer for Gulf Arabic verbs. In Proceedings of the Third Arabic Natural Language Processing Workshop (WANLP). Valencia, Spain, 3545.Google Scholar
Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. In Proceedings of the 10th Machine Translation Summit. Phuket, Thailand. 7986.Google Scholar
Maamouri, M., Bies, A., Buckwalter, T., and Mekki, W. (2004). The Penn Arabic Treebank: Building a large-scale annotated Arabic corpus. In Proceedings of the NEMLAR Conference on Arabic Language Resources and Tools. Cairo, Egypt.Google Scholar
Maamouri, M., Bies, A., Kulick, S., Ciul, M., Habash, N., and Eskander, R. (2014). Developing an Egyptian Arabic Treebank: Impact of dialectal morphology on annotation and tool development. In Proceedings of the International Conference on Language Resources and Evaluation (LREC). Reykjavik, Iceland.Google Scholar
Manning, C. and Schutze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: The MIT Press.Google Scholar
Marton, Y., Habash, N., and Rambow, O. (2013). Dependency parsing of Modern Standard Arabic with lexical and inflectional features. Computational Linguistics, 39(1), 161–94.Google Scholar
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.Google Scholar
Mohit, B., Rozovskaya, A., Habash, N., Zaghouani, W., and Obeid, O. (2014). The first QALB shared task on automatic text correction for Arabic. In Proceedings of the Arabic Natural Language Processing Workshop (WANLP). Doha, Qatar.Google Scholar
Munteanu, D. S. and Marcu, D. (2007). ISI Arabic–English Automatically Extracted Parallel Tex. Catalog No.: LDC2007T08. Linguistic Data Consortium, University of Pennsylvania.Google Scholar
Nivre, J., de Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajic, J., Manning, C. D., et al. (2016). Universal Dependencies v1: A Multilingual Treebank Collection. In Proceedings of International Conference on Language Resources and Evaluation. Portorož, Slovenia.Google Scholar
Pasha, A., Al-Badrashiny, M., El Kholy, A., Eskander, R., Diab, M., Habash, N., et al. (2014). MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In Proceedings of the International Conference on Language Resources and Evaluation. Reykjavik, Iceland.Google Scholar
Rafalovitch, A. and Dale, R. (2009). United Nations General Assembly Resolutions: A six-language parallel corpus. In Proceedings of the 12th Machine Translation Summit. Ottawa, Canada.Google Scholar
Salloum, W. and Habash, N. (2011). Dialectal to Standard Arabic paraphrasing to improve Arabic–English statistical machine translation. In Proceedings of the Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties. Edinburgh, UK.Google Scholar
Shahrour, A., Khalifa, S., Taji, D., and Habash, N. (2016). CamelParser: A system for Arabic syntactic analysis and morphological disambiguation. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics. Osaka, Japan, 228–32.Google Scholar
Shoufan, A. and Alameri, S. (2015). Natural language processing for dialectical Arabic: A survey. In Proceedings of the Second Workshop on Arabic Natural Language Processing. Beijing, China, 3648.Google Scholar
Smrž, O. (2007). ElixirFM: Implementation of functional Arabic morphology. In Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources. Prague, Czech Republic, 18.Google Scholar
Smrž, O., Bielický, V., Kouřilová, I., Kráčmar, J., Hajič, J., and Zemánek, P. (2008). Prague Arabic Dependency Treebank: A word on the million words. In Proceedings of the International Conference on Language Resources and Evaluation. Marrakech, Morocco.Google Scholar
Taji, D., Habash, N., and Zeman, D. (2017). Universal dependencies for Arabic. In Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia, Spain, 166–76.Google Scholar
Tounsi, L., Attia, M., and van Genabith, J. (2009). Automatic treebank-based acquisition of Arabic LFG dependency structures. In Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages. Athens, Greece, 4552.Google Scholar
Watson, J. C. E. (2007). The Phonology and Morphology of Arabic. Oxford: Oxford University Press.Google Scholar
Zaghouani, W. (2014). Critical survey of the freely available Arabic corpora. In Proceedings of the Workshop on Free/Open-Source Arabic Corpora and Corpora Processing Tools. Reykjavik, Iceland.Google Scholar
Zaghouani, W., Diab, M., Mansouri, A., Pradhan, S., and Palmer, M. (2010). The Revised Arabic Propbank. In Proceedings of the Linguistic Annotation Workshop. Uppsala, Sweden.Google Scholar
Zaghouani, W., Mohit, B., Habash, N., Obeid, O., Tomeh, N., Rozovskaya, A., et al. (2014). Large-scale Arabic error annotation: Guidelines and framework. In Proceedings of the International Conference on Language Resources and Evaluation . Reykjavik, Iceland.Google Scholar
Zalmout, N. and Habash, N. (2017). Don’t throw those morphological analyzers away just yet: Neural morphological disambiguation for Arabic. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark.Google Scholar
Zbib, R., Malchiodi, E., Devlin, J., Stallard, D., Matsoukas, S., Schwartz, R., et al. (2012). Machine translation of Arabic dialects. In Proceedings of the North American Chapter of the Association for Computational Linguistics. Montréal, Canada.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×