Hostname: page-component-745bb68f8f-kw2vx Total loading time: 0 Render date: 2025-01-13T14:19:09.396Z Has data issue: false hasContentIssue false

Generating Arabic TAG for syntax-semantics analysis

Published online by Cambridge University Press:  24 March 2022

Cherifa Ben Khelil*
Affiliation:
LIFAT, Université de Tours, Tours 37200, France
Chiraz Ben Othmane Zribi
Affiliation:
RIADI, ENSI, Université La Manouba, La Manouba, Tunisia
Denys Duchier
Affiliation:
LIFO, Université d’Orléans, Orléans, France
Yannick Parmentier
Affiliation:
LORIA, Projet SYNALP, Université de Lorraine, Vandoeuvre-les-Nancy, France
*
*Corresponding author. E-mail: cherifa.bk@gmail.com

Abstract

Arabic presents many challenges for automatic processing. Although several research studies have addressed some issues, electronic resources for processing Arabic remain relatively rare or not widely available. In this paper, we propose a Tree-adjoining grammar with a syntax-semantic interface. It is applied to the modern standard Arabic, but it can be easily adapted to other languages. This grammar named “ArabTAG V2.0” (Arabic Tree Adjoining Grammar) is semi-automatically generated by means of an abstract representation called meta-grammar. To ensure its development, ArabTAG V2.0 benefits from a grammar testing environment that uses a corpus of phenomena. Further experiments were performed to check the coverage of this grammar as well as the syntax-semantic analysis. The results showed that ArabTAG V2.0 can cover the majority of syntactical structures and different linguistic phenomena with a precision rate of 88.76%. Moreover, we were able to semantically analyze sentences and build their semantic representations with a precision rate of about 95.63%.

Type
Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abeillé, A. (1993). Les nouvelles syntaxes: Grammaires d’unification et analyse du français. Edition Armand Colin.Google Scholar
Alahverdzhieva, K. (2008). XTAG using XMG. Master Thesis, University of Nancy, France.Google Scholar
Al-Bataineh, B. and Bataineh, E. (2009). An efficient recursive transition network parser for Arabic language. In Lecture Notes in Engineering and Computer Science, vol. 2177.Google Scholar
Al-Taani, A., Msallam, M. and Wedian, S. (2012). A top-down chart parser for analyzing Arabic sentences. The International Arab Journal of Information Technology IAJIT 9(2), 109–116.Google Scholar
Arps, D. and Petitjean, S. (2018). A parser for LTAG and frame semantics. In Proceedings of the 11th Language Resources and Evaluation Conference, Miyazaki, Japan. European Language Resource Association.Google Scholar
Attia, M. (2008). Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation. Doctoral Thesis. The University of Manchester, Manchester.Google Scholar
Belguith, L., Aloulou, C. and Hamadou, A. (2007). MASPAR: De la segmentation à l’analyse syntaxique de textes arabes. CÉPADUÉS-Editions, editeur, Revue Information Interaction Intelligence I 3, 9–6.Google Scholar
Ben Fraj, F. (2010). Un analyseur syntaxique pour les textes en langue arabe à base d’un apprentissage à partir des patrons d’arbres syntaxiques. PhD Thesis, ENSI La Manouba, Tunisia.Google Scholar
Ben Khelil, C., Ben Othmane Zribi, C., Duchier, D. and Parmentier, Y. (2018). A semi-automatically generated TAG for Arabic: Dealing with linguistic phenomena. In 19th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), Hanoï, Vietnam.Google Scholar
Ben Khelil, C., Duchier, D., Parmentier, Y., Ben Othmane Zribi, C. and Ben Fraj, F. (2016). Arabtag: From a handcrafted to a semi-automatically generated TAG. In Proceedings of the 12th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+12), Heinrich Heine University, Düsseldorf, Germany, pp. 1826.Google Scholar
Ben Othmane Zribi, C., Ben Fraj, F. and Limam, I. (2017). POS-tagging Arabic texts: A novel approach based on ant colony. Natural Language Engineering 23(3), 419439. Cambridge University Press.CrossRefGoogle Scholar
Bos, J. (1995). Predicate logic unplugged. In Proceedings of the Tenth Amsterdam Colloquium, Amsterdam.Google Scholar
Boukedi, S. and Haddar, K. (2014). HPSG grammar treating of different forms of Arabic coordination. Research in Computing Science 86, 2541.CrossRefGoogle Scholar
Bresnan, J. and Kaplan, R.M. (1982). Introduction: Grammars as Mental Representations of Language . The Mental Representation of Grammatical Relations. MIT Press, Cambridge, MA.Google Scholar
Candito, M. (1996). A principle-based hierarchical representation of LTAGS. In 16th International Conference on Computational Linguistics, Proceedings of the Conference COLING, Center for Sprogteknologi, Copenhagen, Denmark, pp. 194199.CrossRefGoogle Scholar
Crabbé, B. (2005). Représentation informatique de grammaires fortement lexicalisées: Application Àla grammaire d’arbres adjoints. PhD Thesis, University of Nancy 2, France.Google Scholar
Crabbé, B., Duchier, D., Gardent, C., Roux, J.L. and Parmentier, Y. (2013). XMG: Extensible metagrammar. Computational Linguistics 39(3), 591629.CrossRefGoogle Scholar
Debili, F., Achour, H. and Souissi, E. (2002). La langue arabe et l’ordinateur: De l’étiquetage grammatical à la voyellation automatique. Correspondances N $^{\circ}$ 71, Lyon, France.Google Scholar
Eberhard, D., Simons, G. and Fennig, C. (2020). Ethnologue: Languages of the World, 23rd Edn.Google Scholar
Evans, R. and Gazdar, G. (1996) DATR: A language for lexical knowledge representation. Computational Linguistics 22, 167216.Google Scholar
Fillmore, C.J. (1982). Frame semantics. In Linguistics in the Morning Calm, pp. 111137.Google Scholar
Frank, A. and Van Genabith, J. (2001). GlueTag linear logic based semantics for LTAG and what it teaches us about LFG and LTAG. In Proceedings of LFG01, Hong Kong.Google Scholar
Gaiffe, B., Crabbé, B. and Roussanaly, A. (2002). A new metagrammar compiler. In Proceedings of the Sixth International Workshop on Tree Adjoining Grammar and Related Frameworks, TAG+ 2002, Venice, Italy, pp. 234241.Google Scholar
Gardent, C. (2008). Integrating a unification-based semantics in a large scale Lexicalised Tree Adjoining Grammar for French. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling) Manchester, pp. 249256.CrossRefGoogle Scholar
Gerald, G., Ewan, K., Geoffrey, K.P. and Ivan, S. (1985). Generalized Phrase Structure Grammar. Cambridge, MA & London, UK: Harvard University Press.Google Scholar
Habash, N. and Rambow, O. (2004). Extracting a tree adjoining grammar from the Penn Arabic Treebank. In Traitement Automatique du Langage Naturel, pp. 277284.Google Scholar
Habash, N. and Roth, R.M. (2009). CATib: The columbia Arabic treebank. In Technical Report CCLS-09-01, Center for Computational Learning Systems, Columbia University.CrossRefGoogle Scholar
Haddad, B. and Yaseen, M. (2005). A compositional approach towards semantic representation and construction of ARABIC. In Blache P., Stabler E., Busquets J. and Moot R. (eds), Lecture Notes in Computer Science, LNAI, vol. 3492, pp. 147161.CrossRefGoogle Scholar
Haddar, K., Boukedi, S. and Zalila, I. (2010). Construction of an HPSG grammar for the arabic language and its specification in TDLtdl. International Journal on Information and Communication Technologies 3, 5264.Google Scholar
Haddar, K., Zalila, I. and Boukedi, S. (2009). A parser generation with the IKB for the Arabic relatives. International Journal of Computing and Information Sciences 7, 5160.Google Scholar
Hajič, J., Smrž, O., Petr, Z., Snaidauf, J. and Beška, E. (2004). Prague Arabic Dependency Treebank: development in data and tools. In Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools.Google Scholar
Hammouda, N.G. and Haddar, K. (2017). Parsing Arabic nominal sentences with transducers to annotate corpora. Computación y Sistemas 21(4), 647656.Google Scholar
Joshi, A.K. (1987). An introduction to tree adjoining grammars. Mathematics of Language 1, 87115.CrossRefGoogle Scholar
Joshi, A.K., Levy, L.S. and Takahashi, M. (1975). Tree adjunct grammars. Journal of Computer and System Sciences 10(1), 136163.CrossRefGoogle Scholar
Joshi, A.K. and Vijay-Shanker, K. (2001). Compositional semantics with lexicalized tree-adjoining grammar (LTAG): How much underspecification is necessary ?. In Computing Meaning. Springer, pp. 147163.CrossRefGoogle Scholar
Kallmeyer, L. and Joshi, A. (2003). Factoring predicate argument and scope semantics: Underspecified semantics with LTAG. Research on Language and Computation 1(1–2), 358.CrossRefGoogle Scholar
Kallmeyer, L., Lichte, T., Maier, W., Parmentier, Y. and Dellert, J. (2008). Developing a TT-MCTAG for German with an RCG-based parser. In The Sixth International Conference on Language Resources and Evaluation (LREC 08), Marrakech, Morocco, pp. 782789.Google Scholar
Kallmeyer, L. and Osswald, R. (2013). Syntax-driven semantic frame composition in lexicalized tree adjoining grammars. Journal of Language Modelling 1(2), 267330.Google Scholar
Kallmeyer, L. and Romero, M. (2008). Scope and situation binding in LTAG using semantic unification. Research on Language and Computation 6, 352.CrossRefGoogle Scholar
Kasper, S. (2008). A Comparison of “Thematic Role” Theories. Doctoral Thesis, Philipps-Universität Marburg, Germany.Google Scholar
Kipper, K., Korhonen, A., Ryant, N. and Palmer, M. (2008). A large-scale classification of english verbs. Language Resources and Evaluation 42(1), 2140.CrossRefGoogle Scholar
Kouloughli, D. (1992). La Grammaire de l’Arabe d’aujourd’hui. Press Pocket, Paris, France.Google Scholar
Lecomte, A. (2004). Méthodes pour le Traitement Automatique des Langues. M1 Ingénierie de la Communication Personne-Systéme, Pierre Mendes-France University, France.Google Scholar
Levin, B. (1993). English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press.Google Scholar
Loukam, M. and Laskri, M.T. (2008). Pharas: Une plateforme d’analyse basée sur le formalisme HPSG pour l’arabe standard: Développements récents et perspectives. African Journals Online (AJOL).Google Scholar
Maamouri, M. and Bies, A. (2004) Developing an Arabic treebank: Methods, guidelines, procedures, and tools. In Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, Semitic’04, Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 29.CrossRefGoogle Scholar
Maamouri, M., Bies, A., Jin, H. and Buckwalter, T. (2003). Arabic treebank: Part 1 v 2.0. linguistic Data Consortium, catalog number ldc2003t06, ISBN:1-58563-261-9.Google Scholar
Maamouri, M. and Data Consortium, Linguistic . (2011). Arabic Treebank: Part 2, v 3.1. Philadelphia, PA: Linguistic Data Consortium.Google Scholar
Mousser, J. (2010). A large coverage verb taxonomy for Arabic. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), Valletta, Malta.Google Scholar
Mousser, J. (2011). Classifying arabic verbs using sibling classes. In Proceedings of the Ninth International Conference on Computational Semantics (IWCS), Oxford, UK.Google Scholar
Othman, E., Shaalan, K. and Rafea, A. (2003). A chart parser for analyzing modern standard arabic sentence. In The MT Summit IX Workshop on Machine Translation for Semitic Languages: Issues and Approaches. New Orleans, Louisiana, USA.Google Scholar
Parmentier, Y., Kallmeyer, L., Lichte, T., Maier, W. and Dellert, J. (2008). Tulipa: A syntax-semantics parsing environment for mildly context-sensitive formalisms. In 9th International Workshop on Tree-Adjoining Grammar and Related Formalisms (TAG+9), Tübingen, Germany, pp. 121–128.Google Scholar
Petitjean, S. (2014). Modular Generation of Formal Grammars. PhD Thesis, University of Orleans, France.Google Scholar
Pollard, C. and Sag, I.A. (1994). Head-driven phrase structure grammar. Chicago & London: The University of Chicago Press.Google Scholar
Rogers, J. and Vijay-Shanker, K. (1994). Obtaining trees from their descriptions: An application to tree-adjoining grammars. Computational Intelligence 10, 401–421.CrossRefGoogle Scholar
Schabes, Y. and Joshi, A.K. (1990). Parsing with lexicalized tree adjoining grammar. Technical Reports (CIS).Google Scholar
Thomasset, F. and De La Clergerie, E. (2005). Comment obtenir plus des méta-grammaires. In Proceedings of the 12th Conference on Natural Language Processing (TALN).Google Scholar
Vijay-Shanker, K. and Joshi, A. (1991). Unification-based tree adjoining grammars. Technical Reports (CIS).Google Scholar
Villemonte De la Clergerie, E. (2005). Dyalog: A tabular logic programming based environment for NLP. In Proceedings of Constraints and Language Processing (CSLP).Google Scholar
Xia, F. (2001). Automatic Grammar Generation from Two Different Perspectives. Doctoral Thesis, University of Pennsylvania.Google Scholar
XTAG RG. (2001). A lexicalized tree adjoining grammar for english. Technical Report IRCS-01-03, IRCS, University of Pennsylvania.Google Scholar