Syntax-ignorant N-gram embeddings for dialectal Arabic sentiment analysis

Hala Mulki; Hatem Haddad; Mourad Gridach; Ismail Babaoğlu

doi:10.1017/S135132492000008X

Syntax-ignorant N-gram embeddings for dialectal Arabic sentiment analysis

Published online by Cambridge University Press: 16 March 2020

Mourad Gridach and

Hala Mulki*: Affiliation:
Department of Computer Engineering, Konya Technical University, Turkey
Hatem Haddad: Affiliation:
RIADI Laboratory, National School of Computer Sciences, University of Manouba, Tunisia
Mourad Gridach: Affiliation:
Department of Computer Science, University of Oxford, Oxfordshire, United Kingdom
Ismail Babaoğlu: Affiliation:
Department of Computer Engineering, Konya Technical Univeristy, Turkey
*: *Corresponding author. E-mail: hallamulki@gmail.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Arabic sentiment analysis models have recently employed compositional paragraph or sentence embedding features to represent the informal Arabic dialectal content. These embeddings are mostly composed via ordered, syntax-aware composition functions and learned within deep neural network architectures. With the differences in the syntactic structure and words’ order among the Arabic dialects, a sentiment analysis system developed for one dialect might not be efficient for the others. Here we present syntax-ignorant, sentiment-specific n-gram embeddings for sentiment analysis of several Arabic dialects. The novelty of the proposed model is illustrated through its features and architecture. In the proposed model, the sentiment is expressed by embeddings, composed via the unordered additive composition function and learned within a shallow neural architecture. To evaluate the generated embeddings, they were compared with the state-of-the art word/paragraph embeddings. This involved investigating their efficiency, as expressive sentiment features, based on the visualisation maps constructed for our n-gram embeddings and word2vec/doc2vec. In addition, using several Eastern/Western Arabic datasets of single-dialect and multi-dialectal contents, the ability of our embeddings to recognise the sentiment was investigated against word/paragraph embeddings-based models. This comparison was performed within both shallow and deep neural network architectures and with two unordered composition functions employed. The results revealed that the introduced syntax-ignorant embeddings could represent single and combinations of different dialects efficiently, as our shallow sentiment analysis model, trained with the proposed n-gram embeddings, could outperform the word2vec/doc2vec models and rival deep neural architectures consuming, remarkably, less training time.

Keywords

n-gram embeddings Unordered compositionality Arabic dialects Sentiment analysis

Information

Type: Article
Information: Natural Language Engineering , Volume 27 , Issue 3 , May 2021 , pp. 315 - 338

DOI: https://doi.org/10.1017/S135132492000008X [Opens in a new window]
Copyright: © Cambridge University Press 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Abdulla, N.A., Ahmed, N.A., Shehab, M.A. and Al-Ayyoub, M. (2013). Arabic sentiment analysis: Lexicon-based and corpus-based. In 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), pp. 1–6.CrossRef Google Scholar

Al-Azani, S. and El-Alfy, E.-S.M. (2017). Hybrid deep learning for sentiment polarity determination of Arabic microblogs. In International Conference on Neural Information Processing, pp. 491–500.CrossRef Google Scholar

Alomari, K.M., ElSherif, H.M. and Shaalan, K. (2017). Arabic tweets sentimental analysis using machine learning. In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pp. 602–610.CrossRef Google Scholar

Al-Rfou, R., Perozzi, B. and Skiena, S. (2013) Polyglot: Distributed word representations for multilingual NLP. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 183–192.Google Scholar

Al Sallab, A., Hajj, H., Badaro, G., Baly, R., El Hajj, W. and Shaban, K.B. (2015). Deep learning models for sentiment analysis in Arabic. In Proceedings of the Second Workshop on Arabic Natural Language Processing, pp. 9–17.CrossRef Google Scholar

Al-Sallab, A., Baly, R., Hajj, H., Shaban, K.B., El-Hajj, W. and Badaro, G. (2017). AROMA: A recursive deep learning model for opinion mining in Arabic as a low resource language. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 16(4), 25.Google Scholar

Altowayan, A.A. and Tao, L. (2016). Word embeddings for Arabic sentiment analysis. In 2016 IEEE International Conference on Big Data (Big Data), pp. 3820–3825.CrossRef Google Scholar

Aly, M. and Atiya, A. (2013). LABR: A large scale Arabic book reviews dataset. In ACL (2), pp. 494–498.Google Scholar

Ba, J. and Caruana, R. (2014). Do deep nets really need to be deep? In Advances in Neural Information Processing Systems, pp. 2654–2662.Google Scholar

Banea, C., Mihalcea, R. and Wiebe, J. (2010). Multilingual subjectivity: Are more languages better? In Proceedings of the 23rd International Conference on Computational Linguistics, Association for Computational Linguistics (ACL), pp. 28–36.Google Scholar

Baniata, L.H. and Park, S.-B. (2016). Sentence representation network for Arabic sentiment analysis. In Proceedings of the Korean Information Science Society, pp. 470–472.Google Scholar

Brustad, K.E. (2000). The Syntax of Spoken Arabic. A Comparative Study of Moroccan, Egyptian, Syrian, and Kuwaiti Dialects. Washington, DC: Georgetown University Press.Google Scholar

Chiang, D., Diab, M., Habash, N., Rambow, O. and Shareef, S. (2006). Parsing arabic dialects. In 11th Conference of the European Chapter of the Association for Computational Linguistics.Google Scholar

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K. and Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, 2493–2537.Google Scholar

Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R. and Bengio, Y. (2016). Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 Google Scholar

Dahou, A., Xiong, S., Zhou, J., Haddoud, M.H. and Duan, P. (2016). Word embeddings and convolutional neural network for Arabic sentiment classification. In COLING 2016, pp. 2418–2427.Google Scholar

Duchi, J., Hazan, E. and Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 2121–2159.Google Scholar

El-Beltagy, S.R., Kalamawy, M.E. and Soliman, A.B. (2017). NileTMRG at SemEval-2017 task 4: Arabic sentiment analysis. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 790–795.CrossRef Google Scholar

Elouardighi, A., Maghfour, M., Hammia, H. and Aazi, F.-z. (2017). A machine Learning approach for sentiment analysis in the standard or dialectal Arabic Facebook comments. In 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech-2017), pp. 1–8.CrossRef Google Scholar

ElSahar, H. and El-Beltagy, S.R. (2015). Building large Arabic multi-domain resources for sentiment analysis. In CICLing 2015, pp. 23–34.CrossRef Google Scholar

Firth, J.R. (1957). A synopsis of linguistic theory 1930–1955. In Studies in linguistic analysis (pp. 1–32). Oxford: Philological Society. [Reprinted in F. R. Palmer (Ed.) (1968). Selected papers of J. R. Firth 1952–1959. London: Longman.]Google Scholar

Glorot, X. and Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256.Google Scholar

Gormley, M.R., Yu, M. and Dredze, M. (2015). Improved relation extraction with feature-rich compositional embedding models. arXiv preprint arXiv:1505.02419 .Google Scholar

Gridach, M., Haddad, H. and Mulki, H. (2017). Empirical evaluation of word representations on Arabic sentiment analysis. In International Conference on Arabic Language Processing (ICALP), pp. 147–158.Google Scholar

Gulcehre, C., Moczulski, M., Denil, M. and Bengio, Y. (2016) Noisy activation functions. In International Conference on Machine Learning, pp. 3059–3068.Google Scholar

Iyyer, M., Manjunatha, V., Boyd-Graber, J. and Daumé, H., III. (2015) Deep unordered composition rivals syntactic methods for text classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 1681–1691.CrossRef Google Scholar

Karmani, N. (2017). Tunisian Arabic Customer’s Reviews Processing and Analysis for an Internet Supervision System. PhD Dissertation, National Engineering School of Sfax, Tunisia.Google Scholar

Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 , pp. 120–150.CrossRef Google Scholar

Kingma, D.P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 .Google Scholar

Kiritchenko, S., Zhu, X. and Mohammad, S.M. (2014). Sentiment analysis of short informal texts. Journal of Artificial Intelligence Research 50, 723–762.CrossRef Google Scholar

Krueger, D., Ballas, N., Jastrzebski, S., Arpit, D., Kanwal, M.S., Maharaj, T., Bengio, E., Fischer, A. and Courville, A. (2017). Deep nets don’t learn via memorization. In Workshop track- ICLR 2017.Google Scholar

Le, Q. and Mikolov, T. (2014). Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1188–1196.Google Scholar

Medhaffar, S., Bougares, F., Esteve, Y. and Hadrich-Belguith, L. (2017) Sentiment analysis of tunisian dialect: Linguistic resources and experiments. In Proceedings of The Third Arabic Natural Language Processing Workshop (WANLP), pp. 55–61.CrossRef Google Scholar

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J. (2013) Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pp. 3111–3119.Google Scholar

Mitchell, J. and Lapata, M. (2010). Recursive deep models for semantic compositionality over a sentiment treebank. In Composition in Distributional Models of Semantics, 1388–1429.Google Scholar

Mohammad, S.M., Kiritchenko, S. and Zhu, X. (2013). NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets. arXiv preprint arXiv:1308.6242 .Google Scholar

Mourad, A. and Darwish, K. (2013). Subjectivity and sentiment analysis of modern standard Arabic and Arabic microblogs. In Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 55–64.Google Scholar

Nabil, M., Aly, M. and Atiya, A. (2015). ASTD: Arabic sentiment tweets dataset. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2515–2519.CrossRef Google Scholar

Pennington, J., Socher, R. and Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543.CrossRef Google Scholar

Piryani, R., Madhavi, D. and Singh, V.K. (2017). Analytical mapping of opinion mining and sentiment analysis research during 2000–2015. Information Processing & Management 53, 122–150.CrossRef Google Scholar

Refaee, E. and Rieser, V. (2014) An Arabic twitter corpus for subjectivity and sentiment analysis. In LREC, pp. 2268–2273.Google Scholar

Rosenthal, S., Farra, N. and Nakov, P. (2017). SemEval-2017 task 4: Sentiment analysis in twitter. In Proceedings of the 11th International Workshop on Semantic Evaluations (SemEval-2017), pp. 502–518.CrossRef Google Scholar

Rushdi Saleh, M., Teresa Martin Valdivia, M., Alfonso Urena-Lopez, L. and Perea Ortega, J.M. (2011). OCA: Opinion corpus for Arabic. Journal of the American Society for Information Science and Technology 62(10), 2045–2054.CrossRef Google Scholar

Sayadi, K., Liwicki, M., Ingold, R. and Bui, M. (2016). Tunisian dialect and modern standard Arabic dataset for sentiment analysis : Tunisian election context. In 2nd International Conference on Arabic Computational Linguistics (acling), pp. 120–150.Google Scholar

Shen, D., Wang, G., Wang, W., Min, M.R., Su, Q., Zhang, Y., Li, C., Henao, R. and Carin, L. (2018). Baseline needs more love: On simple word-embedding-based models and associated pooling mechanisms. arXiv preprint arXiv:1805.09843 .Google Scholar

Socher, R., Bengio, Y. and Manning, C. (2013). Deep learning for NLP. In Tutorial at Association of Computational Logistics (ACL) and North American Chapter of the Association of Computational Linguistics (NAACL).Google Scholar

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A. and Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1631–1642.Google Scholar

Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T. and Qin, B. (2014). Learning sentiment-specific word embedding for twitter sentiment classification. In Acossation of Comptational Linguistics ACL (1), pp. 1555–1565.CrossRef Google Scholar

Tieleman, T. and Hinton, G. (2012). Lecture 6.5–RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning.Google Scholar

van der Maaten, L. and Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research 9, 2579–2605.Google Scholar

White, L., Togneri, R., Liu, W. and Bennamoun, M. (2015). How well sentence embeddings capture meaning. In Proceedings of the 20th Australasian Document Computing Symposium, p. 9.CrossRef Google Scholar

Zahran, M.A., Magooda, A., Mahgoub, A.Y., Raafat, H., Rashwan, M. and Atyia, A. (2015). Word representations in vector space and their applications for Arabic. In Computational Linguistics and Intelligent Text Processing (CICLing), pp. 430–443.CrossRef Google Scholar

Zaidan, O.F. and Callison-Burch, C. (2014). Arabic dialect identification. Computational Linguistics 40(1), 171–202.CrossRef Google Scholar

Zbib, R., Malchiodi, E., Devlin, J., Stallard, D., Matsoukas, S., Schwartz, R., Makhoul, J., Zaidan, O.F. and Callison-Burch, C. (2012). Machine translation of Arabic dialects. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 49–59.Google Scholar

Zeiler, M.D. (2012). ADADELTA: An adaptive learning rate method. arXiv preprint arXiv:1212.5701 .Google Scholar

Article contents

Syntax-ignorant N-gram embeddings for dialectal Arabic sentiment analysis

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests