CO-graph: A new graph-based technique for cross-lingual word sense disambiguation

ANDRES DUQUE; LOURDES ARAUJO; JUAN MARTINEZ-ROMO

doi:10.1017/S1351324915000091

CO-graph: A new graph-based technique for cross-lingual word sense disambiguation

Published online by Cambridge University Press: 16 April 2015

ANDRES DUQUE ,

LOURDES ARAUJO and

JUAN MARTINEZ-ROMO

Show author details

ANDRES DUQUE: Affiliation:
Dpto. Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia (UNED), Madrid 28040, Spain e-mail: aduque@lsi.uned.es, lurdes@lsi.uned.es, juaner@lsi.uned.es
LOURDES ARAUJO: Affiliation:
Dpto. Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia (UNED), Madrid 28040, Spain e-mail: aduque@lsi.uned.es, lurdes@lsi.uned.es, juaner@lsi.uned.es
JUAN MARTINEZ-ROMO: Affiliation:
Dpto. Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia (UNED), Madrid 28040, Spain e-mail: aduque@lsi.uned.es, lurdes@lsi.uned.es, juaner@lsi.uned.es

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In this paper, we present a new method based on co-occurrence graphs for performing Cross-Lingual Word Sense Disambiguation (CLWSD). The proposed approach comprises the automatic generation of bilingual dictionaries, and a new technique for the construction of a co-occurrence graph used to select the most suitable translations from the dictionary. Different algorithms that combine both the dictionary and the co-occurrence graph are then used for performing this selection of the final translations: techniques based on sub-graphs (communities) containing clusters of words with related meanings, based on distances between nodes representing words, and based on the relative importance of each node in the whole graph. The initial output of the system is enhanced with translation probabilities, provided by a statistical bilingual dictionary. The system is evaluated using datasets from two competitions: task 3 of SemEval 2010, and task 10 of SemEval 2013. Results obtained by the different disambiguation techniques are analysed and compared to those obtained by the systems participating in the competitions. Our system offers the best results in comparison with other unsupervised systems in most of the experiments, and even overcomes supervised systems in some cases.

Information

Type: Articles
Information: Natural Language Engineering , Volume 21 , Special Issue 5: Graphs in NLP , November 2015 , pp. 743 - 772

DOI: https://doi.org/10.1017/S1351324915000091 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2015

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Agirre, E., and Soroa, A. 2009. Personalizing pagerank for word sense disambiguation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2009), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 33–41.Google Scholar

Agirre, E., Lopez de Lacalle, O., and Soroa, A., 2014. Random walks for knowledge-based word sense disambiguation. Computational Linguistics 40 (1): 57–84.CrossRef Google Scholar

Apidianaki, M. 2008. Translation-oriented word sense induction based on parallel corpora. In Proceedings of the 6th International Language Resources and Evaluation (LREC-08), Marrakech, Morocco, May. European Language Resources Association (ELRA).Google Scholar

Apidianaki, M., 2009. Data-driven semantic analysis for multilingual wsd and lexical selection in translation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2009), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 77–85.CrossRef Google Scholar

Apidianaki, M. 2013. Limsi: cross-lingual word sense disambiguation using translation sense clustering. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval-2013), Atlanta, Georgia, USA, June. Association for Computational Linguistics.Google Scholar

Banea, C., and Mihalcea, R. 2011. Word sense disambiguation with multilingual features. In Proceedings of the Ninth International Conference on Computational Semantics (IWCS -2011), Association for Computational Linguistics, pp. 25–34.Google Scholar

Biemann, C., 2006. Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In Proceedings of the 1st Workshop on Graph Based Methods for Natural Language Processing, TextGraphs-1, Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 73–80.CrossRef Google Scholar

Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3: 993–1022, March.Google Scholar

Brin, S., and Page, L. 1998. The anatomy of a large-scale hypertextual web search engine. In Computer Networks and ISDN Systems, Elsevier Science Publishers B. V., pp. 107–117.Google Scholar

Carpuat, M., 2013. Nrc: a machine translation approach to cross-lingual word sense disambiguation (semeval-2013 task 10). In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval-2013), Atlanta, Georgia, USA, June. Association for Computational Linguistics, pp. 188–192.Google Scholar

Chan, Y. S., Ng, H. T., and Chiang, D. 2007. Word sense disambiguation improves statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL-2007), pp. 33–40.Google Scholar

Dandala, B., Mihalcea, R., and Bunescu, R. 2013. Multilingual word sense disambiguation using wikipedia. In Proceedings of the 6th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing.CrossRef Google Scholar

Diab, M. T. and Resnik, P. 2002. An unsupervised method for word sense tagging using parallel corpora. In ACL, pp. 255–262.Google Scholar

Dijkstra, E. W., 1959. A note on two problems in connexion with graphs. Numerische Mathematik 1 (1): 269–271.CrossRef Google Scholar

Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. Bradford Books.CrossRef Google Scholar

Fernandez-Ordonez, E., Mihalcea, R., and Hassan, S. 2012. Unsupervised word sense disambiguation with multilingual representations. In LREC, pp. 847–851.Google Scholar

Guo, W., and Diab, M., 2010. Coleur and colslm: a wsd approach to multilingual lexical substitution, tasks 2 and 3 semeval 2010. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval-2010), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 129–133.Google Scholar

Ide, N. and Veronis, J., 1998. Word sense disambiguation: the state of the art. Computational Linguistics 24 : 1–40.Google Scholar

Ion, R., and Tufis, D., 2004. Multilingual word sense disambiguation using aligned wordnets. Romanian Journal of Information Science and Technology 7 (1–2): 183–200.Google Scholar

Kazakov, D., and Shahid, A. R. 2010. Retrieving lexical semantics from multilingual corpora. In Polibits, pp. 25–28.Google Scholar

Kazakov, D., and Shahid, A. R. 2013. Using parallel corpora for word sense disambiguation. In Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP-2013), Shoumen, Bulgaria, INCOMA Ltd.Google Scholar

Koehn, P. 2005. Europarl: a parallel corpus for statistical machine translation. In MT summit, volume 5.Google Scholar

Lefever, E., and Hoste, V., 2010a. Semeval-2010 task 3: cross-lingual word sense disambiguation. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval-2010), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 15–20.CrossRef Google Scholar

Lefever, E., and Hoste, V. 2010b. Construction of a benchmark data set for cross-lingual word sense disambiguation. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC-2010), Valletta, Malta, May. European Language Resources Association (ELRA).Google Scholar

Lefever, E., and Hoste, V., 2013. Semeval-2013 task 10: cross-lingual word sense disambiguation. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval-2013), Atlanta, Georgia, USA, June. Association for Computational Linguistics, pp. 158–166.Google Scholar

Lefever, E., Hoste, V., and De Cock, M., 2011. Parasense or how to use parallel corpora for word sense disambiguation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers-Volume 2 (HLT2011), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 317–322.Google Scholar

Màrquez, L., Exsudero, G., Martínez, D., and Rigau, G. 2006. Supervised corpus-based methods for wsd. In Word Sense Disambiguation: Algorithms and Applications, vol. 33, pp. 167–216. Text, Speech and Language Technology. Dordrecht, The Netherlands: Springer.Google Scholar

Martinez-Romo, J., Araujo, L., Borge-Holthoefer, J., Arenas, A., Capitán, J. A., and Cuesta, J. A. 2011. Disentangling categorical relationships through a graph of co-occurrences. Physical Review E 84: 046108, October.CrossRef Google Scholar PubMed

Mihalcea, R., 2005. Unsupervised large-vocabulary word sense disambiguation with graph-based algorithms for sequence data iza ling. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-2005), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 411–418.CrossRef Google Scholar

Mihalcea, R. 2006. Knowledge-based methods for wsd. In Word Sense Disambiguation: Algorithms and Applications, vol. 33, pp. 107–132. Text, Speech and Language Technology. Dordrecht, The Netherlands: Springer.CrossRef Google Scholar

Navigli, R., and Lapata, M. 2010. An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (4): 678–692, April.CrossRef Google Scholar PubMed

Navigli, R., and Ponzetto, S. P. 2010. Babelnet: building a very large multilingual semantic network. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL-2010), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 216–225.Google Scholar

Navigli, R., and Ponzetto, S. P., 2012. Joining forces pays off: multilingual joint word sense disambiguation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL-2012), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 1399–1410.Google Scholar

Och, F. J., and Ney, H. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29 (1): 19–51, March.CrossRef Google Scholar

Pons, P., and Latapy, M., 2005. Computing communities in large networks using random walks. Lecture Notes in Computer Science 3733 : 284.CrossRef Google Scholar

Reese, S., Boleda, G., Cuadros, M., Padr, L., and Rigau, G. 2010. Wikicorpus: a word-sense disambiguated multilingual wikipedia corpus. In N. Calzolari, K.Choukri, B.Maegaard, J.Mariani, J.Odijk, S.Piperidis, M. Rosner, and Tapias, D., (eds.), LREC. European Language Resources Association.Google Scholar

Resnik, P., and Yarowsky, D., 1999. Distinguishing systems and distinguishing senses: new evaluation methods for word sense disambiguation. Natural Language Engineering 5 (2): 113–133.CrossRef Google Scholar

Resnik, P. 2004. Exploiting hidden meanings: using bilingual text for monolingual annotation. In International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), pp. 283–299.Google Scholar

Rudnick, A., Liu, C., and Gasser, M., 2013. Hltdi: Cl-wsd using markov random fields for semeval-2013 task 10. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval-2013), Atlanta, Georgia, USA, June. Association for Computational Linguistics, pp. 171–177.Google Scholar

Schmid, H., 1994. Probabilistic part-of-speech tagging using decision trees. In Proceedings of International Conference on New Methods in Language Processing, Volume 12, Manchester, UK, pp. 44–49.Google Scholar

Schütze, H. 1998. Automatic word sense discrimination. Computational Linguistics 24 (1): 97–123, March.Google Scholar

Silberer, C., and Ponzetto, S. P., 2010. Uhd: cross-lingual word sense disambiguation using multilingual co-occurrence graphs. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval-10), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 134–137.Google Scholar

Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., and Tufi, D. 2006. The jrc-acquis: a multilingual aligned parallel corpus with 20+ languages. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006), pp. 2142–2147.Google Scholar

Tan, L., and Bond, F., 2013. Xling: matching query sentences to a parallel corpus using topic models for wsd. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval-2013), Atlanta, Georgia, USA, June. Association for Computational Linguistics, pp. 167–170.Google Scholar

Van Gompel, M., 2010. Uvt-wsd1: a cross-lingual word sense disambiguation system. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval-2010), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 238–241.Google Scholar

Van Gompel, M., and van den Bosch, A., 2013. Wsd2: parameter optimisation for memory-based cross-lingual word-sense disambiguation. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval-2013), Atlanta, Georgia, USA, June. Association for Computational Linguistics, pp. 183–187.Google Scholar

Vickrey, D., Biewald, L., Teyssier, M., and Koller, D. 2005. Word-sense disambiguation for machine translation. In EMNLP, pp. 771–778.Google Scholar

Vilariño, D., Balderas, C., Pinto, D., Rodríguez, M., and León, S., 2010. Fcc: modeling probabilities with giza++ for task #2 and #3 of semeval-2. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval-2010), Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 112–116.Google Scholar

Article contents

CO-graph: A new graph-based technique for cross-lingual word sense disambiguation

Abstract

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests