SePaS: Word sense disambiguation by sequential patterns in sentences

MASOUD NAROUEI; MANSOUR AHMADI; ASHKAN SAMI

doi:10.1017/S1351324913000259

SePaS: Word sense disambiguation by sequential patterns in sentences

Published online by Cambridge University Press: 06 September 2013

MASOUD NAROUEI ,

MANSOUR AHMADI and

ASHKAN SAMI

Show author details

MASOUD NAROUEI: Affiliation:
Young Researchers and Elite Club, Zahedan Branch, Islamic Azad University, Zahedan, Iran e-mail: naroee@cse.shirazu.ac.ir
MANSOUR AHMADI: Affiliation:
Young Researchers and Elite Club, Shiraz Branch, Islamic Azad University, Shiraz, Iran e-mail: info@mahmadi.com
ASHKAN SAMI: Affiliation:
Department of Computer Science, Shiraz University, Shiraz, Iran e-mail: asami@ieee.org

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

An open problem in natural language processing is word sense disambiguation (WSD). A word may have several meanings, but WSD is the task of selecting the correct sense of a polysemous word based on its context. Proposed solutions are based on supervised and unsupervised learning methods. The majority of researchers in the area focused on choosing proper size of ‘n’ in n-gram that is used for WSD problem. In this research, the concept has been taken to a new level by using variable ‘n’ and variable size window. The concept is based on the iterative patterns extracted from the text. We show that this type of sequential pattern is more effective than many other solutions for WSD. Using regular data mining algorithms on the extracted features, we significantly outperformed most monolingual WSD solutions. The state-of-the-art results were obtained using external knowledge like various translations of the same sentence. Our method improved the accuracy of the multilingual system more than 4 percent, although we were using monolingual features.

Information

Type: Articles
Information: Natural Language Engineering , Volume 21 , Issue 2 , March 2015 , pp. 251 - 269

DOI: https://doi.org/10.1017/S1351324913000259 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Agirre, E., and Edmonds, P., 2006. Word Sense Disambiguation: Algorithms and Applications. New York, NY: Springer.CrossRef Google Scholar

Agirre, E., and Soroa, A., 2007. Semeval-2007 task 02: evaluating word sense induction and discrimination systems. In Proceedings of SemEval-2007, Prague, Czech Republic, pp. 7–12.Google Scholar

Agirre, E., and Soroa, A., 2009. Personalizing PageRank for word sense disambiguation. In Proceedings of the 12th Conference of the European Chapter of the ACL, Athens, Greece, pp. 33–41.Google Scholar

Agrawal, R., and Srikant, R., 1994. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Databases (VLDB), Santiago de Chile, Chile, pp. 487–99.Google Scholar

Ahmadi, M., Sami, A., Rahimi, H., and Yadegari, B. 2013. Malware detection by behavioral sequential patterns. Computer Fraud and Security 2013 (8): 11–9 (Elsevier).Google Scholar

Banea, C., and Mihalcea, R., 2011. Word sense disambiguation with multilingual features. In Proceedings of the 9th International Conference on Computational Semantics, Oxford, UK, pp. 25–34.Google Scholar

Bar-Hillel, Y. 1960. The present status of automatic translation of languages. Advances in Computers 1: 91–163 (Academic Press, New York).Google Scholar

Brown, P. F., Della Pietra, V. J., Pietra, Stephan. A. Della., Mercer, R.L., and Lai, J. C. 1992. An estimate of an upper bound for the entropy of English. Computational Linguistics 18 (1): 31–40 (MIT Press).Google Scholar

Chen, S. F., Beeferman, D., and Rosenfeld, R., 1998. Evaluation metrics for language models. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, Virginia, USA, pp. 275–80.Google Scholar

Cheng, H., Yan, X., Han, J., and Hsu, C., 2007. Discriminative frequent pattern analysis for effective classification. In Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE 07), Istanbul, Turkey, pp. 716–25.Google Scholar

Cottrell, G. W., 1989. A Connectionist Approach to Word Sense Disambiguation. London: Pitman.Google Scholar

Decadt, B., Hoste, V., Daelemans, W., and Van Den Bosch, A. 2004. GAMBL, genetic algorithm optimization of memory-based WSD. In Proceedings of Senseval-3: 3rd International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain, pp. 108–12. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar

Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press (a Bradford book).CrossRef Google Scholar

Hall, M. A., 1998. Correlation-Based Feature Subset Selection for Machine Learning. Hamilton, New Zealand: University of Waikato.Google Scholar

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H., 2009. The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 11 (1): 10–8.CrossRef Google Scholar

Han, J., Pei, J., Yin, Y., and Mao, R. 2004. Mining frequent patterns without candidate generation. Data Mining and Knowledge Discovery 8 (1): 53–87 (Kluwer Academic, Netherlands).Google Scholar

Hoste, V., Hendrickx, I., Daelemans, W., and Van Den Bosch, A. 2002. Parameter optimization for machine-learning of word sense disambiguation. Natural Language Engineering 8 (4): 311–25 (Cambridge University Press, Cambridge, UK).Google Scholar

Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., and Weischedel, R., 2006. OntoNotes: the 90% solution. In Proceedings of HLTNAACL, New York, USA companion volume: short papers, pp. 57–60.Google Scholar

Iyer, R., Ostendorf, M., and Meteer, M., 1997. Analyzing and predicting language model improvements. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, Santa Barbara, CA, pp. 254–61.CrossRef Google Scholar

Jason, G. S., and Lethal, G. S., 2008. Size of N for word sense disambiguation using N-gram model for Punjabi language. International Journal of Translation 20 (1–2): 47–56.Google Scholar

Ji, H., 2010. One sense per context: improving word sense disambiguation using web-scale phrase clustering. In 4th International Universal Communication Symposium (IUCS), Beijing, China, pp. 181–4.Google Scholar

Jurafsky, D., and martin, J. 2008. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd ed.New Jersey: Prentice-Hall.Google Scholar

Klapaftis, I. P., and Manandhar, S. 2010. Word sense induction & disambiguation using hierarchical random graphs. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 745–55. Cambridge, MA: Association for Computational Linguistics.Google Scholar

Lefever, E., Hoste, V., and De Cock, M., 2011. ParaSense or how to use parallel corpora for word sense disambiguation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL-11), Oregon, USA, pp. 317–22.Google Scholar

Lin, D., and Pantel, P. 2002. Discovering word senses from text. In 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Canada, pp. 613–9.Google Scholar

Lo, D., Cheng, H., Han, J., Khoo, S.-C., and Sun, C. 2009. Classification of software behaviors for failure detection: a discriminative pattern mining approach. In Proceedings of the 15th ACM SIGDD International Conference on Knowledge Discovery and Data Mining, New York, pp. 557–66.Google Scholar

Lo, D., and Khoo, S.-C. 2007. Efficient mining of iterative patterns for software specification discovery. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, California, pp. 460–9.Google Scholar

Marti, U. V., and Bunke, H., 2001. On the influence of vocabulary size and language models in unconstrained handwritten text recognition. In Proceedings of 6th International Conference on Document Analysis and Recognition, Seattle, WA, pp. 260–5.Google Scholar

Mooney, R. J., 1996. Comparative experiments on disambiguating word senses: an illustration of the role of bias in machine learning. In Proceedings of the 1996 Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, PA, pp. 82–91.Google Scholar

Navigli, R., 2009. Word sense disambiguation: a survey. ACM Computing Surveys (CSUR) 41 (2): 1–69.CrossRef Google Scholar

Navigli, R., and Ponzetto, S. P., 2010. BabelNet: building a very large multilingual semantic network. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp. 216–25.Google Scholar

Navigli, R., and Ponzetto, S. P. 2012. Joining forces pays off: multilingual joint word sense disambiguation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, South Korea, pp. 1399–410. Stroudsburg, PA: ACL.Google Scholar

Palmer, M., Fellbaum, C., Cotton, S., Delfs, L., and Dang, H. T., 2001. English tasks: all-words and verb lexical sample. In Proceedings of SENSEVAL-2: 2nd International Workshop on Evaluating Word Sense Disambiguation Systems, Tolouse, France, pp. 21–4.Google Scholar

Ponzetto, S. P. and Navigli, R., 2010. Knowledge-rich word sense disambiguation rivaling supervised systems. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp. 1522–32.Google Scholar

Porter, M. F. 1980. An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14 (3): 130–7 (MCB UP, West Yorkshire, UK).Google Scholar

Pradhan, S., Loper, E., Dligach, D., and Palmer, M., 2007. Semeval-2007 task-17: English lexical sample, SRL and all words. In Proceedings of the 4th International Workshop on Semantic Evaluations, Prague, Czech Republic, pp. 87–92.Google Scholar

Shannon, C. E., 1951. Prediction and entropy of printed English. Bell System Technical Journal 30: 50–64.Google Scholar

Snyder, B., and Palmer, M., 2004. The English all-words task. In ACL 2004 Senseval-3 Workshop, Barcelona, Spain, pp. 41–3.Google Scholar

Srikant, R., and Agrawal, R., 1996. Mining quantitative association rules in large relational tables. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Canada, pp. 1–12.Google Scholar

Stevenson, M., and Guo, Y. 2010. Disambiguation in the biomedical domain: the role of ambiguity type. Journal of Biomedical Informatics 43 (6): 972–81 (Elsevier).CrossRef Google Scholar

Tsatsaronis, G., Vazirgiannis, M., and Androutsopoulos, I., 2007. Word sense disambiguation with spreading activation networks generated from thesauri. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, pp. 1725–30.Google Scholar

Veronis, J. 2004. Hyperlex: lexical cartography for information retrieval. Computer Speech and Language 18 (3): 223–52 (Elsevier).Google Scholar

Weaver, W. 1949. Translation. In Locke, William N. and Booth, A. Donald (eds.), Machine Translation of Languages: Fourteen Essays (written in 1949, published in 1955), pp. 15–23. New York, NY: John Wiley.Google Scholar

Wilks, Y. 1975. Preference semantics. In Keenan, E. L. (ed.), Formal Semantics of Natural Language, pp. 329–48. Cambridge, UK: Cambridge University Press.Google Scholar

Zhong, Z., and Ng, H. T., 2010. It makes sense: a wide-coverage word sense disambiguation system for free text. In Proceedings of the ACL 2010 System Demonstrations, Uppsala, Sweden, pp. 78–83.Google Scholar

Article contents

SePaS: Word sense disambiguation by sequential patterns in sentences

Abstract

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests