Hostname: page-component-78c5997874-4rdpn Total loading time: 0 Render date: 2024-11-10T14:03:49.675Z Has data issue: false hasContentIssue false

Classifying Korean comparative sentences for comparison analysis

Published online by Cambridge University Press:  09 September 2013

SEON YANG
Affiliation:
Department of Computer Engineering, Dong-A University, 840 Hadan 2-dong, Saha-gu, Busan, 604-714, South Korea e-mail: seony.yang@gmail.com, youngjoong.ko@gmail.com
YOUNGJOONG KO*
Affiliation:
Department of Computer Engineering, Dong-A University, 840 Hadan 2-dong, Saha-gu, Busan, 604-714, South Korea e-mail: seony.yang@gmail.com, youngjoong.ko@gmail.com
*
Corresponding author.

Abstract

Comparisons sort objects based on their superiority or inferiority and they may have major effects on a variety of evaluation processes. The Web facilitates qualitative and quantitative comparisons via online debates, discussion forums, product comparison sites, etc., and comparison analysis is becoming increasingly useful in many application areas. This study develops a method for classifying sentences in Korean text documents into several different comparative types to facilitate their analysis. We divide our study into two tasks: (1) extracting comparative sentences from text documents and (2) classifying comparative sentences into seven types. In the first task, we investigate many actual comparative sentences by referring to previous studies and construct a lexicon of comparisons. Sentences that contain elements from the lexicon are regarded as comparative sentence candidates. Next, we use machine learning techniques to eliminate non-comparative sentences from the candidates. In the second task, we roughly classify the comparative sentences using keywords and use a transformation-based learning method to correct initial classification errors. Experimental results show that our method could be suitable for practical use. We obtained an F1-score of 90.23% in the first task, an accuracy of 81.67% in the second task, and an overall accuracy of 88.59% for the integrated system with both tasks.

Type
Articles
Copyright
Copyright © Cambridge University Press 2013 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Berger, A. L., Della Pietra, S. A., and Della Pietra, V. J., 1996. A maximum entropy approach to natural language processing. Computational Linguistics 22 (1): 3971.Google Scholar
Black, W. J., and Vasilakopoulos, A., 2002. Language-independent named entity classification by modified transformation-based learning and by decision tree induction. In Proceedings of the Sixth Conference on Natural Language Learning (CoNLL-2002), Taipei, Taiwan, vol. 20, pp. 14.Google Scholar
Brill, E., 1992. A simple rule-based part of speech tagger. In Proceedings of the Third Conference on Applied Natural language Processing (ANLP 1992), Trento, Italy, pp. 152–5.CrossRefGoogle Scholar
Brill, E., 1995. Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Computational Linguistics 21 (4): 543–65.Google Scholar
Ding, X., Liu, B., and Yu, P. S., 2008. A holistic lexicon-based approach to opinion mining. In Proceedings of ACM International Conference on Web Search and Data Mining (WSDM 2008), Stanford, USA, pp. 231–40.Google Scholar
Esuli, A., and Sebastiani, F., 2006a. Determining term subjectivity and term orientation for opinion mining. In Proceedings of European Chapter of the Association for Computational Linguistics (EACL 2006), Trento, Italy, pp. 193200.Google Scholar
Esuli, A., and Sebastiani, F. 2006b. SentiWordNet: a publicly available lexical resource for opinion mining. In Proceedings of the Fifth Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy.Google Scholar
Ha, G.-J., 1999a. Korean Modern Comparative Syntax. Seoul, Korea: Pijbook Press.Google Scholar
Ha, G.-j., 1999b. Research on Korean equality comparative syntax. Association for Korean Linguistics 5: 229–65.Google Scholar
Hu, M., and Liu, B., 2004. Mining and summarizing customer reviews. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2004 (KDD 2004), Seattle, USA, pp. 168–77.Google Scholar
Jeong, I.-s., 2000. Research on Korean adjective superlative comparative syntax. Korean Han-min-jok Eo-mun-hak 36: 6186.Google Scholar
Jindal, N., and Liu, B., 2006a. Identifying comparative sentences in text documents. In Proceedings of Association for Computing Machinery/Special Interest Group on Information Retrieval (SIGIR 2006), Seattle, USA, pp. 244–51.Google Scholar
Jindal, N., and Liu, B., 2006b. Mining comparative sentences and relations. In Proceedings of Association for Advancement of Artificial Intelligence (AAAI 2006), Boston, USA, pp. 1331–6.Google Scholar
Joachims, T., 1998. Text categorization with support vector machines: learning with many relevant features. In Proceedings of European Conference on Machine Learning (ECML 1998), Chemnitz, Germany, pp. 137–42.Google Scholar
Kaji, N., and Kitsuregawa, M., 2007. Building lexicon for sentiment analysis from massive collection of HTML documents. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007), Prague, Czech Republic, pp. 1075–83.Google Scholar
Kanayama, H., and Nasukawa, T., 2006. Fully automatic lexicon expansion for domain-oriented sentiment analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), Sydney, Australia, pp. 355–63.Google Scholar
Kim, S.-M. and Hovy, E., 2006. Identifying and analyzing judgment opinions. In Proceedings of the Human Language Technology Conference – North American Chapter of the Association for Computational Linguistics (NAACL 2006), New York City, USA, pp. 200207.Google Scholar
Li, J., and Sun, M., 2007. Experimental study on sentiment classification of Chinese review using machine learning techniques. In International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE 2007), Beijing, China, pp. 393400.CrossRefGoogle Scholar
Liu, B., 2006. Web Data Mining. New York City, USA: Springer.Google Scholar
Oh, K.-s., 2004. The difference between ‘Man-kum’ comparative and ‘Cheo-rum’ comparative. Society of Korean Semantics 14: 197221.Google Scholar
Pang, B., and Lee, L., 2004. A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain, pp. 271–8.Google Scholar
Refaeilzadeh, P., Tang, L., and Liu, H. 2009. Cross-validation. In Encyclopedia of Database Systems, pp. 532–8. New York City, USA: Springer.CrossRefGoogle Scholar
Riloff, E., and Wiebe, J., 2003. Learning extraction patterns for subjective expressions. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2003), Sapporo, Japan, pp. 105–12.Google Scholar
Riloff, E., Wiebe, J., and Wilson, T., 2003. Learning subjective nouns using extraction pattern bootstrapping. In Proceedings of the Seventh Conference on Natural Language Learning (CoNLL-2003), New York City, USA, pp. 2532.Google Scholar
Turney, P. D., 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, USA, pp. 417–24.Google Scholar
Wan, X., 2008. Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), Waikiki, Hawaii, USA, pp. 553–61.Google Scholar
Wan, X. 2009. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the Association of Computational Linguistics and the International Joint Conference on Natural Language Processing (ACL–IJCNLP 2009), Singapore, pp. 235–43.Google Scholar
Wiebe, J., and Riloff, E., 2005. Creating subjective and objective sentence classifiers from unannotated texts. In Proceedings of the Sixth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2005), Mexico City, Mexico, pp. 486–97.CrossRefGoogle Scholar
Wiebe, J., Wilson, T., and Cardie, C., 2005. Annotating expressions of opinions and emotions in language. Language Resources and Evaluation 39: 165210.CrossRefGoogle Scholar
Wilson, T., Wiebe, J., and Hoffmann, P., 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of Human Language Technologies Conference/Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005), Vancouver, Canada, pp. 347–54.Google Scholar