Hostname: page-component-78c5997874-8bhkd Total loading time: 0 Render date: 2024-11-10T13:25:39.275Z Has data issue: false hasContentIssue false

A cross-corpus study of subjectivity identification using unsupervised learning

Published online by Cambridge University Press:  16 August 2011

DONG WANG
Affiliation:
Department of Computer Science, The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas e-mail: dongwang@hlt.utdallas.edu, yangl@hlt.utdallas.edu
YANG LIU
Affiliation:
Department of Computer Science, The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas e-mail: dongwang@hlt.utdallas.edu, yangl@hlt.utdallas.edu

Abstract

In this study, we investigate using unsupervised generative learning methods for subjectivity detection across different domains. We create an initial training set using simple lexicon information and then evaluate two iterative learning methods with a base naive Bayes classifier to learn from unannotated data. The first method is self-training, which adds instances with high confidence into the training set in each iteration. The second is a calibrated EM (expectation-maximization) method where we calibrate the posterior probabilities from EM such that the class distribution is similar to that in the real data. We evaluate both approaches on three different domains: movie data, news resource, and meeting dialogues, and we found that in some cases the unsupervised learning methods can achieve performance close to the fully supervised setup. We perform a thorough analysis to examine factors, such as self-labeling accuracy of the initial training set in unsupervised learning, the accuracy of the added examples in self-training, and the size of the initial training set in different methods. Our experiments and analysis show inherent differences across domains and impacting factors explaining the model behaviors.

Type
Articles
Copyright
Copyright © Cambridge University Press 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Andreevskaia, A. and Bergler, S. 2008. When specialists and generalists work together: overcoming domain dependence in sentiment tagging. In Proceedings of ACL/HLT, Columbus, Ohio.Google Scholar
Chapelle, O., Schölkopf, B. and Zien, A. (eds). 2006. Semi-Supervised Learning. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Choi, Y. and Cardie, C. 2009. Adapting a polarity lexicon using integer linear programming for domainspecific sentiment classification. In Proceedings of EMNLP, Singapore.Google Scholar
Dai, W., Xue, G.-R., Yang, Q., and Yu, Y. 2007. Transferring naive Bayes classifiers for text classification. In Proceedings of AAAI, Vancouver, British Columbia, Canada.Google Scholar
Dasgupta, S. and Ng, V. 2009. Mine the easy, classify the hard: a semi-supervised approach to automatic sentiment classification. In Proceedings of ACL-IJCNLP, Suntec, Singapore.Google Scholar
Druck, G., Pal, C., McCallum, A., and Zhu, X. 2007. Semi-supervised classification with hybrid generative/discriminative methods. In Proceedings of ACM SIGKDD, San Jose, CA, USA.Google Scholar
Gyamfi, Y., Wiebe, J., Mihalcea, R. and Akkaya, C. 2009. Integrating knowledge for subjectivity sense labeling. In Proceedings of NAACL, Boulder, CO, USA.Google Scholar
Hu, M. and Liu, B. 2006. Opinion extraction and summarization on the web. In Proceedings of AAAI, Boston, MA, USA.Google Scholar
Kim, S.-M. and Hovy, E. 2005. Automatic detection of opinion bearing words and sentences. In Proceedings of ACL, Jeju Island, Korea.Google Scholar
Li, S., Huang, C.-R., Zhou, G., and Lee, S. Y. M. 2010. Employing personal/impersonal views in supervised and semi-supervised sentiment classification. In Proceedings of ACL, Uppsala, Sweden.Google Scholar
Melville, P., Gryc, W. and Lawrence, R. D. 2009. Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of ACM SIGKDD, Paris, France.Google Scholar
Murray, G. and Carenini, G. 2008. Summarizing spoken and written conversations. In Proceedings of EMNLP, Honolulu, Hawaii.Google Scholar
Murray, G. and Carenini, G. 2009. Detecting subjectivity in multiparty speech. In Proceedings of Interspeech, Brighton, UK.Google Scholar
Nakagawa, T., Inui, K. and Kurohashi, S. 2010. Dependency tree-based sentiment classification using CRFs with hidden variables. In Proceedings of NAACL, Los Angeles, CA, USA.Google Scholar
Ng, V., Dasgupta, S. and Arifin, S. M. N. 2006. Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In Proceedings of COLING/ACL, Sydney, Australia.Google Scholar
Ni, X., Xue, G.-R., Ling, X., Yu, Y., and Yang, Q. 2007. Exploring in the weblog space by detecting informative and affective articles. In Proceedings of WWW, Banff, Alberta, Canada.Google Scholar
Nigam, K., McCallum, A. K., Thrun, S., and Mitchell, T. 2000. Text classification from labeled and unlabeled documents using EM. Machine Learning 39: 103–34.CrossRefGoogle Scholar
Nishikawa, H., Hasegawa, T., Matsuo, Y. and Kikui, G. 2010. Optimizing informativeness and readability for sentiment summarization. In Proceedings of ACL, Uppsala, Sweden.Google Scholar
Pang, B. and Lee, L. 2004. A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of ACL, Barcelona, Spain.Google Scholar
Pang, B. and Lee, L. 2008. Using very simple statistics for review search: An exploration. In Proceedings of COLING, Manchester, UK.Google Scholar
Pang, B., Lee, L. and Vaithyanathan, S. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of EMNLP, Philadelphia, PA, USA.Google Scholar
Raaijmakers, S. and Kraaij, W. 2008. A shallow approach to subjectivity classification. In Proceedings of ICWSM, Seattle, DC, USA.Google Scholar
Raaijmakers, S., Truong, K. and Wilson, T. 2008. Multimodal subjectivity analysis of multiparty conversation. In Proceedings of EMNLP, Honolulu, Hawaii.Google Scholar
Riloff, E. and Wiebe, J. 2003. Learning extraction patterns for subjective expressions. In Proceedings of EMNLP, Stroudsburg, PA, USA.Google Scholar
Riloff, E., Wiebe, J. and Phillips, W. 2005. Exploiting subjectivity classification to improve information extraction. In Proceedings of AAAI, Pittsburgh, PA, USA.Google Scholar
Sebastiani, F., Esuli, A. and Sebastiani, F. 2006. Determining term subjectivity and term orientation for opinion mining. In Proceedings of EACL, Trento, Italy.Google Scholar
Tsuruoka, Y. and Tsujii, J. 2003. Training a naive Bayes classifier via the EM algorithm with a class distribution constraint. In Proceedings of NAACL, Edmonton, Canada.Google Scholar
Wiebe, J. and Riloff, E. 2005. Creating subjective and objective sentence classifiers from unannotated texts. In Proceedings of CICLing, Mexico City, Mexico.Google Scholar
Wiebe, J., Wilson, T., Bruce, R., Bell, M., and Martin, M. 2004. Learning subjective language. Computational Linguistics 30 (3): 277308.CrossRefGoogle Scholar
Wiegand, M. and Klakow, D. 2010. Bootstrapping supervised machine-learning polarity classifiers with rule-based classification. In Proceedings of WASSA, Lisbon, Portugal.Google Scholar
Wilson, T. 2008. Annotating subjective content in meetings. In Proceedings of LREC, Marrakech, Morocco.Google Scholar
Wilson, T. and Wiebe, J. 2003. Annotating opinions in the world press. In Proceedings of SIGdial, Sapporo, Japan.Google Scholar
Wilson, T., Wiebe, J. and Hwa, R. 2004. Just how mad are you? Finding strong and weak opinion clauses. In Proceedings of AAAI, San Jose, CA, USA.Google Scholar
Wilson, T., Wiebe, J. and Hoffmann, P. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of HLT-EMNLP, Vancouver, British Columbia, Canada.Google Scholar
Yu, H. and Hatzivassiloglou, V. 2003. Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of EMNLP, Stroudsburg, PA, USA.Google Scholar
Zhou, S., Chen, Q. and Wang, X. 2010. Active deep networks for semi-supervised sentiment classification. In Proceedings of COLING, Beijing, China.Google Scholar