Hostname: page-component-78c5997874-v9fdk Total loading time: 0 Render date: 2024-11-10T14:41:56.341Z Has data issue: false hasContentIssue false

Improving shift-reduce constituency parsing with large-scale unlabeled data

Published online by Cambridge University Press:  19 June 2013

MUHUA ZHU
Affiliation:
Natural Language Processing Lab, Northeastern University, Shenyang 110819, China e-mails: zhumuhua@gmail.com, zhujingbo@mail.neu.edu.cn, wanghuizhen@mail.neu.edu.cn
JINGBO ZHU
Affiliation:
Natural Language Processing Lab, Northeastern University, Shenyang 110819, China e-mails: zhumuhua@gmail.com, zhujingbo@mail.neu.edu.cn, wanghuizhen@mail.neu.edu.cn
HUIZHEN WANG*
Affiliation:
Natural Language Processing Lab, Northeastern University, Shenyang 110819, China e-mails: zhumuhua@gmail.com, zhujingbo@mail.neu.edu.cn, wanghuizhen@mail.neu.edu.cn
*
Corresponding author.

Abstract

Shift-reduce parsing has been studied extensively for diverse grammars due to the simplicity and running efficiency. However, in the field of constituency parsing, shift-reduce parsers lag behind state-of-the-art parsers. In this paper we propose a semi-supervised approach for advancing shift-reduce constituency parsing. First, we apply the uptraining approach (Petrov, S. et al. 2010. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), Cambridge, MA, USA, pp. 705–713) to improve part-of-speech taggers to provide better part-of-speech tags to subsequent shift-reduce parsers. Second, we enhance shift-reduce parsing models with novel features that are defined on lexical dependency information. Both stages depend on the use of large-scale unlabeled data. Experimental results show that the approach achieves overall improvements of 1.5 percent and 2.1 percent on English and Chinese data respectively. Moreover, the final parsing accuracies reach 90.9 percent and 82.2 percent respectively, which are comparable with the accuracy of state-of-the-art parsers.

Type
Articles
Copyright
Copyright © Cambridge University Press 2013 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bikel, D. 2004. On the Parameter Space of Generative Lexicalized Statistical Parsing Models. PhD thesis, University of Pennsylvania.Google Scholar
Carreras, X., Collins, M. and Koo, T. 2008. TAG, dynamic programming and the perceptron for efficient, feature-rich parsing. In Proceedings of the 12th Conference on Computational Natural Language Learning (CoNLL), Manchester, UK, pp. 916.CrossRefGoogle Scholar
Charniak, E. 2000. A maximum-entropy-inspired parser. In Proceedings of the First Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL), Washington, USA, pp. 132–9.Google Scholar
Charniak, E. and Johnson, M. 2005. Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), University of Michigan, Ann Arbor, MI, USA, pp. 173–80.Google Scholar
Chen, W., Kazama, J., Uchimoto, K. and Torisawa, K. 2012. Exploiting subtrees in auto-parsed data to improve dependency parsing. Computational Intelligence Journal 28 (3): 426–51 (John Wiley).CrossRefGoogle Scholar
Chen, W., Kazama, J., Zhang, M., Tsuruoka, Y., Zhang, Y., Wang, Y., Torisawa, K., and Li, H. 2012. Bitext dependency parsing with auto-generated bilingual treebanks. IEEE Transactions on Audio, Speech and Language Processing 20 (5): 1461–72.CrossRefGoogle Scholar
Clark, S., Curran, J. and Osborne, M. 2003. Bootstrapping POS taggers using unlabeled data. In Proceedings of the 7th Conference on Computational Natural Language Learning (CoNLL), Edmonton, Canada.Google Scholar
Collins, M. 1996. A new statistical parser based on bigram lexical dependencies. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL), California, USA.Google Scholar
Collins, M. 1997. Three generative, lexicalised models for statistical parsing. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL), Madrid, Spain.Google Scholar
Collins, M. 1999. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, University of Pennsylvania.Google Scholar
Collins, M. 2002. Discriminative training methods for hidden Markov models: theory and experiemnts with perceptron algorithm. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, PA, USA, pp. 18.Google Scholar
Collins, M. and Roark, B. 2004. Incremental parsing with the perceptron algorithm. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Barcelona, Spain.Google Scholar
Eisner, J. and Satta, G. 1999. Efficient parsing for bilexical context-free grammars and head automaton grammars. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL), Maryland, USA.Google Scholar
Graff, D. 1995. North American News Text Corpus. Linguistic Data Consortium, Philadelphia, PA. LDC Catalog No. LDC95T21.Google Scholar
Hatori, J., Matsuzaki, T. and Tsujii, J. 2011. Incremental joint POS tagging and dependency parsing in Chinese. In Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP), Chiang Mai, Thailand, pp. 813.Google Scholar
Huang, L. 2008. Forest reranking: discriminative parsing with non-local features. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL), Ohio, USA, pp. 586–94.Google Scholar
Huang, L. Y. 2009. Improve Chinese parsing with Max-Ent reranking parser. Master Project Report, Brown University, Providence, RI.Google Scholar
Huang, Z., Eidelman, V. and Harper, M. 2009a. Improving a simple bigram HMM part-of-speech tagger by latent annotation and self-training. In Proceedings of Huamn Language Technology: Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL), 2009, Colorado, USA, pp. 213–6.Google Scholar
Huang, Z. and Harper, M. 2009. Self-training PCFG grammars with latent annotations across languages. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore, pp. 832–41.Google Scholar
Huang, Z., Harper, M. and Petrov, S. 2010. Self-training with products of latent variable grammars. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), Cambridge, MA, USA, pp. 1222.Google Scholar
Huang, L., Jiang, W. and Liu, Q. 2009b. Billingually constrained (monolingual) shift-reduce parsing. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore, pp. 1222–31.Google Scholar
Huang, L. and Sagae, K. 2010. Dynamic programming for linear-time incremental parsing. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), Uppsala, Sweden, pp. 1077–86.Google Scholar
Katz-Brown, J., Petrov, S., McDonald, R., Och, F., Talbot, D., Ichikawa, H., Seno, M., and Kazawa, H. 2011. Training a parser for machine translation reordering. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP), Edinburgh, UK, pp. 183–92.Google Scholar
Koo, T. and Collins, M. 2010. Efficient third-order dependency parsers. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), Uppsala, Sweden, pp. 111.Google Scholar
McClosky, D, Charniak, E., and Johnson, M. 2006. Effective self-training for parsing. In Proceedings of Human Language Technology Conference – North American Chapter of the Association of Computational Linguistics (HLT-NAACL), New York, USA, pp. 152–9.Google Scholar
Manning, C. D. 2011. Part-of-speech tagging from 97% to 100%: is it time for some linguistics? In Proceedings of Computational Linguistics and Intelligence Text Processing – 12th International Conference (CICLing), Tokyo, Japan, pp. 171–89.CrossRefGoogle Scholar
Marcus, P., Santorini, B. and Marcinkiewiz, A. 1993. Building a large annotated corpus of English. Computational Linguistics 19 (2): 313–30 (MIT Press).Google Scholar
Merialdo, B. 1994. Tagging English text with a probabilistic model. Computational Linguistics 20 (2): 155–71, MIT Press.Google Scholar
Nivre, J. 2004. Incrementality in deterministic dependency parsing. In Proceedings of the ACL Workshop on Incremental Parsing: Bringing Engineering and Cognition Together. Workshop at ACL, Barcelona, Spain.Google Scholar
Noord, G. 2007. Using self-trained bilexical preferences to improve disambiguation accuracy. In Proceedings of the 10th International Conference on Parsing Technologies (IWPT), Prague, Czech Republic, pp. 110.Google Scholar
Petrov, S. 2010. Products of random latent variable grammars. In Proceedings of Human Language Technologies Conference – North American Chapter of the Association of Computational Linguistics (HLT-NAACL), California, USA, pp. 1927.Google Scholar
Petrov, S., Chang, P., Ringgaard, M. and Alshawi, H. 2010. Uptraining for accurate deterministic question parsing. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), Cambridge, MA, USA, pp. 705–13.Google Scholar
Petrov, S. and Klein, D. 2007. Improved inference for unlexicalized parsing. In Proceedings of Human Language Technology Conference – North American Chapter of the Association of Computational Linguistics (HLT-NAACL), New York, USA, pp. 404–11.Google Scholar
Ratnaparkhi, A. 1996. A maximum entropy part of speech tagger. In Proceedings of the 1996 Conference on Empirical Methods in Natural Language Processing (EMNLP), University of Pennsylvania.Google Scholar
Ratnaparkhi, A. 1997. A linear observed time statistical parser based on maximum entropy models. In Proceedings of the 1997 Conference on Empirical Methods in Natural Language Processing (EMNLP), Rhode Island, USA.Google Scholar
Sagae, K. and Lavie, A. 2005. A classifier-based parser with linear run-time complexity. In Proceedings of the 9th International Workshop on Parsing Technologies (IWPT), Vancouver, BC, Canada, pp. 125–32.CrossRefGoogle Scholar
Sagae, K. and Lavie, A. 2006. A best-first probabilistic shift-reduce parser. In Proceedings of 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL), Sydney, Australia, pp. 691–8.Google Scholar
Søgaard, A. 2010. Simple semi-supervised training of part-of-speech taggers. In Proceedings of of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), Uppsala, Sweden, pp. 205–8.Google Scholar
Suzuki, J. and Isozaki, H. 2008. Semi-supervised labeling and segmentation using giga-word scale unlabeled data. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL), Ohio, USA, pp. 665–73.Google Scholar
Toutanova, K., Klein, D., Manning, C. and Singer, Y. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of Human Language Technology Conference – North Amrican Chapter of the Association for Computational Linguistics (HLT-NAACL), Edmonton, Canada, pp. 252–9.Google Scholar
Tsuruoka, Y., Miyao, Y. and Kazama, J. 2011. Learning with lookahead: can history-based models rival globally optimized models? In Proceedings of the 15th Conference on Computational Natural Language Learning (CoNLL), Oregon, USA, pp. 238–46.Google Scholar
Tsuruoka, Y., Tsujii, J. and Ananiadou, S. 2009. Fast full parsing by linear-chain conditional random fields. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Athens, Greece, pp. 790–8.Google Scholar
Wang, W., Huang, Z. and Harper, M. 2007. Semi-supervised learning for part-of-speech tagging of Mandarin transcribed speech. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hawaii, USA.Google Scholar
Wang, M., Sagae, K. and Mitamura, T. 2006. A fast, accurate deterministic parser for Chinese. In Proceedings of 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL), Sydney, Australia, pp. 425–32.Google Scholar
Xue, N., Xia, F., Chiou, F. and Palmer, M. 2006. The Penn Chinese Treebank: phrase structure annotation of a large corpus. Natural Language Engineering 11 (2): 207–38 (Cambridge University Press).CrossRefGoogle Scholar
Yamada, H. and Matsumoto, Y. 2003. Statistical dependency analysis with support vector machines. In Proceedings of the 8th International Workshop on Parsing Technologies (IWPT), Nancy, France, pp. 195206.Google Scholar
Zhang, Y. and Clark, S. 2011. Shift-reduce CCG parsing. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL), Oregon, USA, pp. 683–92.Google Scholar
Zhang, Y. and Clark, S. 2009. Transition-based parsing of the Chinese treebank using a global discriminative model. In Proceedings of 11th International Conference on Parsing Technologies (IWPT), Paris, France, pp. 162–71.CrossRefGoogle Scholar
Zhang, H., Zhang, M., Tan, C. and Li, H. 2009. K-best combination of syntactic parsers. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore, pp. 1552–60.Google Scholar
Zhao, H., Song, Y., Kit, C., and Zhou, G. 2009. Cross language dependency parsing using a bilingual lexicon. In Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, Singapore, pp. 5563.Google Scholar