Hostname: page-component-cd9895bd7-gxg78 Total loading time: 0 Render date: 2024-12-26T08:27:23.520Z Has data issue: false hasContentIssue false

Adding semantic roles to the Chinese Treebank

Published online by Cambridge University Press:  01 January 2009

NIANWEN XUE
Affiliation:
Department of Linguistics and Center for Spoken Language Research, University of Colorado at Boulder, CO, U.S.A. e-mail: Nianwen.Xue@Colorado.EDU
MARTHA PALMER
Affiliation:
Department of Linguistics and Center for Spoken Language Research, University of Colorado at Boulder, CO, U.S.A. e-mail: Nianwen.Xue@Colorado.EDU

Abstract

We report work on adding semantic role labels to the Chinese Treebank, a corpus already annotated with phrase structures. The work involves locating all verbs and their nominalizations in the corpus, and semi-automatically adding semantic role labels to their arguments, which are constituents in a parse tree. Although the same procedure is followed, different issues arise in the annotation of verbs and nominalized predicates. For verbs, identifying their arguments is generally straightforward given their syntactic structure in the Chinese Treebank as they tend to occupy well-defined syntactic positions. Our discussion focuses on the syntactic variations in the realization of the arguments as well as our approach to annotating dislocated and discontinuous arguments. In comparison, identifying the arguments for nominalized predicates is more challenging and we discuss criteria and procedures for distinguishing arguments from non-arguments. In particular we focus on the role of support verbs as well as the relevance of event/result distinctions in the annotation of the predicate-argument structure of nominalized predicates. We also present our approach to taking advantage of the syntactic structure in the Chinese Treebank to bootstrap the predicate-argument structure annotation of verbs. Finally, we discuss the creation of a lexical database of frame files and its role in guiding predicate-argument annotation. Procedures for ensuring annotation consistency and inter-annotator agreement evaluation results are also presented.

Type
Papers
Copyright
Copyright © Cambridge University Press 2008

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abney, S., Schapire, R., and Singer, Y. 1999. Boosting applied to tagging and PP attachment. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999, College Park, MD, USA.Google Scholar
Baker, C., Fillmore, C., and Lowe, J. 1998. The Berkeley FrameNet Project. In Proceedings of COLING-ACL, Montreal, Canada.CrossRefGoogle Scholar
Burchardt, A., Erk, K., Frank, A., Kowalski, A., Pado, S., and Pinkal, M. 2006. The SALSA corpus: a German corpus resource for lexical semantics. In Proceedings of LREC 2006, Genoa, Italy, pp. 969–974.Google Scholar
Chen, Keh-Jiann, Huang, Chu-Ren, Chen, Feng-Yi, Luo, Chi-Ching, Chang, Ming-Chung, and Chen, Chao-Jan. 2004. Sinica Treebank: design criteria, representational issues and implementation. In Abeillé, Anne (ed.), Building and Using Parsed Corpora, Dordrecht, the Netherlands: Kluwer.Google Scholar
Chierchia, G. 1984. Topics in the Syntax and Semantics of Infinitives and Gerunds. Ph.D. thesis, University of Massachusetts at Amherst.Google Scholar
Hajič, Jan, Böhmová, A., Hajicová, E., and Hladká, B. 2003. The Prague Dependency Treebank: a three level annotation scenario. In Abeillé, Anne (ed.), Treebanks: Building and Using Annotated Corpora, Dordrecht, the Netherlands: Kluwer Academic Publishers.Google Scholar
Hindle, D., and Rooth, M. 1991. Structural ambiguity and lexical relations. In The 29th Annual Meeting of the Association for Computational Linguistics, University of California, Berkeley.Google Scholar
Levin, B. 1993. English Verbs and Alternations: A Preliminary Investigation. Chicago: The Unversity of Chicago Press.Google Scholar
Li, C., and Thompson, S. 1976. Subject and topic: a new typology of language. In Li, Charles (ed.), Subject and Topic. New York: Academic Press.Google Scholar
Marcus, M. P., Santorini, B., and Marcinkiewicz, M. A. 1993. Building a large annotated corpus of english: the penn treebank. Computational Linguistics 19 2313–30Google Scholar
Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinska, V., Young, B., and Grishman, R.. 2004. The NomBank Project: an interim report. In Proceedings of the NAACL/HLT Workshop on Frontiers in Corpus Annotation, Boston, MA, pp. 24–31.Google Scholar
Palmer, M., Gildea, D., and Kingsbury, P. 2005. The Proposition Bank: An annotated corpus of semantic roles. Computational Linguistics 31 171106CrossRefGoogle Scholar
Palmer, M., Rosenzwieg, J., and Cotton, S. 2001. Automatic predicate argument analysis of the penn treebank. In Proceedings of the First International Conference on Human Language Technology Research, San Francisco.CrossRefGoogle Scholar
Pantel, P., and Lin, D. 2000. An unsupervised approach to prepositional phrase attachment using contextually similar words. In Proceedings of the 38th Meeting of the Association for Computational Linguistics, October 2000, Hong Kong, pp. 101–8.Google Scholar
Siegel, S., and Castellan, N. J. Jr., 1988. Nonparametric Statistics for the Behavioral Sciences, 2nd ed.New York: McGraw-Hill.Google Scholar
Xue, N. 2003. Guidelines for the Chinese Proposition Bank.Google Scholar
Xue, N. 2004. Handling Dislocated and Discontinuous Constituents in Chinese Semantic Role Labeling. In Proceedings of the 4th Workshop on Asian Language Resources, ALR04, Hainan Island, China.Google Scholar
Xue, N. 2006a. A Chinese lexicon of roles and senses. Language Resources and Evaluation 40 3–4395403.CrossRefGoogle Scholar
Xue, N. 2006b. Annotating the predicate-argument structure of Chinese nominalizations. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, Genoa, Italy.Google Scholar
Xue, N. 2008. Labeling Chinese Predicates with Semantic Roles. Computational Linguistics 34 2225–55.CrossRefGoogle Scholar
Xue, N., and Palmer, M. 2003. Annotating the propositions in the Penn Chinese Treebank. In The Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan.CrossRefGoogle Scholar
Xue, N., and Palmer, M. 2005. Automatic semantic role labeling for Chinese verbs. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, pp. 1160–5.Google Scholar
Xue, N., and Xia, F. 2000. The Bracketing Guidelines for Penn Chinese Treebank Project. Technical Report IRCS 00-08, University of Pennsylvania.Google Scholar
Xue, N., Xia, F., Chiou, F. d., and Palmer, M. 2005. The Penn Chinese TreeBank: phrase structure annotation of a large corpus. Natural Language Engineering 11 2207–38CrossRefGoogle Scholar
Yi, S., Loper, E., and Palmer, M. 2007. Can semantic roles generalize across genres? In Proceedings of NAACL-2007, Rochester, NY, pp. 548–55.Google Scholar