Hostname: page-component-78c5997874-t5tsf Total loading time: 0 Render date: 2024-11-15T01:34:37.042Z Has data issue: false hasContentIssue false

Definitional and human constraints on structural annotation of English*

Published online by Cambridge University Press:  01 October 2008

GEOFFREY SAMPSON
Affiliation:
Department of Informatics, University of Sussex, Falmer, Brighton, BN1 9QJ, England e-mail: grs2@sussex.ac.uk
ANNA BABARCZY
Affiliation:
Department of Cognitive Science, Budapest University of Technology & Economics, 1111 Budapest, Stoczek utca 2, Hungary e-mail: babarczy@cogsci.bme.hu

Abstract

The limits on predictability and refinement of English structural annotation are examined by comparing independent annotations, by experienced analysts using the same detailed published guidelines, of a common sample of written texts. Three conclusions emerge. First, while it is not easy to define watertight boundaries between the categories of a comprehensive structural annotation scheme, limits on inter-annotator agreement are in practice set more by the difficulty of conforming to a well-defined scheme than by the difficulty of making a scheme well defined. Secondly, although usage is often structurally ambiguous, commonly the alternative analyses are logical distinctions without a practical difference – which raises questions about the role of grammar in human linguistic behaviour. Finally, one specific area of annotation is strikingly more problematic than any other area examined, though this area (classifying the functions of clause-constituents) seems a particularly significant one for human language use. These findings should be of interest both to computational linguists and to students of language as an aspect of human cognition.

Type
Papers
Copyright
Copyright © Cambridge University Press 2008

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Babarczy, Anna, Carroll, J. A. and Sampson, G. R. 2006. Definitional, personal, and mechanical constraints on part of speech annotation performance. Journal of Natural Language Engineering 12: 7790.CrossRefGoogle Scholar
Bird, S. and Liberman, M. 2001. Linguistic annotation. www.ldc.upenn.edu/annotation/Google Scholar
Fillmore, C. J. 1968. The case for case. In Bach, E. and Harms, R. T. (eds.), Universals in Linguistic Theory, Holt, Rinehart & Winston, pp. 0–88.Google Scholar
Gildea, D. and Jurafsky, D. 2002. Automatic labeling of semantic roles. Computational Linguistics 28: 245–88.CrossRefGoogle Scholar
Kübler, Sandra and Telljohann, J. 2002. Towards a dependency-oriented evaluation for partial parsing. In Proceedings of the Workshop ‘Beyond Parseval – Towards Improved Evaluation Measures for Parsing Systems’ LREC 2002, Las Palmas, 2 June 2002, pp. 9–16.Google Scholar
Manning, C.D. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.Google Scholar
Màrquez, Ll., Surdeanu, M., Comas, P. and Turmo, J. 2005. A robust combination strategy for semantic role labeling. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005). Vancouver, pp. 644–51.Google Scholar
Ruppenhofer, J., Ellsworth, M., Petruck, , Miriam, R. L. and Johnson, C. R. 2005. FrameNet: Theory and Practice. framenet.icsi.berkeley.edu/book/book.htmlGoogle Scholar
Sampson, G. R. 1995. English for the Computer: The SUSANNE Corpus and Annotation Scheme. Oxford: Clarendon Press (Oxford University Press).CrossRefGoogle Scholar
Sampson, G. R. 2000. A proposal for improving the measurement of parse accuracy. International Journal of Corpus Linguistics 5: 5368.CrossRefGoogle Scholar
Sampson, G. R. 2001. Demographic correlates of complexity in English speech. In Sampson, G.R. (ed), Empirical Linguistics. London: Continuum, pp. 5773.Google Scholar
Sampson, G. R. and Babarczy, Anna. 2003. A test of the leaf-ancestor metric for parse accuracy. Journal of Natural Language Engineering 9: 365–80.CrossRefGoogle Scholar
Sapir, E. 1921. Language. New York: Harcourt, Brace & World.Google Scholar
Stockwell, R. P., Schachter, P. and Partee, B. H. 1973. The Major Syntactic Structures of English. New York: Holt, Rinehart & Winston.Google Scholar
Xue, N. and Palmer, Marta. 2004. Calibrating features for semantic role labeling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004). Barcelona, pp. 88–94.Google Scholar
Xue, N., Xia, Fei, Chiou, Fu-Dong, and Palmer, Marta. 2005. The Penn Chinese TreeBank: phrase structure annotation of a large corpus. Journal of Natural Language Engineering 11: 207–38.CrossRefGoogle Scholar