Hostname: page-component-cd9895bd7-q99xh Total loading time: 0 Render date: 2024-12-26T08:12:44.975Z Has data issue: false hasContentIssue false

Some notes on the PARC 700 Dependency Bank

Published online by Cambridge University Press:  11 June 2007

TOMAS BY
Affiliation:
Institut für Informatik, Technische Universität München Boltzmannstraße 3, 85748 Garching bei München, Germany e-mail: tomas.by@in.tum.de

Abstract

The PARC 700 dependency bank is a potentially very useful resource for parser evaluation that has, so to speak, a high barrier to entry, because of tokenisation that is quite different from the source of the data, the Penn Treebank, and because there is no representation of word order, producing an uncertainty factor of some 15%. There is also a small, but perhaps not insignificant, number of errors. When using the dependency bank for evaluation, it seems likely that these things will cause inflated counts for mismatches, so to obtain more accurate measurements, it is desirable to eliminate them. The work reported here consists of an automatic conversion of the dependency bank into a Prolog representation where the word order is explicit, as well as graphical representations of the dependency trees for all 700 sentences, automatically generated from the Prolog data. As a side effect of the transformation, errors were detected and corrected. It is hoped that this work will lead to more widespread use of the PARC 700 dependency bank for parser evaluation.

Type
Papers
Copyright
2007 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)