Hostname: page-component-78c5997874-v9fdk Total loading time: 0 Render date: 2024-11-10T14:54:56.985Z Has data issue: false hasContentIssue false

A portable method for acquiring information extraction patterns without annotated corpora

Published online by Cambridge University Press:  04 August 2003

NEUS CATALÀ
Affiliation:
TALP Research Centre, Technical University of Catalonia, Jordi Girona 1-3, Campus Nord, C6. 08034 Barcelona, Spain e-mail: ncatala@talp.upc.es
NÚRIA CASTELL
Affiliation:
TALP Research Centre, Technical University of Catalonia, Jordi Girona 1-3, Campus Nord, C6. 08034 Barcelona, Spain e-mail: castell@talp.upc.es
MARIO MARTÍN
Affiliation:
Department of Software, Technical University of Catalonia, Jordi Girona 1-3, Campus Nord, C6. 08034 Barcelona, Spain e-mail: mmartin@lsi.upc.es

Abstract

The main issue when building Information Extraction (IE) systems is how to obtain the knowledge needed to identify relevant information in a document. Most approaches require expert human intervention in many steps of the acquisition process. In this paper we describe ESSENCE, a new method for acquiring IE patterns that significantly reduces the need for human intervention. The method is based on ELA, a specifically designed learning algorithm for acquiring IE patterns without tagged examples. The distinctive features of ESSENCE and ELA are that (1) they permit the automatic acquisition of IE patterns from unrestricted and untagged text representative of the domain, due to (2) their ability to identify regularities around semantically relevant concept-words for the IE task by (3) using non-domain-specific lexical knowledge tools such as WordNet, and (4) restricting the human intervention to defining the task, and validating and typifying the set of IE patterns obtained. Since ESSENCE does not require a corpus annotated with the type of information to be extracted and it uses a general purpose ontology and widely applied syntactic tools, it reduces the expert effort required to build an IE system and therefore also reduces the effort of porting the method to any domain. The results of the application of ESSENCE to the acquisition of IE patterns in an MUC-like task are shown.

Type
Research Article
Copyright
2003 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)