A portable method for acquiring information extraction patterns without annotated corpora

NEUS CATALÀ; NÚRIA CASTELL; MARIO MARTÍN

doi:10.1017/S1351324902003042

A portable method for acquiring information extraction patterns without annotated corpora

Published online by Cambridge University Press: 04 August 2003

NEUS CATALÀ ,

NÚRIA CASTELL and

MARIO MARTÍN

Show author details

NEUS CATALÀ: Affiliation:
TALP Research Centre, Technical University of Catalonia, Jordi Girona 1-3, Campus Nord, C6. 08034 Barcelona, Spain e-mail: ncatala@talp.upc.es
NÚRIA CASTELL: Affiliation:
TALP Research Centre, Technical University of Catalonia, Jordi Girona 1-3, Campus Nord, C6. 08034 Barcelona, Spain e-mail: castell@talp.upc.es
MARIO MARTÍN: Affiliation:
Department of Software, Technical University of Catalonia, Jordi Girona 1-3, Campus Nord, C6. 08034 Barcelona, Spain e-mail: mmartin@lsi.upc.es

Article contents

Abstract

Get access

Rights & Permissions

Abstract

The main issue when building Information Extraction (IE) systems is how to obtain the knowledge needed to identify relevant information in a document. Most approaches require expert human intervention in many steps of the acquisition process. In this paper we describe ESSENCE, a new method for acquiring IE patterns that significantly reduces the need for human intervention. The method is based on ELA, a specifically designed learning algorithm for acquiring IE patterns without tagged examples. The distinctive features of ESSENCE and ELA are that (1) they permit the automatic acquisition of IE patterns from unrestricted and untagged text representative of the domain, due to (2) their ability to identify regularities around semantically relevant concept-words for the IE task by (3) using non-domain-specific lexical knowledge tools such as WordNet, and (4) restricting the human intervention to defining the task, and validating and typifying the set of IE patterns obtained. Since ESSENCE does not require a corpus annotated with the type of information to be extracted and it uses a general purpose ontology and widely applied syntactic tools, it reduces the expert effort required to build an IE system and therefore also reduces the effort of porting the method to any domain. The results of the application of ESSENCE to the acquisition of IE patterns in an MUC-like task are shown.

Information

Type: Research Article
Information: Natural Language Engineering , Volume 9 , Issue 2 , June 2003 , pp. 151 - 179

DOI: https://doi.org/10.1017/S1351324902003042 [Opens in a new window]

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article contents

A portable method for acquiring information extraction patterns without annotated corpora

Abstract

Information

Access options

Article purchase

Temporarily unavailable

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Article contents

A portable method for acquiring information extraction patterns without annotated corpora

Abstract

Information

Access options

Article purchase

Temporarily unavailable

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests