Published online by Cambridge University Press: 04 August 2003
The main issue when building Information Extraction (IE) systems is how to obtain the knowledge needed to identify relevant information in a document. Most approaches require expert human intervention in many steps of the acquisition process. In this paper we describe ESSENCE, a new method for acquiring IE patterns that significantly reduces the need for human intervention. The method is based on ELA, a specifically designed learning algorithm for acquiring IE patterns without tagged examples. The distinctive features of ESSENCE and ELA are that (1) they permit the automatic acquisition of IE patterns from unrestricted and untagged text representative of the domain, due to (2) their ability to identify regularities around semantically relevant concept-words for the IE task by (3) using non-domain-specific lexical knowledge tools such as WordNet, and (4) restricting the human intervention to defining the task, and validating and typifying the set of IE patterns obtained. Since ESSENCE does not require a corpus annotated with the type of information to be extracted and it uses a general purpose ontology and widely applied syntactic tools, it reduces the expert effort required to build an IE system and therefore also reduces the effort of porting the method to any domain. The results of the application of ESSENCE to the acquisition of IE patterns in an MUC-like task are shown.