Published online by Cambridge University Press: 01 September 1997
This paper describes the approach to knowledge representation taken in the LaSIE Information Extraction (IE) system. Unlike many IE systems that skim texts and use large collections of shallow, domain-specific patterns and heuristics to fill in templates, LaSIE attempts a fuller text analysis, first translating individual sentences to a quasi-logical form, and then constructing a weak discourse model of the entire text from which template fills are finally derived. Underpinning the system is a general ‘world model’, represented as a semantic net, which is extended during the processing of a text by adding the classes and instances described in that text. In the paper we describe the system's knowledge representation formalisms, their use in the IE task, and how the knowledge represented in them is acquired, including experiments to extend the system's coverage using the WordNet general purpose semantic network. Preliminary evaluations of our approach, through the Sixth DARPA Message Understanding Conference, indicate comparable performance to shallower approaches. However, we believe its generality and extensibility offer a route towards the higher precision that is required of IE systems if they are to become genuinely usable technologies.