Hostname: page-component-cd9895bd7-jkksz Total loading time: 0 Render date: 2024-12-26T08:23:49.504Z Has data issue: false hasContentIssue false

SUMMAC: a text summarization evaluation

Published online by Cambridge University Press:  17 June 2002

INDERJEET MANI
Affiliation:
The MITRE Corporation, 11493 Sunset Hills Rd., Reston, VA 22090, USA
GARY KLEIN
Affiliation:
The MITRE Corporation, 11493 Sunset Hills Rd., Reston, VA 22090, USA
DAVID HOUSE
Affiliation:
The MITRE Corporation, 11493 Sunset Hills Rd., Reston, VA 22090, USA
LYNETTE HIRSCHMAN
Affiliation:
The MITRE Corporation, 11493 Sunset Hills Rd., Reston, VA 22090, USA
THERESE FIRMIN
Affiliation:
Department of Defense, 9800 Savage Rd., Ft. Meade, MD 20755, USA
BETH SUNDHEIM
Affiliation:
SPAWAR Systems Center, Code D44208, 53140 Gatchell Rd., San Diego, CA 92152, USA

Abstract

The TIPSTER Text Summarization Evaluation (SUMMAC) has developed several new extrinsic and intrinsic methods for evaluating summaries. It has established definitively that automatic text summarization is very effective in relevance assessment tasks on news articles. Summaries as short as 17% of full text length sped up decision-making by almost a factor of 2 with no statistically significant degradation in accuracy. Analysis of feedback forms filled in after each decision indicated that the intelligibility of present-day machine-generated summaries is high. Systems that performed most accurately in the production of indicative and informative topic-related summaries used term frequency and co-occurrence statistics, and vocabulary overlap comparisons between text passages. However, in the absence of a topic, these statistical methods do not appear to provide any additional leverage: in the case of generic summaries, the systems were indistinguishable in accuracy. The paper discusses some of the tradeoffs and challenges faced by the evaluation, and also lists some of the lessons learned, impacts, and possible future directions. The evaluation methods used in the SUMMAC evaluation are of interest to both summarization evaluation as well as evaluation of other ‘output-related’ NLP technologies, where there may be many potentially acceptable outputs, with no automatic way to compare them.

Type
Research Article
Copyright
© 2002 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)