COMPENDIUM: a text summarisation tool for generating summaries of multiple purposes, domains, and genres

ELENA LLORET; MANUEL PALOMAR

doi:10.1017/S1351324912000198

COMPENDIUM: a text summarisation tool for generating summaries of multiple purposes, domains, and genres

Published online by Cambridge University Press: 16 July 2012

ELENA LLORET and

MANUEL PALOMAR

Show author details

ELENA LLORET: Affiliation:
Department of Software and Computing Systems, University of Alicante, Apdo. de correos, 99, E-03080, Alicante, Spain e-mail: elloret@dlsi.ua.es, mpalomar@dlsi.ua.es
MANUEL PALOMAR: Affiliation:
Department of Software and Computing Systems, University of Alicante, Apdo. de correos, 99, E-03080, Alicante, Spain e-mail: elloret@dlsi.ua.es, mpalomar@dlsi.ua.es

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In this paper, we present a Text Summarisation tool, compendium, capable of generating the most common types of summaries. Regarding the input, single- and multi-document summaries can be produced; as the output, the summaries can be extractive or abstractive-oriented; and finally, concerning their purpose, the summaries can be generic, query-focused, or sentiment-based. The proposed architecture for compendium is divided in various stages, making a distinction between core and additional stages. The former constitute the backbone of the tool and are common for the generation of any type of summary, whereas the latter are used for enhancing the capabilities of the tool. The main contributions of compendium with respect to the state-of-the-art summarisation systems are that (i) it specifically deals with the problem of redundancy, by means of textual entailment; (ii) it combines statistical and cognitive-based techniques for determining relevant content; and (iii) it proposes an abstractive-oriented approach for facing the challenge of abstractive summarisation. The evaluation performed in different domains and textual genres, comprising traditional texts, as well as texts extracted from the Web 2.0, shows that compendium is very competitive and appropriate to be used as a tool for generating summaries.

Information

Type: Articles
Information: Natural Language Engineering , Volume 19 , Issue 2 , April 2013 , pp. 147 - 186

DOI: https://doi.org/10.1017/S1351324912000198 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Aker, A. and Gaizauskas, R. 2010. Model summaries for location-related images. In Proceedings of the 7th Language Resources and Evaluation Conference, Valletta, Malta, pp. 3119–24.Google Scholar

Álvarez Angulo, T. 2002. El Resumen Como Estrategia de Composición Textual y su Aplicación Didáctica. PhD thesis, Universidad Complutense de Madrid, Madrid, Spain.Google Scholar

Azzam, S., Humphreys, K. and Gaizauskas, R. 1999. Using coreference chains for text summarization. In Proceedings of the ACL'99 Workshop on Coreference and its Applications, Baltimore, MD, USA.Google Scholar

Balahur, A., Lloret, E., Boldrini, E., Montoyo, A., Palomar, M., and Martinez-Barco, P. 2009. Summarizing threads in blogs using opinion polarity. In Proceedings of the International Workshop on Events in Emerging Text Types, Borovets, Bulgaria, pp. 5–13.Google Scholar

Balahur-Dobrescu, A., Kabadjov, M., Steinberger, J., Steinberger, R., and Montoyo, A. 2009. Summarizing opinions in blog threads. In Proceedings of the Pacific Asia Conference on Language, Information and Computation Conference, Hong Kong, China, pp. 606–13.Google Scholar

Baldwin, B. and Morton, T. S. 1998. Dynamic coreference-based summarization. In Proceedings of the Third Conference on Empirical Methods in Natural Language Processing, Granada, Spain.Google Scholar

Barzilay, R. and Elhadad, M. 1999. Using lexical chains for text summarization. In Mani, I. and Maybury, M. (eds.), Advances in Automatic Text Summarization, pp. 111–22. Cambridge, MA, USA: MIT Press.Google Scholar

Barzilay, R. and McKeown, K. R. 2005. Sentence fusion for multidocument news summarization. Computational Linguistics 31 (3): 297–328.CrossRef Google Scholar

Becker, A. 2002. Análisis de la Estructura pragmática de la cláusula en el español de Mérida (Venezuela). Estudios de Lingüística del Español 17, 18–32.Google Scholar

Bossard, A., Généreux, M. and Poibeau, T. 2009. CBSEAS, A summarization system integration of opinion mining techniques to summarize blogs. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Athens, Greece, pp. 5–8.Google Scholar

Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual web search engine. Computer Networks ISDN Systems 30, 107–17.CrossRef Google Scholar

Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. 2005. Learning to rank using gradient descent. In ICML '05: Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, pp. 89–96.CrossRef Google Scholar

Chali, Y. and Hasan, Sadid A. n.d. Query-focused multi-document summarization: automatic data annotations and supervised learning approaches. Natural Language Engineering 18 (1): 109–45.CrossRef Google Scholar

Coursey, K. and Mihalcea, R. 2009. Topic identification using wikipedia graph centrality. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pp. 117–20. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar

Cristea, D., Postolache, O. and Pistol, I. 2005. Summarisation through discourse structure. In Proceedings of the 6th International Conference on Computational Linguistics and Intelligent Text Processing, Mexico City, Mexico, pp. 632–44.CrossRef Google Scholar

Delmonte, R., Bristot, A., Boniforti, M. A. P. and Tonelli, S. 2006. Another evaluation of Anaphora resolution algorithms and a comparison with GETARUNS' knowledge rich approach. In Proceedings of the 4th International Workshop on Robust Methods in Analysis of Natural Language Data (Romand 06), Trento, Italy, pp. 3–10.Google Scholar

Dijkstra, E. W. 1959. A note on two problems in connexion with graphs. Numerische Mathematik 1: 269–71.CrossRef Google Scholar

Dumont, E. and Mérialdo, B. 2009. Automatic evaluation method for rushes summary content. In Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, Cancun, Mexico, pp. 666–9.CrossRef Google Scholar

El-haj, M. O. and Hammo, B. H. 2008. Evaluation of query-based Arabic text summarization system. In Proceedings of the International Conference on Natural Language Processing and Knowledge Engineering, pp. 1–7, Beijing, ChinaGoogle Scholar

Ercan, G. and Cicekli, I. 2008. Lexical cohesion-based topic modeling for summarization. In Proceedings of the 9th International Conference in Computational Linguistics and Intelligent Text Processing, Haifa, Israel, pp. 582–92.Google Scholar

Erkan, G. and Radev, D. R. 2004. LexRank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 22: 457–79.CrossRef Google Scholar

Fan, J., Gao, Y., Luo, H., Keim, D. A., and Li, Z. 2008. A novel approach to enable semantic and visual image summarization for exploratory image search. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, Vancouver, Canada, pp. 358–65.CrossRef Google Scholar

Fellbaum, C. 1998. WordNet: An Electronical Lexical Database. Cambridge, MA, USA: The MIT Press.CrossRef Google Scholar

Ferrández, Ó. 2009. Textual Entailment Recognition and its Applicability in NLP Tasks. PhD thesis, University of Alicante.Google Scholar

Filatova, E. and Hatzivassiloglou, V. 2004. Event-based extractive summarization. In Moens, M.-F. and Szpakowicz, S. (eds.), Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pp. 104–11. Stroudsburg, PA, USA: ACL.Google Scholar

Filippova, K. 2010. Multi-sentence compression: finding shortest paths in word graphs. In Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China, pp. 322–30.Google Scholar

Fuentes, M., Alfonseca, E. and Rodríguez, H. 2007. Support vector machines for query-focused summarization trained and evaluated on pyramid data. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, pp. 57–60.Google Scholar

Givón, T. 1990. Syntax: A Functional-Typological Introduction, II. Amsterdam, Netherlands: John Benjamins.Google Scholar

Glickman, O. 2006. Applied Textual Entailment. PhD thesis, Bar-Ilan University.Google Scholar

Gonçalves, P. N., Rino, L. and Vieira, R. 2008. Summarizing and referring: towards cohesive extracts. In DocEng '08: Proceedings of the 8th ACM Symposium on Document Engineering, Sao Paulo, Brazil, pp. 253–6.CrossRef Google Scholar

Haghighi, A. and Vanderwende, L. 2009. Exploring content models for multi-document summarization. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics, Rochester, New York, USA, pp. 362–70.Google Scholar

Harabagiu, S., Hickl, A. and Lacatusu, F. 2007. Satisfying information needs with multi-document summaries. Information Processing & Management 43 (6): 1619–42.CrossRef Google Scholar

Harabagiu, S. and Lacatusu, F. 2005. Topic themes for multi-document summarization. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, pp. 202–9.CrossRef Google Scholar

Hennig, L. 2009. Topic-based multi-document summarization with probabilistic latent semantic analysis. In Proceedings of the International Conference on Recent Advances in Natural Language Processing, Borovets, Bulgaria, pp. 144–9.Google Scholar

Hovy, E. H. and Lin, C-Y. 1999. Automated multilingual text summarization and its evaluation. Technical Report, Information Sciences Institute, University of Southern California, Los Angeles, SC, USA.Google Scholar

Ji, S. 2007. A textual perspective on Givon's quantity principle. Journal of Pragmatics 39 (2): 292–304.CrossRef Google Scholar

Kabadjov, M., Atkinson, M., Steinberger, J., Steinberger, R., and Van Der Goot, E. 2010. NewsGist: a multilingual statistical news summarizer. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part III, Athens, Greece, pp. 591–4.CrossRef Google Scholar

Khan, A. U., Khan, S. and Mahmood, W. 2005. MRST: a new technique for information summarization. The Second World Informatika Conference, Prague, Czech Republic, pp. 249–52.Google Scholar

Kuo, J.-J. and Chen, H.-H. 2008. Multidocument summary generation: using informative and event words. ACM Transactions on Asian Language Information Processing 7 (1): 1–23.CrossRef Google Scholar

Lehmam, A. 2010. Essential summarizer: innovative automatic text summarization software in twenty languages. In Proceedings of the Adaptivity, Personalization and Fusion of Heterogeneous Information Conference, Paris, France, pp. 216–7.Google Scholar

Lerman, K., Blair-Goldensohn, S., and McDonald, R. 2009. Sentiment summarization: evaluating and learning user preferences. Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Athens, Greece, pp. 514–22.Google Scholar

Li, S., Ouyang, Y., Wang, W. and Sun, B. 2007. Multi-document summarization using support vector regression. In Proceedings of the Document Understanding Workshop, Rochester, New York USA.Google Scholar

Lin, C. Y. 1997. Robust Automated Topic Identification. PhD thesis, University of Southern California, Los Angeles, SC, USA.Google Scholar

Lin, C.-Y. and Hovy, E. 2000. The automated acquisition of topic signatures for text summarization. In Proceedings of the 18th Conference on Computational Linguistics, Saarbrcken, Luxembourg, Germany, pp. 495–501.CrossRef Google Scholar

Lin, C.-Y. and Hovy, E. 2003. Automatic evaluation of summaries using N-gram co-occurrence statistics. In Proceedings of the North American Chapter of the Association for Computational Linguistics on Human Language Technology Conference, Edmonton, Canada, pp. 71–8.Google Scholar

Litvak, M., Last, M. and Friedman, M. 2010. A new approach to improving multilingual summarization using a genetic algorithm. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp. 927–36.Google Scholar

Liu, F. and Liu, Y. 2009. From extractive to abstractive meeting summaries: can it be done by sentence compression? In Proceedings of the Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Singapore, pp. 261–4.Google Scholar

Lloret, E. and Palomar, M. 2009. A gradual combination of features for building automatic summarisation systems. In Proceedings of the 12th International Conference on Text, Speech and Dialogue, Pilsen, Czech Republic, pp. 16–23.CrossRef Google Scholar

Lloret, E. and Palomar, M. 2010. Challenging issues of automatic summarization: relevance detection and quality-based evaluation. Informatica. Special Issue on Computational Linguistics 34 (1): 29–35.Google Scholar

Lloret, E. and Palomar, M. 2011a. Analyzing the use of word graphs for abstractive text summarization. In Proceedings of the First International Conference on Advances in Information Mining and Management, Barcelona, Spain, pp. 61–6.Google Scholar

Lloret, E., Ferrández, Ó., Muñoz, R., and Palomar, M. 2008a. Integración del reconocimiento de la implicación textual en tareas automáticas de resúmenes de textos. Procesamiento del Lenguaje Natural 183–90.Google Scholar

Lloret, E., Ferrández, Ó., Muñoz, R., and Palomar, M. 2008b. A text summarization approach under the influence of textual entailment. In Proceedings of the 5th International Workshop on Natural Language Processing and Cognitive Science in conjunction with the 10th International Conference on Enterprise Information Systems, Barcelona, Spain, pp. 22–31.Google Scholar

Lloret, E. and Palomar, M. 2011b. Text summarisation in progress: a literature review. Artificial Intelligence Review 37 (1): 1–41.CrossRef Google Scholar

Mani, I. and Maybury, M. T. 1999. Advances in Automatic Text Summarization. Cambridge, MA, USA: The MIT Press.Google Scholar

Mann, W. C. and Thompson, S. A. 1988. Rhetorical structure theory: toward a functional theory of text organization. Text 8 (3): 243–81.Google Scholar

Marcu, D. 1999. Discourse trees are good indicators of importance in text. In Mani, I. and Maybury, M. (eds.), Advances in Automatic Text Summarization, pp. 123–36. Cambridge, MA, USA: MIT Press.Google Scholar

McCargar, V. 2005. Statistical approaches to automatic text summarization. Bulletin of the American Society for Information Science and Technology 30 (4): 21–5.CrossRef Google Scholar

Medelyan, O. 2007. Computing lexical chains with graph clustering. In Proceedings of the Association of Computational Linguistics Student Research Workshop, Prague, Czech Republic, pp. 85–90.Google Scholar

Mihalcea, R. 2004. Graph-based ranking algorithms for sentence extraction, applied to text summarization. In Proceedings of the Association of Computational Linguistics on Interactive Poster and Demonstration Sessions, University of Michigan, Michigan, USA, pp. 170–3.Google Scholar

Mitkov, R. 2003. The Oxford Handbook of Computational Linguistics (Oxford Handbooks in Linguistics S.). Oxford, UK: Oxford University Press.Google Scholar

Mittal, V., Kantrowitz, M., Goldstein, J. and Carbonell, J. 1999. Selecting text spans for document summaries: heuristics and metrics. In Proceedings of the 16th National Conference on Artificial Intelligence and the 11th Innovative Applications of Artificial Intelligence Conference, Orlando, FL, USA, pp. 467–73.Google Scholar

Montiel Soto, R., and García-Hernández, R. A. 2009. Comparación de tres modelos de texto para la generación automática de resúmenes. Procesamiento del Lenguaje Natural 43: 303–11.Google Scholar

Mori, T. 2002. Information gain ratio as term weight: the case of summarization of IR results. In Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan, pp. 1–7.Google Scholar

Morris, A. H., Kasper, G. M. and Adams, D. A. 1992. The effect and limitations of automatic text condensing on reading comprehension performance. Information Systems Research 3 (1): 17–35.CrossRef Google Scholar

Nastase, V., Filippova, K. and Ponzetto, S. P. 2008. Generating update summaries with spreading activation. In Proceedings of the Text Analysis Conference, Gaithersburg, MD, USA, pp. 189–97.Google Scholar

Nenkova, A., Vanderwende, L. and McKeown, K. 2006. A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, USA pp. 573–80.CrossRef Google Scholar

Orăsan, C. 2009. Comparative evaluation of term-weighting methods for automatic summarization. Journal of Quantitative Linguistics 16 (1): 67–95.CrossRef Google Scholar

Orăsan, C., Pekar, V. and Hasler, L. 2004. A comparison of summarisation methods based on term specificity estimation. In Proceedings of the Fourth International Conference on Language Resources and Evaluation, Lisbon, Portugal, pp. 1037–41.Google Scholar

Ou, S., Khoo, C. S. G. and Goh, D. H. 2007. Automatic multidocument summarization of research abstracts: design and user evaluation. Journal of American Society for Information Science and Technology 58 (10): 1419–35.CrossRef Google Scholar

Plaza, L. 2011. Uso de Grafos Semánticos en la Generación Automática de Resúmenes y Estudio de su Aplicación en Distintos Dominios: Biomedicina, Periodismo y Turismo. PhD thesis, Universidad Complutense de Madrid.Google Scholar

Plaza, L., Díaz, A. and Gervás, P. 2008. Concept-graph based biomedical automatic summarization using ontologies. In Proceedings of the 3rd Textgraphs Workshop on Graph-based Algorithms for Natural Language Processing, Rochester, NY, USA, pp. 53–6.Google Scholar

Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J., Celebi, A., Drabek, E., Lam, W., Liu, D., Otterbacher, J., Qi, H., Saggion, H., Teufel, S., Topper, M., Winkel, A., and Zhang, Z. 2004. MEAD – a platform for multidocument multilingual text summarization. In Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, Portugal, pp. 699–702.Google Scholar

Ramshaw, L. A. and Marcus, M. P. 1995. Text chunking using transformation-based learning. In Proceedings of the Third ACL Workshop on Very Large Corpora, Boston, MA, USA, pp. 82–94.Google Scholar

Saggion, H. 2008. SUMMA: a robust and adaptable summarization tool. Traitement Automatique des Languages 49: 103–25.Google Scholar

Saggion, H. and Lapalme, G. 2002. Generating indicative-informative summaries with SumUM. Computational Linguistics 28 (4): 497–526.CrossRef Google Scholar

Sarkar, K. and Bandyopadhyay, S. 2005. Generating headline summary from a document set. In Proceedings of the 6th International Conference on Computational Linguistics and Intelligent Text Processing, Mexico City, Mexico, pp. 649–52.CrossRef Google Scholar

Schilder, F. and Kondadadi, R. 2008. FastSum: fast and accurate query-based multi-document summarization. In Proceedings of the 46th Annual Meeting of the Association of Computational Linguistics, Columbus, OH, USA, pp. 205–8.Google Scholar

Spärck Jones, K. 1999. Automatic summarizing: factors and directions. In Mani, I., and Maybury, M. T. (eds), Advances in Automatic Text Summarization, pp. 1–14. Cambridge, MA, USA: MIT Press.Google Scholar

Steinberger, J., Jezek, K. and Sloup, M. 2008. Web topic summarization. In Proceedings of the 12th International Conference on Electronic Publishing, Toronto, Canada, pp. 322–34.Google Scholar

Svore, K. M., Vanderwende, L. and Burges, C. J. C. 2007. Enhancing single-document summarization by combining RankNet and third-party sources. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech, pp. 448–57.Google Scholar

Tatar, D., Tamaianu-Morita, E., Mihis, A., and Lupsa, D. 2008. Summarization by logic segmentation and text entailment. In Proceedings of Conference on Intelligent Text Processing and Computational Linguistics, Haifa, Israel, pp. 15–26.Google Scholar

Teng, Z., Liu, Y., Ren, F., Tsuchiya, S., and Ren, F. 2008. Single document summarization based on local topic identification and word frequency. In Proceedings of the Seventh Mexican International Conference on Artificial Intelligence, Atizapán de Zaragoza, Mexico, pp. 37–41.Google Scholar

Tigelaar, A. S., Op Den Akker, R., and Hiemstra, D. 2010. Automatic summarisation of discussion fora. Natural Language Engineering 16 (02): 161–92.CrossRef Google Scholar

Van Dijk, T. 1980. Macrostructures: An Interdisciplinary Study of Global Structures in Discourse, Interaction, and Cognition. Hillsdale, NJ, USA: Lawrence Erlbaum.Google Scholar

Wan, X. 2008. Using only cross-document relationships for both generic and topic-focused multi-document summarization. Information Retrieval 11 (1): 25–49.CrossRef Google Scholar

Wong, K.-F., Wu, M., and Li, W. 2008. Extractive summarization using supervised and semi-supervised learning. In Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, UK, pp. 985–92.Google Scholar

Yu, J., Reiter, E., Hunter, J. and Mellish, C. 2007. Choosing the content of textual summaries of large time-series data sets. Natural Language Engineering 13 (1): 25–49.CrossRef Google Scholar

Zajic, D. M., Dorr, B. J. and Lin, J. 2008. Single-document and multi-document summarization techniques for email threads using sentence compression. Information Processing & Management 44: 1600–10.CrossRef Google Scholar

Zhang, J. and Fung, P. 2009. Active learning of extractive reference summaries for lecture speech summarization. In Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: From Parallel to Non-parallel Corpora, Singapore, pp. 23–6.CrossRef Google Scholar

Zhao, L., Wu, L. and Huang, X. 2009. Using query expansion in graph-based approach for query-focused multi-document summarization. Information Processing and Management 45 (1): 35–41.CrossRef Google Scholar

Article contents

COMPENDIUM: a text summarisation tool for generating summaries of multiple purposes, domains, and genres

Abstract

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests