Hostname: page-component-78c5997874-4rdpn Total loading time: 0 Render date: 2024-11-14T06:02:36.589Z Has data issue: false hasContentIssue false

COMPENDIUM: a text summarisation tool for generating summaries of multiple purposes, domains, and genres

Published online by Cambridge University Press:  16 July 2012

ELENA LLORET
Affiliation:
Department of Software and Computing Systems, University of Alicante, Apdo. de correos, 99, E-03080, Alicante, Spain e-mail: elloret@dlsi.ua.es, mpalomar@dlsi.ua.es
MANUEL PALOMAR
Affiliation:
Department of Software and Computing Systems, University of Alicante, Apdo. de correos, 99, E-03080, Alicante, Spain e-mail: elloret@dlsi.ua.es, mpalomar@dlsi.ua.es

Abstract

In this paper, we present a Text Summarisation tool, compendium, capable of generating the most common types of summaries. Regarding the input, single- and multi-document summaries can be produced; as the output, the summaries can be extractive or abstractive-oriented; and finally, concerning their purpose, the summaries can be generic, query-focused, or sentiment-based. The proposed architecture for compendium is divided in various stages, making a distinction between core and additional stages. The former constitute the backbone of the tool and are common for the generation of any type of summary, whereas the latter are used for enhancing the capabilities of the tool. The main contributions of compendium with respect to the state-of-the-art summarisation systems are that (i) it specifically deals with the problem of redundancy, by means of textual entailment; (ii) it combines statistical and cognitive-based techniques for determining relevant content; and (iii) it proposes an abstractive-oriented approach for facing the challenge of abstractive summarisation. The evaluation performed in different domains and textual genres, comprising traditional texts, as well as texts extracted from the Web 2.0, shows that compendium is very competitive and appropriate to be used as a tool for generating summaries.

Type
Articles
Copyright
Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aker, A. and Gaizauskas, R. 2010. Model summaries for location-related images. In Proceedings of the 7th Language Resources and Evaluation Conference, Valletta, Malta, pp. 3119–24.Google Scholar
Álvarez Angulo, T. 2002. El Resumen Como Estrategia de Composición Textual y su Aplicación Didáctica. PhD thesis, Universidad Complutense de Madrid, Madrid, Spain.Google Scholar
Azzam, S., Humphreys, K. and Gaizauskas, R. 1999. Using coreference chains for text summarization. In Proceedings of the ACL'99 Workshop on Coreference and its Applications, Baltimore, MD, USA.Google Scholar
Balahur, A., Lloret, E., Boldrini, E., Montoyo, A., Palomar, M., and Martinez-Barco, P. 2009. Summarizing threads in blogs using opinion polarity. In Proceedings of the International Workshop on Events in Emerging Text Types, Borovets, Bulgaria, pp. 513.Google Scholar
Balahur-Dobrescu, A., Kabadjov, M., Steinberger, J., Steinberger, R., and Montoyo, A. 2009. Summarizing opinions in blog threads. In Proceedings of the Pacific Asia Conference on Language, Information and Computation Conference, Hong Kong, China, pp. 606–13.Google Scholar
Baldwin, B. and Morton, T. S. 1998. Dynamic coreference-based summarization. In Proceedings of the Third Conference on Empirical Methods in Natural Language Processing, Granada, Spain.Google Scholar
Barzilay, R. and Elhadad, M. 1999. Using lexical chains for text summarization. In Mani, I. and Maybury, M. (eds.), Advances in Automatic Text Summarization, pp. 111–22. Cambridge, MA, USA: MIT Press.Google Scholar
Barzilay, R. and McKeown, K. R. 2005. Sentence fusion for multidocument news summarization. Computational Linguistics 31 (3): 297328.CrossRefGoogle Scholar
Becker, A. 2002. Análisis de la Estructura pragmática de la cláusula en el español de Mérida (Venezuela). Estudios de Lingüística del Español 17, 1832.Google Scholar
Bossard, A., Généreux, M. and Poibeau, T. 2009. CBSEAS, A summarization system integration of opinion mining techniques to summarize blogs. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Athens, Greece, pp. 58.Google Scholar
Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual web search engine. Computer Networks ISDN Systems 30, 107–17.CrossRefGoogle Scholar
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. 2005. Learning to rank using gradient descent. In ICML '05: Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, pp. 8996.CrossRefGoogle Scholar
Chali, Y. and Hasan, Sadid A. n.d. Query-focused multi-document summarization: automatic data annotations and supervised learning approaches. Natural Language Engineering 18 (1): 109–45.CrossRefGoogle Scholar
Coursey, K. and Mihalcea, R. 2009. Topic identification using wikipedia graph centrality. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pp. 117–20. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
Cristea, D., Postolache, O. and Pistol, I. 2005. Summarisation through discourse structure. In Proceedings of the 6th International Conference on Computational Linguistics and Intelligent Text Processing, Mexico City, Mexico, pp. 632–44.CrossRefGoogle Scholar
Delmonte, R., Bristot, A., Boniforti, M. A. P. and Tonelli, S. 2006. Another evaluation of Anaphora resolution algorithms and a comparison with GETARUNS' knowledge rich approach. In Proceedings of the 4th International Workshop on Robust Methods in Analysis of Natural Language Data (Romand 06), Trento, Italy, pp. 310.Google Scholar
Dijkstra, E. W. 1959. A note on two problems in connexion with graphs. Numerische Mathematik 1: 269–71.CrossRefGoogle Scholar
Dumont, E. and Mérialdo, B. 2009. Automatic evaluation method for rushes summary content. In Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, Cancun, Mexico, pp. 666–9.CrossRefGoogle Scholar
El-haj, M. O. and Hammo, B. H. 2008. Evaluation of query-based Arabic text summarization system. In Proceedings of the International Conference on Natural Language Processing and Knowledge Engineering, pp. 17, Beijing, ChinaGoogle Scholar
Ercan, G. and Cicekli, I. 2008. Lexical cohesion-based topic modeling for summarization. In Proceedings of the 9th International Conference in Computational Linguistics and Intelligent Text Processing, Haifa, Israel, pp. 582–92.Google Scholar
Erkan, G. and Radev, D. R. 2004. LexRank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 22: 457–79.CrossRefGoogle Scholar
Fan, J., Gao, Y., Luo, H., Keim, D. A., and Li, Z. 2008. A novel approach to enable semantic and visual image summarization for exploratory image search. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, Vancouver, Canada, pp. 358–65.CrossRefGoogle Scholar
Fellbaum, C. 1998. WordNet: An Electronical Lexical Database. Cambridge, MA, USA: The MIT Press.CrossRefGoogle Scholar
Ferrández, Ó. 2009. Textual Entailment Recognition and its Applicability in NLP Tasks. PhD thesis, University of Alicante.Google Scholar
Filatova, E. and Hatzivassiloglou, V. 2004. Event-based extractive summarization. In Moens, M.-F. and Szpakowicz, S. (eds.), Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pp. 104–11. Stroudsburg, PA, USA: ACL.Google Scholar
Filippova, K. 2010. Multi-sentence compression: finding shortest paths in word graphs. In Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China, pp. 322–30.Google Scholar
Fuentes, M., Alfonseca, E. and Rodríguez, H. 2007. Support vector machines for query-focused summarization trained and evaluated on pyramid data. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, pp. 5760.Google Scholar
Givón, T. 1990. Syntax: A Functional-Typological Introduction, II. Amsterdam, Netherlands: John Benjamins.Google Scholar
Glickman, O. 2006. Applied Textual Entailment. PhD thesis, Bar-Ilan University.Google Scholar
Gonçalves, P. N., Rino, L. and Vieira, R. 2008. Summarizing and referring: towards cohesive extracts. In DocEng '08: Proceedings of the 8th ACM Symposium on Document Engineering, Sao Paulo, Brazil, pp. 253–6.CrossRefGoogle Scholar
Haghighi, A. and Vanderwende, L. 2009. Exploring content models for multi-document summarization. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics, Rochester, New York, USA, pp. 362–70.Google Scholar
Harabagiu, S., Hickl, A. and Lacatusu, F. 2007. Satisfying information needs with multi-document summaries. Information Processing & Management 43 (6): 1619–42.CrossRefGoogle Scholar
Harabagiu, S. and Lacatusu, F. 2005. Topic themes for multi-document summarization. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, pp. 202–9.CrossRefGoogle Scholar
Hennig, L. 2009. Topic-based multi-document summarization with probabilistic latent semantic analysis. In Proceedings of the International Conference on Recent Advances in Natural Language Processing, Borovets, Bulgaria, pp. 144–9.Google Scholar
Hovy, E. H. and Lin, C-Y. 1999. Automated multilingual text summarization and its evaluation. Technical Report, Information Sciences Institute, University of Southern California, Los Angeles, SC, USA.Google Scholar
Ji, S. 2007. A textual perspective on Givon's quantity principle. Journal of Pragmatics 39 (2): 292304.CrossRefGoogle Scholar
Kabadjov, M., Atkinson, M., Steinberger, J., Steinberger, R., and Van Der Goot, E. 2010. NewsGist: a multilingual statistical news summarizer. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part III, Athens, Greece, pp. 591–4.CrossRefGoogle Scholar
Khan, A. U., Khan, S. and Mahmood, W. 2005. MRST: a new technique for information summarization. The Second World Informatika Conference, Prague, Czech Republic, pp. 249–52.Google Scholar
Kuo, J.-J. and Chen, H.-H. 2008. Multidocument summary generation: using informative and event words. ACM Transactions on Asian Language Information Processing 7 (1): 123.CrossRefGoogle Scholar
Lehmam, A. 2010. Essential summarizer: innovative automatic text summarization software in twenty languages. In Proceedings of the Adaptivity, Personalization and Fusion of Heterogeneous Information Conference, Paris, France, pp. 216–7.Google Scholar
Lerman, K., Blair-Goldensohn, S., and McDonald, R. 2009. Sentiment summarization: evaluating and learning user preferences. Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Athens, Greece, pp. 514–22.Google Scholar
Li, S., Ouyang, Y., Wang, W. and Sun, B. 2007. Multi-document summarization using support vector regression. In Proceedings of the Document Understanding Workshop, Rochester, New York USA.Google Scholar
Lin, C. Y. 1997. Robust Automated Topic Identification. PhD thesis, University of Southern California, Los Angeles, SC, USA.Google Scholar
Lin, C.-Y. and Hovy, E. 2000. The automated acquisition of topic signatures for text summarization. In Proceedings of the 18th Conference on Computational Linguistics, Saarbrcken, Luxembourg, Germany, pp. 495501.CrossRefGoogle Scholar
Lin, C.-Y. and Hovy, E. 2003. Automatic evaluation of summaries using N-gram co-occurrence statistics. In Proceedings of the North American Chapter of the Association for Computational Linguistics on Human Language Technology Conference, Edmonton, Canada, pp. 71–8.Google Scholar
Litvak, M., Last, M. and Friedman, M. 2010. A new approach to improving multilingual summarization using a genetic algorithm. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp. 927–36.Google Scholar
Liu, F. and Liu, Y. 2009. From extractive to abstractive meeting summaries: can it be done by sentence compression? In Proceedings of the Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Singapore, pp. 261–4.Google Scholar
Lloret, E. and Palomar, M. 2009. A gradual combination of features for building automatic summarisation systems. In Proceedings of the 12th International Conference on Text, Speech and Dialogue, Pilsen, Czech Republic, pp. 1623.CrossRefGoogle Scholar
Lloret, E. and Palomar, M. 2010. Challenging issues of automatic summarization: relevance detection and quality-based evaluation. Informatica. Special Issue on Computational Linguistics 34 (1): 2935.Google Scholar
Lloret, E. and Palomar, M. 2011a. Analyzing the use of word graphs for abstractive text summarization. In Proceedings of the First International Conference on Advances in Information Mining and Management, Barcelona, Spain, pp. 61–6.Google Scholar
Lloret, E., Ferrández, Ó., Muñoz, R., and Palomar, M. 2008a. Integración del reconocimiento de la implicación textual en tareas automáticas de resúmenes de textos. Procesamiento del Lenguaje Natural 183–90.Google Scholar
Lloret, E., Ferrández, Ó., Muñoz, R., and Palomar, M. 2008b. A text summarization approach under the influence of textual entailment. In Proceedings of the 5th International Workshop on Natural Language Processing and Cognitive Science in conjunction with the 10th International Conference on Enterprise Information Systems, Barcelona, Spain, pp. 2231.Google Scholar
Lloret, E. and Palomar, M. 2011b. Text summarisation in progress: a literature review. Artificial Intelligence Review 37 (1): 141.CrossRefGoogle Scholar
Mani, I. and Maybury, M. T. 1999. Advances in Automatic Text Summarization. Cambridge, MA, USA: The MIT Press.Google Scholar
Mann, W. C. and Thompson, S. A. 1988. Rhetorical structure theory: toward a functional theory of text organization. Text 8 (3): 243–81.Google Scholar
Marcu, D. 1999. Discourse trees are good indicators of importance in text. In Mani, I. and Maybury, M. (eds.), Advances in Automatic Text Summarization, pp. 123–36. Cambridge, MA, USA: MIT Press.Google Scholar
McCargar, V. 2005. Statistical approaches to automatic text summarization. Bulletin of the American Society for Information Science and Technology 30 (4): 21–5.CrossRefGoogle Scholar
Medelyan, O. 2007. Computing lexical chains with graph clustering. In Proceedings of the Association of Computational Linguistics Student Research Workshop, Prague, Czech Republic, pp. 8590.Google Scholar
Mihalcea, R. 2004. Graph-based ranking algorithms for sentence extraction, applied to text summarization. In Proceedings of the Association of Computational Linguistics on Interactive Poster and Demonstration Sessions, University of Michigan, Michigan, USA, pp. 170–3.Google Scholar
Mitkov, R. 2003. The Oxford Handbook of Computational Linguistics (Oxford Handbooks in Linguistics S.). Oxford, UK: Oxford University Press.Google Scholar
Mittal, V., Kantrowitz, M., Goldstein, J. and Carbonell, J. 1999. Selecting text spans for document summaries: heuristics and metrics. In Proceedings of the 16th National Conference on Artificial Intelligence and the 11th Innovative Applications of Artificial Intelligence Conference, Orlando, FL, USA, pp. 467–73.Google Scholar
Montiel Soto, R., and García-Hernández, R. A. 2009. Comparación de tres modelos de texto para la generación automática de resúmenes. Procesamiento del Lenguaje Natural 43: 303–11.Google Scholar
Mori, T. 2002. Information gain ratio as term weight: the case of summarization of IR results. In Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan, pp. 17.Google Scholar
Morris, A. H., Kasper, G. M. and Adams, D. A. 1992. The effect and limitations of automatic text condensing on reading comprehension performance. Information Systems Research 3 (1): 1735.CrossRefGoogle Scholar
Nastase, V., Filippova, K. and Ponzetto, S. P. 2008. Generating update summaries with spreading activation. In Proceedings of the Text Analysis Conference, Gaithersburg, MD, USA, pp. 189–97.Google Scholar
Nenkova, A., Vanderwende, L. and McKeown, K. 2006. A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, USA pp. 573–80.CrossRefGoogle Scholar
Orăsan, C. 2009. Comparative evaluation of term-weighting methods for automatic summarization. Journal of Quantitative Linguistics 16 (1): 6795.CrossRefGoogle Scholar
Orăsan, C., Pekar, V. and Hasler, L. 2004. A comparison of summarisation methods based on term specificity estimation. In Proceedings of the Fourth International Conference on Language Resources and Evaluation, Lisbon, Portugal, pp. 1037–41.Google Scholar
Ou, S., Khoo, C. S. G. and Goh, D. H. 2007. Automatic multidocument summarization of research abstracts: design and user evaluation. Journal of American Society for Information Science and Technology 58 (10): 1419–35.CrossRefGoogle Scholar
Plaza, L. 2011. Uso de Grafos Semánticos en la Generación Automática de Resúmenes y Estudio de su Aplicación en Distintos Dominios: Biomedicina, Periodismo y Turismo. PhD thesis, Universidad Complutense de Madrid.Google Scholar
Plaza, L., Díaz, A. and Gervás, P. 2008. Concept-graph based biomedical automatic summarization using ontologies. In Proceedings of the 3rd Textgraphs Workshop on Graph-based Algorithms for Natural Language Processing, Rochester, NY, USA, pp. 53–6.Google Scholar
Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J., Celebi, A., Drabek, E., Lam, W., Liu, D., Otterbacher, J., Qi, H., Saggion, H., Teufel, S., Topper, M., Winkel, A., and Zhang, Z. 2004. MEAD – a platform for multidocument multilingual text summarization. In Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, Portugal, pp. 699702.Google Scholar
Ramshaw, L. A. and Marcus, M. P. 1995. Text chunking using transformation-based learning. In Proceedings of the Third ACL Workshop on Very Large Corpora, Boston, MA, USA, pp. 8294.Google Scholar
Saggion, H. 2008. SUMMA: a robust and adaptable summarization tool. Traitement Automatique des Languages 49: 103–25.Google Scholar
Saggion, H. and Lapalme, G. 2002. Generating indicative-informative summaries with SumUM. Computational Linguistics 28 (4): 497526.CrossRefGoogle Scholar
Sarkar, K. and Bandyopadhyay, S. 2005. Generating headline summary from a document set. In Proceedings of the 6th International Conference on Computational Linguistics and Intelligent Text Processing, Mexico City, Mexico, pp. 649–52.CrossRefGoogle Scholar
Schilder, F. and Kondadadi, R. 2008. FastSum: fast and accurate query-based multi-document summarization. In Proceedings of the 46th Annual Meeting of the Association of Computational Linguistics, Columbus, OH, USA, pp. 205–8.Google Scholar
Spärck Jones, K. 1999. Automatic summarizing: factors and directions. In Mani, I., and Maybury, M. T. (eds), Advances in Automatic Text Summarization, pp. 114. Cambridge, MA, USA: MIT Press.Google Scholar
Steinberger, J., Jezek, K. and Sloup, M. 2008. Web topic summarization. In Proceedings of the 12th International Conference on Electronic Publishing, Toronto, Canada, pp. 322–34.Google Scholar
Svore, K. M., Vanderwende, L. and Burges, C. J. C. 2007. Enhancing single-document summarization by combining RankNet and third-party sources. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech, pp. 448–57.Google Scholar
Tatar, D., Tamaianu-Morita, E., Mihis, A., and Lupsa, D. 2008. Summarization by logic segmentation and text entailment. In Proceedings of Conference on Intelligent Text Processing and Computational Linguistics, Haifa, Israel, pp. 1526.Google Scholar
Teng, Z., Liu, Y., Ren, F., Tsuchiya, S., and Ren, F. 2008. Single document summarization based on local topic identification and word frequency. In Proceedings of the Seventh Mexican International Conference on Artificial Intelligence, Atizapán de Zaragoza, Mexico, pp. 3741.Google Scholar
Tigelaar, A. S., Op Den Akker, R., and Hiemstra, D. 2010. Automatic summarisation of discussion fora. Natural Language Engineering 16 (02): 161–92.CrossRefGoogle Scholar
Van Dijk, T. 1980. Macrostructures: An Interdisciplinary Study of Global Structures in Discourse, Interaction, and Cognition. Hillsdale, NJ, USA: Lawrence Erlbaum.Google Scholar
Wan, X. 2008. Using only cross-document relationships for both generic and topic-focused multi-document summarization. Information Retrieval 11 (1): 2549.CrossRefGoogle Scholar
Wong, K.-F., Wu, M., and Li, W. 2008. Extractive summarization using supervised and semi-supervised learning. In Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, UK, pp. 985–92.Google Scholar
Yu, J., Reiter, E., Hunter, J. and Mellish, C. 2007. Choosing the content of textual summaries of large time-series data sets. Natural Language Engineering 13 (1): 2549.CrossRefGoogle Scholar
Zajic, D. M., Dorr, B. J. and Lin, J. 2008. Single-document and multi-document summarization techniques for email threads using sentence compression. Information Processing & Management 44: 1600–10.CrossRefGoogle Scholar
Zhang, J. and Fung, P. 2009. Active learning of extractive reference summaries for lecture speech summarization. In Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: From Parallel to Non-parallel Corpora, Singapore, pp. 23–6.CrossRefGoogle Scholar
Zhao, L., Wu, L. and Huang, X. 2009. Using query expansion in graph-based approach for query-focused multi-document summarization. Information Processing and Management 45 (1): 3541.CrossRefGoogle Scholar