It all starts with entities: A Salient entity topic model

Chuan Wu; Evangelos Kanoulas; Maarten de Rijke

doi:10.1017/S1351324919000585

It all starts with entities: A Salient entity topic model

Published online by Cambridge University Press: 22 November 2019

Chuan Wu

Evangelos Kanoulas and

Maarten de Rijke

Show author details

Chuan Wu*: Affiliation:
School of Information Management, Wuhan University, Wuhan, China Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
Evangelos Kanoulas: Affiliation:
Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
Maarten de Rijke: Affiliation:
Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
*: *Corresponding author. Email: wuchuan114@gmail.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Entities play an essential role in understanding textual documents, regardless of whether the documents are short, such as tweets, or long, such as news articles. In short textual documents, all entities mentioned are usually considered equally important because of the limited amount of information. In long textual documents, however, not all entities are equally important: some are salient and others are not. Traditional entity topic models (ETMs) focus on ways to incorporate entity information into topic models to better explain the generative process of documents. However, entities are usually treated equally, without considering whether they are salient or not. In this work, we propose a novel ETM, Salient Entity Topic Model, to take salient entities into consideration in the document generation process. In particular, we model salient entities as a source of topics used to generate words in documents, in addition to the topic distribution of documents used in traditional topic models. Qualitative and quantitative analysis is performed on the proposed model. Application to entity salience detection demonstrates the effectiveness of our model compared to the state-of-the-art topic model baselines.

Keywords

Entity Salience Entity Topic Model

Information

Type: Article
Information: Natural Language Engineering , Volume 26 , Issue 5 , September 2020 , pp. 531 - 549

DOI: https://doi.org/10.1017/S1351324919000585 [Opens in a new window]
Copyright: © Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Aletras, N. and Mittal, A. (2017). Labeling topics with images using a neural network. In Advances in Information Retrieval 39th European Conference on IR Research. Springer, pp. 500–505.CrossRef Google Scholar

Andrzejewski, D., Zhu, X. and Craven, M. (2009). Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In The 26th International Conference on Machine Learning. Association for Computing Machinery (ACM), pp. 25–32.CrossRef Google Scholar

Balog, K. (2018). Entity-Oriented Search. Cham, Switzerland: Springer.CrossRef Google Scholar

Bicalho, P., Pita, M., Pedrosa, G., Lacerda, A. and Pappa, G.L. (2017). A general framework to expand short text for topic modeling. Information Sciences 393, 66–81.CrossRef Google Scholar

Blei, D.M., Ng, A.Y. and Jordan, M.I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022.Google Scholar

Dunietz, J. and Gillick, D. (2014). A new entity salience task with millions of training examples. In The European Chapter of the ACL, vol. 14. Association for Computational Linguistics (ACL), pp. 205–209.CrossRef Google Scholar

Erosheva, E., Fienberg, S. and Lafferty, J. (2004). Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences 101(suppl 1), 5220–5227.CrossRef Google Scholar

Escoter, L., Pivovarova, L., Du, M., Katinskaia, A. and Yangarber, R. (2017). Grouping business news stories based on salience of named entities. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics (ACL), pp. 1096–1106.CrossRef Google Scholar

Gamon, M., Yano, T., Song, X., Apacible, J. and Pantel, P. (2013). Identifying salient entities in web pages. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM). Association for Computing Machinery (ACM), pp. 2375–2380.CrossRef Google Scholar

Griffiths, T.L. and Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences 101 (suppl 1), 5228–5235.CrossRef Google Scholar

Han, X. and Sun, L. (2012). An entity-topic model for entity linking. In Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics (ACL), pp. 105–115.Google Scholar

Hu, L., Li, J., Zhang, J. and Shao, C. (2015). o-hetm: An online hierarchical entity topic model for news streams. In Advances in Knowledge Discovery and Data Mining 19th Pacific-Asia Conference. Springer, pp. 696–707.CrossRef Google Scholar

Hulpus, I., Hayes, C., Karnstedt, M. and Greene, D. (2013). Unsupervised graph-based topic labelling using dbpedia. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining. Association for Computing Machinery (ACM), pp. 465–474.CrossRef Google Scholar

Jeong, Y.-S. and Choi, H.-J. (2012). Sequential entity group topic model for getting topic flows of entity groups within one document. In Advances in Knowledge Discovery and Data Mining 16th Pacific-Asia Conference. Springer, pp. 366–378.CrossRef Google Scholar

Ji, Z., Xu, F., Wang, B. and He, B. (2012). Question-answer topic model for question retrieval in community question answering. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM). Association for Computing Machinery (ACM), pp. 2471–2474.CrossRef Google Scholar

Kataria, S.S, Kumar, K.S., Rastogi, R.R., Sen, P. and Sengamedu, S.H. (2011). Entity disambiguation with hierarchical topic models. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery (ACM), pp. 1037–1045.CrossRef Google Scholar

Kim, H., Sun, Y., Hockenmaier, J. and Han, J. (2012). Etm: Entity topic models for mining documents associated with entities. In 12th IEEE International Conference on Data Mining (ICDM). Institute of Electrical and Electronics Engineers, pp. 349–58.CrossRef Google Scholar

Kulkarni, S., Singh, A., Ramakrishnan, G. and Chakrabarti, S. (2009). Collective annotation of Wikipedia entities in web text. In Proceedings of the 18th ACM International Conference on Information and Knowledge Management (CIKM). Association for Computing Machinery (ACM), pp. 457–66.CrossRef Google Scholar

Lau, J.H., Grieser, K., Newman, D. and Baldwin, T. (2011). Automatic labelling of topic models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (ACL), pp. 1536–1545.Google Scholar

Lauscher, A., Nanni, F., Fabo, P.R. and Ponzetto, S.P. (2016). Entities as topic labels: combining entity linking and labeled lda to improve topic interpretability and evaluability. IJCol-Italian Journal of Computational Linguistics 20 (2), 67–88.Google Scholar

Levit, M., Parthasarathy, S., Chang, S., Stolcke, A. and Dumoulin, B. (2014). Word-phrase-entity language models: Getting more mileage out of n-grams. In 15th Annual Conference of the International Speech Communication Association. International Speech Communication Association, pp. 666–670.Google Scholar

Li, X., Ouyang, J. and Zhou, X. (2015a). Centroid prior topic model for multi-label classification. Pattern Recognition Letters 62, 8–13.CrossRef Google Scholar

Li, X., Ouyang, J. and Zhou, X. (2015b). Supervised topic models for multi-label classification. Neurocomputing 149, 811–819.CrossRef Google Scholar

Li, X., Wang, Y., Zhang, A., Li, C., Chi, J. and Ouyang, J. (2018). Filtering out the noise in short text topic modeling. Information Sciences 456, 83–96.CrossRef Google Scholar

McCallum, A. (1999). Multi-label text classification with a mixture model trained by em. In AAAI workshop on Text Learning. Association for the Advancement of Artificial Intelligence, pp. 1–7.Google Scholar

McCallum, A., Corrada-Emmanuel, A. and Wang, X. (2005). The author-recipient-topic model for topic and role discovery in social networks, with application to enron and academic email. In Proceedings of Workshop on Link Analysis, Counterterrorism and Security, p. 33.Google Scholar

Newman, D., Chemudugunta, C. and Smyth, P. (2006). Statistical entity-topic models. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery (ACM), pp. 680–686.CrossRef Google Scholar

Ponza, M., Ferragina, P. and Piccinno, F. (2018). Swat: A System for Detecting Salient Wikipedia Entities in Texts. arXiv preprint arXiv:1804.03580.Google Scholar

Qiu, Z. and Shen, H. (2017). User clustering in a dynamic social network topic model for short text streams. Information Sciences 414, 102–116.CrossRef Google Scholar

Ramage, D., Hall, D., Nallapati, R. and Manning, C.D. (2009). Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics (ACL), pp. 248–256.CrossRef Google Scholar

Rosen-Zvi, M., Griffiths, T., Steyvers, M. and Smyth, P. (2004). The author-topic model for authors and documents. In Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (2004). AUAI Press, pp. 487–494.Google Scholar

Rubin, T.N., Chambers, A., Smyth, P. and Steyvers, M. (2012). Statistical topic models for multi-label document classification. Machine Learning 88 (1–2), 157–208.CrossRef Google Scholar

Shen, W., Wang, J., Luo, P. and Wang, M. (2013). Linking named entities in tweets with knowledge base via user interest modeling. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery (ACM), pp. 68–76.CrossRef Google Scholar

Tran, T.A., Niederée, C., Kanhabua, N., Gadiraju, U. and Anand, A. (2015). Balancing novelty and salience: Adaptive learning to rank entities for timeline summarization of high-impact events. In Proceedings of the 24th ACM international Conference on Information and Knowledge Management (CIKM). Association for Computing Machinery (ACM), pp. 1201–1210.CrossRef Google Scholar

Wang, S., Chen, Z. and Liu, B. (2016). Mining aspect-specific opinion using a holistic lifelong topic model. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conference Committee, pp. 167–176.CrossRef Google Scholar

Xie, R., Liu, Z., Jia, J., Luan, H. and Sun, M. (2016). Representation learning of knowledge graphs with entity descriptions. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence and the Twenty-Eighth Innovative Applications of Artificial Intelligence Conference. Association for the Advancement of Artificial Intelligence, pp. 2659–2665.Google Scholar

Xiong, C., Liu, Z., Callan, J. and Liu, T.-Y. (2018). Towards better text understanding and retrieval through kernel entity salience modeling. In The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery (ACM), pp. 575–584.CrossRef Google Scholar

Xu, K., Qi, G., Huang, J. and Wu, T. (2017). Incorporating Wikipedia concepts and categories as prior knowledge into topic models. Intelligent Data Analysis 21 (2), 443–461.CrossRef Google Scholar

Zhang, Y., Mao, W. and Zeng, D. (2016). A non-parametric topic model for short texts incorporating word coherence knowledge. In Proceedings of the 25th ACM international Conference on Information and Knowledge Management (CIKM). Association for Computing Machinery (ACM), pp. 2017–2020.CrossRef Google Scholar

Article contents

It all starts with entities: A Salient entity topic model

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests