Hostname: page-component-7bb8b95d7b-w7rtg Total loading time: 0 Render date: 2024-09-20T04:00:15.219Z Has data issue: false hasContentIssue false

Predicting the Utility of Scientific Articles for Emerging Pandemics Using Their Titles and Natural Language Processing

Published online by Cambridge University Press:  10 May 2024

Kinga Dobolyi*
Affiliation:
Department of Computer Science, George Washington University, Washington, DC, USA
Sidra Hussain
Affiliation:
Department of Computer Science, George Washington University, Washington, DC, USA
Grady McPeak
Affiliation:
Department of Computer Science, George Washington University, Washington, DC, USA
*
Corresponding author: Kinga Dobolyi, PhD; Email: kinga@gwu.edu

Abstract

Objective:

Not all scientific publications are equally useful to policy-makers tasked with mitigating the spread and impact of diseases, especially at the start of novel epidemics and pandemics. The urgent need for actionable, evidence-based information is paramount, but the nature of preprint and peer-reviewed articles published during these times is often at odds with such goals. For example, a lack of novel results and a focus on opinions rather than evidence were common in coronavirus disease (COVID-19) publications at the start of the pandemic in 2019. In this work, we seek to automatically judge the utility of these scientific articles, from a public health policy making persepctive, using only their titles.

Methods:

Deep learning natural language processing (NLP) models were trained on scientific COVID-19 publication titles from the CORD-19 dataset and evaluated against expert-curated COVID-19 evidence to measure their real-world feasibility at screening these scientific publications in an automated manner.

Results:

This work demonstrates that it is possible to judge the utility of COVID-19 scientific articles, from a public health policy-making perspective, based on their title alone, using deep natural language processing (NLP) models.

Conclusions:

NLP models can be successfully trained on scienticic articles and used by public health experts to triage and filter the hundreds of new daily publications on novel diseases such as COVID-19 at the start of pandemics.

Type
Original Research
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of Society for Disaster Medicine and Public Health, Inc

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Odone, A, Galea, S, Stuckler, D, Signorelli, C. The first 10000 COVID-19 papers in perspective: are we publishing what we should be publishing? Eur J Public Health. 2020;30(5):849-850. https://doi.org/10.1093/eurpub/ckaa170 Google Scholar
Raynaud, M, Zhang, H, Louis, K, et al. COVID-19-related medical research: a meta-research & critical appraisal. BMC Med Res Methodol. 2021;21(1):2313-2349. https://doi.org/10.1186/s12874-020-01190-w Google Scholar
Jalali, R, Hosseinian-Far, A, Mohammadi, M. Contradictions in the promotion of publishing academic & scientific journal articles, & the inability to cope with the new coronavirus (COVID-19). Antimicrob Resist Infect Control. 2021;10(1). Published online 12 January 2021. https://doi.org/10.1186/s13756-021-00884-0 Google Scholar
Mohammed, M, Sha’aban, A, Jatau, AI, et al. Assessment of COVID-19 information overload among the general public. J Racial Ethn Health Disparities. 2022;9(1):184-192. https://doi.org/10.1007/s40615-020-00942-0 Google Scholar
Bai, X, Liu, H, Zhang, F, et al. An overview on evaluating and predicting scholarly article impact. Information. 2017;8(17). Published online 25 June 2017. https://doi.org/10.3390/info8030073 Google Scholar
Rossi, MJ, Brand, JC. Journal article titles impact their citation rates. Arthroscopy. 2020;36(7):2025-2029. https://doi.org/10.1016/j.arthro.2020.02.018 Google Scholar
Beranová, L, Joachimiak, MP, Kliegr, T, et al. Why was this cited? Explainable machine learning applied to COVID-19 research literature. Scientometrics. 2022;127:2313-2349. https://doi.org/10.1007/s11192-022-04314-9 Google Scholar
COVID-19. COVID19 subreddit. Published 2020. Accessed February 1, 2020–July 31, 2020. https://www.reddit.com/r/COVID19/ Google Scholar
Master Question List for COVID-19. US Department of Homeland Security. Published 2020. Accessed December 21, 2020. https://www.dhs.gov/publication/st-master-question-list-COVID-19 Google Scholar
Wang, LL, Lo, K, Chandrasekhar, Y, et al. CORD-19: The COVID-19 open research dataset. Preprint. ArXiv. Published online April 22, 2020.Google Scholar
Devlin, J, Chang, M, Lee, K, Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019;1(Long & Short Papers):4171-4186. https://doi.org/10.18653/v1/N19-1423 Google Scholar
Download MeSH Data. National Library of Medicine. Published 2022. Accessed December 1, 2022. https://www.nlm.nih.gov/databases/download/mesh.html Google Scholar
Fabiano, N, Hallgrimson, Z, Wong, S, et al. Selective tweeting of COVID-19 articles: does title or abstract positivity influence dissemination? Preprint. medRxiv. 2021. Published online 24 June 2021. https://doi.org/10.1101/2021.06.22.21259354 Google Scholar
Lockwood, G. Academic clickbait: articles with positively-framed titles, interesting phrasing, and no wordplay get more attention online. The Winnower. 2016;3. Published online 29 June 2016.Google Scholar
Hallock, RM, Bennett, TN. I’ll read that!: what title elements attract readers to an article? Teach Psychol. 2021;48(1):26-31. https://doi.org/10.1177/0098628320959948 Google Scholar
Älgå, A, Eriksson, O, Nordberg, M. The development of preprints during the COVID-19 pandemic. J Intern Med. 2021;290(2):480-483. https://doi.org/10.1111/joim.13240 Google Scholar