Published online by Cambridge University Press: 09 September 2016
Emails constitute an important genre of online communication. Many of us are often faced with the daunting task of sifting through increasingly large amounts of emails on a daily basis. Keywords extracted from emails can help us combat such information overload by allowing a systematic exploration of the topics contained in emails. Existing literature on keyword extraction has not covered the email genre, and no human-annotated gold standard datasets are currently available. In this paper, we introduce a new dataset for keyword extraction from emails, and evaluate supervised and unsupervised methods for keyword extraction from emails. The results obtained with our supervised keyword extraction system (38.99% F-score) improve over the results obtained with the best performing systems participating in the SemEval 2010 keyword extraction task.
We are grateful to the annotators who made this work possible. This material is based in part upon work supported by Samsung Research America under agreement GN0005468 and by the National Science Foundation under IIS award #1018613. Any opinions, findings, conclusions or recommendations expressed above are those of the authors and do not necessarily reflect the views of Samsung Research America or the National Science Foundation. We also thank the anonymous reviewers whose insightful comments helped improve the draft substantially.