Hostname: page-component-78c5997874-v9fdk Total loading time: 0 Render date: 2024-11-10T11:12:09.461Z Has data issue: false hasContentIssue false

Learning question classifiers: the role of semantic information

Published online by Cambridge University Press:  07 December 2005

XIN LI
Affiliation:
Department of Computer Science, University of Illinois at Urbana-Champaign, IL 61801, USA e-mail: xli1@uiuc.edu, danr@uiuc.edu
DAN ROTH
Affiliation:
Department of Computer Science, University of Illinois at Urbana-Champaign, IL 61801, USA e-mail: xli1@uiuc.edu, danr@uiuc.edu

Abstract

To respond correctly to a free form factual question given a large collection of text data, one needs to understand the question to a level that allows determining some of the constraints the question imposes on a possible answer. These constraints may include a semantic classification of the sought after answer and may even suggest using different strategies when looking for and verifying a candidate answer. This work presents a machine learning approach to question classification. Guided by a layered semantic hierarchy of answer types, we develop a hierarchical classifier that classifies questions into fine-grained classes. This work also performs a systematic study of the use of semantic information sources in natural language classification tasks. It is shown that, in the context of question classification, augmenting the input of the classifier with appropriate semantic category information results in significant improvements to classification accuracy. We show accurate results on a large collection of free-form questions used in TREC 10 and 11.

Type
Papers
Copyright
2005 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This paper combines and extends early works in Li and Roth (2002) and Li, Small and Roth (2004).
Research supported by NSF grants IIS-9801638 and ITR IIS-0085836 and an ONR MURI Award.