Natural Language Engineering: Volume 14 - Issue 3

Robust parsing and spoken negotiative dialogue with databases
JOHAN BOYE, MATS WIRÉN
Published online by Cambridge University Press:

01 July 2008, pp. 289-312
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper presents a robust parsing algorithm and semantic formalism for the interpretation of utterances in spoken negotiative dialogue with databases. The algorithm works in two passes: a domain-specific pattern-matching phase and a domain-independent semantic analysis phase. Robustness is achieved by limiting the set of representable utterance types to an empirically motivated subclass which is more expressive than propositional slot–value lists, but much less expressive than first-order logic. Our evaluation shows that in actual practice the vast majority of utterances that occur can be handled, and that the parsing algorithm is highly efficient and accurate.

Bootstrapping spoken dialogue systems by exploiting reusable libraries
GIUSEPPE DI FABBRIZIO, GOKHAN TUR, DILEK HAKKANI-TÜR, MAZIN GILBERT, BERNARD RENGER, DAVID GIBBON, ZHU LIU, BEHZAD SHAHRARAY
Published online by Cambridge University Press:

01 July 2008, pp. 313-335
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Building natural language spoken dialogue systems requires large amounts of human transcribed and labeled speech utterances to reach useful operational service performances. Furthermore, the design of such complex systems consists of several manual steps. The User Experience (UE) expert analyzes and defines by hand the system core functionalities: the system semantic scope (call-types) and the dialogue manager strategy that will drive the human–machine interaction. This approach is extensive and error-prone since it involves several nontrivial design decisions that can be evaluated only after the actual system deployment. Moreover, scalability is compromised by time, costs, and the high level of UE know-how needed to reach a consistent design. We propose a novel approach for bootstrapping spoken dialogue systems based on the reuse of existing transcribed and labeled data, common reusable dialogue templates, generic language and understanding models, and a consistent design process. We demonstrate that our approach reduces design and development time while providing an effective system without any application-specific data.

A general feature space for automatic verb classification
ERIC JOANIS, SUZANNE STEVENSON, DAVID JAMES
Published online by Cambridge University Press:

01 July 2008, pp. 337-367
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Lexical semantic classes of verbs play an important role in structuring complex predicate information in a lexicon, thereby avoiding redundancy and enabling generalizations across semantically similar verbs with respect to their usage. Such classes, however, require many person-years of expert effort to create manually, and methods are needed for automatically assigning verbs to appropriate classes. In this work, we develop and evaluate a feature space to support the automatic assignment of verbs into a well-known lexical semantic classification that is frequently used in natural language processing. The feature space is general – applicable to any class distinctions within the target classification; broad – tapping into a variety of semantic features of the classes; and inexpensive – requiring no more than a POS tagger and chunker. We perform experiments using support vector machines (SVMs) with the proposed feature space, demonstrating a reduction in error rate ranging from 48% to 88% over a chance baseline accuracy, across classification tasks of varying difficulty. In particular, we attain performance comparable to or better than that of feature sets manually selected for the particular tasks. Our results show that the approach is generally applicable, and reduces the need for resource-intensive linguistic analysis for each new classification task. We also perform a wide range of experiments to determine the most informative features in the feature space, finding that simple, easily extractable features suffice for good verb classification performance.

Using automatically labelled examples to classify rhetorical relations: an assessment
CAROLINE SPORLEDER, ALEX LASCARIDES
Published online by Cambridge University Press:

01 July 2008, pp. 369-416
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Being able to identify which rhetorical relations (e.g., contrast or explanation) hold between spans of text is important for many natural language processing applications. Using machine learning to obtain a classifier which can distinguish between different relations typically depends on the availability of manually labelled training data, which is very time-consuming to create. However, rhetorical relations are sometimes lexically marked, i.e., signalled by discourse markers (e.g., because, but, consequently etc.), and it has been suggested (Marcu and Echihabi, 2002) that the presence of these cues in some examples can be exploited to label them automatically with the corresponding relation. The discourse markers are then removed and the automatically labelled data are used to train a classifier to determine relations even when no discourse marker is present (based on other linguistic cues such as word co-occurrences). In this paper, we investigate empirically how feasible this approach is. In particular, we test whether automatically labelled, lexically marked examples are really suitable training material for classifiers that are then applied to unmarked examples. Our results suggest that training on this type of data may not be such a good strategy, as models trained in this way do not seem to generalise very well to unmarked data. Furthermore, we found some evidence that this behaviour is largely independent of the classifiers used and seems to lie in the data itself (e.g., marked and unmarked examples may be too dissimilar linguistically and removing unambiguous markers in the automatic labelling process may lead to a meaning shift in the examples).

A new PPM variant for Chinese text compression
PEILIANG WU, W. J. TEAHAN
Published online by Cambridge University Press:

01 July 2008, pp. 417-430
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Large alphabet languages such as Chinese are very different from English, and therefore present different problems for text compression. In this article, we first examine the characteristics of Chinese, then we introduce a new variant of the Prediction by Partial Match (PPM) model especially for Chinese characters. Unlike the traditional PPM coding schemes, which encodes an escape probability if a novel character occurs in the context, the new coding scheme directly encodes the order first before encoding a symbol, without having to output an escape probability. This scheme achieves excellent compression rates in comparison with other schemes on a variety of Chinese text files.

Natural Language Processing

Refine listing

Actions for selected content:

Natural Language Engineering, Volume 14 - Issue 3 - July 2008

Papers

Robust parsing and spoken negotiative dialogue with databases

Bootstrapping spoken dialogue systems by exploiting reusable libraries

A general feature space for automatic verb classification

Using automatically labelled examples to classify rhetorical relations: an assessment

A new PPM variant for Chinese text compression

Natural Language Processing

Refine listing

Actions for selected content:

Save Search

Natural Language Engineering, Volume 14 - Issue 3 - July 2008

Papers

Robust parsing and spoken negotiative dialogue with databases

Bootstrapping spoken dialogue systems by exploiting reusable libraries

A general feature space for automatic verb classification

Using automatically labelled examples to classify rhetorical relations: an assessment

A new PPM variant for Chinese text compression