Natural Language Engineering: Volume 26 - Issue 2

How to evaluate machine translation: A review of automated and human metrics
Eirini Chatzikoumi
Published online by Cambridge University Press:

11 September 2019, pp. 137-161
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This article presents the most up-to-date, influential automated, semiautomated and human metrics used to evaluate the quality of machine translation (MT) output and provides the necessary background for MT evaluation projects. Evaluation is, as repeatedly admitted, highly relevant for the improvement of MT. This article is divided into three parts: the first one is dedicated to automated metrics; the second, to human metrics; and the last, to the challenges posed by neural machine translation (NMT) regarding its evaluation. The first part includes reference translation–based metrics; confidence or quality estimation (QE) metrics, which are used as alternatives for quality assessment; and diagnostic evaluation based on linguistic checkpoints. Human evaluation metrics are classified according to the criterion of whether human judges directly express a so-called subjective evaluation judgment, such as ‘good’ or ‘better than’, or not, as is the case in error classification. The former methods are based on directly expressed judgment (DEJ); therefore, they are called ‘DEJ-based evaluation methods’, while the latter are called ‘non-DEJ-based evaluation methods’. In the DEJ-based evaluation section, tasks such as fluency and adequacy annotation, ranking and direct assessment (DA) are presented, whereas in the non-DEJ-based evaluation section, tasks such as error classification and postediting are detailed, with definitions and guidelines, thus rendering this article a useful guide for evaluation projects. Following the detailed presentation of the previously mentioned metrics, the specificities of NMT are set forth along with suggestions for its evaluation, according to the latest studies. As human translators are the most adequate judges of the quality of a translation, emphasis is placed on the human metrics seen from a translator-judge perspective to provide useful methodology tools for interdisciplinary research groups that evaluate MT systems.

Finding next of kin: Cross-lingual embedding spaces for related languages
Serge Sharoff
Published online by Cambridge University Press:

04 September 2019, pp. 163-182
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Some languages have very few NLP resources, while many of them are closely related to better-resourced languages. This paper explores how the similarity between the languages can be utilised by porting resources from better- to lesser-resourced languages. The paper introduces a way of building a representation shared across related languages by combining cross-lingual embedding methods with a lexical similarity measure which is based on the weighted Levenshtein distance. One of the outcomes of the experiments is a Panslavonic embedding space for nine Balto-Slavonic languages. The paper demonstrates that the resulting embedding space helps in such applications as morphological prediction, named-entity recognition and genre classification.

Designing a virtual patient dialogue system based on terminology-rich resources: Challenges and evaluation
Leonardo Campillos-Llanos, Catherine Thomas, Éric Bilinski, Pierre Zweigenbaum, Sophie Rosset
Published online by Cambridge University Press:

15 July 2019, pp. 183-220
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Virtual patient software allows health professionals to practise their skills by interacting with tools simulating clinical scenarios. A natural language dialogue system can provide natural interaction for medical history-taking. However, the large number of concepts and terms in the medical domain makes the creation of such a system a demanding task. We designed a dialogue system that stands out from current research by its ability to handle a wide variety of medical specialties and clinical cases. To address the task, we designed a patient record model, a knowledge model for the task and a termino-ontological model that hosts structured thesauri with linguistic, terminological and ontological knowledge. We used a frame- and rule-based approach and terminology-rich resources to handle the medical dialogue. This work focuses on the termino-ontological model, the challenges involved and how the system manages resources for the French language. We adopted a comprehensive approach to collect terms and ontological knowledge, and dictionaries of affixes, synonyms and derivational variants. Resources include domain lists containing over 161,000 terms, and dictionaries with over 959,000 word/concept entries. We assessed our approach by having 71 participants (39 medical doctors and 32 non-medical evaluators) interact with the system and use 35 cases from 18 specialities. We conducted a quantitative evaluation of all components by analysing interaction logs (11,834 turns). Natural language understanding achieved an F-measure of 95.8%. Dialogue management provided on average 74.3 (±9.5)% of correct answers. We performed a qualitative evaluation by collecting 171 five-point Likert scale questionnaires. All evaluated aspects obtained mean scores above the Likert mid-scale point. We analysed the vocabulary coverage with regard to unseen cases: the system covered 97.8% of their terms. Evaluations showed that the system achieved high vocabulary coverage on unseen cases and was assessed as relevant for the task.

A new approach for textual feature selection based on N-composite isolated labels
Samir Elloumi
Published online by Cambridge University Press:

29 April 2019, pp. 221-243
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Textual Feature Selection (TFS) aims to extract relevant parts or segments from text as being the most relevant ones w.r.t. the information it expresses. The selected features are useful for automatic indexing, summarization, document categorization, knowledge discovery, so on. Regarding the huge amount of electronic textual data daily published, many challenges related to the semantic aspect as well as the processing efficiency are addressed. In this paper, we propose a new approach for TFS based on Formal Concept Analysis background. Mainly, we propose to extract textual features by exploring the regularities in a formal context where isolated points exist. We introduce the notion of N-composite isolated points as a set of N words to be considered as a unique textual feature. We show that a reduced value of N (between 1 and 3) allows extracting significant textual features compared with existing approaches even for non-completely covering an initial formal context.

Emerging trends: Reviewing the reviewers (again)
Kenneth Ward Church
Published online by Cambridge University Press:

04 March 2020, pp. 245-257
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The ACL-2019 Business meeting ended with a discussion of reviewing. Conferences are experiencing a success catastrophe. They are becoming bigger and bigger, which is not only a sign of success but also a challenge (for reviewing and more). Various proposals for reducing submissions were discussed at the Business meeting. IMHO, the problem is not so much too many submissions, but rather, random reviewing. We cannot afford to do reviewing as badly as we do (because that leads to even more submissions). Negative feedback loops are effective. The reviewing process will improve over time if reviewers teach authors how to write better submissions, and authors teach reviewers how to write more constructive reviews. If you have received a not-ok (unhelpful/offensive) review, please help program committees improve by sharing your not-ok reviews on social media.

NLE volume 26 issue 2 Cover and Front matter
Published online by Cambridge University Press:

04 March 2020, pp. f1-f2
- Article
- - You have access
- PDF
- Export citation

NLE volume 26 issue 2 Cover and Back matter
Published online by Cambridge University Press:

04 March 2020, pp. b1-b2
- Article
- - You have access
- PDF
- Export citation

Natural Language Processing

Refine listing

Actions for selected content:

Natural Language Engineering, Volume 26 - Issue 2 - March 2020

Survey Paper

How to evaluate machine translation: A review of automated and human metrics

Article

Finding next of kin: Cross-lingual embedding spaces for related languages

Designing a virtual patient dialogue system based on terminology-rich resources: Challenges and evaluation

A new approach for textual feature selection based on N-composite isolated labels

Emerging Trends

Emerging trends: Reviewing the reviewers (again)

Front Cover (OFC, IFC) and matter

NLE volume 26 issue 2 Cover and Front matter

Back Cover (IBC, OBC) and matter

NLE volume 26 issue 2 Cover and Back matter

Natural Language Processing

Refine listing

Actions for selected content:

Save Search

Natural Language Engineering, Volume 26 - Issue 2 - March 2020

Survey Paper

How to evaluate machine translation: A review of automated and human metrics

Article

Finding next of kin: Cross-lingual embedding spaces for related languages

Designing a virtual patient dialogue system based on terminology-rich resources: Challenges and evaluation

A new approach for textual feature selection based on N-composite isolated labels

Emerging Trends

Emerging trends: Reviewing the reviewers (again)

Front Cover (OFC, IFC) and matter

NLE volume 26 issue 2 Cover and Front matter

Back Cover (IBC, OBC) and matter

NLE volume 26 issue 2 Cover and Back matter