Hostname: page-component-cd9895bd7-q99xh Total loading time: 0 Render date: 2024-12-26T09:43:55.163Z Has data issue: false hasContentIssue false

A weighted finite state transducer translation template model for statistical machine translation

Published online by Cambridge University Press:  06 December 2005

SHANKAR KUMAR
Affiliation:
Center for Language and Speech Processing, Department of Electrical and Computer Engineering, The Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218, USA email: skumar@jhu.edu, dengyg@jhu.edu, byrne@jhu.edu
YONGGANG DENG
Affiliation:
Center for Language and Speech Processing, Department of Electrical and Computer Engineering, The Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218, USA email: skumar@jhu.edu, dengyg@jhu.edu, byrne@jhu.edu
WILLIAM BYRNE
Affiliation:
Center for Language and Speech Processing, Department of Electrical and Computer Engineering, The Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218, USA email: skumar@jhu.edu, dengyg@jhu.edu, byrne@jhu.edu

Abstract

We present a Weighted Finite State Transducer Translation Template Model for statistical machine translation. This is a source-channel model of translation inspired by the Alignment Template translation model. The model attempts to overcome the deficiencies of word-to-word translation models by considering phrases rather than words as units of translation. The approach we describe allows us to implement each constituent distribution of the model as a weighted finite state transducer or acceptor. We show that bitext word alignment and translation under the model can be performed with standard finite state machine operations involving these transducers. One of the benefits of using this framework is that it avoids the need to develop specialized search procedures, even for the generation of lattices or N-Best lists of bitext word alignments and translation hypotheses. We report and analyze bitext word alignment and translation performance on the Hansards French-English task and the FBIS Chinese-English task under the Alignment Error Rate, BLEU, NIST and Word Error-Rate metrics. These experiments identify the contribution of each of the model components to different aspects of alignment and translation performance. We finally discuss translation performance with large bitext training sets on the NIST 2004 Chinese-English and Arabic-English MT tasks.

Type
Papers
Copyright
2005 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)