Hostname: page-component-cd9895bd7-q99xh Total loading time: 0 Render date: 2024-12-26T08:06:13.507Z Has data issue: false hasContentIssue false

Find the errors, get the better: Enhancing machine translation via word confidence estimation

Published online by Cambridge University Press:  07 March 2017

NGOC-QUANG LUONG
Affiliation:
Laboratoire d’Informatique de Grenoble, Campus de Grenoble 41, Rue des Mathématiques, BP53, F-38041 Grenoble Cedex 9, France e-mails: quangngocluong@gmail.com, Laurent.Besacier@imag.fr, benjamin.lecouteux@imag.fr
LAURENT BESACIER
Affiliation:
Laboratoire d’Informatique de Grenoble, Campus de Grenoble 41, Rue des Mathématiques, BP53, F-38041 Grenoble Cedex 9, France e-mails: quangngocluong@gmail.com, Laurent.Besacier@imag.fr, benjamin.lecouteux@imag.fr
BENJAMIN LECOUTEUX
Affiliation:
Laboratoire d’Informatique de Grenoble, Campus de Grenoble 41, Rue des Mathématiques, BP53, F-38041 Grenoble Cedex 9, France e-mails: quangngocluong@gmail.com, Laurent.Besacier@imag.fr, benjamin.lecouteux@imag.fr

Abstract

This paper presents two novel ideas of improving the Machine Translation (MT) quality by applying the word-level quality prediction for the second pass of decoding. In this manner, the word scores estimated by word confidence estimation systems help to reconsider the MT hypotheses for selecting a better candidate rather than accepting the current sub-optimal one. In the first attempt, the selection scope is limited to the MT N-best list, in which our proposed re-ranking features are combined with those of the decoder for re-scoring. Then, the search space is enlarged over the entire search graph, storing many more hypotheses generated during the first pass of decoding. Over all paths containing words of the N-best list, we propose an algorithm to strengthen or weaken them depending on the estimated word quality. In both methods, the highest score candidate after the search becomes the official translation. The results obtained show that both approaches advance the MT quality over the one-pass baseline, and the search graph re-decoding achieves more gains (in BLEU score) than N-best List Re-ranking method.

Type
Articles
Copyright
Copyright © Cambridge University Press 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aziz, W., De Sousa, S. C. M., and Specia, L. 2012. Pet: a tool for post-editing and assessing machine translation. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey.Google Scholar
Bicici, E. 2013. Referential translation machines for quality estimation. In Proceedings of the Eighth Workshop on Statistical Machine Translation, Sofia, Bulgaria.Google Scholar
Blackwood, G. 2010. Lattice Rescoring Methods for Statistical Machine Translation. PhD Thesis, University of Cambridge, Cambridge, England.Google Scholar
Blatz, J., Fitzgerald, E., Foster, G., Gandrabur, S., Goutte, C., Kulesza, A., Sanchis, A., and Ueffing, N. 2003. Confidence estimation for machine translation. Technical Report, JHU/CLSP Summer Workshop.Google Scholar
Blatz, J., Fitzgerald, E., Foster, G., Gandrabur, S., Goutte, C., Kulesza, A., Sanchis, A., and Ueffing, N. 2004. Confidence estimation for machine translation. In Proceedings of COLING 2004, Geneva.Google Scholar
Camargo-de-Souza, J. G., González-Rubio, J., Buck, C., Turchi, M., and Negri, M. 2014. Fbk-upv-uedin participation in the wmt14 quality estimation shared-task. In Proceedings of the 9th Workshop on Statistical Machine Translation, Baltimore, Maryland, USA.Google Scholar
Capit, N., and Joseph, E. 2013. OAR Documentation - User Guide. LIG laboratory, Laboratoire d’Informatique de Grenoble, France.Google Scholar
Clark, J., Dyer, C., Lavie, A., and Smith, N., 2011. Better hypothesis testing for statistical machine translation: controlling for optimizer instability. In Proceedings of the Association for Computational Lingustics, Portland, Oregon, USA, pp. 176181.Google Scholar
Duh, K., and Kirchhoff, K., 2008. Beyond log-linear models: boosted minimum error rate training for n-best re-ranking. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (Short Papers), Columbus, Ohio, USA, pp. 3740.Google Scholar
Felice, M., and Specia, L. 2012. Linguistic features for quality estimation. In Proceedings of the 7th Workshop on Statistical Machine Translation, Montreal, Canada.Google Scholar
Frank, V. B. 2004. CONDOR: A Constrained, Non-Linear, Derivative-Free Parallel Optimizer for Continuous, High Computing Load, Noisy Objective Functions. PhD Thesis, University of Brussels (ULB - Université Libre de Bruxelles), Belgium.Google Scholar
Han, A. L. F., Lu, J., Wong, D. F., Chao, L. S., He, L., and Xing, J. 2013. Quality estimation for machine translation using the joint method of evaluation criteria and statistical modeling. In Proceedings of the 8th Workshop on Statistical Machine Translation, Sofia, Bulgaria.Google Scholar
Kirchhoff, K., and Yang, M. 2005. Improved language modeling for statistical machine translation. In Proceedings of the ACL Workshop on Building and Using Parallel Texts, Ann Arbor, Michigan.Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., and Herbst, E. 2007. Moses: open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic.Google Scholar
Kreutzer, J., Schamoni, S., and Riezler, S. 2015. QUality Estimation from ScraTCH (QUETCH): deep learning for word-level translation quality estimation. In Proceedings of the 10th Workshop on Statistical Machine Translation, Lisboa, Portugal. Association for Computational Linguistics.Google Scholar
Lafferty, J., McCallum, A., and Pereira, F. 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, San Francisco, CA.Google Scholar
Lavergne, T., Cappé, O., and Yvon, F. 2010. Practical very large scale CRFs. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden.Google Scholar
Logacheva, V., Hokamp, C., and Specia, L. 2015. Data enhancement and selection strategies for the word-level quality estimation. In Proceedings of the 10th Workshop on Statistical Machine Translation, Lisboa, Portugal. Association for Computational Linguistics.Google Scholar
Luong, N. Q. 2012. Integrating lexical, syntactic and system-based features to improve word confidence estimation in SMT. In Proceedings of JEP-TALN-RECITAL, Grenoble, France.Google Scholar
Luong, N. Q., Besacier, L., and Lecouteux, B. 2013. Word confidence estimation and its integration in sentence quality estimation for machine translation. In Proceedings of The 5th International Conference on Knowledge and Systems Engineering, Hanoi, Vietnam.Google Scholar
Luong, N. Q., Besacier, L., and Lecouteux, B. 2014. LIG System for word level WE task at WMT14. In Proceedings of the 9th Workshop on Statistical Machine Translation, Baltimore, Maryland, USA.Google Scholar
Luong, N. Q., Lecouteux, B., and Besacier, L. 2013. LIG system for WMT13 QE task: investigating the usefulness of features inWord confidence estimation for MT. In Proceedings of the 8th Workshop on Statistical Machine Translation, Sofia, Bulgaria.Google Scholar
Nakov, P., Guzman, F., and Vogel, S. 2012. Optimizing for sentence-level bleu+1 yields short translations. In Proceedings of COLING 2012, Mumbai, India.Google Scholar
Nguyen, B., Huang, F., and Al-Onaizan, Y. 2011. Goodness: a method for measuring machine translation confidence. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon.Google Scholar
Och, F. J. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan.Google Scholar
Papineni, K., Roukos, S., Ard, T., and Zhu, W. J. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA.Google Scholar
Potet, M., Rodier, E. E., Besacier, L., and Blanchon, H. 2012. Collection of a large database of French-English SMT output corrections. In Proceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul.Google Scholar
Shah, K., Logacheva, V., Paetzold, G., Blain, F., Beck, D., Bougares, F., and Specia, L. 2015. SHEF-NN: translation quality estimation with neural networks. In Proceedings of the 10th Workshop on Statistical Machine Translation, Lisboa, Portugal. Association for Computational Linguistics.Google Scholar
Shang, L., Cai, D., and Ji, D. 2015. Strategy- based technology for estimating MT quality. In Proceedings of the 10th Workshop on Statistical Machine Translation, Lisboa, Portugal. Association for Computational Linguistics.Google Scholar
Snover, M., Madnani, N., Dorr, B., and Schwartz, R. 2008. Terp system description. In MetricsMATR Workshop at the Conference of the Association for Machine Translation in the Americas (AMTA), Honolulu, Hawaii, USA.Google Scholar
Sokolov, A., Wisniewski, G., and Yvon, F., 2012a. Computing lattice bleu oracle scores for machine translation. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, pp. 120129.Google Scholar
Sokolov, A., Wisniewski, G., and Yvon, F. 2012b. Non-linear n-best list reranking with few features. In Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA), San Diego, CA, USA.Google Scholar
Soricut, R., and Echihabi, A. 2010. Trustrank: inducing trust in automatic translations via ranking. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden.Google Scholar
Stolcke, A. 2002. Srilm - an extensible language modeling toolkit. In Proceedings of the 7th International Conference on Spoken Language Processing, Denver, USA.Google Scholar
Tezcan, A., Hoste, V., Desmet, B., and Macken, L. 2015. UGENT-LT3 SCATE system for machine translation quality estimation. In Proceedings of the 10th Workshop on Statistical Machine Translation, Lisboa, Portugal. Association for Computational Linguistics.Google Scholar
Ueffing, N., Macherey, K., and Ney, H. 2003. Confidence measures for statistical machine translation. In MT Summit IX, New Orleans, LA.Google Scholar
Ueffing, N., and Ney, H. 2005. Word-level confidence estimation for machine translation using phrased-based translation models. In Human Language Technology Conference and Conference on Empirical Methods in NLP, Vancouver.Google Scholar
Ueffing, N., and Ney, H., 2007. Word-level confidence estimation for machine translation. Computational Linguistics 33 (1): 940.Google Scholar
Watanabe, T., Suzuki., Tsukada, H., and Isozaki, H. 2007. Online large-margin training for statistical machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic.Google Scholar
Wisniewski, G., Pécheux, N., Allauzen, A., and Yvon, F. 2014. Limsi submission for wmt’14 qe task. In Proceedings of the 9th Workshop on Statistical Machine Translation, Baltimore, Maryland, USA.Google Scholar
Xiong, D., Zhang, M., and Li, H. 2010. Error detection for statistical machine translation using linguistic features. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden.Google Scholar
Zhang, Y., Almut, S. H., and Stephan, V. 2006. Distributed language modeling for n-best list re-ranking. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), Sydney.Google Scholar