Results 1 - 10
of
583
METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments
, 2005
"... Meteor is an automatic metric for Machine Translation evaluation which has been demonstrated to have high levels of correlation with human judgments of translation quality, significantly outperforming the more commonly used Bleu metric. It is one of several automatic metrics used in this year’s shar ..."
Abstract
-
Cited by 246 (8 self)
- Add to MetaCart
(Show Context)
Meteor is an automatic metric for Machine Translation evaluation which has been demonstrated to have high levels of correlation with human judgments of translation quality, significantly outperforming the more commonly used Bleu metric. It is one of several automatic metrics used in this year’s shared task within the ACL WMT-07 workshop. This paper recaps the technical details underlying the metric and describes recent improvements in the metric. The latest release includes improved metric parameters and extends the metric to support evaluation of MT output in Spanish, French and German, in addition to English. 1
Minimum Bayes-risk decoding for statistical machine translation
- IN PROCEEDINGS OF HLT-NAACL
, 2004
"... We present Minimum Bayes-Risk (MBR) decoding for statistical machine translation. This statistical approach aims to minimize expected loss of translation errors under loss functions that measure translation performance. We describe a hierarchy of loss functions that incorporate different levels of l ..."
Abstract
-
Cited by 179 (16 self)
- Add to MetaCart
We present Minimum Bayes-Risk (MBR) decoding for statistical machine translation. This statistical approach aims to minimize expected loss of translation errors under loss functions that measure translation performance. We describe a hierarchy of loss functions that incorporate different levels of linguistic information from word strings, word-to-word alignments from an MT system, and syntactic structure from parse-trees of source and target language sentences. We report the performance of the MBR decoders on a Chinese-to-English translation task. Our results show that MBR decoding can be used to tune statistical MT performance for specific loss functions.
A new string-to-dependency machine translation algorithm with a target dependency language model
- In Proc. of ACL
, 2008
"... In this paper, we propose a novel string-todependency algorithm for statistical machine translation. With this new framework, we employ a target dependency language model during decoding to exploit long distance word relations, which are unavailable with a traditional n-gram language model. Our expe ..."
Abstract
-
Cited by 135 (7 self)
- Add to MetaCart
In this paper, we propose a novel string-todependency algorithm for statistical machine translation. With this new framework, we employ a target dependency language model during decoding to exploit long distance word relations, which are unavailable with a traditional n-gram language model. Our experiments show that the string-to-dependency decoder achieves 1.48 point improvement in BLEU and 2.53 point improvement in TER compared to a standard hierarchical string-tostring system on the NIST 04 Chinese-English evaluation set. 1
cdec: A decoder, alignment, and learning framework for finite-state and context-free translation models
- In Proceedings of ACL System Demonstrations
, 2010
"... We present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including word-based models, phrase-based models, and models based on synchronous context-free grammars. Using a single unified internal representation for translat ..."
Abstract
-
Cited by 134 (53 self)
- Add to MetaCart
(Show Context)
We present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including word-based models, phrase-based models, and models based on synchronous context-free grammars. Using a single unified internal representation for translation forests, the decoder strictly separates model-specific translation logic from general rescoring, pruning, and inference algorithms. From this unified representation, the decoder can extract not only the 1- or k-best translations, but also alignments to a reference, or the quantities necessary to drive discriminative training using gradient-based or gradient-free optimization techniques. Its efficient C++ implementation means that memory use and runtime performance are significantly better than comparable decoders. 1
Improving statistical machine translation using word sense disambiguation
- In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
, 2007
"... We show for the first time that incorporating the predictions of a word sense disambigua-tion system within a typical phrase-based statistical machine translation (SMT) model consistently improves translation quality across all three different IWSLT Chinese-English test sets, as well as producing st ..."
Abstract
-
Cited by 128 (7 self)
- Add to MetaCart
(Show Context)
We show for the first time that incorporating the predictions of a word sense disambigua-tion system within a typical phrase-based statistical machine translation (SMT) model consistently improves translation quality across all three different IWSLT Chinese-English test sets, as well as producing sta-tistically significant improvements on the larger NIST Chinese-English MT task— and moreover never hurts performance on any test set, according not only to BLEU but to all eight most commonly used au-tomatic evaluation metrics. Recent work has challenged the assumption that word sense disambiguation (WSD) systems are useful for SMT. Yet SMT translation qual-ity still obviously suffers from inaccurate lexical choice. In this paper, we address this problem by investigating a new strat-egy for integrating WSD into an SMT sys-tem, that performs fully phrasal multi-word disambiguation. Instead of directly incor-porating a Senseval-style WSD system, we redefine the WSD task to match the ex-act same phrasal translation disambiguation task faced by phrase-based SMT systems. Our results provide the first known empir-ical evidence that lexical semantics are in-deed useful for SMT, despite claims to the contrary. ∗This material is based upon work supported in part by
Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability.
- for Computational Linguistics.
, 2011
"... Abstract In statistical machine translation, a researcher seeks to determine whether some innovation (e.g., a new feature, model, or inference algorithm) improves translation quality in comparison to a baseline system. To answer this question, he runs an experiment to evaluate the behavior of the t ..."
Abstract
-
Cited by 124 (15 self)
- Add to MetaCart
(Show Context)
Abstract In statistical machine translation, a researcher seeks to determine whether some innovation (e.g., a new feature, model, or inference algorithm) improves translation quality in comparison to a baseline system. To answer this question, he runs an experiment to evaluate the behavior of the two systems on held-out data. In this paper, we consider how to make such experiments more statistically reliable. We provide a systematic analysis of the effects of optimizer instability-an extraneous variable that is seldom controlled for-on experimental outcomes, and make recommendations for reporting results more accurately.
A survey of statistical machine translation
, 2007
"... Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular tec ..."
Abstract
-
Cited by 93 (6 self)
- Add to MetaCart
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of state-of-the-art SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions.
Tuning as ranking
- In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing
, 2011
"... We offer a simple, effective, and scalable method for statistical machine translation parameter tuning based on the pairwise approach to ranking (Herbrich et al., 1999). Unlike the popular MERT algorithm (Och, 2003), our pairwise ranking optimization (PRO) method is not limited to a handful of param ..."
Abstract
-
Cited by 93 (0 self)
- Add to MetaCart
(Show Context)
We offer a simple, effective, and scalable method for statistical machine translation parameter tuning based on the pairwise approach to ranking (Herbrich et al., 1999). Unlike the popular MERT algorithm (Och, 2003), our pairwise ranking optimization (PRO) method is not limited to a handful of parameters and can easily handle systems with thousands of features. Moreover, unlike recent approaches built upon the MIRA algorithm of Crammer and Singer (2003) (Watanabe et al., 2007; Chiang et al., 2008b), PRO is easy to implement. It uses off-the-shelf linear binary classifier software and can be built on top of an existing MERT framework in a matter of hours. We establish PRO’s scalability and effectiveness by comparing it to MERT and MIRA and demonstrate parity on both phrase-based and syntax-based systems in a variety of language pairs, using large scale data scenarios. 1
Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems
"... This paper describes Meteor 1.3, our submission ..."
(Show Context)
Crowdsourcing Translation: Professional Quality from Non-Professionals
"... Naively collecting translations by crowdsourcing the task to non-professional translators yields disfluent, low-quality results if no quality control is exercised. We demonstrate a variety of mechanisms that increase the translation quality to near professional levels. Specifically, we solicit redun ..."
Abstract
-
Cited by 53 (3 self)
- Add to MetaCart
(Show Context)
Naively collecting translations by crowdsourcing the task to non-professional translators yields disfluent, low-quality results if no quality control is exercised. We demonstrate a variety of mechanisms that increase the translation quality to near professional levels. Specifically, we solicit redundant translations and edits to them, and automatically select the best output among them. We propose a set of features that model both the translations and the translators, such as country of residence, LM perplexity of the translation, edit rate from the other translations, and (optionally) calibration against professional translators. Using these features to score the collected translations, we are able to discriminate between acceptable and unacceptable translations. We recreate the NIST 2009 Urdu-to-English evaluation set with Mechanical Turk, and quantitatively show that our models are able to select translations within the range of quality that we expect from professional translators. The total cost is more than an order of magnitude lower than professional translation. 1