Results 1 - 10
of
24
Hierarchical phrase-based translation with weighted finite state transducers and . . .
- IN PROCEEDINGS OF HLT/NAACL
, 2010
"... In this article we describe HiFST, a lattice-based decoder for hierarchical phrase-based translation and alignment. The decoder is implemented with standard Weighted Finite-State Transducer (WFST) operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs ra ..."
Abstract
-
Cited by 48 (20 self)
- Add to MetaCart
In this article we describe HiFST, a lattice-based decoder for hierarchical phrase-based translation and alignment. The decoder is implemented with standard Weighted Finite-State Transducer (WFST) operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs rather than k-best lists requires less pruning in translation search, resulting in fewer search errors, better parameter optimization, and improved translation performance. The direct generation of translation lattices in the target language can improve subsequent rescoring procedures, yielding further gains when applying long-span language models and Minimum Bayes Risk decoding. We also provide insights as to how to control the size of the search space defined by hierarchical rules. We show that shallow-n grammars, low-level rule catenation, and other search constraints can help to match the power of the translation system to specific language pairs.
A unified framework for phrase-based, hierarchical, and syntax-based statistical machine translation
- In Proceedings of the International Workshop on Spoken Language Translation (IWSLT
, 2009
"... Despite many differences between phrase-based, hierarchical, and syntax-based translation models, their training and testing pipelines are strikingly similar. Drawing on this fact, we extend the Moses toolkit to implement hierarchical and syntactic models, making it the first open source toolkit wit ..."
Abstract
-
Cited by 31 (4 self)
- Add to MetaCart
(Show Context)
Despite many differences between phrase-based, hierarchical, and syntax-based translation models, their training and testing pipelines are strikingly similar. Drawing on this fact, we extend the Moses toolkit to implement hierarchical and syntactic models, making it the first open source toolkit with end-to-end support for all three of these popular models in a single package. This extension substantially lowers the barrier to entry for machine translation research across multiple models. 1.
Rule filtering by pattern for efficient hierarchical translation
- In Proceedings of the EACL
, 2009
"... We describe refinements to hierarchical translation search procedures intended to reduce both search errors and memory usage through modifications to hypothesis expansion in cube pruning and reductions in the size of the rule sets used in translation. Rules are put into syntactic classes based on th ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
(Show Context)
We describe refinements to hierarchical translation search procedures intended to reduce both search errors and memory usage through modifications to hypothesis expansion in cube pruning and reductions in the size of the rule sets used in translation. Rules are put into syntactic classes based on the number of non-terminals and the pattern, and various filtering strategies are then applied to assess the impact on translation speed and quality. Results are reported on the 2008 NIST Arabic-to-English evaluation task. 1
Syntax based reordering with automatically derived rules for improved statistical machine translation
- In Proc. of COLING’10
, 2010
"... Syntax based reordering has been shown to be an effective way of handling word order differences between source and target languages in Statistical Machine Translation (SMT) systems. We present a simple, automatic method to learn rules that reorder source sentences to more closely match the target l ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Syntax based reordering has been shown to be an effective way of handling word order differences between source and target languages in Statistical Machine Translation (SMT) systems. We present a simple, automatic method to learn rules that reorder source sentences to more closely match the target language word order using only a source side parse tree and automatically generated alignments. The resulting rules are applied to source language inputs as a pre-processing step and demonstrate significant improvements in SMT systems across a variety of languages pairs including English to Hindi, EnglishtoSpanishandEnglishtoFrench as measured on a variety of internal test sets as well as a public test set. 1
Parallel Treebanks in Phrase-Based Statistical Machine Translation
"... Abstract. Given much recent discussion and the shift in focus of the field, it is becoming apparent that the incorporation of syntax is the way forward for the current state-of-the-art in machine translation (MT). Parallel treebanks are a relatively recent innovation and appear to be ideal candidate ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Given much recent discussion and the shift in focus of the field, it is becoming apparent that the incorporation of syntax is the way forward for the current state-of-the-art in machine translation (MT). Parallel treebanks are a relatively recent innovation and appear to be ideal candidates for MT training material. However, until recently there has been no other means to build them than by hand. In this paper, we describe how we make use of new tools to automatically build a large parallel treebank and extract a set of linguistically motivated phrase pairs from it. We show that adding these phrase pairs to the translation model of a baseline phrase-based statistical MT (PBSMT) system leads to significant improvements in translation quality. We describe further experiments on incorporating parallel treebank information into PBSMT, such as word alignments. We investigate the conditions under which the incorporation of parallel treebank data performs optimally. Finally, we discuss the potential of parallel treebanks in other paradigms of MT. 1
Long-distance reordering during search for hierarchical phrase-based SMT.
- In Proceedings of the Annual Conference of the European Association for Machine Translation (EAMT),
, 2012
"... Abstract Long-distance reordering of syntactically divergent language pairs is a critical problem. SMT has had limited success in handling these reorderings during inference, and thus deterministic preprocessing based on reordering parse trees is used. We consider German-to-English translation usin ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract Long-distance reordering of syntactically divergent language pairs is a critical problem. SMT has had limited success in handling these reorderings during inference, and thus deterministic preprocessing based on reordering parse trees is used. We consider German-to-English translation using Hiero. We show how to effectively model long-distance reorderings during search. Our work is novel in that we look at reordering distances of up to 50 words, and conduct a detailed manual analysis based on a new gold standard.
Tree transformations and dependencies
"... Abstract. Several tree transformation devices that are relevant in natural language processing are presented with a focus on the dependencies that they are able to capture. In many cases, the consideration of the dependencies alone can be used to provide a high-level explanation of the short-comings ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Several tree transformation devices that are relevant in natural language processing are presented with a focus on the dependencies that they are able to capture. In many cases, the consideration of the dependencies alone can be used to provide a high-level explanation of the short-comings of tree transformation devices and allows surprising insights into their structure. 1
Grammar based statistical MT on Hadoop
, 2009
"... An end-to-end toolkit for large scale PSCFG based MT ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
An end-to-end toolkit for large scale PSCFG based MT
A detailed analysis of phrase-based and syntax-based machine translation: The search for systematic differences
- In Proc. AMTA
, 2012
"... Abstract This paper describes a range of automatic and manual comparisons of phrase-based and syntax-based statistical machine translation methods applied to English-German and English-French translation of user-generated content. The syntax-based methods underperform the phrase-based models and th ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract This paper describes a range of automatic and manual comparisons of phrase-based and syntax-based statistical machine translation methods applied to English-German and English-French translation of user-generated content. The syntax-based methods underperform the phrase-based models and the relaxation of syntactic constraints to broaden translation rule coverage means that these models do not necessarily generate output which is more grammatical than the output produced by the phrase-based models. Although the systems generate different output and can potentially be fruitfully combined, the lack of systematic difference between these models makes the combination task more challenging.
Available at:
"... This thesis describes structural differences and translation asymmetries between English and Japanese. It describes possible ways to treat those phenomena in the framework of linguistics-based machine translation. As an application, the adaptibility of the multilingual linguistics-based MT system It ..."
Abstract
- Add to MetaCart
(Show Context)
This thesis describes structural differences and translation asymmetries between English and Japanese. It describes possible ways to treat those phenomena in the framework of linguistics-based machine translation. As an application, the adaptibility of the multilingual linguistics-based MT system Its-2 to English-Japanese translation is tested, describing the enhancement of the system and its lexical databases, and the positive effect of using statistically acquired linguistic data. The translation quality of the English-Japanese version of Its-2 is finally compared to these of 2 other MT systems, on a set of 5 chosen sentences, showing strength on some points and weeknesses on others. It is also shown that hierarchical statistical post-editing can improve the overall translation quality of the system for daily language translation when the training corpus is not too small.