Results 1 - 10
of
11
Edinburgh’s machine translation systems for European language pairs
- In Proceedings of the ACL 2013 Eighth Workshop on Statistical Machine Translation
, 2013
"... We validated various novel and recently proposed methods for statistical machine translation on 10 language pairs, using large data resources. We saw gains from optimizing parameters, training with sparse features, the operation sequence model, and domain adaptation techniques. We also report on uti ..."
Abstract
-
Cited by 22 (10 self)
- Add to MetaCart
(Show Context)
We validated various novel and recently proposed methods for statistical machine translation on 10 language pairs, using large data resources. We saw gains from optimizing parameters, training with sparse features, the operation sequence model, and domain adaptation techniques. We also report on utilizing a huge language model trained on 126 billion tokens. The annual machine translation evaluation campaign for European languages organized around the ACL Workshop on Statistical Machine Translation offers the opportunity to test recent advancements in machine translation in large data condition across several diverse language pairs. Building on our own developments and external contributions to the Moses open source toolkit, we carried out extensive experiments that, by early indications, led to a strong showing in the evaluation campaign. We would like to stress especially two contributions: the use of the new operation sequence model (Section 3) within Moses, and — in a separate unconstraint track submission — the use of a huge language model trained on 126 billion tokens with a new training tool (Section 4).
Fast and Adaptive Online Training of Feature-Rich Translation Models
"... We present a fast and scalable online method for tuning statistical machine translation models with large feature sets. The standard tuning algorithm—MERT—only scales to tens of features. Recent discriminative algorithms that accommodate sparse features have produced smaller than expected translatio ..."
Abstract
-
Cited by 12 (7 self)
- Add to MetaCart
(Show Context)
We present a fast and scalable online method for tuning statistical machine translation models with large feature sets. The standard tuning algorithm—MERT—only scales to tens of features. Recent discriminative algorithms that accommodate sparse features have produced smaller than expected translation quality gains in large systems. Our method, which is based on stochastic gradient descent with an adaptive learning rate, scales to millions of features and tuning sets with tens of thousands of sentences, while still converging after only a few epochs. Large-scale experiments on Arabic-English and Chinese-English show that our method produces significant translation quality gains by exploiting sparse features. Equally important is our analysis, which suggests techniques for mitigating overfitting and domain mismatch, and applies to other recent discriminative methods for machine translation. 1
A simple and effective weighted phrase extraction formachine translation adaptation
- inProceedings of the 9th International Workshop on Spoken Language Translation (IWSLT ’12
, 2012
"... Abstract The task of domain-adaptation attempts to exploit data mainly drawn from one domain (e.g. news) to maximize the performance on the test domain (e.g. weblogs). In previous work, weighting the training instances was used for filtering dissimilar data. We extend this by incorporating the weig ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Abstract The task of domain-adaptation attempts to exploit data mainly drawn from one domain (e.g. news) to maximize the performance on the test domain (e.g. weblogs). In previous work, weighting the training instances was used for filtering dissimilar data. We extend this by incorporating the weights directly into the standard phrase training procedure of statistical machine translation (SMT). This allows the SMT system to make the decision whether to use a phrase translation pair or not, a more methodological way than discarding phrase pairs completely when using filtering. Furthermore, we suggest a combined filtering and weighting procedure to achieve better results while reducing the phrase table size. The proposed methods are evaluated in the context of Arabicto-English translation on various conditions, where significant improvements are reported when using the suggested weighted phrase training. The weighting method also improves over filtering, and the combined filtering and weighting is better than a standalone filtering method. Finally, we experiment with mixture modeling, where additional improvements are reported when using weighted phrase extraction over a variety of baselines.
Mr. MIRA: Open-Source Large-Margin Structured Learning on
"... We present an open-source framework for large-scale online structured learning. Developed with the flexibility to handle cost-augmented inference problems such as statistical machine translation (SMT), our large-margin learner can be used with any decoder. Integration with MapReduce using Hadoop str ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
We present an open-source framework for large-scale online structured learning. Developed with the flexibility to handle cost-augmented inference problems such as statistical machine translation (SMT), our large-margin learner can be used with any decoder. Integration with MapReduce using Hadoop streaming allows efficient scaling with increasing size of training data. Although designed with a focus on SMT, the decoder-agnostic design of our learner allows easy future extension to other structured learning problems such as sequence labeling and parsing. 1
Simulating discriminative training for linear mixture adaptation in statistical machine translation
- In MT
, 2013
"... Linear mixture models are a simple and effective technique for performing domain adaptation of translation models in statis-tical MT. In this paper, we identify and correct two weaknesses of this method. First, we show that standard maximum-likelihood weights are biased toward large corpora, and tha ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Linear mixture models are a simple and effective technique for performing domain adaptation of translation models in statis-tical MT. In this paper, we identify and correct two weaknesses of this method. First, we show that standard maximum-likelihood weights are biased toward large corpora, and that a straightforward pre-processing step that down-samples phrase tables can be used to counter this bias. Second, we show that features inspired by prototypical linear mixtures can be used to loosely simulate discriminative training for mixture models, with results that are almost certainly superior to true discrimi-native training. Taken together, these en-hancements yield BLEU gains of approx-imately 1.5 over existing linear mixture techniques for translation model adapta-tion. 1
The UEDIN systems for the IWSLT 2012 evaluation
- In IWSLT
"... This paper describes the University of Edinburgh (UEDIN) systems for the IWSLT 2012 Evaluation. We participated in the ASR (English), MT (English-French, German-English) and SLT (English-French) tracks. ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
This paper describes the University of Edinburgh (UEDIN) systems for the IWSLT 2012 Evaluation. We participated in the ASR (English), MT (English-French, German-English) and SLT (English-French) tracks.
The AMARA corpus: Building resources for translating the web’s educational content
- In Proceedings of the International Workshop on Spoken Language Translation, IWSLT ’13
, 2013
"... In this paper, we introduce a new parallel corpus of subtitles of educational videos: the AMARA corpus for online edu-cational content. We crawl a multilingual collection com-munity generated subtitles, and present the results of pro-cessing the Arabic–English portion of the data, which yields a par ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
In this paper, we introduce a new parallel corpus of subtitles of educational videos: the AMARA corpus for online edu-cational content. We crawl a multilingual collection com-munity generated subtitles, and present the results of pro-cessing the Arabic–English portion of the data, which yields a parallel corpus of about 2.6M Arabic and 3.9M English words. We explore different approaches to align the seg-ments, and extrinsically evaluate the resulting parallel corpus on the standard TED-talks tst-2010. We observe that the data can be successfully used for this task, and also observe an absolute improvement of 1.6 BLEU when it is used in com-bination with TED data. Finally, we analyze some of the specific challenges when translating the educational content. 1.
Using comparable corpora to adapt mt models to new domains
- In Proceedings of the Workshop on Statistical Machine Translation (WMT
, 2014
"... Abstract In previous work we showed that when using an SMT model trained on old-domain data to translate text in a new-domain, most errors are due to unseen source words, unseen target translations, and inaccurate translation model scores ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract In previous work we showed that when using an SMT model trained on old-domain data to translate text in a new-domain, most errors are due to unseen source words, unseen target translations, and inaccurate translation model scores
The Prague Bulletin of Mathematical Linguistics
, 2014
"... An open-source web-based tool for resource-agnostic interactive translation prediction ..."
Abstract
- Add to MetaCart
An open-source web-based tool for resource-agnostic interactive translation prediction