Results 1  10
of
64
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources
 Proceedings of the 20th International Conference on Computational Linguistics (Coling’04
, 2004
"... Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources ..."
Abstract

Cited by 214 (2 self)
 Add to MetaCart
(Show Context)
Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources
Monolingual machine translation for paraphrase generation
 In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing
, 2004
"... We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentences in the same language. The system is trained on large volumes of sentence pairs automatically extracted from clustered news articles available on the World Wide Web. Alignment Error Rate (AER) is mea ..."
Abstract

Cited by 111 (6 self)
 Add to MetaCart
We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentences in the same language. The system is trained on large volumes of sentence pairs automatically extracted from clustered news articles available on the World Wide Web. Alignment Error Rate (AER) is measured to gauge the quality of the resulting corpus. A monotone phrasal decoder generates contextual replacements. Human evaluation shows that this system outperforms baseline paraphrase generation techniques and, in a departure from previous work, offers better coverage and scalability than the current bestofbreed paraphrasing approaches. 1
A survey of statistical machine translation
, 2007
"... Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of humanproduced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular tec ..."
Abstract

Cited by 93 (6 self)
 Add to MetaCart
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of humanproduced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of stateoftheart SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions.
Structured prediction, dual extragradient and Bregman projections
 Journal of Machine Learning Research
, 2006
"... We present a simple and scalable algorithm for maximummargin estimation of structured output models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convexconcave saddlepoint problem that allows us to use simple projection methods ..."
Abstract

Cited by 59 (2 self)
 Add to MetaCart
(Show Context)
We present a simple and scalable algorithm for maximummargin estimation of structured output models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convexconcave saddlepoint problem that allows us to use simple projection methods based on the dual extragradient algorithm (Nesterov, 2003). The projection step can be solved using dynamic programming or combinatorial algorithms for mincost convex flow, depending on the structure of the problem. We show that this approach provides a memoryefficient alternative to formulations based on reductions to a quadratic program (QP). We analyze the convergence of the method and present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm. 1 1.
Empirical lower bounds on the complexity of translational equivalence
 In Proceedings of ACL 2006
, 2006
"... This paper describes a study of the patterns of translational equivalence exhibited by a variety of bitexts. The study found that the complexity of these patterns in every bitext was higher than suggested in the literature. These findings shed new light on why “syntactic ” constraints have not helpe ..."
Abstract

Cited by 39 (1 self)
 Add to MetaCart
This paper describes a study of the patterns of translational equivalence exhibited by a variety of bitexts. The study found that the complexity of these patterns in every bitext was higher than suggested in the literature. These findings shed new light on why “syntactic ” constraints have not helped to improve statistical translation models, including finitestate phrasebased models, treetostring models, and treetotree models. The paper also presents evidence that inversion transduction grammars cannot generate some translational equivalence relations, even in relatively simple real bitexts in syntactically similar languages with rigid word order. Instructions for replicating our experiments are at
Structured prediction via the extragradient method
"... We present a simple and scalable algorithm for largemargin estimation of structured models, including an important class of Markov networks and combinatorial models. The estimation problem can be formulated as a quadratic program (QP) that exploits the problem structure to achieve polynomial number ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
(Show Context)
We present a simple and scalable algorithm for largemargin estimation of structured models, including an important class of Markov networks and combinatorial models. The estimation problem can be formulated as a quadratic program (QP) that exploits the problem structure to achieve polynomial number of variables and constraints. However, offtheshelf QP solvers scale poorly with problem and training sample size. We recast the formulation as a convexconcave saddle point problem that allows us to use simple projection methods. We show the projection step can be solved using combinatorial algorithms for mincost convex flow. We provide linear convergence guarantees for our method and present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm.
Faster beamsearch decoding for phrasal statistical machine translation
 In Proceedings of MT Summit XI
, 2007
"... Pharaoh is a widelyused stateoftheart decoder for phrasal statistical machine translation. In this paper, we present two modifications to the algorithm used by Pharaoh that together permit much faster decoding without losing translation quality as measured by BLEU score. The first modification i ..."
Abstract

Cited by 31 (6 self)
 Add to MetaCart
(Show Context)
Pharaoh is a widelyused stateoftheart decoder for phrasal statistical machine translation. In this paper, we present two modifications to the algorithm used by Pharaoh that together permit much faster decoding without losing translation quality as measured by BLEU score. The first modification improves the estimated translation model score used by Pharaoh to evaluate partial hypotheses, by incorporating an estimate of the distortion penalty to be incurred in translating the rest of the sentence. The second modification uses early pruning of possible nextphrase translations to cut down the overall size of the search space. These modifications enable decoding speedups of an order of magnitude or more, with no reduction in the BLEU score of the resulting translations. 1.
Statistical machine translation with word and sentencealigned parallel corpora
 Proceedings of the ACL
, 2004
"... The parameters of statistical translation models are typically estimated from sentencealigned parallel corpora. We show that significant improvements in the alignment and translation quality of such models can be achieved by additionally including wordaligned data during training. Incorporating wor ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
The parameters of statistical translation models are typically estimated from sentencealigned parallel corpora. We show that significant improvements in the alignment and translation quality of such models can be achieved by additionally including wordaligned data during training. Incorporating wordlevel alignments into the parameter estimation of the IBM models reduces alignment error rate and increases the Bleu score when compared to training the same models only on sentencealigned data. On the Verbmobil data set, we attain a 38 % reduction in the alignment error rate and a higher Bleu score with half as many training examples. We discuss how varying the ratio of wordaligned to sentencealigned data affects the expected performance gain. 1
Dependency tree translation: Syntactically informed phrasal smt
 In ACL
, 2005
"... done while at Microsoft Research We describe a novel approach to statistical machine translation that combines syntactic information in the source language with recent advances in phrasal translation. We depend on a sourcelanguage dependency parser and a wordaligned parallel corpus. The only targe ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
(Show Context)
done while at Microsoft Research We describe a novel approach to statistical machine translation that combines syntactic information in the source language with recent advances in phrasal translation. We depend on a sourcelanguage dependency parser and a wordaligned parallel corpus. The only target language resource assumed is a word breaker. These are used to produce treelet (“phrase”) translation pairs as well as several models, including a channel model, an order model, and a target language model. Together these models and the treelet translation pairs provide a powerful and promising approach to MT that incorporates the power of phrasal SMT with the linguistic generality available in a parser. We evaluate two decoding approaches, one inspired by dynamic programming and the
Adaptive language and translation models for interactive machine translation
 In Proceedings of the Conference on Empirical Methods in Natural Language Processing
, 2004
"... We describe experiments carried out with adaptive language and translation models in the context of an interactive computerassisted translation program. We developed cachebased language models which were then extended to the bilingual case for a cachebased translation model. We present the improve ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
(Show Context)
We describe experiments carried out with adaptive language and translation models in the context of an interactive computerassisted translation program. We developed cachebased language models which were then extended to the bilingual case for a cachebased translation model. We present the improvements we obtained in two contexts: in a theoretical setting, we achieved a drop in perplexity for the new models and, in a more practical situation simulating a user working with the system, we showed that fewer keystrokes would be needed to enter a translation. 1