Results 1 - 10
of
19
Clause restructuring for statistical machine translation
- In ACL
, 2005
"... We describe a method for incorporating syntactic information in statistical machine translation systems. The first step of the method is to parse the source language string that is being translated. The second step is to apply a series of transformations to the parse tree, effectively reordering the ..."
Abstract
-
Cited by 65 (2 self)
- Add to MetaCart
We describe a method for incorporating syntactic information in statistical machine translation systems. The first step of the method is to parse the source language string that is being translated. The second step is to apply a series of transformations to the parse tree, effectively reordering the surface string on the source language side of the translation system. The goal of this step is to recover an underlying word order that is closer to the target language word-order than the original string. The reordering approach is applied as a pre-processing step in both the training and decoding phases of a phrase-based statistical MT system. We describe experiments on translation from German to English, showing an improvement from 25.2 % Bleu score for a baseline system to 26.8 % Bleu score for the system with reordering, a statistically significant improvement.
Dependency tree translation: Syntactically informed phrasal smt
- In ACL
, 2005
"... done while at Microsoft Research We describe a novel approach to statistical machine translation that combines syntactic information in the source language with recent advances in phrasal translation. We depend on a source-language dependency parser and a word-aligned parallel corpus. The only targe ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
done while at Microsoft Research We describe a novel approach to statistical machine translation that combines syntactic information in the source language with recent advances in phrasal translation. We depend on a source-language dependency parser and a word-aligned parallel corpus. The only target language resource assumed is a word breaker. These are used to produce treelet (“phrase”) translation pairs as well as several models, including a channel model, an order model, and a target language model. Together these models and the treelet translation pairs provide a powerful and promising approach to MT that incorporates the power of phrasal SMT with the linguistic generality available in a parser. We evaluate two decoding approaches, one inspired by dynamic programming and the
Explorations in sentence fusion
- In Proceedings of the 10th European Workshop on Natural Language Generation
, 2005
"... Sentence fusion is a text-to-text (revision-like) generation task which takes related sentences as input and merges these into a single output sentence. In this paper we describe our ongoing work on developing a sentence fusion module for Dutch. We propose a generalized version of alignment which no ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Sentence fusion is a text-to-text (revision-like) generation task which takes related sentences as input and merges these into a single output sentence. In this paper we describe our ongoing work on developing a sentence fusion module for Dutch. We propose a generalized version of alignment which not only indicates which words and phrases should be aligned but also labels these in terms of a small set of primitive semantic relations, indicating how words and phrases from the two input sentences relate to each other. It is shown that human labelers can perform this task with a high agreement (Fscore of.95). We then describe and evaluate our adaptation of an existing automatic alignment algorithm, and use the resulting alignments, plus the semantic labels, in a generalized fusion and generation algorithm. A small-scale evaluation study reveals that most of the resulting sentences are adequate to good. 1
Mapping Dependencies Trees: An Application to Question Answering
- In Proceedings of the 8th International Symposium on Artificial Intelligence and Mathematics, Fort
, 2004
"... We describe an approach for answer selection in a free form question answering task. In order to go beyond the key-word based matching in selecting answers to questions, one would like to incorporate both syntactic and semantic information in the question answering process. We achieve this goal ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
We describe an approach for answer selection in a free form question answering task. In order to go beyond the key-word based matching in selecting answers to questions, one would like to incorporate both syntactic and semantic information in the question answering process. We achieve this goal by representing both questions and candidate passages using dependency trees, and incorporating semantic information such as named entities in this representation. The sentence that best answers a question is determined to be the one that minimizes the generalized edit distance between it and the question tree, computed via an approximate tree matching algorithm. We evaluate the approach on question-answer pairs taken from previous TREC Q/A competitions. Preliminary experiments show its potential by significantly outperforming common bag-of-word scoring methods.
Machine translation in the year 2004
- In Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP
, 2005
"... Increased availability of parallel data and recent progress in modeling, decoding, and evaluation have recently had a major impact on machine translation (MT) accuracy. This paper covers the basic elements of state-of-the-art, statistical MT. 1. ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Increased availability of parallel data and recent progress in modeling, decoding, and evaluation have recently had a major impact on machine translation (MT) accuracy. This paper covers the basic elements of state-of-the-art, statistical MT. 1.
A discriminative model for tree-to-tree translation
- In Proceedings of the EMNLP
, 2006
"... This paper proposes a statistical, treeto-tree model for producing translations. Two main contributions are as follows: (1) a method for the extraction of syntactic structures with alignment information from a parallel corpus of translations, and (2) use of a discriminative, featurebased model for p ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
This paper proposes a statistical, treeto-tree model for producing translations. Two main contributions are as follows: (1) a method for the extraction of syntactic structures with alignment information from a parallel corpus of translations, and (2) use of a discriminative, featurebased model for prediction of these targetlanguage syntactic structures—which we call aligned extended projections, or AEPs. An evaluation of the method on translation from German to English shows similar performance to the phrase-based model of Koehn et al. (2003). 1
Robust Language Pair-Independent Sub-Tree Alignment
"... Data-driven approaches to machine translation (MT) achieve state-of-the-art results. Many syntax-aware approaches, such as Example-Based MT and Data-Oriented Translation, make use of tree pairs aligned at sub-sentential level. Obtaining sub-sentential alignments manually is time-consuming and error- ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Data-driven approaches to machine translation (MT) achieve state-of-the-art results. Many syntax-aware approaches, such as Example-Based MT and Data-Oriented Translation, make use of tree pairs aligned at sub-sentential level. Obtaining sub-sentential alignments manually is time-consuming and error-prone, and requires expert knowledge of both source and target languages. We propose a novel, language pair-independent algorithm which automatically induces alignments between phrase-structure trees. We evaluate the alignments themselves against a manually aligned gold standard, and perform an extrinsic evaluation by using the aligned data to train and test a DOT system. Our results show that translation accuracy is comparable to that of the same translation system trained on manually aligned data, and coverage improves. 1.
An architecture for parallel corpus-based grammar learning
- Sprachtechnologie, mobile Kommunikation und linguistische Ressourcen. Beiträge zur GLDV-Tagung 2005 in Bonn
, 2005
"... This paper describes an architecture for exploiting implicit information about the grammar of the languages included in a parallel corpus. By initially applying statistical word alignment and defining an appropriate representation format for cross-linguistic structural correspondence, this implicit ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper describes an architecture for exploiting implicit information about the grammar of the languages included in a parallel corpus. By initially applying statistical word alignment and defining an appropriate representation format for cross-linguistic structural correspondence, this implicit information can feed a system for bootstrapping grammars. The proposed architecture will be underlying in the new PTOLEMAIOS project. Dieses Papier beschreibt einer Architektur, mit der die implizit in Parallelkorpora enthaltene Information über die Grammatiken der beteiligten Sprachen ausgenutzt werden soll. Wenn vorab eine statistische Wortalignierung angewandt wird und ein geeignetes Repräsentationformat für die crosslinguistische Strukturkorrespondenz definiert wird, kann diese implizite Information in einem Bootstrapping-Ansatz zum Grammatiklernen verwertet werden. Die vorgeschlagene Architektur wird im neuen PTOLEMAIOS-Projekt zur Anwendung kommen. 1.
Removing the Distinction Between a Translation Memory, a Bilingual Dictionary and a Parallel Corpus
"... This paper presents a prototype MT system which does not make the dis-tinction between a dictionary, a sub-sentential aligned parallel corpus, and post-edited information (translators output) like a translation memory. The system is based on the METIS-approach (Vandeghinste et al, 2006), and uses an ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper presents a prototype MT system which does not make the dis-tinction between a dictionary, a sub-sentential aligned parallel corpus, and post-edited information (translators output) like a translation memory. The system is based on the METIS-approach (Vandeghinste et al, 2006), and uses an XML-based dictionary format in which not only simple word-to-word translations can be included, but which also contains complex dictionary en-tries, including discontinuous entries, like idioms and proverbs. The pre-sented prototype is a system that automatically adapts its dictionary and tar-get language corpus depending on the post-edited output as made by the users of the system, and will therefore have a learning curve in its performance. 1 1
EBMT by Tree-Phrasing
, 2006
"... In this article, we present a study we conducted to build a repository storing associations between simple syntactic dependency treelets in a source language and their corresponding phrases in a target language. We assess the usefulness of this resource in two different settings. First, we show that ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this article, we present a study we conducted to build a repository storing associations between simple syntactic dependency treelets in a source language and their corresponding phrases in a target language. We assess the usefulness of this resource in two different settings. First, we show that it improves upon a standard sub-sentential translation memory. Secondly, we observe improvements in translation quality when a standard statistical phrase-based translation engine is augmented with the ability to exploit such a repository.

