Results 1 - 10
of
529
The Mathematics of Statistical Machine Translation: Parameter Estimation
- COMPUTATIONAL LINGUISTICS
, 1993
"... ..."
(Show Context)
Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging
- Computational Linguistics
, 1995
"... this paper, we will describe a simple rule-based approach to automated learning of linguistic knowledge. This approach has been shown for a number of tasks to capture information in a clearer and more direct fashion without a compromise in performance. We present a detailed case study of this learni ..."
Abstract
-
Cited by 924 (8 self)
- Add to MetaCart
(Show Context)
this paper, we will describe a simple rule-based approach to automated learning of linguistic knowledge. This approach has been shown for a number of tasks to capture information in a clearer and more direct fashion without a compromise in performance. We present a detailed case study of this learning method applied to part of speech tagging
Europarl: A Parallel Corpus for Statistical Machine Translation
"... We collected a corpus of parallel text in 11 languages from the proceedings of the European Parliament, which are published on the web 1. This corpus has found widespread use in the NLP community. Here, we focus on its acquisition and its application as training data for statistical machine translat ..."
Abstract
-
Cited by 519 (1 self)
- Add to MetaCart
We collected a corpus of parallel text in 11 languages from the proceedings of the European Parliament, which are published on the web 1. This corpus has found widespread use in the NLP community. Here, we focus on its acquisition and its application as training data for statistical machine translation (SMT). We trained SMT systems for 110 language pairs, which reveal interesting clues into the challenges ahead.
Extracting paraphrases from a parallel corpus
- In Proc. of the ACL/EACL
, 2001
"... While paraphrasing is critical both for interpretation and generation of natural language, current systems use manual or semi-automatic methods to collect paraphrases. We present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of th ..."
Abstract
-
Cited by 252 (6 self)
- Add to MetaCart
(Show Context)
While paraphrasing is critical both for interpretation and generation of natural language, current systems use manual or semi-automatic methods to collect paraphrases. We present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of the same source text. Our approach yields phrasal and single word lexical paraphrases as well as syntactic paraphrases. 1
Automatic Identification of Word Translations from Unrelated English and German Corpora
, 1999
"... Algorithms for the alignment of words in translated texts are well established. However, only recently new approaches have been proposed to identify word translations from non-parallel or even unrelated texts. This task is ..."
Abstract
-
Cited by 244 (2 self)
- Add to MetaCart
Algorithms for the alignment of words in translated texts are well established. However, only recently new approaches have been proposed to identify word translations from non-parallel or even unrelated texts. This task is
Translating Collocations for Bilingual Lexicons: A Statistical Approach
, 1996
"... ..."
(Show Context)
Identifying word correspondences in parallel texts
- In Proc. of Fourth DARPA Speech and Natural Language Processing Workshop
, 1991
"... Researchers in both machine translation (e.g., Brown et a/, 1990) arm bilingual lexicography (e.g., Klavans and Tzoukermarm, 1990) have recently become interested in studying parallel texts (also known as bilingual ..."
Abstract
-
Cited by 181 (4 self)
- Add to MetaCart
Researchers in both machine translation (e.g., Brown et a/, 1990) arm bilingual lexicography (e.g., Klavans and Tzoukermarm, 1990) have recently become interested in studying parallel texts (also known as bilingual
Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web
- IN PROCEEDINGS OF THE 22ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL
, 1999
"... This paper describes the use of a probabilistic translation model to cross-language IR (CLIR). The performance of this approach is compared with that using machine translation (MT). It is shown that using a probabilistic model, we are able to obtain performances close to those using an MT system. In ..."
Abstract
-
Cited by 154 (16 self)
- Add to MetaCart
This paper describes the use of a probabilistic translation model to cross-language IR (CLIR). The performance of this approach is compared with that using machine translation (MT). It is shown that using a probabilistic model, we are able to obtain performances close to those using an MT system. In addition, we also investigated the possibility of automatically gather parallel texts from the Web in an attempt to construct a reasonable training corpus. The result is very encouraging. We showed that in several tests, such a training corpus is as good as a manually constructed one for CLIR purposes.
Text-translation alignment
, 1988
"... We present an algorithm for aligning texts with their translations that is based only on internal evidence. The relaxation process rests on a notion of which word in one text corresponds to which word in the other text that is essentially based on the similarity of their distributions. It exploits a ..."
Abstract
-
Cited by 150 (0 self)
- Add to MetaCart
(Show Context)
We present an algorithm for aligning texts with their translations that is based only on internal evidence. The relaxation process rests on a notion of which word in one text corresponds to which word in the other text that is essentially based on the similarity of their distributions. It exploits a partial alignment of the word level to induce a maximum likelihood alignment of the sentence level, which is in turn used, in the next iteration, to refine the word level estimate. The algorithm appears to converge to the correct sentence alignment in only a few iterations. 1. The Problem To align a text with a translation of it in another language is, in the terminology of this paper, to show which of its parts are translated by what parts of the second text. The result takes the form of a list of pairs of items--words, sentences, paragraphs, or whatever--from the two texts. A pair (a ~ b> is on the list if a is translated, in whole or in part, by b. If (a, b> and (a, c) are on the list, it is because a is translated partly by b, and partly by c. We say that the alignment is partial if only some of the items of the chosen kind from one or other of the texts are represented in the pairs. Otherwise, it is complete.
Using cognates to align sentences in bilingual corpora
- In Proceedings of the Fourth International Congress on Theoretical and Methodological Issues in Machine Translation
, 1992
"... 1 ..."
(Show Context)