Abstract:
Parallel texts (bitexts) have properties that distinguish them from other kinds of parallel data. First, most words translate to only one other word. Second, bitext correspondence is noisy. This article presents methods for biasing statistical translation models to reflect these properties. Analysis of the expected behavior of these biases in the presence of sparse data predicts that they will result in more accurate models. The prediction is confirmed by evaluation with respect to a gold standard--- translation models that are biased in this fashion are significantly more accurate than a baseline knowledge-poor model. This article also shows how a statistical translation model can take advantage of various kinds of pre-existing knowledge that might be available about particular language pairs. Even the simplest kinds of language-specific knowledge, such as the distinction between content words and function words, is shown to reliably boost translation model performance on some tasks. Statistical translation models that are informed by pre-existing knowledge about the model domain combine the best of both the rationalist and empiricist traditions.
Citations
|
4704
|
Maximum likelihood from incomplete data via the EM algorithm
– Dempster, Laird, et al.
- 1977
|
|
516
|
The mathematics of statistical machine translation: parameter estimation
– Brown, Pietra, et al.
- 1993
|
|
501
|
Accurate methods for the statistics of surprise and coincidence
– Dunning
- 1993
|
|
430
|
A statistical approach to machine translation
– unknown authors
- 1990
|
|
277
|
A program for aligning sentences in bilingual corpora
– Gale, Church
- 1991
|
|
127
|
Identifying word correspondences in parallel texts
– Gale, Church
- 1991
|
|
118
|
One sense per collocation
– Yarowsky
- 1993
|
|
91
|
Hmm-based word alignment in statistical translation
– Vogel, Ney, et al.
- 1996
|
|
61
|
A perspective on word sense disambiguation methods and their evaluation
– Resnik, Yarowsky
- 1997
|
|
60
|
Automatic Evaluation and Uniform Filter Cascades for Inducing N-Best Translation Lexicons
– Melamed
- 1995
|
|
56
|
A statistical approach to language translation
– Brown
- 1988
|
|
55
|
Automatic construction of clean broad-coverage translation lexicons
– Melamed
- 1996
|
|
53
|
A Survey of Multilingual Text Retrieval
– Oard, Dorr
- 1996
|
|
52
|
Building Probabilistic Models for Natural Language
– Chen
- 1996
|
|
46
|
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora
– FUNG
- 1995
|
|
45
|
A geometric approach to mapping bitext correspondence
– Melamed
- 1996
|
|
43
|
Deriving translation data from bilingual texts
– Catizone, Russell, et al.
- 1989
|
|
34
|
Good applications for crummy machine translation
– Church, Hovy
- 1993
|
|
28
|
Building an MT dictionary from parallel texts based on linguistic and statistical information
– Kumano, Hirakawa
- 1994
|
|
26
|
Learning an English-Chinese lexicon from a parallel corpus
– Wu, Xia
- 1994
|
|
24
|
Compiling bilingual lexicon entries from a non-parallel English-Chinese corpus
– Fung
- 1995
|
|
24
|
A portable algorithm for mapping bitext correspondence
– Melamed
- 1997
|
|
24
|
Semi-automatic acquisition of domain-specific translation lexicons
– Resnik, Melamed
- 1997
|
|
20
|
Robust Word Alignment for Machine Aided Translation
– Dagan, Church
- 1993
|
|
17
|
How to compile a bilingual collocational lexicon automatically
– Smadja
- 1992
|
|
16
|
But dictionaries are data too
– Brown, Pietra, et al.
- 1993
|
|
13
|
Using bi-textual alignment for translation validation: the TransCheck system
– Macklovitch
- 1994
|
|
13
|
Line 'Em Up: Advances in Alignment Technology and their Impact on Translation Support Tools
– Macklovitch
- 1996
|
|
10
|
Measuring Semantic Entropy
– Melamed
- 1997
|
|
8
|
Melamed "Automatic Detection of Omissions in Translations
– D
- 1996
|
|
5
|
Evaluation of Machine Translation
– White
- 1993
|
|
2
|
personal communication
– Nasr
- 1997
|
|
1
|
TransSearch: A Bilingual Concordance Tool." Centre d'innovation en technologies de l'information
– Simard, Foster
- 1993
|