This thesis examines the possibility of using comparable corpora to augment statisti-cal models of translation. Treating comparable corpora as marginal samples from an aligned bilingual joint distribution, the estimation of translation models from a com-bination of bilingual parallel and comparable corpora is seen as a variation of the labelled-unlabelled problem [Seeger, 2000b]. Results on synthetic data confirm that successful re-estimation within the EM frame-work [Dempster et al., 1977] is highly-dependent on the balance between complete and incomplete data [Nigam, 2001]. Here we show that the utility of re-estimation with additional incomplete data is highly-dependent on the accuracy of initial parame-ters estimated from the complete data alone. We propose a method for constraining the re-estimation procedure in relation to the degree of comparability between marginal samples. This is seen to result in bet-ter conditional models when the assumption of comparability is valid. Finally, we consider how more complex marginal models could be used to further constrain the
|
4388
|
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
– Pearl
- 1988
|
|
4345
|
Maximum likelihood from incomplete data via the EM algorithm
– Dempster, Laird, et al.
- 1977
|
|
2103
|
A tutorial on hidden markov models and selected applications in speech recognition
– Rabiner
- 1989
|
|
904
|
Local computations with probabilities on graphical structures and their application to expert systems (with discussion
– Lauritzen, Spiegelhalter
- 1988
|
|
629
|
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm
– Viterbi
- 1967
|
|
589
|
Information Theory and Statistics
– Kullback
- 1959
|
|
575
|
Combining labeled and unlabeled data with co-training
– Blum, Mitchell
- 1998
|
|
575
|
Algorithms on Strings, Trees, and Sequences
– Gusfield
- 1997
|
|
438
|
Factor graphs and the sum-product algorithm
– Kschischang, Frey, et al.
- 2001
|
|
415
|
A maximization technique occurring in the statistical analysis of probabilistic function of Markov chains
– Baum, Petrie, et al.
- 1970
|
|
410
|
A New View of the EM Algorithm that Justifies Incremental and Other Variants“, Learning in Graphical Models
– Neal, Hinton
- 1993
|
|
394
|
A statistical approach to machine translation
– Brown, Cocke, et al.
- 1990
|
|
391
|
Class-Based n-gram Models of Natural Language
– Brown, Pietra, et al.
- 1992
|
|
371
|
Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
– Dubin, Eddy, et al.
- 1998
|
|
336
|
Inducing Features of Random Fields
– Pietra, Pietra, et al.
- 1995
|
|
332
|
Hidden Markov models in computational biology. Applications to protein modeling
– Krogh, Brown, et al.
- 1994
|
|
293
|
Linear Pattern Matching Algorithm
– Weiner
- 1973
|
|
285
|
Generalized iterative scaling for log-linear models
– Darroch, Ratcliff
- 1972
|
|
257
|
A program for aligning sentences in bilingual corpora
– Gale, Church
- 1993
|
|
250
|
Theory of point estimation
– Lehmann
- 1983
|
|
231
|
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods
– Yarowsky
- 1995
|
|
218
|
A maximum entropy model for part-ofspeech tagging
– Ratnaparkhi
- 1996
|
|
157
|
Sequential updating of conditional probabilities on directed graphical structures
– Spiegelhalter, Lauritzen
- 1990
|
|
142
|
I-divergence geometry of probability distributions and minimization problems,” The Annuals of Probability
– Csiszar
- 1975
|
|
110
|
A comparison of algorithms for maximum entropy parameter estimation
– Malouf
- 2002
|
|
102
|
Dimensions of Meaning
– Schütze
- 1992
|
|
101
|
The mathematics of machine translation: Parameter estimation
– Brown, Pietra, et al.
- 1993
|
|
98
|
Learning with Labeled and Unlabeled Data
– Seeger
- 2000
|
|
96
|
Information geometry and alternating minimization procedures,” Stat
– Csiszár, Tusnády
- 1984
|
|
79
|
Information geometry of the EM and em algorithms for neural networks
– Amari
- 1995
|
|
77
|
Unsupervised learning from dyadic data
– Hofmann, Puzicha
- 1998
|
|
72
|
Maximum likelihood estimation via the ECM algorithm: a general framework
– Meng, Rubin
- 1993
|
|
65
|
On a least-squares adjustment of a sampled frequency table when the expected marginal totals are known
– Deming, Stephan
- 1940
|
|
61
|
Learning to paraphrase: An unsupervised approach using multiple-sequence alignment
– Barzilay, Lee
- 2003
|
|
56
|
Markov fields and loglinear interaction models for contingency tables
– Darroch, Lauritzen, et al.
- 1980
|
|
56
|
An IR Approach for Translating New Words from Nonparallel, Comparable Texts
– Fung, Yee
- 1998
|
|
51
|
Statistical methods and linguistics
– Abney
- 1996
|
|
51
|
Automatic Identification of Word Translations from Unrelated English and German Corpora
– Rapp
- 1999
|
|
39
|
M.I.An Introduction to Probabilistic Graphical Models
– JORDAN
|
|
36
|
Identifying word translations in nonparallel texts
– Rapp
- 1995
|
|
34
|
Using unlabeled data to improve text classification
– Nigam
- 2001
|
|
26
|
Finding terminology translations from non-parallel corpora
– Fung, McKeown
- 1997
|
|
22
|
Estimating Word Translation Probabilities From Unrelated Monolingual Corpora Using the EM Algorithm
– Koehn, Knight
- 2002
|
|
19
|
S.: A Statistical Word-Level Translation Model for Comparable Corpora
– Diab, Finch
- 2000
|
|
17
|
A geometric interpretation of Darroch and Ratcliff's generalized iterative scaling
– Csiszar
- 1989
|
|
17
|
K.: Learning a Translation Lexicon from Monolingual Corpora
– Koehn, Knight
- 2002
|
|
16
|
Continuation methods for mixing heterogenous sources
– Corduneanu, Jaakkola
|
|
15
|
Alternating minimization and Boltzmann machine learning
– Byrne
- 1992
|
|
11
|
Input-dependent regularization of conditional density models
– Seeger
- 2001
|
|
9
|
Stable mixing of complete and incomplete information
– Corduneanu, Jaakkola
- 2001
|