MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Learning Translations from Comparable Corpora

Download:
pdf
by David Talbot
http://www.inf.ed.ac.uk/publications/thesis/online/IM030067.pdf
Add To MetaCart

Abstract:

This thesis examines the possibility of using comparable corpora to augment statisti-cal models of translation. Treating comparable corpora as marginal samples from an aligned bilingual joint distribution, the estimation of translation models from a com-bination of bilingual parallel and comparable corpora is seen as a variation of the labelled-unlabelled problem [Seeger, 2000b]. Results on synthetic data confirm that successful re-estimation within the EM frame-work [Dempster et al., 1977] is highly-dependent on the balance between complete and incomplete data [Nigam, 2001]. Here we show that the utility of re-estimation with additional incomplete data is highly-dependent on the accuracy of initial parame-ters estimated from the complete data alone. We propose a method for constraining the re-estimation procedure in relation to the degree of comparability between marginal samples. This is seen to result in bet-ter conditional models when the assumption of comparability is valid. Finally, we consider how more complex marginal models could be used to further constrain the

Citations

4388 Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference – Pearl - 1988
4345 Maximum likelihood from incomplete data via the EM algorithm – Dempster, Laird, et al. - 1977
2103 A tutorial on hidden markov models and selected applications in speech recognition – Rabiner - 1989
904 Local computations with probabilities on graphical structures and their application to expert systems (with discussion – Lauritzen, Spiegelhalter - 1988
629 Error bounds for convolutional codes and an asymptotically optimum decoding algorithm – Viterbi - 1967
589 Information Theory and Statistics – Kullback - 1959
575 Combining labeled and unlabeled data with co-training – Blum, Mitchell - 1998
575 Algorithms on Strings, Trees, and Sequences – Gusfield - 1997
438 Factor graphs and the sum-product algorithm – Kschischang, Frey, et al. - 2001
415 A maximization technique occurring in the statistical analysis of probabilistic function of Markov chains – Baum, Petrie, et al. - 1970
410 A New View of the EM Algorithm that Justifies Incremental and Other Variants“, Learning in Graphical Models – Neal, Hinton - 1993
394 A statistical approach to machine translation – Brown, Cocke, et al. - 1990
391 Class-Based n-gram Models of Natural Language – Brown, Pietra, et al. - 1992
371 Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids – Dubin, Eddy, et al. - 1998
336 Inducing Features of Random Fields – Pietra, Pietra, et al. - 1995
332 Hidden Markov models in computational biology. Applications to protein modeling – Krogh, Brown, et al. - 1994
293 Linear Pattern Matching Algorithm – Weiner - 1973
285 Generalized iterative scaling for log-linear models – Darroch, Ratcliff - 1972
257 A program for aligning sentences in bilingual corpora – Gale, Church - 1993
250 Theory of point estimation – Lehmann - 1983
231 Unsupervised Word Sense Disambiguation Rivaling Supervised Methods – Yarowsky - 1995
218 A maximum entropy model for part-ofspeech tagging – Ratnaparkhi - 1996
157 Sequential updating of conditional probabilities on directed graphical structures – Spiegelhalter, Lauritzen - 1990
142 I-divergence geometry of probability distributions and minimization problems,” The Annuals of Probability – Csiszar - 1975
110 A comparison of algorithms for maximum entropy parameter estimation – Malouf - 2002
102 Dimensions of Meaning – Schütze - 1992
101 The mathematics of machine translation: Parameter estimation – Brown, Pietra, et al. - 1993
98 Learning with Labeled and Unlabeled Data – Seeger - 2000
96 Information geometry and alternating minimization procedures,” Stat – Csiszár, Tusnády - 1984
79 Information geometry of the EM and em algorithms for neural networks – Amari - 1995
77 Unsupervised learning from dyadic data – Hofmann, Puzicha - 1998
72 Maximum likelihood estimation via the ECM algorithm: a general framework – Meng, Rubin - 1993
65 On a least-squares adjustment of a sampled frequency table when the expected marginal totals are known – Deming, Stephan - 1940
61 Learning to paraphrase: An unsupervised approach using multiple-sequence alignment – Barzilay, Lee - 2003
56 Markov fields and loglinear interaction models for contingency tables – Darroch, Lauritzen, et al. - 1980
56 An IR Approach for Translating New Words from Nonparallel, Comparable Texts – Fung, Yee - 1998
51 Statistical methods and linguistics – Abney - 1996
51 Automatic Identification of Word Translations from Unrelated English and German Corpora – Rapp - 1999
39 M.I.An Introduction to Probabilistic Graphical Models – JORDAN
36 Identifying word translations in nonparallel texts – Rapp - 1995
34 Using unlabeled data to improve text classification – Nigam - 2001
26 Finding terminology translations from non-parallel corpora – Fung, McKeown - 1997
22 Estimating Word Translation Probabilities From Unrelated Monolingual Corpora Using the EM Algorithm – Koehn, Knight - 2002
19 S.: A Statistical Word-Level Translation Model for Comparable Corpora – Diab, Finch - 2000
17 A geometric interpretation of Darroch and Ratcliff's generalized iterative scaling – Csiszar - 1989
17 K.: Learning a Translation Lexicon from Monolingual Corpora – Koehn, Knight - 2002
16 Continuation methods for mixing heterogenous sources – Corduneanu, Jaakkola
15 Alternating minimization and Boltzmann machine learning – Byrne - 1992
11 Input-dependent regularization of conditional density models – Seeger - 2001
9 Stable mixing of complete and incomplete information – Corduneanu, Jaakkola - 2001