| Peter Brown, Stephen DellaPietra, Vincent DellaPietra, Robert Mercer, Arthur Nadas and Salim Roukos. A Maximum Penalized Entropy Construction of Conditional Log-Linear Language and Translation Models Using Learned Features and a Generalized Csiszar Algorithm. Unpublished IBM research report. |
....language models very similar to conventional n gram models within the maximum entropy framework. The maximum entropy models described in Section 1. 1 are joint models; to create the conditional distributions used in conventional n gram models we use the framework introduced by Brown et al. [23]. Instead of estimating a joint distribution q(x) over samples x, we estimate a conditional distribution q(yjx) over samples (x; y) Instead of constraints as given by equation (2) we have constraints of the form X x;y p(x)q(yjx)f i (x; y) X x;y p(x; y)f i (x; y) 7) This can be ....
P. Brown, S. Della Pietra, V. Della Pietra, R. Mercer, A. Nadas, and S. Roukos, "A maximum penalized entropy construction of conditional log-linear language and translation models using learned features and a generalized csiszar algorithm," Internal IBM Report, 1992.
....methods for doing so: 1. Clustering by Linguistic Knowledge ( Jelinek 89, Derouault and Merialdo 86] 2. Clustering by Domain Knowledge ( Price 90] 1 A smoothed unigram will have a slightly higher cross entropy 3. Data Driven Clustering ( Jelinek 89, appendix C] Jelinek 89, appendix D] [Brown et al. 90b] Kneser and Ney 91] Suhm and Waibel 94] See [Rosenfeld 94b] for a more detailed exposition. 2.4 Intermediate Distance Long distance N grams attempt to capture directly the dependence of the predicted word on N 1 grams which are some distance back. For example, a distance 2 trigram ....
....is that the event space f(h; w)g is of size O(V L 1 ) where V is the vocabulary size and L is the history length. For any reasonable values of V and L, this is a huge space, and no feasible amount of training data is sufficient to train a model for it. A better method was later proposed by [Brown et al. ] Let P(h; w) be the desired probability estimate, and let P(h; w) be the empirical distribution of the training data. Let f i (h; w) be any constraint function, and let K i be its desired expectation. Equation 17 can be rewritten as: X h P(h) Delta X w P(wjh) Delta f i (h; w) K i (21) ....
Peter Brown, Stephen DellaPietra, Vincent DellaPietra, Robert Mercer, Arthur Nadas and Salim Roukos. A Maximum Penalized Entropy Construction of Conditional Log-Linear Language and Translation Models Using Learned Features and a Generalized Csiszar Algorithm. Unpublished IBM research report.
No context found.
Peter Brown, Stephen DellaPietra, Vincent DellaPietra, Robert Mercer, Arthur Nadas and Salim Roukos. A Maximum Penalized Entropy Construction of Conditional Log-Linear Language and Translation Models Using Learned Features and a Generalized Csiszar Algorithm. Unpublished IBM research report.
No context found.
Brown, P., Della Pietra, S., Della Pietra, V., Mercer, R., Nadas, A., and Roukos, S. A maximum penalized entropy construction of conditional log-linear language and translation models using learned features and a generalized Csiszar algorithm. Unpublished IBM research report.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC