4 citations found. Retrieving documents...
Peter Brown, Stephen DellaPietra, Vincent DellaPietra, Robert Mercer, Arthur Nadas and Salim Roukos. A Maximum Penalized Entropy Construction of Conditional Log-Linear Language and Translation Models Using Learned Features and a Generalized Csiszar Algorithm. Unpublished IBM research report.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
A Gaussian Prior for Smoothing Maximum Entropy Models - Chen, Rosenfeld (1999)   (25 citations)  (Correct)

....language models very similar to conventional n gram models within the maximum entropy framework. The maximum entropy models described in Section 1. 1 are joint models; to create the conditional distributions used in conventional n gram models we use the framework introduced by Brown et al. [23]. Instead of estimating a joint distribution q(x) over samples x, we estimate a conditional distribution q(yjx) over samples (x; y) Instead of constraints as given by equation (2) we have constraints of the form X x;y p(x)q(yjx)f i (x; y) X x;y p(x; y)f i (x; y) 7) This can be ....

P. Brown, S. Della Pietra, V. Della Pietra, R. Mercer, A. Nadas, and S. Roukos, "A maximum penalized entropy construction of conditional log-linear language and translation models using learned features and a generalized csiszar algorithm," Internal IBM Report, 1992.


A Maximum Entropy Approach to Adaptive Statistical Language.. - Rosenfeld (1996)   (61 citations)  (Correct)

....methods for doing so: 1. Clustering by Linguistic Knowledge ( Jelinek 89, Derouault and Merialdo 86] 2. Clustering by Domain Knowledge ( Price 90] 1 A smoothed unigram will have a slightly higher cross entropy 3. Data Driven Clustering ( Jelinek 89, appendix C] Jelinek 89, appendix D] [Brown et al. 90b] Kneser and Ney 91] Suhm and Waibel 94] See [Rosenfeld 94b] for a more detailed exposition. 2.4 Intermediate Distance Long distance N grams attempt to capture directly the dependence of the predicted word on N 1 grams which are some distance back. For example, a distance 2 trigram ....

....is that the event space f(h; w)g is of size O(V L 1 ) where V is the vocabulary size and L is the history length. For any reasonable values of V and L, this is a huge space, and no feasible amount of training data is sufficient to train a model for it. A better method was later proposed by [Brown et al. ] Let P(h; w) be the desired probability estimate, and let P(h; w) be the empirical distribution of the training data. Let f i (h; w) be any constraint function, and let K i be its desired expectation. Equation 17 can be rewritten as: X h P(h) Delta X w P(wjh) Delta f i (h; w) K i (21) ....

Peter Brown, Stephen DellaPietra, Vincent DellaPietra, Robert Mercer, Arthur Nadas and Salim Roukos. A Maximum Penalized Entropy Construction of Conditional Log-Linear Language and Translation Models Using Learned Features and a Generalized Csiszar Algorithm. Unpublished IBM research report.


A Maximum Entropy Approach - To Adaptive Statistical   (Correct)

No context found.

Peter Brown, Stephen DellaPietra, Vincent DellaPietra, Robert Mercer, Arthur Nadas and Salim Roukos. A Maximum Penalized Entropy Construction of Conditional Log-Linear Language and Translation Models Using Learned Features and a Generalized Csiszar Algorithm. Unpublished IBM research report.


A Maximum Entropy Approach to Named Entity Recognition - Borthwick (1999)   (11 citations)  (Correct)

No context found.

Brown, P., Della Pietra, S., Della Pietra, V., Mercer, R., Nadas, A., and Roukos, S. A maximum penalized entropy construction of conditional log-linear language and translation models using learned features and a generalized Csiszar algorithm. Unpublished IBM research report.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC