17 citations found. Retrieving documents...
H. Ney and U. Essen. On smoothing techniques for bigram-based natural language modelling. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, volume 2, pages 825--828, Toronto, May 1991.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Rational Interpolation Of Maximum Likelihood.. -.. (1997)   (Correct)

....between reliable predictors (i.e. the relevant word history has a reasonably large # value) and less reliable predictors. Thus, in numerous situations the contributions of ML predictors P (wjv) will not be weighted in an optimal way. Several authors proposed alternative interpolation schemes [10, 13] incorporating certain predictor weights into the model Eq. 3) which are expected to appropriately reflect our confidence in the component probability estimators. These models, however, abandoned the use of interpolation along with its benefit of (cross validation ) datadriven adaptation of the ....

H. Ney and U. Essen. On Smoothing Techniques for Bigram-Based Natural Language Modelling. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 825--828, Toronto, 1991.


Mlp Emulation Of N-Gram Models As A First Step To.. - Castro, Prat.. (1992)   (Correct)

.... trigram models, maximum likelihood estimation of parameters can be unreliable (for instance, an important fraction of possible trigrams is usually not present in the available training data) In order to alleviate this problem, some techniques can be applied: Different smoothing techniques [1,4,5] are usually employed in order to assign reliable probability estimates to N grams which are not frequent or not present in the training data. The number of parameters can be significantly reduced (and, hence, their estimation made more reliable) by clustering linguistic units into classes [1, ....

H. Ney and U. Essen. On Smoothing Techniques for Bigram-Based Natural Language Modeling. In Proc. ICASSP'91, pages 825-828, Toronto (Canada), 1991.


Dependency Language Modeling - Stolcke, Chelba, Engle, Jimenez.. (1997)   (8 citations)  (Correct)

....count for unigrams and trigrams was two. We refer to such a model as a 2 4 2 trigram. For comparison, the standard Switchboard backoff model is a 1 1 2 trigram. For smoothing in both backoff and ME models, the N gram counts were discounted using the absolute discounting scheme described in [19]. Table 3 summarizes word error rates obtained for various N gram models, of both backoff and maximum entropy types. All results were obtained using the same language model weight (12) and insertion penalty ( 10) For the bigram, backoff and ME models perform almost identically. In the case of ....

Herman Ney and Ute Essen. On smoothing techniques for bigram-based natural language modelling. In Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing, pages 825--828, Toronto, May 1991.


Symbolic Parsing and Probabilistic Decision Making. The Speech.. - Weber, Görz (1999)   (Correct)

....count values. Parameters are then estimated as PC (X) r C (X) r C G(r C (X) P Y G(r C (Y ) 7) where r C (X) r(C; X) and r C = r(C) 8) All of the techniques developed for n gram statistics can be used as smoothing procedures: as relative, absolute or pooling discounting ( NE91] and their simpler context back off variants ( Kat87] as well as linear interpolation with simpler contexts ( STHKN95] The equations express that estimation and smoothing should apply in a context specific manner. Such procedures can be used either with or without a given grammar. If no ....

H. Ney and U. Essen. On smoothing techniques for bigram-based natural language modelling. In IEEE International Conference on Acoustics, Speech and Signal Processing 1991, pages 825--828, 1991.


Dependency Language Modeling - Stolcke, Chelba, Engle, Jimenez.. (1997)   (8 citations)  (Correct)

....count for unigrams and trigrams was two. We refer to such a model as a 2 4 2 trigram. For comparison, the standard Switchboard backoff model is a 1 1 2 trigram. For smoothing in both backoff and ME models, the N gram counts were discounted using the absolute discounting scheme described in [21]. Table 12 summarizes word error rates obtained for various N gram models, of both the backoff and the maximum entropy varieties. All results were obtained using the same language model weight (12) and insertion penalty ( 10) For the bigram, backoff and ME models perform almost identically. In ....

Herman Ney and Ute Essen. On smoothing techniques for bigram-based natural language modelling. In Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing, pages 825--828, Toronto, May 1991.


MLP Emulation of N-Gram Models as a First Step to.. - Castro, Prat.. (1999)   (Correct)

.... trigram models, maximum likelihood estimation of parameters can be unreliable (for instance, an important fraction of possible trigrams is usually not present in the available training data) In order to alleviate this problem, some techniques can be applied: Different smoothing techniques [1, 4, 5] are usually employed in order to assign reliable probability estimates to N grams which are not frequent or not present in the training data. The number of parameters can be significantly reduced (and, hence, their estimation made more reliable) by clustering linguistic units into classes [1, ....

H. Ney and U. Essen. On Smoothing Techniques for Bigram-Based Natural Language Modeling. In Proc. ICASSP'91, pages 825--828, Toronto (Canada), 1991.


Automatic Acquisition of Language Models for Speech Recognition - McCandless (1994)   (3 citations)  (Correct)

....Other approaches try to steal more probability mass from infrequent n grams, for example back off smoothing [27] which only removes probability mass from those n grams that occurred fewer than a certain cutoff number of times. There are various other approaches, such as nonlinear discounting [33], and Good Turing [21, 13] When one uses an n gram language model to generate random sentences, as shown in Tables 2.1, 2.2 and 2.3, very few of the resulting sentences could be considered valid. This indicates that n gram models leave substantial room for improvement. The sentences for the ....

H. Ney and U. Essen. "On smoothing techniques for bigram-based natural language modeling", Proc. International Conference on Acoustics, Speech, and Signal Processing, 825--828, May 1991.


Rational Interpolation Of Maximum Likelihood.. -.. (1997)   (Correct)

....between reliable predictors (i.e. the relevant word history has a reasonably large # value) and less reliable predictors. Thus, in numerous situations the contributions of ML predictors P (wjv) will not be weighted in an optimal way. Several authors proposed alternative interpolation schemes [10, 13] incorporating certain predictor weights into the model Eq. 3) which are expected to appropriately reflect our confidence in the component probability estimators. These models, however, abandoned the use of interpolation along with its benefit of (cross validation ) datadriven adaptation of the ....

H. Ney and U. Essen. On Smoothing Techniques for Bigram-Based Natural Language Modelling. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 825--828, Toronto, 1991.


Parsing N Best Trees from a Word Lattice - Weber, Spilker, Görz   (Correct)

....Grammars Deep linguistic analysis based on word lattices is still a difficult problem. There exist a couple of parsing algorithms for context free grammars, as e.g. variants of generalized LR parsing [Tomita 1986] or variants of Earley s parser [Paeseler 1988] and Cocke Kasami Younger parsers [Ney 1991] which are of cubic complexity 2 . Some of these have been extended to unification grammars as in 1 This work was funded by the German Federal Ministry for Research and Technology (BMFT) in the framework of the Verbmobil Project under Grant BMFT 01 IV 101 H 9. The responsibility for the ....

....ffffifl, and the grammar rule i: X fffifl and the rule j: fi ffi , where fi is the k th nonterminal in the right hand side of rule i, we say: rule j occurred, given i and k. So for each pair i; k we estimate a distribution P i;k : j 7 [0; 1] The model was smoothed using absolute discounting [Ney and Essen 1991], redistributing the saved probability mass equally to the unobserved events as a floor value. 6 5 Parsing a Lattice as a Chart We parse the lattice using the standard mapping of frame numbers to chart edges. Our parsing schema is the well known left to right active chart parsing with pruning on ....

Ney, Hermann and Essen, Ute. 1991. On smoothing techniques for bigram-based natural language modelling. IEEE Transactions ASSP, 39:825--828, February.


Ergodic Hidden Markov Models And Polygrams For.. - Kuhn, Niemann.. (1994)   (3 citations)  (Correct)

....of words. On the other hand, there are two drawbacks of the category model. First, a set of categories has to be defined and second, the training corpus has to be tagged in advance. Both problems can be solved automatically using agglomerative clustering methods [8] iterative Viterbi alignment [12], or the Baum Welch re estimation procedure [11, 3] In this paper we present two new methods for statistical language modeling. The concept of the Markov bigrams enables an unsupervised learning procedure of a bigram model based on word categories using an ergodic discrete density HMM. The ....

H. Ney and U. Essen. On Smoothing Techniques for Bigram-Based Natural Language Modelling. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pages 825--828, Toronto, 1991.


Radiological Reporting Based on Voice Recognition - Antoniol, Fiutem, Flor, Lazzari (1993)   (Correct)

....words. As the system is speaker dependent, a short training procedure for adapting the recognizer to the radiologist s voice is needed. The lexicon used by the recognizer is medium sized (2500 5000 words) and it is integrated with a statistical language model based on bigram probabilities [10, 4]; run time modification and user adaptation of the lexicon and language model (LM) are possible, through the insertion of new words and new bigrams. When the system is turned on, it remains ready in a pause state, providing an empty report form. Variable text insertion by voice is made possible, ....

H. Ney and U. Essen. On Smoothing Techniques for Bigram-Based Natural Language Modelling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 825--828, Toronto, Canada, 1991. IRST Technical Report 9307-17, July 93.


Analyzing And Improving Statistical Language Models For Speech.. - Ueberla (1994)   (2 citations)  (Correct)

....the LOB corpus is an adequate choice. It contains about one million words and is small compared to, for example, the Wall Street Journal Corpus (50 million words) or the British National Corpus (100 million words) Moreover, many researchers working on language modeling also use the LOB ( 85] [109], 34] 106] 78] 110] Even though we are building language models for speech recognition, the corpus is constructed from written text. This is common practice, mainly for practical reasons. Large quantities of written text are already in a format that can be used for a corpus, whereas the ....

H. Ney and U. Essen. On smoothing techniques for bigram-based natural language modeling. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, pages 825--828. Toronto, ON, 1991.


An Overview of Corpus-Based Statistics-Oriented (CBSO).. - Su, Chiang, Chang (1996)   (Correct)

....in NLP tasks. For instance, semantic scores for disambiguating syntactic ambiguities have been exploited based on certain semantic classes [Su 88, Chang 92, Chiang 96] where classes are defined in terms of semantic categories. Automatic clustering is also of great interest in many frameworks [Ney 91, Brown 92, Chang 93] In those approaches, automatic clustering has been carried out to optimize a characteristic function, such as mutual information or perplexity. The automatic clustering mechanisms used in these frameworks can be viewed as variants of the hierarchical clustering and dynamic ....

Ney, H. and U. Essen, "On smoothing Techniques for Bigram-based Natural Language Modelling, in Proceedings of the IEEE 1991 International Conference on Acoustic, Speech, and Signal Processing, pp. 251-258, Toranto, 1991.


Log-Linear Interpolation of Language Models - Gutkin (2006)   (Correct)

No context found.

H. Ney and U. Essen. On smoothing techniques for bigram-based natural language modelling. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, volume 2, pages 825--828, Toronto, May 1991.


Log-Linear Interpolation of Language Models - Gutkin (2000)   (Correct)

No context found.

H. Ney and U. Essen. On smoothing techniques for bigram-based natural language modelling. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, volume 2, pages 825--828, Toronto, May 1991.


Language Modeling With Sentence-Level Mixtures - Iyer (1994)   (11 citations)  (Correct)

No context found.

H. Ney and U. Essen, "On Smoothing Techniques for Bigram-Based Natural Language Modeling", Proc. Int'l. Conf. on Acoust., Speech and Signal Proc., pp. 825-828, 1991.


Improving And Predicting Performance Of Statistical Language.. - Iyer (1998)   (2 citations)  (Correct)

No context found.

H. Ney and U. Essen. "On Smoothing Techniques for Bigram-Based Natural Language Modeling." Proceedings of the International Conference on Acoustics, Speech and Signal Processing, volume 2, pages 825--828, 1991.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC