Results 1  10
of
631
A tutorial on hidden Markov models and selected applications in speech recognition
 PROCEEDINGS OF THE IEEE
, 1989
"... Although initially introduced and studied in the late 1960s and early 1970s, statistical methods of Markov source or hidden Markov modeling have become increasingly popular in the last several years. There are two strong reasons why this has occurred. First the models are very rich in mathematical s ..."
Abstract

Cited by 5892 (1 self)
 Add to MetaCart
Although initially introduced and studied in the late 1960s and early 1970s, statistical methods of Markov source or hidden Markov modeling have become increasingly popular in the last several years. There are two strong reasons why this has occurred. First the models are very rich in mathematical structure and hence can form the theoretical basis for use in a wide range of applications. Second the models, when applied properly, work very well in practice for several important applications. In this paper we attempt to carefully and methodically review the theoretical aspects of this type of statistical modeling and show how they have been applied to selected problems in machine recognition of speech.
A Systematic Comparison of Various Statistical Alignment Models
 COMPUTATIONAL LINGUISTICS
, 2003
"... ..."
The Mathematics of Statistical Machine Translation: Parameter Estimation
 COMPUTATIONAL LINGUISTICS
, 1993
"... ..."
(Show Context)
An Empirical Study of Smoothing Techniques for Language Modeling
, 1998
"... We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Br ..."
Abstract

Cited by 1224 (21 self)
 Add to MetaCart
(Show Context)
We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and ngram order (bigram versus trigram) affect the relative performance of these methods, which we measure through the crossentropy of test data. In addition, we introduce two novel smoothing techniques, one a variation of JelinekMercer smoothing and one a very simple linear interpolation technique, both of which outperform existing methods. 1
ClassBased ngram Models of Natural Language
 Computational Linguistics
, 1992
"... We address the problem of predicting a word from previous words in a sample of text. In particular we discuss ngram models based on calsses of words. We also discuss several statistical algoirthms for assigning words to classes based on the frequency of their cooccurrence with other words. We find ..."
Abstract

Cited by 986 (5 self)
 Add to MetaCart
(Show Context)
We address the problem of predicting a word from previous words in a sample of text. In particular we discuss ngram models based on calsses of words. We also discuss several statistical algoirthms for assigning words to classes based on the frequency of their cooccurrence with other words. We find that we are able to extract classes that have the flavor of either syntactically based groupings or semantically based groupings, depending on the nature of the underlying statistics.
Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
, 1995
"... ..."
Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,”
 IEEE Trans. Speech Audio Process.,
, 1994
"... ..."
(Show Context)
Inducing Features of Random Fields
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 1997
"... We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the ..."
Abstract

Cited by 670 (10 self)
 Add to MetaCart
We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the KullbackLeibler divergence between the model and the empirical distribution of the training data. A greedy algorithm determines how features are incrementally added to the field and an iterative scaling algorithm is used to estimate the optimal values of the weights. The random field models and techniques introduced in this paper differ from those common to much of the computer vision literature in that the underlying random fields are nonMarkovian and have a large number of parameters that must be estimated. Relations to other learning approaches, including decision trees, are given. As a demonstration of the method, we describe its application to the problem of automatic word classifica...
Unsupervised word sense disambiguation rivaling supervised methods
 IN PROCEEDINGS OF THE 33RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 1995
"... This paper presents an unsupervised learning algorithm for sense disambiguation that, when trained on unannotated English text, rivals the performance of supervised techniques that require timeconsuming hand annotations. The algorithm is based on two powerful constraints  that words tend to have ..."
Abstract

Cited by 638 (4 self)
 Add to MetaCart
This paper presents an unsupervised learning algorithm for sense disambiguation that, when trained on unannotated English text, rivals the performance of supervised techniques that require timeconsuming hand annotations. The algorithm is based on two powerful constraints  that words tend to have one sense per discourse and one sense per collocation  exploited in an iterative bootstrapping procedure. Tested accuracy exceeds 96%.