#### DMCA

## A tutorial on hidden Markov models and selected applications in speech recognition (1989)

### Cached

### Download Links

- [www.stat.ucla.edu]
- [www-2.cs.cmu.edu]
- [www.gao.ece.ufl.edu]
- [www.cc.gatech.edu]
- [www.cs.cmu.edu]
- [astro.temple.edu]
- [www.cc.gatech.edu]
- [www.cs.cmu.edu]
- [astro.temple.edu]
- [www.mne.psu.edu]
- [www.dca.fee.unicamp.br]
- [www.ece.iastate.edu]
- [home.engineering.iastate.edu]
- [www.ece.iastate.edu]
- [www.ece.iastate.edu]
- [cs.utsa.edu]
- [cs.utsa.edu]
- CiteULike

### Other Repositories/Bibliography

Venue: | PROCEEDINGS OF THE IEEE |

Citations: | 5890 - 1 self |

### Citations

1138 | A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. - Baum, Petrie, et al. - 1970 |

643 | Linear prediction: A tutorial review,” - Makhoul - 1975 |

631 | An inequality and associated maximization technique in statistical estimation for probabilistic functions of a Markov process. - Baum - 1972 |

477 | A Maximum Likelihood Approach to Continuous Speech Recognition. - Bahl, Jelinek, et al. - 1983 |

251 | An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition, - Levinson, Rabiner, et al. - 1983 |

249 | Continuous speech recognition by statistical methods,” - Jelinek - 1976 |

223 |
Maximum mutual information estimation of hidden Markov model parameters for speech recognition,” in
- Bahl, Brown, et al.
- 1986
(Show Context)
Citation Context ...o alleviate this type of problem, there has been proposed at least :WO alternatives to the standard maximum likelihood (ML) optimization procedure for estimating HMM parameters. The first alternative =-=[32]-=- is based on the idea that several HMMs are to be designed and we wish to design them all at the same time in such a way so as to maximize the discrimination power of each model (i.e., each model’s ab... |

210 | Speech analysis and synthesis by linear prediction of the speech wave,” - Atal, Hanauer - 1971 |

189 |
Continuously variable duration hidden Markov models for automatic speech recognition,”
- Levinson
- 1986
(Show Context)
Citation Context ...or variable duration HMMs than for the standard HMM. One proposal to alleviate some of these problems is to use a parametric state duration density instead of the nonparametric p,(d) used above [29], =-=[30]-=-. In particular, proposals include the Gaussian family with pm = X(d, PI, a:) with parameters p, and of, or the Gamma family with ,;dv! - le-tt!d pJd) = rw (82) (83) with parameters V, and 7, and with... |

140 |
A probabilistic distance measure for hidden Markov models
- Juang, Rabiner
- 1985
(Show Context)
Citation Context ...ppear in IEEE TRANSACTIONS ON INFORMATION THEORY. effectively being used. None of the approaches, however, assumes that the source has the probability distribution of the model. F. Comparison of HMMs =-=[34]-=- An interesting question associated with HMMs is the following: Given two HMMs, X1 and X2, what is a reasonable measure of the similarity of the two models? A key point here is the similarity criterio... |

137 | sequential decoding algorithm using a stack,” - Jelinek - 1969 |

114 |
Maximum likelihood estimation for multivariate observations of Markov sources,”
- Liporace
- 1982
(Show Context)
Citation Context ...sity function (pdf) to insure that the parameters of the pdf can be reestimated in a consistent way. The most general representation of the pdf, for which a reestimation procedure has been formulated =-=[24]-=--[26], is a finite mixture of the form M b,(o) = C c/mxtO, p/m, U,], m=l 1 5 j 5 N (49) whereoisthevector being modeled,c,,,,isthemixturecoefficient for the mth mixture in state/ and 31. is any log-co... |

113 |
Large-vocabulary speaker-independent continuous speech recognition: The sphinx system,”
- Lee
- 1988
(Show Context)
Citation Context ... number of problems in speech recognition including the estimation of trigram word probabilities for language models [13], and the estimation of HMM output probabilities for trigram phone models [37, =-=[38]-=-. Another way of handling the effects of insufficient training data is to add extra constraints to the model parameters to insure that no model parameter estimate falls below a specified level. Thus, ... |

95 |
Explicit modelling of state occupancy in Hidden Markov Models for automatic speech recognition.
- Russell, Cook
- 1987
(Show Context)
Citation Context ...cult for variable duration HMMs than for the standard HMM. One proposal to alleviate some of these problems is to use a parametric state duration density instead of the nonparametric p,(d) used above =-=[29]-=-, [30]. In particular, proposals include the Gaussian family with pm = X(d, PI, a:) with parameters p, and of, or the Gamma family with ,;dv! - le-tt!d pJd) = rw (82) (83) with parameters V, and 7, an... |

88 | Linear predictive hidden Markov models and the speech signal,” in - Poritz - 1982 |

82 |
The Harpy Speech Understanding System’,
- Lowerre, Reddy
- 1979
(Show Context)
Citation Context ...iterion is to use a separate training sequence of observations 0’ to derive model parameters for each model A,. Thus the standard ML optimization yields (84) The proposed alternative design criterion =-=[31]-=- is the maximum mutual information (MMI) criterion in which the average mutual information l between the observation sequence 0’ and the complete set of models X = (A1, A, ... , h,) is maximized. One ... |

79 | Design of a linguistic statistical decoder for the recognition of continuous speech. - Jelinek, Bahl, et al. - 1975 |

77 |
Maximum likelihood estimation for multivariate mixture observations of Markov chains
- Juang, Levinson, et al.
- 1986
(Show Context)
Citation Context ...ion procedure. This is the case because any HMM parameter set to zero initially, will remain at zero throughout the reestimation procedure (see (44)). A. Continuous Observation Densities in HMMs 1241-=-=[26]-=- All of our discussion, to this point, has considered only the case when the observations were characterized as discrete symbols chosen from a finite alphabet, and therefore we could use a discrete pr... |

76 |
Mixture autoregressive hidden Markov models for speech signals,”
- Juang, Rabiner
- 1985
(Show Context)
Citation Context ...ortion of the observation vector accounted for by the kth mixture component. A similar interpretation can be given for the reestimation term for the covariance matrix u/k. B. Autoregressive HMMS [27J =-=[28]-=- Although the general formulation of continuous density HMMs is applicable to a wide range of problems, there is one other very interesting class of HMMs that is particularly applicable to speech proc... |

74 | M.M.: Maximum likelihood estimation for mixture multivariate stochastic observations of markov chains. - Juang, Levinson, et al. - 1986 |

66 |
A segmential K-means training procedure for connected word recognition.
- Rabiner, Wilpon, et al.
- 1986
(Show Context)
Citation Context ...us density model are comparable. Finally Table 1 shows that the autoregressive density HMM gives poorer performance than the standard mixture density model. VII. CONNECTED WORD RECOGNITION USING HMMs =-=[59]-=-- 1631 A somewhat more complicated problem of speech recognition, to which HMMs have been successfully applied, is the problem of connected word recognition. The basic premise of connected word recogn... |

57 | Growth functions for transformations on manifolds, - Baum, Sell - 1968 |

56 | On the use of bandpass liftering in speech recognition
- Juang, Rabiner, et al.
- 1987
(Show Context)
Citation Context ...ere Q > p and Q = 12 in the results to be described later in this section. 6) Cepstral Weighting: The Q-coefficient cepstral vector c,(m) at time frame Pis weighted by a window WJm) of the form [55], =-=[56]-=- to give W,(m) = 1 + - sin . (y), 1 5 m 5 Q (115) 2 - - N M w(n) P g(n) BLOCK Xt(n) XI(”) AUTO- Re(m) LPC/ aL(m) 1-az-’ INTO CEPSTRAL FRAMES FRAME ANALYSIS ck(m) e,(m) = c,(m) - W,(m). (116) 7) Delta ... |

54 |
Recognition of isolated digits using hidden Markov models with continuous mixture densities
- Rabiner, Juang, et al.
- 1985
(Show Context)
Citation Context ...the mixture gains clm as well as the diagonal covariance coefficients Ulm(r, r) to be greater than or equal to some minimum values (we use in all cases). F. Segmental k-Means Segmentation into States =-=[42]-=- We stated earlier that good initial estimates of the parameters of the bi(O,) densities were essential for rapid and proper convergence of the reestimation formulas. Hence a procedure for providing g... |

48 | Decoding for channels with insertions, deletions, and substitutions with applications to speech recognition - Bahl, Jelinek - 1975 |

40 | A minimum discrimination information approach for hidden Markov modeling.
- Ephra€ım, Dembo, et al.
- 1989
(Show Context)
Citation Context ...ond alternative philosophy is to assume that the signal to be modeled was not necessarily generated by a Markovsource, but does obey certain constraints (e.g., positive definite correlation function) =-=[33]-=-. The goal of the design procedure is therefore to choose HMM parameters which minimize the discrimination information (DI) or the cross entropy between the set of valid (i.e., which satisfy the measu... |

28 | BYBLOS: The BBN Continuous Speech Recognition System - Chow - 1987 |

27 | Continuous Speech Word Recognition via Centi-Second Acoustic States - Bakis - 1976 |

26 | A speaker-independent, syntax-directed, connected word recognition system based on hidden Markov models and level building, - Rabiner, Levinson - 1985 |

26 | Structural Methods in Automatic Speech Recognition - Levinson - 1985 |

23 |
Some properties of continuous hidden Markov model representations
- Rabiner, Juang, et al.
- 1985
(Show Context)
Citation Context ...the B parameters, experience has shown that good initial estimates are helpful in the discrete symbol case, and are essential (when dealing with multiple mixtures) in the continuous distribution case =-=[35]-=-. Such initial estimates can be obtained in a number of ways, including manual segmentation of the observation sequence($ into states with averaging of observations within states, maximum likelihood s... |

22 |
Integration of Acoustic Information in a Large Vocabulary Word Recognizer
- Gupta, Lennig, et al.
- 1987
(Show Context)
Citation Context ...ive to be as thorough or as complete in our descriptions as to what was done as we were in describing the theory of HMMs. The interested reader should read the material in [6], [IO], [12], [13], [39]-=-=[46]-=- for more complete descriptions of individual systems. Our main goal here is to show how specific aspects of HMM theoryget applied, not to make the reader an expert in speech recognition technology. A... |

18 | Speaker-Stress Resistant HMM Isolated Word Recognizer - Paul - 1987 |

17 | An improved word-detection algorithm for telephone-quality speech incorporating both syntactic and semantic constraints - Wilpon, Rabiner, et al. - 1984 |

16 | speaker-dependent connected speech recognition via dynamic programing and statistical methods - Bourlard, Kamp, et al. - 1985 |

10 | Application of hidden markov models to automatic speech endpoint detection - Wilpon, Rabiner - 1987 |

9 |
A weighted cepstral distance measure for speech recognition..IEEE Transactions on acoustics, speech and signal processing 35
- Tokhura
- 1987
(Show Context)
Citation Context ...nt, where Q > p and Q = 12 in the results to be described later in this section. 6) Cepstral Weighting: The Q-coefficient cepstral vector c,(m) at time frame Pis weighted by a window WJm) of the form =-=[55]-=-, [56] to give W,(m) = 1 + - sin . (y), 1 5 m 5 Q (115) 2 - - N M w(n) P g(n) BLOCK Xt(n) XI(”) AUTO- Re(m) LPC/ aL(m) 1-az-’ INTO CEPSTRAL FRAMES FRAME ANALYSIS ck(m) e,(m) = c,(m) - W,(m). (116) 7) ... |

4 | Context-dependent phonetic Markov models for large vocabu!ary speech recognition,” in - Derouault - 1987 |

4 | The Viterbi algorithm - Theory - 1967 |

3 | Vector quantization and Markov source models applied to speech recognition - Billi - 1982 |

3 | Multistyle Training for Robust Isolated Word Speech Recognition - Lippmann, Martin - 1987 |

3 | et al., “Experiments with the TANGORA 20,000 word speech recognizer - Averbuch - 1987 |

3 | A model-based connected digit recognition system using either hidden Markov models or templates - Rabiner, Wilpon, et al. - 1986 |

3 | An introduction to hidden markov models.IEEE ASSP Magazine, - Rabiner, Juang - 1986 |

2 | Statistical inference for probabilisticfunctionsof finite state - Baum, Petrie - 1967 |

2 | Speech recognition with very large size dictionary - Merialdo - 1987 |

1 |
Sondhi,“On theapplication of vector quantization and hidden Markov models to speaker-independent isolated word recognition,” Bell Syst
- Rabiner, Levinson, et al.
(Show Context)
Citation Context ...ing parameters are rescaled so that the densities obey the required stochastic constraints. Such post-processor techniques have been applied to several problems in speech processing with good success =-=[39]-=-. It can be seen from (112) that this procedure is essentially equivalent to a simple form of deleted interpolation in which the model X’ is a uniform distribution model, and the interpolation value E... |

1 | the use of hidden Markov models for speaker-independent recognition of isolated words from a medium-size vocabulary - “On - 1984 |

1 | Isolated word recognition - Poritz, Richter - 1986 |

1 | Analysis-synthesis telephony based upon the maximum likelihood method - Saito - 1968 |

1 |
Rosenberg,“On the useof instantaneous and transitional spectral information in speaker recognition
- E
- 1986
(Show Context)
Citation Context ...derivative of the sequence of weighted cepstral vectors is approximated by a first-order orthogonal polynomial over a finite length window of (2K + 1) frames, centered around the current vector [571, =-=[58]-=-. (K = 2 in the results to be presented; hence a 5 frame window is used for the computation of the derivative.) The cepstral derivative (i.e., the delta cepstrum vector) is computed as At,(m) = [ ki-,... |

1 | Global connected digit recognition using Baum-Welch algorithm - Wellekens - 1986 |

1 | Vector quantization in speech coding,”Proc - Makhoul, Roucos, et al. - 1985 |

1 | Discrete-state Markov processes,” Chapter 5 in Fundamentals ofApplied Probability Theory - Drake - 1967 |

1 | Maximum likevol - Dempster, Laird, et al. |

1 | et al., “Context-dependent modeling for acoustic-phonetic recognition of continuous speech - Amsterdam - 1980 |

1 | A speaker stress resistant H M M isolated word recognizer - lCASSP - 1987 |

1 | Averbuch et al., “Experiments with the TANGORA 20,000 word speech recognizer - unknown authors - 1987 |

1 | Global connected digit recognition using Baum-Welch algorithm - J - 1986 |

1 | Rabiner (Fellow, IEEE) was born cuitry, military communications problems, and problems in in Brooklyn, NY,on September28,1943. He binaural hearing. Presently he is engaged in research on speech received the S.B. and S.M. degrees, both in recognition and d - MODELS - 1978 |

1 | through 1964 he participated Dr. Rabiner i s a member of Eta Kappa Nu, Sigma Xi, Tau Beta Pi, in the cooperative plan in electrical engi- The National Academy of Engineering, and a Fellow of the Acoustneering at Bell Laboratories, Whippany, and ical Socie - From - 1962 |