Download:
|
by Yoshihiko Gotoh, Michael M. Hochberg, Harvey F. Silverman
http://www.dcs.shef.ac.uk/~yg/./papers_ps1/sap98.ps.gz
Add To MetaCart
Abstract:
Typically, parameter estimation for a hidden Markov model (HMM) is performed using an expectation-maximization (EM) algorithm with the maximum-likelihood (ML) criterion. The EM algorithm is an iterative scheme which is well-defined and numerically stable, but convergence may require a large number of iterations. For speech recognition systems utilizing large amounts of training material, this results in long training times. This paper presents an incremental estimation approach to speed-up the training of HMMs without any loss of recognition performance. The algorithm selects a subset of data from the training set, updates the model parameters based on the subset, and then iterates the process until convergence of the parameters. The advantage of this approach is a substantial increase in the number of iterations of the EM algorithm per training token which leads to faster training. In order to achieve reliable estimation from a small fraction of the complete data set at each iteration, two training criteria are studied; ML and maximum a posteriori (MAP) estimation. Experimental results show that the training of the incremental algorithms is substantially faster than the conventional (batch) method and suffers no loss of recognition performance. Furthermore, the incremental MAP
Citations
|
4344
|
Maximum likelihood from incomplete data via the EM algorithm
– Dempster, Laird, et al.
- 1977
|
|
2961
|
Pattern Classification and Scene Analysis
– Duda, Hart
- 1973
|
|
628
|
Statistical decision theory and Bayesian analysis
– Berger
- 1985
|
|
593
|
Hierarchical Mixtures of Experts and the EM algorithm
– Jordan, Jacobs
- 1993
|
|
415
|
A maximization technique occurring in statistical analysis of probabilistic functions of Markov chains
– Baum, Petrie, et al.
- 1970
|
|
410
|
A New View of the EM Algorithm that Justifies Incremental and Other Variants“, Learning in Graphical Models
– Neal, Hinton
- 1993
|
|
374
|
Mixture densities, maximum likelihood and the em algorithm
– Redner, Walker
- 1984
|
|
317
|
Connectionist Speech Recognition: A Hybrid Approach
– Bourlard, Morgan
- 1994
|
|
307
|
Optimal Statistical Decisions
– DeGroot
- 1970
|
|
269
|
Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains
– Gauvain, Lee
- 1994
|
|
234
|
Bayesian Inference in Statistical Analysis
– Box, Tiao
- 1973
|
|
101
|
Multi-stream speech recognition
– Bourlard, Dupont, et al.
- 1996
|
|
40
|
On-line estimation of hidden Markov model parameters based on the Kullback-Leibler information measure
– Krishnamurthy, Moore
- 1993
|
|
38
|
Smooth on-line learning algorithms for hidden Markov models
– Baldi, Chauvin
- 1994
|
|
14
|
On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive bayes estimate
– Huo, Lee
- 1997
|
|
13
|
Sequential algorithms for parameter estimation based on the Kullback-Leibler information measure
– Weinstein, Feder, et al.
- 1990
|
|
13
|
Hidden Markov models in molecular biology: new algorithms and applications
– Baldi, Chauvin, et al.
- 1993
|
|
9
|
On the implementation and computation of training an HMM recognizer having explicit state durations and multiple-feature-set tied-mixture observation probabilities
– Silverman, Gotoh
- 1994
|
|
4
|
A Comparison of State-Duration Modeling Techniques for Connected Speech Recognition
– Hochberg
- 1993
|
|
1
|
Mixture density estimators in Viterbi training
– Wellekens
- 1992
|
|
1
|
MAP estimation of HSMM state duration parameters from exponential family distribution
– Gotoh, Hochberg, et al.
|