Download:
|
by Geoffrey Zweig, Stuart Russell
http://www.cs.berkeley.edu/~russell/papers/aaai98-speech.ps
Add To MetaCart
Abstract:
Dynamic Bayesian networks (DBNs) are a useful tool for representing complex stochastic processes. Recent developments in inference and learning in DBNs allow their use in real-world applications. In this paper, we apply DBNs to the problem of speech recognition. The factored state representation enabled by DBNs allows us to explicitly represent long-term articulatory and acoustic context in addition to the phonetic-state information maintained by hidden Markov models (HMMs). Furthermore, it enables us to model the short-term correlations among multiple observation streams within single time-frames. Given a DBN structure capable of representing these long- and short-term correlations, we applied the EM algorithm to learn models with up to 500,000 parameters. The use of structured DBN models decreased the error rate by 12 to 29 % on a large-vocabulary isolated-word recognition task, compared to a discrete HMM; it also improved significantly on other published results for the same task. This is the first successful application of DBNs to a largescale speech recognition problem. Investigation of the learned models indicates that the hidden state variables are strongly correlated with acoustic properties of the speech signal.
Citations
|
880
|
and B-H Juang, Fundamentals of Speech Recognition
– Rabiner
- 1993
|
|
342
|
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
– Davis, Mermelstein
- 1980
|
|
317
|
Connectionist Speech Recognition: A Hybrid Approach
– Bourlard, Morgan
- 1994
|
|
254
|
Factorial hidden Markov models
– Ghahramani, Jordan
- 1997
|
|
157
|
Automatic Speech Recognition: The Development of the SPHINX System
– Lee
- 1989
|
|
155
|
The EM algorithm for graphical association models with missing data
– Lauritzen
- 1995
|
|
146
|
Probabilistic independence networks for hidden markov probability models
– Smyth, Heckerman, et al.
|
|
69
|
Local learning in probabilistic networks with hidden variables
– Russell, Binder, et al.
- 1995
|
|
41
|
PhoneBook: A phonetically-rich isolated-word telephone-speech database
– Pitrelli, Fong, et al.
- 1995
|
|
23
|
Hybrid HMM/ANN systems for training independent tasks: Experiments on phonebook and related improvements
– Dupont, Bourlard, et al.
- 1997
|
|
22
|
A model for reasoning about persistence and causation. Computational Intelligence 5:142–150
– Dean, Kanazawa
- 1989
|
|
12
|
A recurrent error propagation speech recognition system
– Robinson, Fallside
- 1991
|
|
5
|
Compositional modeling with DPNs
– Zweig, Russell
- 1997
|
|
4
|
Fusion and Propagation with Multiple Observations
– Peot, Shachter
- 1991
|