This publication can be retrieved by anonymous ftp to publications.ai.mit.edu. We present a framework for learning in hidden Markov models with distributed state representations. Within this framework, we derive a learning algorithm based on the Expectation--Maximization (EM) procedure for maximum likelihood estimation. Analogous to the standard Baum-Welch update rules, the M-step of our algorithm is exact and can be solved analytically. However, due to the combinatorial nature of the hidden state representation, the exact E-step is intractable. A simple and tractable mean field approximation is derived. Empirical results on a set of problems suggest that both the mean field approximation and Gibbs sampling are viable alternatives to the computationally expensive exact algorithm.
|
4388
|
Probabilistic Reasoning in Intelligent Systems
– Pearl
- 1988
|
|
4364
|
Elements of Information Theory
– Cover, Thomas
- 1991
|
|
4344
|
Maximum likelihood from incomplete data via the EM algorithm
– Dempster, Laird, et al.
- 1977
|
|
2322
|
Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images
– Geman, Geman
- 1984
|
|
2138
|
UCI Repository of Machine Learning Databases
– Merz, Murphy
- 1996
|
|
2103
|
A tutorial in hidden Markov models and selected applications in speech recognition
– Rabiner
- 1989
|
|
904
|
Local computations with probabilities on graphical structures and their applications to expert systems
– Lauritzen, Spiegelhalter
- 1988
|
|
655
|
UCI Repository of Machine Learning Databases [machine-readable data repository
– Murphy, Aha
- 1992
|
|
629
|
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm
– Viterbi
- 1967
|
|
593
|
Hierarchical Mixtures of Experts and the EM algorithm
– Jordan, Jacobs
- 1993
|
|
545
|
An introduction to hidden markov models
– Rabiner, Juang
- 1986
|
|
520
|
Generalized Linear Models
– McCullagh, Nelder
- 1989
|
|
489
|
Neural networks and the bias/variance dilemma
– Geman, Bienenstock, et al.
- 1992
|
|
415
|
A maximization technique occurring in statistical analysis of probabilistic functions of Markov chains
– Baum, Petrie, et al.
- 1970
|
|
410
|
A New View of the EM Algorithm that Justifies Incremental and Other Variants“, Learning in Graphical Models
– Neal, Hinton
- 1993
|
|
404
|
Statistical Analysis of Finite Mixture Distributions
– Titterington, Smith, et al.
- 1985
|
|
332
|
Hidden Markov models in computational biology. Applications to protein modeling
– Krogh, Brown, et al.
- 1994
|
|
323
|
Probabilistic inference using Markov Chain Monte Carlo methods
– Neal
- 1993
|
|
312
|
A model for reasoning about persistence and causation
– Dean, Kanazawa
- 1989
|
|
295
|
A learning algorithm for Boltzmann Machines
– Ackley, Hinton, et al.
- 1985
|
|
236
|
The calculation of posterior distributions by data augmentation
– TANNER, H
|
|
211
|
Learning and relearning in Boltzmann machines
– Hinton, Sejnowski
- 1986
|
|
195
|
D: Learning Bayesian Networks
– Heckerman, Geiger
|
|
182
|
The wake-sleep algorithm for unsupervised neural networks
– Hinton, Dayan, et al.
- 1995
|
|
171
|
Unsupervised learning
– Barlow
- 1989
|
|
150
|
Connectionist learning of belief networks
– Neal
- 1992
|
|
146
|
Probabilistic independence networks for hidden markov probability models
– Smyth, Heckerman, et al.
|
|
134
|
Stochastic simulation algorithms for dynamic probabilistic networks
– Kanazawa, Koller, et al.
- 1995
|
|
114
|
Bayesian updating in recursive graphical models by local computations. Computational Statisticals Quarterly
– Jensen, Lauritzen, et al.
- 1990
|
|
111
|
Hidden Markov models of biological primary sequence information
– Baldi, Chauvin, et al.
- 1994
|
|
109
|
Hidden markov model induction by bayesian model merging
– Stolcke, Omohundro
- 1993
|
|
108
|
Statistical field theory
– Parisi
- 1988
|
|
102
|
Mean field theory for sigmoid belief networks
– Saul, Jaakkola, et al.
- 1996
|
|
91
|
An input/output HMM architecture
– Bengio, Frasconi
- 1996
|
|
91
|
Autoencoders, minimum description length and Helmholtz free energy
– Hinton, Zemel
- 1994
|
|
77
|
Exploiting tractable substructures in intractable networks
– Saul, Jordan
- 1996
|
|
75
|
Soft Competitive Adaptation: Neural Network Learning Algorithms based on Fitting Statistical Mixtures
– Nowlan
- 1991
|
|
69
|
Applications of a general propagation algorithm for probabilistic expert systmes
– Dawid
- 1992
|
|
66
|
A multiple cause mixture model for unsupervised learning
– Saund
- 1995
|
|
57
|
A Minimum Description Length Framework for Unsupervised Learning
– Zemel
- 1994
|
|
56
|
Mixtures of controllers for jump linear and non-linear plants
– Cacciatore, Nowlan
- 1994
|
|
55
|
Boltzmann chains and hidden Markov models
– Saul, Jordan
- 1995
|
|
44
|
Hidden markov models for speech recognition
– Juang, Rabiner
- 1991
|
|
43
|
Factorial Learning and the EM Algorithm
– Ghahramani
- 1995
|
|
42
|
Hidden Markov decision trees
– Jordan, Ghahramani, et al.
- 1997
|
|
28
|
Multiple viewpoint systems for music prediction
– Conklin, Witten
- 1995
|
|
19
|
Mean field networks that learn to discriminate temporally distorted strings
– Williams, Hinton
- 1990
|
|
15
|
Learning fine motion by Markov mixtures of experts
– Meila, Jordan
- 1996
|
|
15
|
in press). Mixed memory Markov models
– Saul, Jordan
|
|
9
|
Supervised factorial learning
– Redlich
- 1993
|