This report describes a new technique for inducing the structure of Hidden Markov Models from data which is based on the general `model merging ' strategy (Omohundro 1992). The process begins with a maximum likelihood HMM that directly encodes the training data. Successively more general models are produced by merging HMM states. A Bayesian posterior probability criterion is used to determine which states to merge and when to stop generalizing. The procedure may be considered a heuristic search for the HMM structure with the highest posterior probability. We discuss a variety of possible priors for HMMs, as well as a number of approximations which improve the computational efficiency of the algorithm. We studied three applications to evaluate the procedure. The first compares the merging algorithm with the standard Baum-Welch approach in inducing simple finitestate languages from small, positive-only training samples. We found that the merging procedure is more robust and accurate, particularly with a small amount of training data. The second application uses labelled speech data from the TIMIT database to
|
4364
|
Elements of Information Theory
– Cover, Thomas
- 1991
|
|
4345
|
Maximum likelihood from incomplete data via the EM algorithm
– Dempster, Laird, et al.
- 1977
|
|
2771
|
Introduction to Automata Theory, Languages and Computation
– Hopcroft, Ullman
- 1979
|
|
675
|
E: A Bayesian method for the induction of probabilistic networks from data
– GF, Herskovits
- 1992
|
|
629
|
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm
– Viterbi
- 1967
|
|
545
|
An introduction to hidden markov models
– Rabiner, Juang
- 1986
|
|
518
|
Estimation of probabilities from sparse data for the language model component of a speech recognizer
– Katz
- 1987
|
|
415
|
A maximization technique occurring in the statistical analysis of probabilistic function of Markov chains
– Baum, Petrie, et al.
- 1970
|
|
391
|
Class-Based n-gram Models of Natural Language
– Brown, Pietra, et al.
- 1992
|
|
374
|
Mixture densities, maximum likelihood and the em algorithm
– Redner, Walker
- 1984
|
|
317
|
Connectionist Speech Recognition: A Hybrid Approach
– Bourlard, Morgan
- 1994
|
|
283
|
A practical part-of-speech tagger
– Cutting, Kupiec, et al.
- 1992
|
|
280
|
A universal prior for integers and estimation by minimum description length
– Rissanen
- 1983
|
|
265
|
Inferring decision trees using the minimum description length principle
– Quinlan, Rivest
- 1989
|
|
256
|
Inductive inference: theory and methods
– Angluin, Smith
- 1983
|
|
248
|
The estimation of stochastic context-free grammars using the inside-outside algorithm
– Lari, Young
- 1990
|
|
236
|
Interpolated Estimation of Markov Source Parameters from Sparse Data
– Jelinek, Mercer
- 1980
|
|
213
|
AuTOCLASS: A Bayesian classification system
– Cheeseman, Kelly, et al.
- 1988
|
|
213
|
Inside-outside reestimation from partially bracketed corpora
– Pereira, Schabes
- 1992
|
|
156
|
Estimation and inference by compact coding
– Wallace, Freeman
- 1987
|
|
131
|
Theory refinement on Bayesian networks
– Buntine
- 1991
|
|
109
|
Hidden markov model induction by bayesian model merging
– Stolcke, Omohundro
- 1993
|
|
104
|
Learning classification trees
– Buntine
- 1993
|
|
67
|
The Power of Amnesia
– Ron, Singer, et al.
- 1994
|
|
57
|
Bayesian inductive inference and maximum entropy
– Gull
- 1988
|
|
50
|
A study of grammatical inference
– HORNING
- 1969
|
|
44
|
Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database
– Garofolo
- 1988
|
|
43
|
A statistical model for generating pronunciation networks
– Riley
- 1991
|
|
28
|
The Berkeley Restaurant Project
– JURAFSKY, WOOTERS, et al.
- 1994
|
|
27
|
Best-first model merging for dynamic learning and recognition
– OMOHUNDRO
- 1992
|
|
19
|
Identification of contextual factors for pronunciation networks
– Chen
- 1990
|
|
19
|
Bayesian learning of gaussian mixture densities of hidden markov models
– Gauvain, Lee
- 1991
|
|
13
|
Hidden Markov models in molecular biology: new algorithms and applications
– Baldi, Chauvin, et al.
- 1993
|
|
13
|
Dynamic construction of finite automata from examples using hill-climbing
– TOMITA
- 1982
|
|
9
|
Dynamic programming inference of Markov networks from finite set of sample strings
– THOMASON, GRANUM
- 1986
|
|
2
|
Mechanisms of Implicit Learning. A Parallel Distributed Processing Model of Sequence Acquisition
– Cleeremans
- 1991
|
|
2
|
Learning automata from ordered examples. Machine Learning 7.109--138
– Feldman
- 1991
|
|
2
|
Implicit learning of artifical grammars
– Reber
- 1969
|