MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Best-first model merging for hidden Markov model induction (1994) [76 citations — 5 self]

Download:
pdf | ps
by Andreas Stolcke, Stephen M. Omohundro
ftp://ftp.icsi.berkeley.edu/pub/techreports/1994/tr-94-003.ps.gz
Add To MetaCart

Abstract:

This report describes a new technique for inducing the structure of Hidden Markov Models from data which is based on the general `model merging ' strategy (Omohundro 1992). The process begins with a maximum likelihood HMM that directly encodes the training data. Successively more general models are produced by merging HMM states. A Bayesian posterior probability criterion is used to determine which states to merge and when to stop generalizing. The procedure may be considered a heuristic search for the HMM structure with the highest posterior probability. We discuss a variety of possible priors for HMMs, as well as a number of approximations which improve the computational efficiency of the algorithm. We studied three applications to evaluate the procedure. The first compares the merging algorithm with the standard Baum-Welch approach in inducing simple finitestate languages from small, positive-only training samples. We found that the merging procedure is more robust and accurate, particularly with a small amount of training data. The second application uses labelled speech data from the TIMIT database to

Citations

4364 Elements of Information Theory – Cover, Thomas - 1991
4345 Maximum likelihood from incomplete data via the EM algorithm – Dempster, Laird, et al. - 1977
2771 Introduction to Automata Theory, Languages and Computation – Hopcroft, Ullman - 1979
675 E: A Bayesian method for the induction of probabilistic networks from data – GF, Herskovits - 1992
629 Error bounds for convolutional codes and an asymptotically optimum decoding algorithm – Viterbi - 1967
545 An introduction to hidden markov models – Rabiner, Juang - 1986
518 Estimation of probabilities from sparse data for the language model component of a speech recognizer – Katz - 1987
415 A maximization technique occurring in the statistical analysis of probabilistic function of Markov chains – Baum, Petrie, et al. - 1970
391 Class-Based n-gram Models of Natural Language – Brown, Pietra, et al. - 1992
374 Mixture densities, maximum likelihood and the em algorithm – Redner, Walker - 1984
317 Connectionist Speech Recognition: A Hybrid Approach – Bourlard, Morgan - 1994
283 A practical part-of-speech tagger – Cutting, Kupiec, et al. - 1992
280 A universal prior for integers and estimation by minimum description length – Rissanen - 1983
265 Inferring decision trees using the minimum description length principle – Quinlan, Rivest - 1989
256 Inductive inference: theory and methods – Angluin, Smith - 1983
248 The estimation of stochastic context-free grammars using the inside-outside algorithm – Lari, Young - 1990
236 Interpolated Estimation of Markov Source Parameters from Sparse Data – Jelinek, Mercer - 1980
213 AuTOCLASS: A Bayesian classification system – Cheeseman, Kelly, et al. - 1988
213 Inside-outside reestimation from partially bracketed corpora – Pereira, Schabes - 1992
156 Estimation and inference by compact coding – Wallace, Freeman - 1987
131 Theory refinement on Bayesian networks – Buntine - 1991
109 Hidden markov model induction by bayesian model merging – Stolcke, Omohundro - 1993
104 Learning classification trees – Buntine - 1993
67 The Power of Amnesia – Ron, Singer, et al. - 1994
57 Bayesian inductive inference and maximum entropy – Gull - 1988
50 A study of grammatical inference – HORNING - 1969
44 Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database – Garofolo - 1988
43 A statistical model for generating pronunciation networks – Riley - 1991
28 The Berkeley Restaurant Project – JURAFSKY, WOOTERS, et al. - 1994
27 Best-first model merging for dynamic learning and recognition – OMOHUNDRO - 1992
19 Identification of contextual factors for pronunciation networks – Chen - 1990
19 Bayesian learning of gaussian mixture densities of hidden markov models – Gauvain, Lee - 1991
13 Hidden Markov models in molecular biology: new algorithms and applications – Baldi, Chauvin, et al. - 1993
13 Dynamic construction of finite automata from examples using hill-climbing – TOMITA - 1982
9 Dynamic programming inference of Markov networks from finite set of sample strings – THOMASON, GRANUM - 1986
2 Mechanisms of Implicit Learning. A Parallel Distributed Processing Model of Sequence Acquisition – Cleeremans - 1991
2 Learning automata from ordered examples. Machine Learning 7.109--138 – Feldman - 1991
2 Implicit learning of artifical grammars – Reber - 1969