MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  The power of amnesia: learning probabilistic automata with variable memory length (1996) [116 citations — 14 self]

Download:
Download as a PDF | Download as a PS
by Dana Ron, Yoram Singer
Machine Learning
http://www.cs.huji.ac.il/~singer/papers/psa.ps.gz
Add To MetaCart

Abstract:

Abstract. We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions generated by general probabilistic automata, we prove that the algorithm we present can efficiently learn distributions generated by PSAs. In particular, we show that for any target PSA, the KL-divergence between the distribution generated by the target and the distribution generated by the hypothesis the learning algorithm outputs, can be made small with high confidence in polynomial time and sample complexity. The learning algorithm is motivated by applications in human-machine interaction. Here we present two applications of the algorithm. In the first one we apply the algorithm in order to construct a model of the English language, and use this model to correct corrupted text. In the second application we construct a simple stochastic model for E.coli DNA. 1.

Citations

4364 Elements of Information Theory – Cover, Thomas - 1991
4344 Maximum likelihood from incomplete data via the EM algorithm – Dempster, Laird, et al. - 1977
2103 A tutorial in hidden Markov models and selected applications in speech recognition – Rabiner - 1989
1397 Dynamic Programming – Bellman - 1957
545 An introduction to hidden markov models – Rabiner, Juang - 1986
481 Compression of individual sequences via variable-rate coding – Ziv, Lempel - 1978
415 A maximization technique occurring in statistical analysis of probabilistic functions of Markov chains – Baum, Petrie, et al. - 1970
361 An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov Processes – Baum - 1972
279 Self-Organized Language Modelling for Speech Recognition. Dordrecht – Jelinek - 1985
224 Practical Prefetching via Data Compression – Curewitz, Krishnan, et al. - 1993
207 Prediction and entropy of printed English – Shannon - 1951
122 Learning decision trees using the Fourier spectrum – Kushilevitz, Mansour - 1993
110 A universal data compression system – Rissanen - 1983
71 On the computational complexity of approximating distributions by probabilistic automata – Abe, Warmuth - 1992
67 The Power of Amnesia – Ron, Singer, et al. - 1994
64 On the learnability of discrete distributions – Kearns, Mansour, et al. - 1994
60 Conductance and convergence of Markov chains (A combinatorial Treatment of expanders)," FOCS – Mihail - 1989
58 A fast sequential decoding algorithm using a stack – Jelinek - 1969
58 The context tree weighting method: Basicproperties – Willems, Shtarkov, et al. - 1995
55 Eigenvalue bounds on convergence to stationarity for nonreversible Markov chains, with an application to the exclusion process,” The Annals of Applied Probability – Fill - 1991
53 Markov Source Modeling of Text Generation – Jelinek - 1985
51 Complexity of strings in the class of Markov sources – Rissanen - 1986
49 On the learnability and usage of acyclic probabilistic finite automata – Ron, Singer, et al. - 1998
47 A hidden Markov model that finds genes in E. coli DNA – Krogh, Mian, et al. - 1994
42 Efficient learning of typical finite automata from random walks – Freund, Kearns, et al. - 1993
33 Optimal Prediction for Prefetching in the Worst Case – Krishnan, Vitter - 1994
33 Discrete sequence prediction and its applications – Laird - 1994
32 Estimation of probabilities in the language model of the IBM speech recognition system – Nadas - 1984
28 A Sequential Algorithm for the Universal Coding of Finite Memory Sources – Weinberger, Lempel, et al. - 1992
23 Learning and robust learning of product distributions – Höffgen - 1993
18 Part-of-speech tagging using a variable memory Markov model – Schütze, Singer - 1994
11 Inference and minimization of hidden Markov chains – Gillman, Sipser - 1994
4 Error bounds for convulutional codes and an asymptotically optimal decoding algorithm – Viterbi - 1967
3 genes, sequences, and computers: An Escherichia coli case study – Maps - 1993
3 An adaptive cursive handwriting recognition system – Singer, Tishby - 1995
2 Statistics of language: Introduction – Good - 1969
1 Applications of DAWGs to data compression – Blumer