| N. Abe and M. Warmuth. On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9:205--260, 1992. |
....# de ned by M # ij = log i log j is singular if m 2. Hence, so is any matrix obtained by multiplying one or more rows of M # by scalars. 5 Conclusion When choosing how to compare two probability distributions, one important consideration is how e ciently a given distance can be computed. In [1] a number of commonly used measures of distances between probability distributions are listed: the # , variation, quadratic, Hellinger and Kullback Leibler distances. In [13] we show how to compute the quadratic distance between the probability distributions of two hidden Markov models in ....
N. Abe and M. K. Warmuth. On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9:205260, 1992.
....a2) l( al,a2, l)V( 1) 11o fictitiousMarkovGame( 4, 42, foreach e oa e A [ oa e A X[ Vlsi : laXalcA 1 Y[s, 1] foreach t C 1. k foreach s C S a : argmaXaA Y[s, all a : argmin, eA: X[s, a] v[4 : mi, V[4, t r[ t] foreach a 1 [ 1] : 1] 1, 4 X, 1, V[ foreach a A X[ X[ 4 X, V[ return(V) Table 5.8: Approximating the value of a Markov game by fictitious play. for each s C . However, the value of this game is V(s) so if we knew V, there would be no point in ....
....l( al,a2, l)V( 1) 11o fictitiousMarkovGame( 4, 42, foreach e oa e A [ oa e A X[ Vlsi : laXalcA 1 Y[s, 1] foreach t C 1. k foreach s C S a : argmaXaA Y[s, all a : argmin, eA: X[s, a] v[4 : mi, V[4, t r[ t] foreach a 1 [ 1] [ 1] ( 1, 4 X, 1, V[ foreach a A X[ X[ 4 X, V[ return(V) Table 5.8: Approximating the value of a Markov game by fictitious play. for each s C . However, the value of this game is V(s) so if we knew V, there would be no point in solving this ....
[Article contains additional citation context not shown here]
N. Abe and M. Warmuth. On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9:205 260, 1992.
....as a separate misclassification. ffl We call S = S [ S0 the input sample of MinDis(H) or simply (but somewhat ambigously) the sample. ffl If C is a concept class, we say that S is C legal if C contains a concept c which is consistent with S. The following result has been already observed in [2] (although the authors state it in a technically slightly different setting. Theorem 2.2 If RP 6= NP and MinDis(H) restricted to C legal input samples is NP hard, then C is not PAO learnable by H. Proof. We show that a PAO learning algorithm L for C; H can be converted into a Monte Carlo ....
....It is however not quite clear which additional tools are available for the purposes of neural learning. Our notions of learnability are not new, except for limited degradation and heuristical learning with constant reject rate. Robust learning has already been used by Abe, Takeuchi and Warmuth in [1, 2]. Their notion of robust learning includes also stochastic rules (without a deterministic distinction between positive and negative examples) and allows cost measures different from the prediction error. Notice however that negative results are stronger for the more specific situation, whereas ....
[Article contains additional citation context not shown here]
N. Abe and M. K. Warmuth, On the computational complexity of approximating distributions by probabilistic automata, in "Proceedings of the 3rd Annual Workshop on Computational Learning Theory, 1990," pp. 52--66.
....# de ned by M # ij = log i log j is singular if m 2. Hence, so is any matrix obtained by multiplying one or more rows of M # by scalars. 5 Conclusion When choosing how to compare two probability distributions, one important consideration is how e ciently a given distance can be computed. In [1] a number of commonly used distance measures between probability distributions are listed: the # 2 , variation, quadratic, Hellinger and Kullback Leibler distances. In [13] we show how to compute the quadratic distance between the probability distributions of two HMMs in polynomial time. The ....
N. Abe and M. K. Warmuth. On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9:205260, 1992.
....learning according to the prediction model that is optimal to within a factor of 1 o(1) 1 Introduction Many important applied problems can be modeled as learning from random examples. Examples include text categorization [21] handwritten character recognition [15, 4, 26] speech recognition [2, 1], and virtual circuit holding times in IP over ATM networks [20, 17, 14] In this paper, we present improved sample complexity bounds according to two fundamental learning models. 1.1 Improved bounds for Haussler s learning model. Haussler [9] building on the work of Valiant [23] Vapnik [24] ....
N. Abe and M. K. Warmuth. On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9(2--3):205--260, 1992.
....described a polynomial learning algorithm [7] Similar models were studied in information theory, as well ( 6] 8] 9] Other models in the learning theory literature include Hidden Markov Models [5] and probabilistic finite automata. They both have severe theoretical drawbacks. Abe and Warmuth [1] proved that Hidden Markov Models are not learnable in time polynomial in the alphabet size unless RP=NP. Kearns et al. 4] proved that nor are Probabilistic Finite Automata learnable in the PAC sense [2] unless noisy parity functions are learnable. We will proceed as follows. In Section 2 we will ....
N. Abe and M. Warmuth. On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9:205--260, 1992.
....to extensive mathematical treatment, have critical drawbacks for practical use. The Markov chain model su ers from exponential growth in the number of states for a non trivial memory length, and poor source approximation at low order. The HMMs family su ers from known learnability hardness results [Abe and Warmuth 1992, Gillman and Sipser 1994] and consequently, the derived model is not guaranteed to be optimal (this may suggest that for diverged families a high quality multiple alignment of the input sequences is required to obtain a reliable model) The probabilistic sux trees are inspired by the same ....
Abe, N. and Warmuth, M. (1992). On the computational complexity of approximating distributions by probability automata. Machine Learning, 9, 205-260.
....when such knowledge is available [PGS] we could combine 3.5. DISCUSSION 23 our prefetcher with a logical prefetcher based on the semantics of the application, so as to get the best of both worlds. Similar techniques for cache replacement appear in [FKL] The framework of Abe and Warmuth [AbW], who investigated a quite different learning problem related to FSAs, suggests a static PAC learning framework for prefetching, in which the prefetcher is trained on several independently generated sequences of a particular length. A harder model is to assume that the prefetcher is trained on one ....
N. Abe and M. Warmuth, "On the Computational Complexity of Approximating Distributions by Probabilistic Automata," UCSC, UCSC-CRL-90-63, December 1990.
....We have constructed a universal prefetcher P based on the Ziv Lempel data compression algorithm that prefetches optimally in the limit for almost all sequences emitted by a Markov source. Some practical issues regarding prefetching are addressed in Section 6. The framework of Abe and Warmuth [AbW], who investigated a quite different learning problem related to FSAs, has led us to propose a static PAC learning framework for prefetching, in which the prefetcher is trained on several independently generated sequences of a particular length generated by a source, and the prefetcher should ....
N. Abe & M. Warmuth, "On the Computational Complexity of Approximating Distributions by Probabilistic Automata," UCSC, UCSC-CRL-90-63, December 1990.
....to extensive mathematical treatment, have critical drawbacks for practical use. The Markov chain model su ers from exponential growth in the number of states for a non trivial memory length, and poor source approximation at low order. The HMMs family su ers from known learnability hardness results [Abe Warmuth 1992, Gillman Sipser 1994] and consequently, the derived model is not guaranteed to be optimal (this may suggest that for diverged families a high quality multiple alignment of the input sequences is required to obtain a reliable model) The probabilistic sux trees are inspired by the same ....
Abe, N. & Warmuth, M. (1992). On the computational complexity of approximating distributions by probability automata. Machine Learning 9, 205-260.
....and normally is used in practice. Other ways to estimate the model parameters are: clustering techniques, underlying facts (e.g. the use of cryptography in character recognition) or simple heuristic considerations. A rigorous performance criterion for training algorithms of HMMs can be found in [1], whose main result is that, under certain conditions, any finite class of HMM constraints is polynomially trainable. Useful results can also be found in [12] 19 Given that the internal parameters are all defined, then the evaluation problem can be solved by using the forward backward ....
N. Abe and M. K. Warmuth, "On the computational complexity of approximating distributions by probabilistic automata," Machine Learning, vol. 9, pp. 205--260, 1992.
....challenge. A plausible and well defined model of the statistical dependencies among the hidden variables is however not in general su#cient, since the problem of setting the corresponding conditional probabilities from observable linguistic material is in most cases computationally intractable (Abe Warmuth, 1992). Nevertheless, those intractability results have not precluded significant algorithmic and experimental progress with carefully designed model classes and learning methods such as EM and variants, especially in speech processing (Baum Petrie, 1966; Baker, 1979) In particular, the learning ....
Abe, N., & Warmuth, M. (1992). On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9, 205-260.
....and extensions of HMMs discussed here also include language models [38, 39, 13] econometrics [14, 15, 40] time series [41] and signal processing. An analysis of the sample and computational complexity of approximating a distribution using an HMM or a probabilistic automaton has been done [42] using tools from the PAC learning paradygm [43] See also [44] for an analysis of the case of hidden Markov chains with deterministic emissions, which shows that some classes of Markovian learning problems are hard while others are polynomial in the number of samples required. In the case of ....
....the choice of a model class that fits well to the data distribution and the efficiency of training such a model. Similar trade offs (between generality of the model and intractability of the learning algorithm) have been described for variants HMMs and other finite state learning algorithms in [42, 44, 45]. 8 Challenges for Future Research Hidden Markov models are powerful models of sequential data which have already been successfully used in several applications, notably speech recognition. They could be applied in many other domains. Many extensions and related models have been proposed in ....
N. Abe and M. Warmuth, "On the computational complexity of approximating distributions by probabilistic automata," Machine Learning, vol. 9, July 1992.
....on X is assumed. It is often called parameter estimation when specific parametric probability models are used. One example of this in the computational learning theory literature is the recent investigation of Abe and Warmuth into the complexity of learning the parameters in a hidden Markov model [1]. Our purpose here is twofold. First, we propose an extension of the PAC model, based on the work of Vapnik and Chervonenkis [129] and Pollard [104, 106] that addresses these and other issues. Second, we use this extension to obtain distribution independent upper bounds on the size of the ....
....considerable insight into the particular problem domain. Finally, the third practical problem is the computational complexity of the method we use to produce our decision rule from the training examples. This issue has been addressed extensively in the PAC literature, and is also addressed in [66, 1]. Of these three important issues, here we examine only the first. This issue is referred to as the problem of estimating the sample complexity of the learning problem in the PAC literature [40] The number of random training examples needed to avoid overfitting depends critically on the nature ....
[Article contains additional citation context not shown here]
N. Abe and M. Warmuth. On the computational complexity of approximating distributions by probabilistic automata. In Proceedings of the 3nd Workshop on Computational Learning Theory, pages 52--66. published by Morgan Kaufmann, 1990.
....is equivalent to a stochastic regular grammar used as string generator. Thus, P x2 Sigma PA (x) 1. Note that some works on the learning of discrete distributions use distributions defined on Sigma n (that is P x2 Sigma n P (x) 1, for any n 1) instead of Sigma (see for instance [1, 7]) The probabilistic automaton A= denotes the automaton derived from the probabilistic automaton A with respect to the partition of Q, also called the quotient automaton A= It is obtained by merging states of A belonging to the same subset in . When q results from the merging of the states ....
N. Abe and M. Warmuth. On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9:205--260, 1992.
No context found.
N. Abe and M. Warmuth. On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9:205--260, 1992.
No context found.
N. Abe and M. K. Warmuth. On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9:205--260, 1992.
No context found.
N. Abe and M. K. Warmuth. 1992. On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9:205--260.
No context found.
Abe, N., and Warmuth, M. K. 1992. On the computational complexity of approximating distributions by probabilistic automata. Machine Learning 9:205--260.
No context found.
N. Abe and M. Warmuth, "On the computational complexity of approximating distributions by probabilistic automata," in Proceedings of the Third Workshop on Computational Learning Theory. Morgan Kaufmann, 1998, pp. 52--66.
No context found.
N. Abe and M. Warmuth, "On the computational complexity of approximating distributions by probabilistic automata," Machine Learning, vol. 9, pp. 205--260, 1992.
No context found.
N. Abe and M. Warmuth, "On the computational complexity of approximating distributions by probabilistic automata," in Proceedings of the Third Workshop on Computational Learning Theory. Morgan Kaufmann, 1998, pp. 52--66.
No context found.
Naoki Abe and Manfred K. Warmuth. On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9(2--3):205--260, 1992.
No context found.
Y. Abe and M. Warmuth, "On the Computational Complexity of Approximating Distributions by Probabilistic Automata", Machine Learning 9, 205-260,
No context found.
N. Abe and M. Warmuth. On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9:205--260, 1992.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC