| Naoki Abe, Jun-ichi Takeuchi, and Manfred K. Warmuth. Polynomial learnability of probabilistic concepts with respect to the Kullback-Liebler divergence. In Proceedings of the Fourth Annual Workshop on Computational Learning Theory, pages 277--289, August 1991. |
....distance metric by showing that the reverse is true. In particular, we prove the following lemma in the Appendix. under the variation distance metric is PAC learnable under the KL distance measure. The lemma is proved using a method related to the ffl Bayesian shift of Abe and Warmuth [3]. Note that the result requires a discrete domain of support for the target distribution, such as the domain f0; 1g which we use here. The rest of this section is organised as follows: Subsection 1.1 discusses previous work related to the General Markov Model of Evolution, and the relationship ....
N. Abe and M.K. Warmuth, Polynomial Learnability of Probabilistic Concepts with Respect to the Kullback-Leibler Divergence, Proceedings of the 1992.
....2, N dKL (p, q) N p i ln( p i q i ) 8) where N is the number of possible outcomes. Indeed, if p # is the empirically observed distribution for data samples s i , 1 M and h is a hypothesis (candidate probability distribution for the underlying true distribution) then [1] dKL (p # , h) p # (s i ) ln( p # (s i ) h(s i ) ln(h(s i ) ln ln(h(s i ) 9) Therefore, minimizing the KL distance with respect to the empirically observed distribution is equivalent to finding the maximum likelihood solution h # of ln(h(s i ) Since the ....
N. Abe, J. Takeuchi, and M. Warmuth, "Polynomial learnability of probabilistic concepts with respect to the Kullback-Leibler divergence," in Proceedings of the 1991.
....e.g. 11,14,18,23,32] We have chosen to use the empirical KL divergence D( #f)instead of D(f# fN ) since the former is finite (with probability 1) and therefore simplifies the asymptotic expansion. Results similar to ours can be obtained for D(f# fN ) by use of bounded approximations [1] for the divergence measure. For the OOBN learning to be meaningful, we will initially assume that the domain is in fact object oriented, such that the CPTs of one instantiation of a class are identical to the corresponding CPTs of any other instantiation of that class. We call this the OO ....
....made us choose the BMA setup. To examine the effect of the BMA setup more closely, we performed a simple example with a class containing only one binary variable X. The class has two instantiations, with P(X= 1) #) 2 in the first instantiation, and P(X= 1) #) 2 in the other; # #[0, 1] defines the difference between the two instantiations. Note that the OO assumption is violated as long as # #= 0. We calculated the degree of belief in the model to be object oriented by using equation (12) The results are shown for different data sizes in figure 10. The calculation scheme is ....
N. Abe, M.K. Warmuth and J. Takeuchi, Polynomial learnability of probabilistic concepts with respect to the Kullback--Leibler divergence, in: Proceedings of the 4th Annual Workshop on Computational Learning Theory (COLT 1991.
....It is however not quite clear which additional tools are available for the purposes of neural learning. Our notions of learnability are not new, except for limited degradation and heuristical learning with constant reject rate. Robust learning has already been used by Abe, Takeuchi and Warmuth in [1, 2]. Their notion of robust learning includes also stochastic rules (without a deterministic distinction between positive and negative examples) and allows cost measures different from the prediction error. Notice however that negative results are stronger for the more specific situation, whereas ....
N. Abe, J. Takeuchi, and M. K. Warmuth, Polynomial learnability of probabilistic concepts with respect to the Kullback-Leibler divergence, in "Proceedings of the 4th Annual Workshop on Computational Learning Theory, 1991," pp. 277--289.
....sense, as D( f jj g ) D( g jj f ) does not hold in general. The term is here used in the everyday meaning of the phrase. H. Langseth and O. Bangs Parameter Learning in OOBNs 9 totic expansion. Results similar to ours can be obtained for D( f jj f ) by use of bounded approximations [1] for the divergence measure . For the OOBN learning to be meaningful, we will initially assume that the domain is in fact object oriented, such that the CPTs of one instantiation of a class are identical to the corresponding CPTs of any other instantiation of that class. We call this the OO ....
Naoki Abe, Manfred K. Warmuth, and Jun-ichi Takeuchi. Polynomial learnability of probabilistic concepts with respect to the Kullback-Leibler divergence. In Proceedings of the Fourth Annual Workshop on Computational Learning Theory (COLT 1991), pages 277 { 289, San Mateo, CA, 1991. Morgan Kaufmann.
....cov(x; y 0 )cov(y; x 0 ) Therefore, either jcov(x; y 0 )j (3 2 =4) or jcov(y; x 0 )j (3 2 =4) must hold. In either case, we have connected x and y. 2 We can now use these Lemmas to prove that V(M;M 0 ) 2. Proof of Lemma 2. 47: We begin by showing that there is a set of edges fe[1]; e[t]g in M such that For every e[i] and every pair of leaves x and y whose connecting path contains e[i] jcov(x; y)j 3 2 =4; Disconnecting all of the e[i] edges gives a partition of the leaf set of M . Every related set C is the union of one or more of the sets in this ....
....in KL distance. This result is not as strong as the result in the opposite direction, because the proof depends on altering the original hypothesis returned by the learning algorithm for variation distance. The Lemma is proved using a method related to the Bayesian shift of Abe and Warmuth [1]. Lemma 2.52 When the hypothesis class for a learning problem is general enough, a class of probability distributions over the domain f0; 1g n that is PAC learnable under the variation distance metric is also PAC learnable under the KL distance measure. Proof: Let K be a polynomial in three ....
N. Abe, J. Takeuchi, and N. K. Warmuth. "Polynomial Learnability of Probabilistic Concepts with Respect to the Kullback-Leibler Divergence". In Proceedings of the fourth Annual Workshop on Computational Learning Theory, pages 277--289, (1991).
....Corollary 1. After iteration T , the sample error rate of GeoLev s master hypothesis is bounded by T Y t=1 1 Gamma r 2 t jjh t jj 2 2 m sin 2 ( t ) 24) The recurrence of Theorem 2 is somewhat difficult to analyze, but we can apply the following lemma from Abe et al. [1]. Lemma 4. Consider a sequence fg t g of non negative numbers satisfying g t 1 g t Gamma cg 2 t , where c 0 is a positive constant. If f t = 1 c i t 1 g0c j , then g t f t for all t 2 N . Given a lower bound r on the r t values and an upper bound H 2 on jjh t jj 2 , then we can ....
N. Abe, J. Takeuchi, and M. K. Warmuth. Polynomial learnability of probabilistic concepts with respect to the Kullback-Leibler divergence. In Proc. 4th Annu. Workshop on Comput. Learning Theory, pages 277--289, San Mateo, CA, 1991. Morgan Kaufmann.
....by Theorem 2.3, Yamanishi [31] shows that an equivalent problem is to find, in polynomial time with high probability and for given ffl, a hypothesis h such that E x2D h ( p h(x) Gamma p c(x) 2 i ffl. This quantity is known as the Hellinger distance. Finally, Abe, Takeuchi and Warmuth [1] have shown that all of these problems are equivalent (modulo polynomial time computation) to the problem of finding, with high probability and for given ffl, a hypothesis with small Kullback Liebler divergence, i.e. a hypothesis h for which E x2D c(x) lg c(x) h(x) 1 Gamma c(x) ....
Naoki Abe, Jun-ichi Takeuchi, and Manfred K. Warmuth. Polynomial learnability of probabilistic concepts with respect to the Kullback-Liebler divergence. In Proceedings of the Fourth Annual Workshop on Computational Learning Theory, pages 277--289, August 1991.
....lead to NP hard minimization problems) It seems however that the empirical estimations, actually used in our paper, minimize the empirical log loss of the D[xjy] function and, therefore, also the empirical log loss of the D[yjx] function. Using methods like the ffl Bayesian shift (see [ATW91]) and a convertion of Kullback Leibler divergence to the variation or Hellinger distance (described in [Yam90] it seems that pab decidability for our applications can be obtained from the general methods however with sample sizes being much larger than those obtained from our elementary ....
N. Abe, J. Takeuchi, and M. Warmuth. Polynomial learnability of probabilistic concepts with respect to the kullback-leibler divergence. In The Workshop on Computational Learning Theory, pages 277--289. Morgan Kaufmann, San Mateo, CA, 1991.
....lemma in the Appendix. Lemma 2 A class of probability distributions over the domain f0; 1g n that is PAC learnable under the variation distance metric is PAC learnable under the KL distance measure. The lemma is proved using a method related to the ffl Bayesian shift of Abe and Warmuth [3]. Note that the result requires a discrete domain of support for the target distribution, such as the domain f0; 1g n which we use here. The rest of this section is organised as follows: Subsection 1.1 discusses previous work related to the General Markov Model of Evolution, and the relationship ....
N. Abe and M.K. Warmuth, Polynomial Learnability of Probabilistic Concepts with Respect to the Kullback-Leibler Divergence, Proceedings of the 1992 Conference on Computational Learning Theory, (1992) 277--289.
....distributions mentioned in the introduction. 10 All our sample complexity bounds with respect to the Kullback Leibler divergence rely on Hoeffding s inequality and thus grow with 1 ffl 2 . Is the 1 ffl 2 growth in the sample complexity really necessary 10 This has recently been done in [ATW91] for various classes of probabilistic concepts with respect to both the Kullback Leibler divergence and the quadratic distance. We showed in Section 6 that s state HMMs can easily be simulated by s state PAs. How can HMMs be used to simulate PAs There are many open problems related to the ....
N. Abe, J. Takeuchi, and M. K. Warmuth. Polynomial learnability of probabilistic concepts with respect to the Kullback-Leibler divergence. In Proceedings of the 1991 Workshop on Computational Learning Theory. Morgan Kaufmann, San Mateo, California, August 1991.
No context found.
Naoki Abe, Jun-ichi Takeuchi, and Manfred K. Warmuth. Polynomial learnability of probabilistic concepts with respect to the Kullback-Liebler divergence. In Proceedings of the Fourth Annual Workshop on Computational Learning Theory, pages 277--289, August 1991.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC