| R. El-Yaniv, S. Fine, and N. Tishby. Agnostic classi cation of markovian sequences. Advances in Neural Information Processing Systems, 10:465-471, 1997. |
....subsection 4) for Gaussian components with diagonal covariances 16 . The author calls Q a stochastic equivalence predicate. He is interested in distance learning, does not apply his method to kernel machines and does not give a Bayesian interpretation. OTHER REL. WORK: Minka [14] El Yaniv et al. [3], Tipping [25] Rattray [17] We have presented a general framework for kernel learning and described a powerful, yet ecient implementation for high dimensional semi supervised learning. Some preliminary ideas for future work are collected in section D of the appendix. Furthermore, we will apply ....
Ran El-Yaniv, Shai Fine, and Naftali Tishby. Agnostic classication of Markovian sequences. In Advances in NIPS 10. MIT Press, 1997.
....denote the corresponding measure by D JS . This measure is symmetric and ranges between 0 and 1, where the score for identical distributions is 0. It is proportional to the minus logarithm of the probability that the two empirical distributions represent samples from the same ( common ) source (El Yaniv et al. 1997). While a statistical measure estimating the probability that two distributions represent the same source distribution seems appropriate for the comparison of pro les, a major ingredient is ignored; the apriori probability of the source distribution. This information can help to assess the ....
El-Yaniv, R., Fine, S. & Tishby, N. (1997). Agnostic classi cation of markovian sequences. Advances in Neural Information Processing Systems 10, 465-471.
....denote the corresponding measure by D JS . This measure is symmetric and ranges between 0 and 1, where the score for identical distributions is 0. It is proportional to the minus logarithm of the probability that the two empirical distributions represent samples from the same ( common ) source [El Yaniv et al. 1997]. While a statistical measure estimating the probability that two distributions represent the same source distribution seems appropriate for the comparison of pro les, a major ingredient is ignored; the apriori probability of the source distribution. This information can help to assess the signi ....
El-Yaniv, R., Fine, S. & Tishby, N. (1997). Agnostic classication of markovian sequences. Advances in Neural Information Processing Systems 10, 465-471.
No context found.
R. El-Yaniv, S. Fine, and N. Tishby. Agnostic classi cation of markovian sequences. Advances in Neural Information Processing Systems, 10:465-471, 1997.
.... T ) has been discussed by Lin [11] as the Jensen Shannon divergence D JS among the distributions P (W ) Unlike the Kullback Leibler divergence [2] the standard choice for measuring dissimilarity among distributions) the Jensen Shannon divergence is symmetric, and bounded (see also [12]) Moreover, DJS can be used to bound other measures of similarity, such as the optimal or Bayesian probability of identifying correctly the origin of a sample. We nd that information about identity is accumulating at more or less constant rate well before the undersampling limits of the ....
El-Yaniv, R., Fine, S. & Tishby, N. Agnostic classi cation of Markovian sequences, NIPS 10 pp. 465-471 (MIT Press, 1997).
.... ) I(T ; Y ) and represent each x by p(x; y) The greedy merging criterion is known from the Agglomerative Information Bottleneck (AIB) algorithm [14, 17] Speci cally, in this context we get d(x; t) p(x) p(t) JS(p(yjx) p(yjt) 4) where JS(p; q) is the Jensen Shannon divergence [8, 4] de ned as JS(p; q) 1KL(pjj p) 2KL(qjj p) where in our context fp; qg fp(yjx) p(yjt)g f 1 ; 2g f p(x) p(x) p(t) p(x) p(t) g p = 1p(yjx) 2p(yjt) 5) Notice that any given partition T de nes some membership ( hard ) probability p(tjx) which in turn de nes p(yjt) ....
R. El-Yaniv, S. Fine, and N. Tishby. Agnostic classi cation of Markovian sequences. In Advances in Neural Information Processing (NIPS-97), pages 465-471, 1997.
....information loss. In AIB the clusters that are merged at each step are those which minimize the following distance, d i;j = i j )JS i ; j [p(N jF = i) p(N jF = j) 4) Where i = p(F = i) is the apriori probability of being in cluster i, and JS denotes the Jensen Shannon divergence [7][4], a natural information theoretic distance between distributions de ned as, JS 1 ; 2 [p 1 ; p 2 ] H [ 1 p 1 2 p 2 ] 1 H [p 1 ] 2 H [p 2 ] 5) Movements are more likely to fall into the same partition (cluster) if the distributions p(N jm 1 ) and p(N jm 2 ) are close under the JS ....
.... H [ 1 p 1 2 p 2 ] 1 H [p 1 ] 2 H [p 2 ] 5) Movements are more likely to fall into the same partition (cluster) if the distributions p(N jm 1 ) and p(N jm 2 ) are close under the JS divergence, which means that they have a high likelihood of being generated by the same statistical source [4]. Since the algorithm proceeds by merging partitions, it produces mappings, F , with cardinality ranging from jM j to one. As the cardinality of F grows, I(F ; N) increases but I(F ; M) also increases. The cardinality of F should be chosen to be the minimum value that still captures a signi cant ....
R. El-Yaniv, S.Fine, and N. Tishby. Agnostic classication of markovian sequences. In Advances in neural information processing systems (Vol. 11). 1997.
....important information theoretic measure of the class conditional distributions p(xjy i ) called the Jensen Shannon divergence. This measure plays an important role in our context. The Jensen Shannon divergence of M class distributions, p i (x) each with a prior i , 1 i M , is de ned as, [6, 4]. JS [p 1 ; p 2 ; pM ] H [ M X i=1 i p i (x) M X i=1 i H [p i (x) 5) where H [p(x) is Shannon s entropy, H [p(x) P x p(x) log p(x) The convexity of the entropy and Jensen inequality guarantees the non negativity of the JSdivergence. 1.2 The hard clustering limit ....
R. El-Yaniv, S. Fine, and N. Tishby. Agnostic classication of Markovian sequences. In Advances in Neural Information Processing (NIPS'97) , 1998.
.... ) has been discussed by Lin [11] as the Jensen Shannon divergence D JS among the distributions P i (W ) 2 2 Unlike the Kullback Leibler divergence [2] the standard choice for measuring dissimilarity among distributions) the Jensen Shannon divergence is symmetric, and bounded (see also [12]) Moreover, DJS can be used to bound other measures of similarity, such as the optimal or Bayesian probability of identifying correctly the origin of a sample. We nd that information about identity is accumulating at more or less constant rate well before the undersampling limits of the ....
El-Yaniv, R., Fine, S. & Tishby, N. Agnostic classication of Markovian sequences, NIPS 10 pp. 465-471 (MIT Press, 1997).
No context found.
El-Yaniv, R., Fine, & Tishby, N. (1998) Agnostic Classication of Markovian Sequences. In M. Jordan et al, eds., Neural Information Processing Systems 10, MIT Press.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC