65 citations found. Retrieving documents...
J. Lin. Divergence measures based on the Shannon entropy. IEEE Trans. Inform. Theory, 37(1): 145--151, 1991.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Kernel-Based Object Tracking - Comaniciu, Ramesh, Meer (2003)   (16 citations)  (Correct)

.... the distance between two discrete distributions as dy ############################ 6 where we chose ##y## ppy; qq ################ y 7 the sample estimate of the Bhattacharyya coefficient between p and q [43] The Bhattacharyya coefficient is a divergence type measure [49] which has a straightforward geometric interpretation. It is the cosine of the angle between the m dimensional unit vectors . The fact that p and q are distributions is thus explicitly taken into account by representing them on the unit hypersphere. At the same time, wecan interpret ....

J. Lin, "Divergence Measures Based on the Shannon Entropy," IEEE Trans. Information Theory, vol. 37, pp. 145-151, 1991.


An Algorithm for Data-Driven - Bandwidth Selection Dorin   (Correct)

....mean of the bandwidth matrices computed at x H #1 h 3 where the weights jH jH j . D. Comaniciu is with the Real Time Vision and Modeling Department, Siemens Corporate Research, 755 College Road East, Princeton, NJ 08540. E mail: comanici scr.siemens.com. Manuscript received 18 Mar. 2002; revised 19 July 2002; accepted 25 July 2002. Recommended for acceptance by S. Sclaroff. For information on obtaining reprints of this article, please send e mail to: tpami computer.org, and reference IEEECS Log Number 116109. 1. The terms bandwidth and scale will be considered ....

....(12) the sparse data needs attention. The local sample size should be sufficiently large for inference. The approach we take is based on the Effective Sample Size [10] which computes the kernel weighted count of the number of points in each window ESSx;H i1 KH KH 0 #0 20 ; 0 ; 18 Using the binomial rule of thumb, we cancel the inference when ESSx; H 5. 4.5 Bandwidth Selection Examples A first example for a bimodal data set generated with equal probability from 1 is presented in Fig. 4. The standard deviation for each distribution (measured before amalgamating the ....

[Article contains additional citation context not shown here]

J. Lin, "Divergence Measures Based on the Shannon Entropy," IEEE Trans. Information Theory, vol. 37, pp. 145-151, 1991.


Enhanced Word Clustering for Hierarchical Text Classification - Dhillon, Mallela, Kumar   (4 citations)  (Correct)

....word clusters at a high computational cost. In this paper, we rst derive a global criterion that captures the optimality of word clustering in an informationtheoretic framework. This leads to an objective function for clustering that is based on the generalized Jensen Shannon divergence[20] among an arbitrary number of probability distributions. In order to nd the best word clustering, i.e. the clustering that minimizes this objective function, we present a new divisive algorithm for clustering words. This algorithm is reminiscent of the k means algorithm but uses Kullback Leibler ....

....; p2) 1. In contrast, the Jensen Shannon(JS) divergence between p1 and p2 de ned by JS (p1 ; p2) 1KL(p1 ; 1p1 2p2) 2KL(p2 ; 1p1 2p2) H( 1p1 2p2) 1H(p1) 2H(p2) where 1 2 = 1, i 0, is clearly a measure that is symmetric in f 1 ; p1g and f 2 ; p2g, and is bounded [20]. The JS divergence can be generalized to measure the distance between any nite number of probability distributions as: JS (fp i : 1 i ng) H n i p i n i H(p i ) 1) which is symmetric in the f i ; p i g s ( i i = 1; i 0) Let Y be another random variable with ....

J. Lin. Divergence measures based on the Shannon entropy. IEEE Trans. Inform. Theory, 37(1), 1991.


Link Analysis Ranking Algorithms Theory And Experiments - Borodin, Roberts.. (2004)   (Correct)

....5 38.56 6157 5 shakespeare 4383 3660 1247 13575 2 3.70 1199 6 table tennis 1948 1489 803 5465 2 3.67 745 6 weather 8011 6464 2852 34672 3 5.36 2775 9 vintage cars 3460 2044 1920 12796 3 6. 26 1580 5 Table 1: Query statistics 31 entries is 1, and then taking the Jensen Shannon divergence [36] of the two distributions. This corresponds to the information we lose about the entries of the vectors if we merge them [50] Any other hierarchical algorithm for clustering binary vectors would also be applicable. Executing the algorithm on the rows of the matrix produces a tree, where each node ....

J. Lin. Divergence measures based on the Shannon entropy. Machine Learning, 37(1): 145--151, 1991.


Noise and Information in Neural Codes - Schneidman   (Correct)

....flies would generate the word W . Thus, we measure how well we can discriminate between one individual and a mixture of all the other individuals in the ensemble, or e#ectively how far each individual is from the mean of her conspecifics. The measure I T (W ; identity) has been discussed by Lin [101] as the Jensen Shannon divergence D JS among the distributions P (W ) namely D JS (P (W ) P (W ) P (W ) We recall that the problem of finding a measure of similarity among distributions is not simple; obvious choices such as the Kullback Leibler [35] divergence are not ....

....that the problem of finding a measure of similarity among distributions is not simple; obvious choices such as the Kullback Leibler [35] divergence are not symmetric, and may have spurious technical requirements such as absolute continuity of one distribution with respect to the others. Lin [101] and Guttman [58] proposed D JS as a way of getting around these di#culties, and showed that D JS can be used to bound other measures of similarity, such as the optimal or Bayesian probability of identifying correctly the origin of a sample (as in forced choice psychophysical discrimination ....

J. Lin. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1):145--151, 1991.


Kernel-Based Object Tracking - Comaniciu, Ramesh, Meer (2003)   (16 citations)  (Correct)

.... define the distance between two discrete distributions as #r 8r6 PR B H #L (6) where we chose P #IQ PR B H #L B C T F E B 0YE # B iE (7) the sample estimate of the Bhattacharyya coefficient between [43] The Bhattacharyya coefficient is a divergence type measure [49] which has a straightforward geometric interpretation. It is the cosine of the angle between the dimensional unit vectors B 0 N N N B 0 and B L N N N B F . The fact that are distributions is thus explicitly taken into account by representing them on the unit ....

J. Lin, "Divergence measures based on the Shannon entropy," IEEE Trans. Information Theory, vol. 37, pp. 145--151, 1991.


Construction of a Shared Secret Key Using Continuous Variables - Cardinal, Van Assche (2003)   (Correct)

.... D k = ig(P )dP = hP; log P i hP; log f k i) g(P )dP = E k [H(P ) hE k [P ] log f k i (13) but since f k = E k [P ] we obtain D k = H(E k [P ] E k [H(P ) 14) When g( is actually a probability mass function, this expression is known as the generalized Jensen Shannon divergence [17]. We conclude that minimization of the average K L divergence or the average Jensen Shannon divergence within a cluster are similar problems. E Quantization with ideal correction rate constraint Instead of fixing K 2 K = f1; 2; Ng, we can simply let K = N and solve the constrained ....

J. LIN, Divergence measures based on the shannon entropy, IEEE Trans. Inform. Theory, 37 (1991), pp. 145--151.


Enhanced Word Clustering for Hierarchical Text Classification - Dhillon, Mallela, Kumar (2002)   (4 citations)  (Correct)

....sub optimal word clusters at a high computational cost. In this paper, we rst derive a global criterion that captures the optimalityofword clustering in an informationtheoretic framework. This leads to an objective function for clustering that is based on the generalized Jensen Shannon divergence[20] among an arbitrary number of probability distributions. In order to nd the best word clustering, i.e. the clustering that minimizes this objective function, we present a new divisive algorithm for clustering words. This algorithm is reminiscentofthek means algorithm but uses Kullback Leibler ....

....KL(p#;p #) #. In contrast, the Jensen Shannon(JS) divergence between p# and p# de ned by JS# (p#;p #) #KL(p#; # p# #p#) #KL(p#; # p# #p#) H( #p# #p#) # #H(p#) # #H(p#) where # # = 1, # # 0, is clearly a measure that is symmetric in # #;p ## and # #;p ##, and is bounded [20]. The JS divergence can be generalized to measure the distance between any nite number of probability distributions as: JS# (#p # :1# i # n#) H # p # # H(p # ) 1) which is symmetric in the # # ;p # # s ( # # =1; # # 0) Let Y be another random variable with probability ....

J. Lin. Divergence measures based on the Shannon entropy. IEEE Trans. Inform. Theory, 37(1), 1991.


On Fitting Mixture Models - Figueiredo, Leitão, Jain (1999)   (5 citations)  (Correct)

.... m 1 and m 2 as (m 1 #m 2 ) arg min (i#j) bff i ff j ) fY (yj j ) # i # (17) s [f Y (yj j ) is the symmetric Kullback Leibler (KL) divergence [12] the standard dissimilarity measure between probability densities [12] The Jensen Shannon divergence (see [13]) would be a natural candidate, because it allows weighting differently the two probability functions being compared# however, it does not haveaclosed form expression for Gaussian densities and so we settled for the KL divergence. In the Gaussian case, the symmetric KL divergence is [12] yj ....

J. Lin. Divergence measures based on the Shannon entropy. IEEE Trans. Information Theory, 37:145--151, 1991.


CLUSEQ: Efficient and Effective Sequence Clustering - Yang, Wang (2003)   (1 citation)  (Correct)

....of sequences is to compute the difference between the corresponding conditional probability distributions. There have been many methods to assess the difference between two probability distributions, among which the variational distance and the Kullback Leibler divergence are the most popular ones [18]. Given two probability distributions (P 1 and P 2 ) of the variable , the variational distance is defined as V (P 1 ; P 2 ) jP 1 ( P 2 ( j while the Kullback Leibler divergence is defined as J(P 1 ; P 2 ) I(P 1 ; P 2 ) I(P 2 ; P 1 ) P 1 ( P 2 ( log , where is the ....

J. Lin. Divergence measures based on the Shannon entropy. IEEE Tran. on Information Theory, vol. 37, no. 1, pp. 145-151, 1991.


Information Theoretic Feature Clustering for Text.. - Dhillon, Manella, Kumar   (4 citations)  (Correct)

....word clusters at a high computational cost. In this paper, we rst derive a global criterion that captures the optimality of word clustering in an information theoretic framework. This leads to an objective function for clustering that is based on the generalized Jensen Shannon divergence [21] among an arbitrary number of probability distributions. In order to nd the best word clustering, i.e. the clustering that minimizes this objective function, we present a new divisive algorithm for clustering words. This algorithm is reminiscent of the k means algorithm but uses Kullback Leibler ....

.... between p 1 and p 2 de ned by JS (p 1 ; p 2 ) 1 KL(p 1 ; 1 p 1 2 p 2 ) 2 KL(p 2 ; 1 p 1 2 p 2 ) H( 1 p 1 2 p 2 ) 1 H(p 1 ) 2 H(p 2 ) where 1 2 = 1, i 0, is clearly a measure that is symmetric in f 1 ; p 1 g and f 2 ; p 2 g, and is bounded [21]. The JS divergence can be generalized to measure the distance between any nite number of probability distributions as: JS (fp i : 1 i ng) H i p i i H(p i ) 1) which is symmetric in f i ; p i g s ( i i = 1; i 0) Let Y be another random variable with ....

J. Lin. Divergence measures based on the Shannon entropy. IEEE Trans. Inform. Theory, 37(1):145-151, 1991.


Quantization with an Information-Theoretic Distortion Measure - Cardinal (2002)   (Correct)

.... C k : D k = hP; log P i hP; log f k i) g(P )dP (13) E k [H(P ) hE k [P ] log f k i (14) but since f k = E k [P ] we obtain D k = H(E k [P ] E k [H(P ) 15) When g( is actually a probability mass function, this expression is known as the generalized Jensen Shannon divergence [7]. We conclude that minimization of the average K L divergence or the average Jensen Shannon divergence within a cluster are similar problems. 5 Rate Constrained Quantization Just as in standard vector quantization [2] we can de ne a rate constrained algorithm that does not restrict the range ....

J. Lin. Divergence measures based on the shannon entropy. IEEE Trans. Inform. Theory, 37(1):145-151, January 1991.


Empirical Multiresolution Models Applicable to Gray-Level.. - García-Salinas, al. (1997)   (Correct)

....1I (1997) 83 93 85 therefore, with a loss of validity for reaching specific objectives in image processing. In order to assess the degree to which the mode 1 corresponds to a well defined structure, the Jensen Shannon (J ) divergence measure between probability distributions has been used [5] (other lmeasures are in [3, 4] This measure shows the heterogeneity in the set of gray level histograms. The (unweighted)JSdivergence is . where H( Ypklogpk is the Shannon en tropy. Applied to model columnsL js(3 ( 3( 2) 3 ) m(Qm, m, l( Nm,t, y , H( t) 3) i=1 There ....

J. Lin, "Divergence measures based on the Shannon entropy ", IEEE Trans. Inform. Theory. Vol. 37, No. 1, January 1991, pp. 145-150.


Universality and Individuality in a Neural Code - Schneidman, Brenner, Tishby, ..   (Correct)

.... i, P is the probability that y i will generate (at some time) the word W in response to the stimulus movie, and P (W ) is the probability that any y in the whole ensemble of ies would generate this word, W ) W ) 2) The measure I(W identity; T ) has been discussed by Lin [11] as the Jensen Shannon divergence D JS among the distributions P (W ) Unlike the Kullback Leibler divergence [2] the standard choice for measuring dissimilarity among distributions) the Jensen Shannon divergence is symmetric, and bounded (see also [12] Moreover, DJS can be used to ....

Lin, J., Divergence measures based on the Shannon entropy, IEEE Trans. Inf. Theory, 37, 145-151 (1991).


Unsupervised Document Classification Using Sequential.. - Slonim, Friedman, Tishby (2002)   (12 citations)  (Correct)

.... ) I(T ; Y ) and represent each x by p(x; y) The greedy merging criterion is known from the Agglomerative Information Bottleneck (AIB) algorithm [14, 17] Speci cally, in this context we get d(x; t) p(x) p(t) JS(p(yjx) p(yjt) 4) where JS(p; q) is the Jensen Shannon divergence [8, 4] de ned as JS(p; q) 1KL(pjj p) 2KL(qjj p) where in our context fp; qg fp(yjx) p(yjt)g f 1 ; 2g f p(x) p(x) p(t) p(x) p(t) g p = 1p(yjx) 2p(yjt) 5) Notice that any given partition T de nes some membership ( hard ) probability p(tjx) which in turn de nes p(yjt) ....

J. Lin. Divergence Measures Based on the Shannon Entropy. IEEE Transactions on Information theory, 37(1):145-151, 1991.


Iterative Double Clustering for Unsupervised And.. - El-Yaniv, Souroujon (2001)   (Correct)

.... I( S, T ) The reduction in I( S, T ) due to a merge of two clusters s i and s j is shown to be (p(s i ) p(s j ) D JS [p(t s i ) p(t s j ) 1) where, for any two distributions p(x) and q(x) with priors # p and # q , # p # q = 1, D JS [p(x) q(x) is the Jensen Shannon divergence (see [7, 4]) D JS [p(x) q(x) # p DKL (p ) # q DKL (q ) Here, p q denotes the distribution (p(x) q(x) 2 and DKL ( is the Kullbak Leibler divergence [3] This agglomerative algorithm is of course only locally optimal, since at each step it greedily merges the two most similar ....

J. Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory, 37(1):145--151, 1991.


Measuring HMM Similarity with the Bayes Probability of.. - Bahlmann, Burkhardt (2001)   (4 citations)  (Correct)

....as in (2) and (3) D (Tg k , Tg k ) D (Tg k , Tg k ) N 1 [R R Nd[ xt, i) nn D, k) 6) The component which has to be redefined is the local disrance measure (4) of the now two pdfs. Several choices are possible, e.g. X 2, Kullback Leibler [5] or Jensen Shannon divergence [8]. However, we want to interpret the model similarities in the context of experimentally gained classification errors. A measure for the probability of a classification error in a two class problem is the Bayes probability of error (or Bayes error) 12] Pe (P (x) P2 (x) min rp (x) r2P2 ....

J. Lin. Divergence measures based on the Shannon entropy. IEEE Trans. lnf. Theory, 37(1):145 151, Jan. 1991.


Statistical Analysis and Modeling of Brain cells' Activity - Gat (1999)   (Correct)

....Recently, more 29 attention has been given to the empirical problem, and the method presented in this section is one of the optimal implementations of such a technique. The technique described here was rst shown by Gutman[48] and is also referred to as the Jensen Shannon (JS) divergence [64]. Using this technique, a statistic is computed. It has been shown [48] that this statistic is not only optimal in a certain sense, but will also, given enough training and testing data, produce error probabilities with the same exponential rate as the likelihood ratio classi er that ....

J. Lin. Divergence measures based on the shannon entropy. IEEE Trans. Info. Th., 37:145-151, 1991.


Measuring HMM Similarity with the Bayes Probability of.. - Bahlmann, Burkhardt (2001)   (4 citations)  (Correct)

....very similar way as in (2) and (3) i) R lk (i) 5) 6) The component which has to be redefined is the local distance measure (4) of the now two pdfs. Several choices are possible, e.g. # , Kullback Leibler [5] or Jensen Shannon divergence [8]. However, we want to interpret the model similarities in the context of experimentally gained classification errors. A measure for the probability of a classification error in a two class problem is the Bayes probability of error (or Bayes error) 12] P e (p 1 (x) p 2 (x) # 1 p 1 (x) ....

J. Lin. Divergence measures based on the Shannon entropy. IEEE Trans. Inform. Theory, 37(1):145--151, Jan. 1991. 3


Enhanced Word Clustering for Hierarchical Text Classification - Dhillon, Mallela, Kumar (2002)   (4 citations)  (Correct)

....word clusters at a high computational cost. In this paper, we rst derive a global criterion that captures the optimality of word clustering in an information theoretic framework. This leads to an objective function for clustering that is based on the generalized Jensen Shannon divergence [20] among an arbitrary number of probability distributions. In order to nd the best word clustering, i.e. the clustering that minimizes this objective function, we present a new divisive algorithm for clustering words. This algorithm is reminiscent of the k means algorithm but uses Kullback Leibler ....

....2 ) 1. In contrast, the Jensen Shannon divergence between p 1 and p 2 de ned by JS (p 1 ; p 2 ) 1 KL(p 1 ; 1 p 1 2 p 2 ) 2 KL(p 2 ; 1 p 1 2 p 2 ) H( 1 p 1 2 p 2 ) 1 H(p 1 ) 2 H(p 2 ) where 1 2 = 1, i 0, is clearly a symmetric measure and is bounded [20]. The Jensen Shannon divergence can be generalized to measure the distance between any nite number of probability distributions as: JS (fp i : 1 i ng) H n i p i n i H(p i ) 1) which is symmetric in the p i s ( i = 1; i 0) Let Y be another random variable ....

J. Lin. Divergence measures based on the Shannon entropy. IEEE Trans. Inform. Theory, 37(1):145-151, 1991.


Agnostic Classification of Markovian Sequences - El-Yaniv, Fine, Tishby (1998)   (Correct)

....(p x ; p y ) of two samples x and y, with empirical distributions p x and p y respectively, is defined as d(x; y) D (p x ; p y ) DKL (p x jjM ) 1 Gamma ) DKL (p y jjM ) 4) where M is the mixture of p x and p y . The function D (p; q) is an extension of the Jensen Shannon divergence [Lin91] and satisfies many useful analytic properties, such as symmetry and boundedness on both sides by the L 1 norm, in addition to its clear statistical meaning. See [Lin91, EFT97] for a more complete discussion of this measure. 2.2 Estimating the D similarity measure The key component of our ....

....(p y jjM ) 4) where M is the mixture of p x and p y . The function D (p; q) is an extension of the Jensen Shannon divergence [Lin91] and satisfies many useful analytic properties, such as symmetry and boundedness on both sides by the L 1 norm, in addition to its clear statistical meaning. See [Lin91, EFT97] for a more complete discussion of this measure. 2.2 Estimating the D similarity measure The key component of our classification method is the estimation of D for individual finite sequences, without an explicit model distribution. Since cross entropies, DKL , express code length differences, ....

Lin, Jianhua. 1991. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1):145--151.


An Information Divergence Measure For Isar Image Registration - He, Hamza, Krim (2001)   (2 citations)  (Correct)

....check that the JensenR enyi divergence is nonnegative for # # (0, 1) It is also symmetric and vanishes if and only if the probability distributions p 1 , p 2 , p n are equal, for all # 0. When # # 1, the Jensen R enyi divergence is exactly the generalized Jensen Shannon divergence [7]. Unlike other entropy based divergence measures such as the well known Kullback Leibler divergence, the Jensen R enyi divergence has the advantage of being symmetric and generalizable to any finite number of probability distributions, with a possibility of assigning weights to these ....

J. Lin, "Divergence Measures Based on the Shannon Entropy, " IEEE Trans. Information Theory, vol. 37, no. 1, pp. 145--151, 1991.


On Fitting Mixture Models - Figueiredo, Leitão, Jain (1999)   (5 citations)  (Correct)

.... (i;j) n (bff i b ff j ) D s h fY (yj b i ) fY (yj b j ) i ; i 6= j o ; 17) where D s [f Y (yj b i ) fY (yj b j ) is the symmetric Kullback Leibler (KL) divergence [12] the standard dissimilarity measure between probability densities [12] The Jensen Shannon divergence (see [13]) would be a natural candidate, because it allows weighting differently the two probability functions being compared; however, it does not have a closed form expression for Gaussian densities and so we settled for the KL divergence. In the Gaussian case, the symmetric KL divergence is [12] D s ....

J. Lin. Divergence measures based on the Shannon entropy. IEEE Trans. Information Theory, 37:145--151, 1991.


Agglomerative Multivariate Information Bottleneck - Slonim, Friedman, Tishby (2001)   (2 citations)  (Correct)

....only the probability distributions that involve t l j and t r j directly. Generalizing the results of [11] for the original IB, we now develop a closed form formula for L(t l j ; t r j ) To describe this result we need the following de nition. The Jensen Shannon (JS) divergence [7, 3] between two probabilities p 1 ; p 2 is given by JS (p 1 ; p 2 ) 1 KL[p 1 k p] 2 KL[p 2 k p] where = f 1 ; 2 g is a normalized probability and p = 1 p 1 2 p 2 . The JS divergence is equal zero if and only if both its arguments are identical. It is upper bounded and ....

J. Lin. Divergence Measures Based on the Shannon Entropy. IEEE Trans. Info. Theory, 37(1):145151, 1991.


The Power of Word Clusters for Text Classification - Slonim, Tishby (2001)   (8 citations)  (Correct)

....C) and I( W after ; C) are the information values before and after the merger, respectively. After a little algebra [17] one can see that I( w i ; w j ) p( w i ) p( w j ) D JS [p(cj w i ) p(cj w j ) 7) where the functional D JS is the Jensen Shannon (JS) divergence (see [13] [4] defined as D JS [p i ; p j ] i DKL [p i k p ] j DKL [p j k p ] 8) where in our case 8 : fp i ; p j g fp(cj w i ) p(cj w j )g f i ; j g f p( w i ) p( w ) p( w j ) p( w ) g p = i p(cj w i ) j p(cj w j ) ....

J. Lin. Divergence Measures Based on the Shannon Entropy. IEEE Transactions on Information theory, 37(1):145--151, 1991.


Document Clustering using Word Clusters via the Information.. - Slonim, Tishby (2000)   (27 citations)  (Correct)

....X before ; Y ) and I( Xafter ; Y ) are the information values before and after the merger, respectively. After a little algebra [25] one can see that I( x i ; x j ) p( x i ) p( x j ) DJS [p(yj x i ) p(yj x j ) 6) where the functional DJS is the Jensen Shannon (JS) divergence (see [13] [7] defined as DJS [p i ; p j ] i DKL [p i k p ] j DKL [p j k p ] 7) where in our case 8 : fp i ; p j g fp(yj x i ) p(yj x j )g f i ; j g f p( x i ) p( x ) p( x j ) p( x ) g p = i p(yj x i ) j p(yj x j ) 8) The JS divergence ....

....the L1 norm (or the variational distance) defined as, L1 (p i ; p j ) X y2Y jp i (y) p j (y)j: 9) Unlike the JS divergence, the L1 norm is a distance measure satisfying all the metric properties, including triangle inequality. It also approximates the JS divergence for close distributions [13]. Our second clustering algorithm therefore used the following distributional similarity measure d i;j = p( x i ) p( x j ) L1 (p(yj x i ) p(yj x j ) 10) Note that multiplication by the weight of the clusters to be merged is crucial. Otherwise there is a strong bias for assigning all ....

J. Lin. Divergence Measures Based on the Shannon Entropy. IEEE Transactions on Information theory, 37(1):145--151, 1991.


Using Unlabeled Data to Improve Text Classification - Nigam (2001)   (10 citations)  (Correct)

....disadvantage of vote entropy is that it does not consider the confidence of the committee members classifications, as indicated by the class probabilities Pm (c j jd i ; from each member. As an improvement, we measure committee disagreement for each document using Jensen Shannon divergence (Lin, 1991). Unlike vote entropy, which compares only the committee members top ranked class, Jensen Shannon divergence measures the strength of the certainty of disagreement by calculating differences in the committee members class distributions, Pm (Cjd i ) 3 Each committee member produces a posterior ....

Lin, K. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145--151.


Distributional Clustering of Movements Based on Neural.. - Globerson, Chechik, Tishby (2001)   (1 citation)  (Correct)

....information loss. In AIB the clusters that are merged at each step are those which minimize the following distance, d i;j = i j )JS i ; j [p(N jF = i) p(N jF = j) 4) Where i = p(F = i) is the apriori probability of being in cluster i, and JS denotes the Jensen Shannon divergence [7][4] a natural information theoretic distance between distributions de ned as, JS 1 ; 2 [p 1 ; p 2 ] H [ 1 p 1 2 p 2 ] 1 H [p 1 ] 2 H [p 2 ] 5) Movements are more likely to fall into the same partition (cluster) if the distributions p(N jm 1 ) and p(N jm 2 ) are close under the JS ....

J. Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information theory, 37(1):145-151, 1991.


Journal of Machine Learning Research 3 (2003) 1265-1287.. - Algorithm For Text   (Correct)

No context found.

J. Lin. Divergence measures based on the Shannon entropy. IEEE Trans. Inform. Theory, 37(1): 145--151, 1991.


Active Feedback -- UIUC TREC-2003 HARD Experiments - Xuehua Shen Chengxiang (2003)   (Correct)

No context found.

Jianhua Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory, 37(1):145--151, 1991.


Active Feedback in Ad Hoc Information Retrieval - Shen, Zhai (2005)   (Correct)

No context found.

J. Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory, 37(1):145--151, 1991.


View Selection for Volume Rendering - Udeepta Bordoloi Han-Wei   (Correct)

No context found.

J. Lin. Divergence measures based on the shannon entropy. IEEE Trans. on Information Theory, 37(1):145--151, January 1991.


A Divisive Information-Theoretic Feature Clustering.. - Dhillon, Mallela, Kumar (2003)   (5 citations)  (Correct)

No context found.

J. Lin. Divergence measures based on the Shannon entropy. IEEE Trans. Inform. Theory, 37(1): 145--151, 1991.


An Exploration of Formalized Information Retrieval Heuristics - Hui Fang Tao   (Correct)

No context found.

Lin, J. (1991). Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory, 37(1):145--151.


Detecting Change in Data Streams - Kifer, Ben-David, Gehrke (2004)   (Correct)

No context found.

J. Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory, 37(1):145--151, 1991.


Detecting Anomalous Human Interactions Using Laser Range-finders - Panangadan, al. (2004)   (Correct)

No context found.

J. Lin. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1):145--151, 1991.


PhotoTOC: Automatic Clustering for Browsing Personal.. - Platt, Czerwinski, Field (2002)   (5 citations)  (Correct)

No context found.

J. Lin. Divergence measures based on the Shannon entropy. IEEE Trans. Info. Theory, 37(1):145--151, 1991.


Image Registration and Segmentation by Maximizing the.. - Departmentof..   (Correct)

No context found.

J. Lin "Divergence Measures Based on the Shannon Entropy " IEEE Trans. Information Theory vol. 37 no. 1 pp. 145-151 1991. 147 150


Construction of a Shared Secret Key Using Continuous Variables - Cardinal, Van Assche (2003)   (Correct)

No context found.

J. LIN, Divergence measures based on the shannon entropy, IEEE Trans. Inform. Theory, 37 (1991), pp. 145--151.


Compression of Side Information - Cardinal (2003)   (1 citation)  (Correct)

No context found.

J. Lin, "Divergence measures based on the shannon entropy," IEEE Trans. Inform. Theory, vol. 37, no. 1, pp. 145--151, Jan. 1991.


Geodesic Object Representation and Recognition - Ben Hamza And (2003)   (2 citations)  (Correct)

No context found.

J. Lin, "Divergence measures based on the Shannon entropy," IEEE Trans. Information Theory, vol. 37, no. 1, pp. 145-151, 1991.


Comparing Dissimilarity Measures for Probabilistic Symbolic.. - Malerba, Monopoli   (Correct)

No context found.

Lin, J.: Divergence Measures Based on the Shannon Entropy. IEEE Transactions on Information theory, 37(1):145--151, 1991.


A Generalized Divergence Measure for Robust Image Registration - He, Hamza, Krim (2003)   (2 citations)  (Correct)

No context found.

J. Lin, "Divergence measures based on the Shannon entropy," IEEE Trans. Inform. Theory, vol. 37, pp. 145--151, Jan. 1991.


Document Clustering using Word Clusters - Via The Information   (Correct)

No context found.

J. Lin. Divergence Measures Based on the Shannon Entropy. IEEE Transactions on Information theory, 37(1):145--151, 1991.


A New Adaptive Algorithm for the - Polygonization Of Noisy (2001)   (Correct)

No context found.

Jianhua Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory, 37(1):145--151, January 1991. 26, 27


Kernel-Based Object Tracking - Comaniciu, Ramesh, Meer (2003)   (16 citations)  (Correct)

No context found.

J. Lin, "Divergence measures based on the Shannon entropy," IEEE Trans. Information Theory, vol. 37, pp. 145--151, 1991.


Stochastic Modeling for Efficient Computation of Information.. - Schreibman (2000)   (Correct)

No context found.

J. Lin. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37, No. 1:145-151, 1991.


Analysing Six Types of Protein-Protein Interfaces - Ofran, Rost (2003)   (Correct)

No context found.

Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Trans. Inform. Theory, 37, 145 --151.


Two Applications of Information Complexity - Jayram, Kumar, Sivakumar (2003)   (1 citation)  (Correct)

No context found.

J. Lin. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1):145--151, 1991.


An Information Statistics Approach to Data Stream.. - Bar-Yossef.. (2003)   (2 citations)  (Correct)

No context found.

J. Lin. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1):145--151, 1991.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC