#### DMCA

## Clustering with Bregman Divergences (2005)

### Cached

### Download Links

- [www.lans.ece.utexas.edu]
- [www.cs.utexas.edu]
- [www.ideal.ece.utexas.edu]
- [staff.icar.cnr.it]
- [dns2.icar.cnr.it]
- [www.cs.utexas.edu]
- [www2.cs.uh.edu]
- [www.lix.polytechnique.fr]
- [www.jmlr.org]
- [jmlr.csail.mit.edu]
- [people.ee.duke.edu]
- [jmlr.org]
- [www.lix.polytechnique.fr]
- [people.ee.duke.edu]
- [www.lans.ece.utexas.edu]
- [hercules.ece.utexas.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | JOURNAL OF MACHINE LEARNING RESEARCH |

Citations: | 435 - 59 self |

### Citations

11690 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...iterative relocation scheme of Euclidean kmeans [14]. The popularity of this algorithm stems from its simplicity and scalability. The corresponding soft 1 clustering algorithm obtained by applying EM =-=[9] to a-=- mixture model of Gaussians with identical, isotropic covariances, is also popular and can be scaled to large data sets [6]. Underlying both hard and soft Euclidean kmeans is a Gaussian \noise" m... |

5222 | Convex Analysis
- Rockafellar
- 1970
(Show Context)
Citation Context ...exity of ψ implies that ∇ψ is monotonic and it is possible to define the inverse function (∇ψ) −1 : Θ ∗ ↦→ Θ, where Θ ∗ = int(dom(ψ ∗ )). If the pair (Θ, ψ) is of Legendre type, then it can be shown (=-=Rockafellar, 1970-=-) that (Θ ∗ , ψ ∗ ) is also of Legendre type, and (Θ,ψ) and (Θ ∗ ,ψ ∗ ) are called Legendre duals of each other. Further, the gradient mappings are continuous and form a bijection between the two open... |

2968 | Some methods for classification and analysis of multivariate observations
- MacQueen
- 1967
(Show Context)
Citation Context ... of parametric clustering problems have been developed over the years. Among the hard clustering algorithms, the most well-known is the iterative relocation scheme for the Euclidean kmeans algorithm (=-=MacQueen, 1967-=-; Jain and Dubes, 1988; Duda et al., 2001). Another widely used clustering algorithm with a similar scheme is the Linde-Buzo-Gray (LBG) algorithm (Linde et al., 1980; Buzo et al., 1980) based on the I... |

2761 |
Pattern Classification
- Duda, Hart, et al.
- 2001
(Show Context)
Citation Context ...e been developed over the years. Among the hard clustering algorithms, the most well-known is the iterative relocation scheme for the Euclidean kmeans algorithm (MacQueen, 1967; Jain and Dubes, 1988; =-=Duda et al., 2001-=-). Another widely used clustering algorithm with a similar scheme is the Linde-Buzo-Gray (LBG) algorithm (Linde et al., 1980; Buzo et al., 1980) based on the Itakura-Saito distance, which has been use... |

2749 |
Dubes. Algorithms for Clustering Data
- Jain, C
- 1988
(Show Context)
Citation Context ...gorithm for all Bregman divergences. 1 Introduction Data clustering is a fundamental \unsupervised" learning procedure that has been extensively studied across varied disciplines over several dec=-=ades [14-=-]. It has produced several parametric clustering methods which partition the data into a pre-specied number of partitions with a cluster representative corresponding to every cluster, such that a well... |

2099 |
Vector Quantization and Signal Compression
- GERSHO, GRAY
- 1992
(Show Context)
Citation Context ...ee broad and overlapping ideas. First, an information theoretic viewpoint of the clustering problem is invaluable. Such considerations occur in several techniques, from classical vector quantization (=-=Gersho and Gray, 1992-=-) to information theoretic clustering (Dhillon et al., 2003) and the information bottleneck method (Tishby et al., 1999). In particular, the information theoretic hard clustering (Dhillon et al., 2003... |

1621 |
An algorithm for vector quantizer design
- Linde, Buzo, et al.
- 1982
(Show Context)
Citation Context ...implicity and scalability of kmeans but can cater to a much larger class of distortion functions? A hint towards an armative answer to this question is provided by the Linde-Buzo-Gray (LBG) algorithm =-=[17]-=- based on the Itakura-Saito distance, which has been used in the signal-processing community for clustering speech data. The more recent information theoretic clustering algorithm [10] for clustering ... |

1430 |
The EM Algorithm and Extensions
- McLachlan, Krishnan
- 1996
(Show Context)
Citation Context ...re assumed to be constant for the distributions. likelihood function, then the algorithm will converge to a local maximum of the likelihood. For a detailed proof and other related results, please see =-=[18]-=-. As stated earlier, the Bregman soft clustering problem is to estimate the maximum likelihood parameters for a mixture model of the form given in (4.8). Applying the EM algorithm to this problem give... |

1019 | Text classification from labeled and unlabeled documents using EM
- Nigam, McCallum, et al.
- 2000
(Show Context)
Citation Context ...e possibility of designing annealing schemes for Bregman soft clustering interpreting 1/β as the temperature parameter. 5 Experiments There are a number of experimental results in existing literature =-=[17, 10, 20, 16]-=- that illustrate the usefulness of Bregman divergences and the Bregman clustering algorithms in specific domains. The classical kmeans algorithm, which is a special case of the Bregman hard clustering... |

784 | Graphical models, exponential families, and variational inference
- Wainwright, Jordan
- 2008
(Show Context)
Citation Context ...ch that 〈a, t(ω)〉 = c (a constant) ∀ω ∈ Ω, then this representation is said to be minimal. 7 For a minimal representation, there exists a unique probability density f(ω; θ) for every choice of θ ∈ Θ (=-=Wainwright and Jordan, 2003-=-). Fψ is called a full exponential family of order d in such a case. In addition, if the parameter space Θ is open, i.e., Θ = int(Θ), then Fψ is called a regular exponential family. It can be easily s... |

687 |
Mixture densities, maximum likelihood and the EM algorithm
- Redner, Walker
- 1984
(Show Context)
Citation Context ...ta, we revisit EM for mixture model estimation for this class of problems. We show that, with proper representation, the bijection gives an alternative interpretation of a well known ecient EM scheme =-=[22-=-] applicable in this case. The scheme simplies the computationally intensive maximization-step of the EM algorithm, resulting in a general soft-clustering algorithm for all members of the exponential ... |

677 | A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and - Bilmes - 1998 |

584 | Cluster ensembles – a knowledge reuse framework for combining multiple partitions - Strehl, Ghosh |

536 | The information bottleneck method
- Tishby, Pereira, et al.
- 1999
(Show Context)
Citation Context ...oretic viewpoint is very insightful. Such considerations occur in several techniques, from classical vector quantization to information theoretic clustering [10] and the information bottleneck method =-=[26]-=-. In particular, the information theoretic clustering [10] approach solved the problem of distributional clustering with a formulation involving loss in Shannon's mutual information. In this paper, we... |

479 |
The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming
- Bregman
- 1967
(Show Context)
Citation Context ...ion f , when well-defined, is denoted by f −1 . 2. Preliminaries In this section, we define the Bregman divergence corresponding to a strictly convex function and present some examples. Definition 1 (=-=Bregman, 1967-=-; Censor and Zenios, 1998) Let φ : S ↦→ R,S = dom(φ) be a strictly convex function defined on a convex set S ⊆ R d such that φ is differentiable on ri(S), assumed to be nonempty. The Bregman divergenc... |

465 | Methods of Information Geometry - Amari, Nagaoka - 1993 |

401 |
Rate-Distortion theory: a mathematical basis for data compression
- Berger
- 1971
(Show Context)
Citation Context ...f this viewpoint. We restrict our attention to regular exponential families and regular Bregman divergences in this section. 6.1 Rate Distortion Theory for Bregman Divergences Rate distortion theory (=-=Berger, 1971-=-; Berger and Gibson, 1998) deals with the fundamental limits of quantizing a stochastic source X ∼ p(x), x ∈ X , using a random variable ˆX over a reproduction alphabet ˆX typically assumed to embed t... |

368 |
Zenios , Parallel Optimization: Theory, Algorithms, and Applications
- Censor, A
- 1997
(Show Context)
Citation Context ...ll-defined, is denoted by f −1 . 2. Preliminaries In this section, we define the Bregman divergence corresponding to a strictly convex function and present some examples. Definition 1 (Bregman, 1967; =-=Censor and Zenios, 1998-=-) Let φ : S ↦→ R,S = dom(φ) be a strictly convex function defined on a convex set S ⊆ R d such that φ is differentiable on ri(S), assumed to be nonempty. The Bregman divergence dφ : S × ri(S) ↦→ [0,∞)... |

299 | Scaling Clustering Algorithms to Large Databases
- Bradley, Fayyad, et al.
- 1998
(Show Context)
Citation Context .... The corresponding soft 1 clustering algorithm obtained by applying EM [9] to a mixture model of Gaussians with identical, isotropic covariances, is also popular and can be scaled to large data sets =-=[6]. Un-=-derlying both hard and soft Euclidean kmeans is a Gaussian \noise" model, which corresponds to a squared-Euclidean distortion function [15]. This dis Dept. of ECE, University of Texas at Austin, ... |

275 | Computation of channel capacity and rate distortion functions
- Blahut
- 1972
(Show Context)
Citation Context ...The rate distortion problem is a convex problem that involves optimizing over the probabilistic assignments p( ˆx|x) and can be theoretically solved using the Blahut-Arimoto algorithm (Arimoto, 1972; =-=Blahut, 1972-=-; Csiszár, 1974; Cover and Thomas, 1991). However, numerical computation of the rate distortion function through the Blahut-Arimoto algorithm is often infeasible in practice, primarily due to lack of ... |

266 |
Information and Exponential Families in Statistical Theory
- Barndorff-Nielsen
- 1978
(Show Context)
Citation Context ...x) given by g(x;θ) = exp(〈θ,x〉 − ψ(θ))p0(x) (6) is such that f(ω;θ)/g(x;θ) does not depend on θ. Thus, x is a sufficient statistic (Amari and Nagaoka, 2001) for the family, and in fact, can be shown (=-=Barndorff-Nielsen, 1978-=-) to be minimally 6. For conciseness, we abuse notation and continue to use the Lebesgue integral sign even for counting measures. The integral in this case actually denotes a sum over T . Further, th... |

239 | Why least squares and maximum entropy? An axiomatic approach for linear inverse problems - Csiszar - 1991 |

200 | Impact of similarity measures on webpage clustering
- Strehl, Ghosh, et al.
- 2000
(Show Context)
Citation Context ... data mining literature. However, in many data mining applications, this distortion function is not a good match with the data, and consequentlyskmeans performs poorly as compared to other approaches =-=[25]-=-. In fact, in such situations kmeans often becomes a convenient strawman to show the superiority of a competing technique! This has also led to the search for more appropriate distance functions for s... |

188 |
Information geometry and alternating minimization procedures, Stat
- Csiszár, Tusnády
- 1984
(Show Context)
Citation Context ...unds in the online learning setting extensively use this framework [3]. In the unsupervised learning setting, use of this framework typically involves development of alternate minimization procedures =-=[8]-=-. For example, [21, 27] analyze and develop iterative alternate projection procedures for solving unsupervised optimization problems involving objective functions based on Bregman divergences under va... |

166 |
Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions
- Berg, Christensen, et al.
- 1984
(Show Context)
Citation Context ...ijection between regular exponential families and regular Bregman divergences. The crux of the argument relies on results in harmonic analysis connecting positive definiteness to integral transforms (=-=Berg et al., 1984-=-). In particular, we use a result due to Devinatz (1955) that relates exponentially convex functions to Laplace transforms of bounded non-negative measures. Theorem 5 (Devinatz (1955)) Let Θ ⊆ R d be ... |

151 | A generalization of principal component analysis to the exponential family
- Collins, Dasgupta, et al.
- 2001
(Show Context)
Citation Context ...tial distribution can be written as the sum of a Bregman divergence and a function that does not depend on the parameters, i.e., (3.6) log p(xj) = D (x; ) + f(x): This result was used later on by [7]=-=-=- to extend PCA to exponential families. We make the relationship between Bregman divergences and exponential families exact by showing that there is actually a bijection between Bregman divergences an... |

149 | Relative loss bounds for on-line density estimation with the exponential family of distributions
- Azoury, Warmuth
(Show Context)
Citation Context ... functions and their corresponding Bregman divergences. Bregman divergences have several interesting and useful properties, such as non-negativity, convexity in thesrst argument, etc. For details see =-=[3]-=- and [5]. 2.1 Bregman Information The dual formulation of Shannon's celebrated rate distortion problem [13] involvessnding a coding scheme with a given rate, such that the expected distortion between ... |

134 | A divisive information-theoretic feature clustering algorithm for text classification
- Dhillon, Mallela, et al.
(Show Context)
Citation Context ...(LBG) algorithm [17] based on the Itakura-Saito distance, which has been used in the signal-processing community for clustering speech data. The more recent information theoretic clustering algorithm =-=[10]-=- for clustering probability distributions also has asavor similar to kmeans. This algorithm uses the KL-divergence as the distortion function and is wellsuited for various clustering tasks in the anal... |

129 |
Convex Analysis
- Rockafeller
- 1970
(Show Context)
Citation Context ...lustering problem. We begin by dening Bregman divergence [21]. Let : S 7! R be a strictly convex function dened on a convex set S R d , such that is dierentiable on int(S), the interior of S [23]. The Bregman divergence D : S int(S) 7! [0; 1) is dened as D (x; y) = (x) (y) hx y; r(y)i; where r is the gradient of . Table 1 contains a list of some convex functions and their correspo... |

126 |
An algorithm for computing the capacity of arbitrary discrete memoryless channels
- Arimoto
- 1972
(Show Context)
Citation Context ...n of X and ˆ X. The rate distortion problem is a convex problem that involves optimizing over the probabilistic assignments p(ˆx|x) and can be theoretically solved using the Blahut-Arimoto algorithm (=-=Arimoto, 1972-=-, Blahut, 1972, Csiszár, 1974, Cover and Thomas, 1991). However, numerical computation of the rate distortion function through the Blahut-Arimoto algorithm is often infeasible in practice, primarily d... |

118 | Information geometry of the EM and em algorithms for neural networks
- Amari
- 1994
(Show Context)
Citation Context ...ction and it uniquely determines the exponential family Fs. Further, given an exponential family Fs, the log-partition function,sis uniquely determined up to a constant additive term. It can be shown =-=[2-=-] that is a convex set in R d andsis a strictly convex and dierentiable function on int(). 3.2 Expectation parameters and Legendre dualitysConsider a d-dimensional real random variable X following an ... |

113 | Game theory, maximum entropy, minimum discrepancy, and robust bayesian decision theory - Grnwald, Dawid - 2002 |

101 | Lossy source coding
- Berger, Gibson
- 1980
(Show Context)
Citation Context ...nt. We restrict our attention to regular exponential families and regular Bregman divergences in this section. 6.1 Rate Distortion Theory for Bregman Divergences Rate distortion theory (Berger, 1971; =-=Berger and Gibson, 1998-=-) deals with the fundamental limits of quantizing a stochastic source X ∼ p(x), x ∈ X , using a random variable ˆX over a reproduction alphabet ˆX typically assumed to embed the source alphabet X , i.... |

99 | An information-theoretic analysis of hard and soft assignment methods for clustering,” in Learning in graphical models
- Kearns, Mansour, et al.
- 1998
(Show Context)
Citation Context ...ces, is also popular and can be scaled to large data sets [6]. Underlying both hard and soft Euclidean kmeans is a Gaussian \noise" model, which corresponds to a squared-Euclidean distortion func=-=tion [15-=-]. This dis Dept. of ECE, University of Texas at Austin, TX, USA. y Dept. of CS, University of Texas at Austin, TX, USA 1 In soft clustering, data points can have non-zero probabilities of belonging t... |

88 |
Diversity and dissimilarity coefficients: A unified approach. Theoretical Population Biology
- Rao
- 1982
(Show Context)
Citation Context ...H BREGMAN DIVERGENCES On a larger context, there has been research in various fields that has focussed on generalized notions of distances and on extending known methodologies to the general setting (=-=Rao, 1982-=-). Grünwald and Dawid (2004) recently extended the ‘redundancy-capacity theorem’ of information theory to arbitrary discrepancy measures. As an extension of Shannon’s entropy (Cover and Thomas, 1991),... |

77 |
A Modern Approach to Probability Theory
- Fristedt, Gray
(Show Context)
Citation Context ...ection result, we need to review the following background material. 3.1 Exponential families Consider a family F of probability densities on a measurable space( ; B) where B is a -algebra on the set [12]. Suppose every probability density, p 2 F , is parameterized by d realvalued variables = f j g d j=1 so that F = fp = f(!; )j! 2 B; 2 R d g: Then, F is called a d-dimensional parametric m... |

45 | On the optimality of conditional expectation as a Bregman predictor - Banerjee, Guo, et al. |

45 |
A mapping approach to rate-distortion computation and analysis
- Rose
- 1939
(Show Context)
Citation Context ...tice, primarily due to lack of knowledge of the optimal support of the reproduction random variable. An efficient solution for addressing this problem is the mapping approach (Banerjee et al., 2004a; =-=Rose, 1994-=-), where one solves a related problem that assumes cardinality k for the support of the reproduction random variable. In this setting, the optimization is over the assignments as well as the support s... |

42 | Feature weighting in k-means clustering
- Modha, Spangler
- 2003
(Show Context)
Citation Context ...ving the problem. There has also been work on learning algorithms that involve minimizing loss functions based on distortion measures that are somewhat dierent from Bregman divergences. For example, [=-=19]-=- presents the convexkmeans clustering for distortion measures that are always non-negative and convex in the second argument, using the notion of a generalized centroid. Bregman divergences, on the ot... |

33 | Towards Systematic Design of Distance Functions
- Aggarwal
- 2003
(Show Context)
Citation Context ...ituations kmeans often becomes a convenient strawman to show the superiority of a competing technique! This has also led to the search for more appropriate distance functions for speci c applications =-=[1, 25]-=-. Is it possible to devise an algorithm which has the simplicity and scalability of kmeans but can cater to a much larger class of distortion functions? A hint towards an armative answer to this quest... |

31 |
Generalized projections for non-negative functions
- Csiszár
- 1995
(Show Context)
Citation Context ... each of the partitions such that the expected Bregman divergence of the data 2. Note that F(·) is a function and it is possible to extend the notion of Bregman divergences to the space of functions (=-=Csiszár, 1995-=-; Grünwald and Dawid, 2004). 3. For x ∈ {0,1} (Bernoulli) and y ∈ (0,1) (posterior probability for x = 1), we have xlog( x y 1−x ) + (1 − x)log( 1−y ) = log(1+exp(− f(x)g(y))), i.e., the logistic loss... |

29 | Duality and Auxiliary Functions for Bregman Distances
- Pietra, Pietra, et al.
- 2001
(Show Context)
Citation Context ...a clustering algorithm that is a generalization of the kmeans algorithm and is guaranteed to converge to a local minimum of the Bregman hard clustering problem. We begin by dening Bregman divergence [21]. Let : S 7! R be a strictly convex function dened on a convex set S R d , such that is dierentiable on int(S), the interior of S [23]. The Bregman divergence D : S int(S) 7! [0; 1) is de... |

25 | Maximum likelihood and the information bottleneck
- Slonim, Weiss
- 2002
(Show Context)
Citation Context ...onential family corresponding to KL-divergence, i.e., the multinomial family (Collins et al., 2001). Further, the iterative IB algorithm is the same as the EM algorithm for multinomial distributions (=-=Slonim and Weiss, 2002-=-), and also the Bregman soft clustering algorithm using KLdivergence. 7. Experiments There are a number of experimental results in existing literature (MacQueen, 1967; Linde et al., 1980; Buzo et al.,... |

22 |
Text classi from labeled and unlabeled documents using EM
- Nigam, Mccallum, et al.
(Show Context)
Citation Context ...he possibility of designing annealing schemes for Bregman soft clustering interpreting 1=sas the temperature parameter. 5 Experiments There are a number of experimental results in existing literature =-=[17, 10, 20, 16-=-] that illustrate the usefulness of Bregman divergences and the Bregman clustering algorithms in specic domains. The classical kmeans algorithm, which is a special case of the Bregman hard clustering ... |

22 |
On the computation of rate-distortion functions
- Csiszar
- 1974
(Show Context)
Citation Context ...rtion problem is a convex problem that involves optimizing over the probabilistic assignments p( ˆx|x) and can be theoretically solved using the Blahut-Arimoto algorithm (Arimoto, 1972; Blahut, 1972; =-=Csiszár, 1974-=-; Cover and Thomas, 1991). However, numerical computation of the rate distortion function through the Blahut-Arimoto algorithm is often infeasible in practice, primarily due to lack of knowledge of th... |

20 | Relative expected instantaneous loss bounds - Forster, Warmuth |

14 | Stationary covariances associated with exponentially convex functions, Bernoulli 9(4 - Ehm, Genton, et al. - 2003 |

12 | Kolmogorov complexity and information theory; with an interpretation in terms of questions and answers
- Grünwald, Vitányi
- 2003
(Show Context)
Citation Context ...seful properties, such as non-negativity, convexity in thesrst argument, etc. For details see [3] and [5]. 2.1 Bregman Information The dual formulation of Shannon's celebrated rate distortion problem =-=[13]-=- involvessnding a coding scheme with a given rate, such that the expected distortion between the source random variable and the decoded random variable is minimized. The achieved distortion is called ... |

10 | The representation of functions as Laplace–Stieltjes integrals - Devinatz - 1955 |

9 |
Spectral distance measures between Gaussian processes
- Kazakos, Papantoni-Kazakos
- 1980
(Show Context)
Citation Context ...ra F(e jθ ) and G(e jθ ) and can also be interpreted as the I-divergence (Csiszár, 1991) between the generating processes under the assumption that they are equal mean, stationary Gaussian processes (=-=Kazakos and Kazakos, 1980-=-). Table 1 contains a list of some common convex functions and their corresponding Bregman divergences. Bregman divergences have several interesting and useful properties, such as non-negativity, conv... |

9 | On entropy rates of dynamical systems and Gaussian processes
- Palu·s
- 1997
(Show Context)
Citation Context ...gnal f(t), then the functional φ(F)=− 1 R π 2π −π log(F(e jθ ))dθ is convex in F and corresponds to the negative entropy rate of the signal assuming it was generated by a stationary Gaussian process (=-=Palus, 1997-=-; Cover and Thomas, 1991). The Bregman divergence between F(e jθ ) and G(e jθ ) (the power spectrum of another signal g(t)) is given by dφ(F,G) = 1 2π = 1 2π Z π −π Z π −π � −log(F(e jθ ))+log(G(e jθ ... |

8 | An information theoretic analysis of maximum likelihood mixture estimation for exponential families - Banerjee, Dhillon, et al. - 2004 |

7 | The EM Algorithm - COLLINS - 1997 |

5 | Learning continuous latent variable models with bregman divergences - WANG, SCHUURMANS |

3 |
Learning latent variable models with Bregman divergences
- WANG, SCHUURMANS
(Show Context)
Citation Context ... learning setting extensively use this framework [3]. In the unsupervised learning setting, use of this framework typically involves development of alternate minimization procedures [8]. For example, =-=[21, 27]-=- analyze and develop iterative alternate projection procedures for solving unsupervised optimization problems involving objective functions based on Bregman divergences under various kinds of constrai... |

2 | Optimal Bregman prediction and Jensen’s equality - Banerjee, Guo, et al. - 2004 |

1 |
Optimization approach to generating families of k{means like algorithms
- Kogan, Teboulle, et al.
- 2003
(Show Context)
Citation Context ...he possibility of designing annealing schemes for Bregman soft clustering interpreting 1=sas the temperature parameter. 5 Experiments There are a number of experimental results in existing literature =-=[17, 10, 20, 16-=-] that illustrate the usefulness of Bregman divergences and the Bregman clustering algorithms in specic domains. The classical kmeans algorithm, which is a special case of the Bregman hard clustering ... |

1 | The Classical Moment Problem and some related questions in analysis - BANERJEE, DHILLON, et al. - 1965 |

1 | Kolmogorov complexity and information theory with an interpretation in terms of questions and answers - BANERJEE, DHILLON, et al. - 2003 |

1 | Speech coding based upon vector quantization - Banerjee, Dhillon, et al. - 1980 |

1 |
Generalized projections for non-negative functions. Acta Mathematica Hungarica
- Csiszár
- 1995
(Show Context)
Citation Context ... based on ideas from Shannon’s rate distortion theory. Then, we motivate the 2. Note that F (·) is a function and it is possible to extend the notion of Bregman divergences to the space of functions (=-=Csiszár, 1995-=-, Grünwald and Dawid, 2004). 5sBanerjee, Merugu, Dhillon, and Ghosh Table 1: Bregman divergences generated from some convex functions. Domain φ(x) dφ(x, y) Divergence R x 2 (x − y) 2 Squared loss R+ x... |

1 | Clustering with Bregman Divergences - Grünwald, Dawid |

1 | Cluster ensembles – a knowledge reuse framework for combining partitionings - Banerjee, Dhillon, et al. |