Results 1 
7 of
7
The CauchySchwarz Divergence and Parzen Windowing: Connections to Graph Theory and Mercer Kernels
, 2006
"... This paper contributes a tutorial level discussion of some interesting properties of the recent CauchySchwarz (CS) divergence measure between probability density functions. This measure brings together elements from several different machine learning fields, namely information theory, graph theory ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
This paper contributes a tutorial level discussion of some interesting properties of the recent CauchySchwarz (CS) divergence measure between probability density functions. This measure brings together elements from several different machine learning fields, namely information theory, graph theory and Mercer kernel and spectral theory. These connections are revealed when estimating the CS divergence nonparametrically using the Parzen window technique for density estimation. An important consequence of these connections is that they enhance our understanding of the different machine learning schemes relative to each other.
Estimating the Information Potential with the Fast Gauss Transform
 in Proc. of ICA
, 2006
"... Abstract. In this paper, we propose a fast and accurate approximation to the information potential of Information Theoretic Learning (ITL) using the Fast Gauss Transform (FGT). We exemplify here the case of the Minimum Error Entropy criterion to train adaptive systems. The FGT reduces the complexity ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we propose a fast and accurate approximation to the information potential of Information Theoretic Learning (ITL) using the Fast Gauss Transform (FGT). We exemplify here the case of the Minimum Error Entropy criterion to train adaptive systems. The FGT reduces the complexity of the estimation from O(N 2) to O(pkN) where p is the order of the Hermite approximation and k the number of clusters utilized in FGT. Further, we show that FGT converges to the actual entropy value rapidly with increasing order p unlike the Stochastic Information Gradient, the present O(pN) approximation to reduce the computational complexity in ITL. We test the performance of these FGT methods on System Identification with encouraging results. 1
An Information Theoretic Approach to Machine Learning
, 2005
"... In this thesis, theory and applications of machine learning systems based on information theoretic criteria as performance measures are studied. A new clustering algorithm based on maximizing the CauchySchwarz (CS) divergence measure between probability density functions (pdfs) is proposed. The CS ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
In this thesis, theory and applications of machine learning systems based on information theoretic criteria as performance measures are studied. A new clustering algorithm based on maximizing the CauchySchwarz (CS) divergence measure between probability density functions (pdfs) is proposed. The CS divergence is estimated nonparametrically using the Parzen window technique for density estimation. The problem domain is transformed from discrete 0/1 cluster membership values to continuous membership values. A constrained gradient descent maximization algorithm is implemented. The gradients are stochastically approximated to reduce computational complexity, making the algorithm more practical. Parzen window annealing is incorporated into the algorithm to help avoid convergence to a local maximum. The clustering results obtained on synthetic and real data are encouraging. The Parzen windowbased estimator for the CS divergence is shown to have a dual expression as a measure of the cosine of the angle between cluster mean vectors in a feature space determined by the eigenspectrum of a Mercer kernel matrix. A spectral clustering
LEGClust  A Clustering Algorithm Based on Layered Entropic Subgraphs
, 2008
"... Hierarchical clustering is a stepwise clustering method usually based on proximity measures between objects or sets of objects from a given data set. The most common proximity measures are distance measures. The derived proximity matrices can be used to build graphs, which provide the basic structu ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Hierarchical clustering is a stepwise clustering method usually based on proximity measures between objects or sets of objects from a given data set. The most common proximity measures are distance measures. The derived proximity matrices can be used to build graphs, which provide the basic structure for some clustering methods. We present here a new proximity matrix based on an entropic measure and also a clustering algorithm (LEGClust) that builds layers of subgraphs based on this matrix and uses them and a hierarchical agglomerative clustering technique to form the clusters. Our approach capitalizes on both a graph structure and a hierarchical construction. Moreover, by using entropy as a proximity measure, we are able, with no assumption about the cluster shapes, to capture the local structure of the data, forcing the clustering method to reflect this structure. We present several experiments on artificial and real data sets that provide evidence on the superior performance of this new algorithm when compared with competing ones.
Information Cut for Clustering using a Gradient Descent Approach
, 2006
"... We introduce a new graph cut for clustering which we call the Information Cut. It is derived using Parzen windowing to estimate an information theoretic distance measure between probability density functions. We propose to optimize the Information Cut using a gradient descentbased approach. Our alg ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We introduce a new graph cut for clustering which we call the Information Cut. It is derived using Parzen windowing to estimate an information theoretic distance measure between probability density functions. We propose to optimize the Information Cut using a gradient descentbased approach. Our algorithm has several advantages compared to many other graphbased methods in terms of determining an appropriate affinity measure, computational complexity, memory requirements and coping with different data scales. We show that our method may produce clustering and image segmentation results comparable or better than the stateofthe art graphbased methods.
EntropyInspired Competitive Clustering Algorithms
"... Abstract In this paper, the wellknown competitive clustering algorithm (CA) is revisited and reformulated from a point of view of entropy minimization. That is, the second term of the objective function in CA can be seen as quadratic or secondorder entropy. Along this novel explanation, two genera ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract In this paper, the wellknown competitive clustering algorithm (CA) is revisited and reformulated from a point of view of entropy minimization. That is, the second term of the objective function in CA can be seen as quadratic or secondorder entropy. Along this novel explanation, two generalized competitive clustering algorithms inspired by Renyi entropy and Shannon entropy, i.e. RECA and SECA, are respectively proposed in this paper. Simulation results show that CA requires a large number of initial clusters to obtain the right number of clusters, while RECA and SECA require small and moderate number of initial clusters respectively. Also the iteration steps in RECA and SECA are less than that of CA. Further CA and RECA are generalized to CAp and RECAp by using the porder entropy and Renyi's porder entropy in CA and RECA respectively. Simulation results show that the value of phas a great impact on the performance of CAp, whereas it has little in
uence on that of RECAp. Key words: competitive clustering; fuzzy cmeans; optimal number of clusters; cluster validity; entropy minimization
Maximum WithinCluster Association
"... This paper addresses a new method and aspect of informationtheoretic clustering where we exploits the minimum entropy principle and the quadratic distance measure between probability densities. We present a new minimum entropy objective function which leads to the maximization of withincluster as ..."
Abstract
 Add to MetaCart
(Show Context)
This paper addresses a new method and aspect of informationtheoretic clustering where we exploits the minimum entropy principle and the quadratic distance measure between probability densities. We present a new minimum entropy objective function which leads to the maximization of withincluster association. A simple implementation using the gradient ascent method is given. In addition, we show that the minimum entropy principle leads to the objective function of the kmeans clustering, and the maximum withincluster association is closed related to the spectral clustering which is an eigendecompositionbased method. This informationtheoretic view of spectral clustering leads us to use the kernel density estimation method in constructing an affinity matrix.