Results 1  10
of
19
Supervised Learning of Quantizer Codebooks by Information Loss Minimization
, 2007
"... This paper proposes a technique for jointly quantizing continuous features and the posterior distributions of their class labels based on minimizing empirical information loss, such that the index K of the quantizer region to which a given feature X is assigned approximates a sufficient statistic fo ..."
Abstract

Cited by 71 (0 self)
 Add to MetaCart
(Show Context)
This paper proposes a technique for jointly quantizing continuous features and the posterior distributions of their class labels based on minimizing empirical information loss, such that the index K of the quantizer region to which a given feature X is assigned approximates a sufficient statistic for its class label Y. We derive an alternating minimization procedure for simultaneously learning codebooks in the Euclidean feature space and in the simplex of posterior class distributions. The resulting quantizer can be used to encode unlabeled points outside the training set and to predict their posterior class distributions, and has an elegant interpretation in terms of lossless source coding. The proposed method is extensively validated on synthetic and real datasets, and is applied to two diverse problems: learning discriminative visual vocabularies for bagoffeatures image classification, and image segmentation.
On the Performance of Clustering in Hilbert Spaces
"... Abstract—Based on � randomly drawn vectors in a separable Hilbert space, one may construct a �means clustering scheme by minimizing an empirical squared error. We investigate the risk of such a clustering scheme, defined as the expected squared distance of a random vector ˆ from the set of cluster ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
Abstract—Based on � randomly drawn vectors in a separable Hilbert space, one may construct a �means clustering scheme by minimizing an empirical squared error. We investigate the risk of such a clustering scheme, defined as the expected squared distance of a random vector ˆ from the set of cluster centers. Our main result states that, for an almost surely bounded ˆ, the expected excess clustering risk is y @ Ia�A. Since clustering in high (or even infinite)dimensional spaces may lead to severe computational problems, we examine the properties of a dimension reduction strategy for clustering based on Johnson–Lindenstrausstype random projections. Our results reflect a tradeoff between accuracy and computational complexity when one uses �means clustering after random projection of the data to a lowdimensional space. We argue that random projections work better than other simplistic dimension reduction schemes. Index Terms—Clustering, empirical risk minimization, Hilbert space, �means, random projections, vector quantization.
Efficient Adaptive Algorithms and Minimax Bounds for ZeroDelay Lossy Source Coding
, 2003
"... ..."
(Show Context)
Estimation of Intrinsic Dimensionality Using HighRate Vector Quantization
"... We introduce a technique for dimensionality estimation based on the notion of quantization dimension, which connects the asymptotic optimal quantization error for a probability distribution on a manifold to its intrinsic dimension. The definition of quantization dimension yields a family of estimati ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
We introduce a technique for dimensionality estimation based on the notion of quantization dimension, which connects the asymptotic optimal quantization error for a probability distribution on a manifold to its intrinsic dimension. The definition of quantization dimension yields a family of estimation algorithms, whose limiting case is equivalent to a recent method based on packing numbers. Using the formalism of highrate vector quantization, we address issues of statistical consistency and analyze the behavior of our scheme in the presence of noise. 1.
Fast rates for empirical vector quantization
 ELECTRONIC JOURNAL OF STATISTICS
, 2012
"... We consider the rate of convergence of the expected loss of empirically optimal vector quantizers. Earlier results show that the meansquared expected distortion for any fixed distribution supported on a bounded set and satisfying some regularity conditions decreases at the rate O(log n/n). We pro ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
We consider the rate of convergence of the expected loss of empirically optimal vector quantizers. Earlier results show that the meansquared expected distortion for any fixed distribution supported on a bounded set and satisfying some regularity conditions decreases at the rate O(log n/n). We prove that this rate is actually O(1/n). Although these conditions are hard to check, we show that wellpolarized distributions with continuous densities supported on a bounded set are included in the scope of this result.
Convergence of Distributed Asynchronous Learning Vector Quantization Algorithms
, 2011
"... Motivated by the problem of effectively executing clustering algorithms on very large data sets, we address a model for large scale distributed clustering methods. To this end, we briefly recall some standards on the quantization problem and some results on the almost sure convergence of the competi ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Motivated by the problem of effectively executing clustering algorithms on very large data sets, we address a model for large scale distributed clustering methods. To this end, we briefly recall some standards on the quantization problem and some results on the almost sure convergence of the competitive learning vector quantization (CLVQ) procedure. A general model for linear distributed asynchronous algorithms well adapted to several parallel computing architectures is also discussed. Our approach brings together this scalable model and the CLVQ algorithm, and we call the resulting technique the distributed asynchronous learning vector quantization algorithm (DALVQ). An indepth analysis of the almost sure convergence of the DALVQ algorithm is performed. A striking result is that we prove that the multiple versions of the quantizers distributed among the processors in the parallel architecture asymptotically reach a consensus almost surely. Furthermore, we also show that these versions converge almost surely towards the same nearly optimal value for the quantization criterion.
Tradeoffs for Space, Time, Data and Risk in Unsupervised Learning
"... Faced with massive data, is it possible to trade off (statistical) risk, and (computational) space and time? This challenge lies at the heart of largescale machine learning. Using kmeans clustering as a prototypical unsupervised learning problem, we show how we can strategically summarize the da ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Faced with massive data, is it possible to trade off (statistical) risk, and (computational) space and time? This challenge lies at the heart of largescale machine learning. Using kmeans clustering as a prototypical unsupervised learning problem, we show how we can strategically summarize the data (control space) in order to trade off risk and time when data is generated by a probabilistic model. Our summarization is based on coreset constructions from computational geometry. We also develop an algorithm, TRAM, to navigate the space/time/data/risk tradeoff in practice. In particular, we show that for a fixed risk (or data size), as the data size increases (resp. risk increases) the running time of TRAM decreases. Our extensive experiments on real data sets demonstrate the existence and practical utility of such tradeoffs, not only for kmeans but also for Gaussian Mixture Models. 1
Operator Norm Convergence of Spectral Clustering on Level Sets
"... Following Hartigan (1975), a cluster is defined as a connected component of the tlevel set of the underlying density, that is, the set of points for which the density is greater than t. A clustering algorithm which combines a density estimate with spectral clustering techniques is proposed. Our alg ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Following Hartigan (1975), a cluster is defined as a connected component of the tlevel set of the underlying density, that is, the set of points for which the density is greater than t. A clustering algorithm which combines a density estimate with spectral clustering techniques is proposed. Our algorithm is composed of two steps. First, a nonparametric density estimate is used to extract the data points for which the estimated density takes a value greater than t. Next, the extracted points are clustered based on the eigenvectors of a graph Laplacian matrix. Under mild assumptions, we prove the almost sure convergence in operator norm of the empirical graph Laplacian operator associated with the algorithm. Furthermore, we give the typical behavior of the representation of the data set into the feature space, which establishes the strong consistency of our proposed algorithm.
Joint universal lossy coding and identification of stationary mixing sources with general alphabets
 IEEE Trans. Inform. Theory
"... Abstract — We consider the problem of joint universal variablerate lossy coding and identification for parametric classes of stationary βmixing sources with general (Polish) alphabets. Compression performance is measured in terms of Lagrangians, while identification performance is measured by the ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract — We consider the problem of joint universal variablerate lossy coding and identification for parametric classes of stationary βmixing sources with general (Polish) alphabets. Compression performance is measured in terms of Lagrangians, while identification performance is measured by the variational distance between the true source and the estimated source. Provided that the sources are mixing at a sufficiently fast rate and satisfy certain smoothness and Vapnik–Chervonenkis learnability conditions, it is shown that, for bounded metric distortions, there exist universal schemes for joint lossy compression and identification whose Lagrangian redundancies converge to zero as p Vn log n/n as the block length n tends to infinity, where Vn is the Vapnik–Chervonenkis dimension of a certain class of decision regions defined by the ndimensional marginal distributions of the sources; furthermore, for each n, the decoder can identify ndimensional marginal of the active source up to a ball of radius O ( p Vn log n/n) in variational distance, eventually with probability one. The results are supplemented by several examples of parametric sources satisfying the regularity conditions. Index Terms—Learning, minimumdistance density estimation, twostage codes, universal vector quantization, Vapnik– Chervonenkis dimension. I.
On Codecell Convexity of Optimal Multiresolution Scalar Quantizers for Continuous Sources
"... Abstract—It has been shown by earlier results that for fixed rate multiresolution scalar quantizers and for mean squared error distortion measure, codecell convexity precludes optimality for certain discrete sources. However it was unknown whether the same phenomenon can occur for any continuous sou ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—It has been shown by earlier results that for fixed rate multiresolution scalar quantizers and for mean squared error distortion measure, codecell convexity precludes optimality for certain discrete sources. However it was unknown whether the same phenomenon can occur for any continuous source. In this paper, examples of continuous sources (even with bounded continuous densities) are presented for which optimal fixed rate multiresolution scalar quantizers cannot have only convex codecells, proving that codecell convexity precludes optimality also for such regular sources. Index Terms—Clustering methods, codecell convexity, continuous density function, mean squared error methods, multiresolution, optimization methods, quantization, rate distortion theory, source coding I. INTRODUCTION AND DEFINITIONS