Results 11  20
of
443
Supervised Learning of Quantizer Codebooks by Information Loss Minimization
, 2007
"... This paper proposes a technique for jointly quantizing continuous features and the posterior distributions of their class labels based on minimizing empirical information loss, such that the index K of the quantizer region to which a given feature X is assigned approximates a sufficient statistic fo ..."
Abstract

Cited by 71 (0 self)
 Add to MetaCart
(Show Context)
This paper proposes a technique for jointly quantizing continuous features and the posterior distributions of their class labels based on minimizing empirical information loss, such that the index K of the quantizer region to which a given feature X is assigned approximates a sufficient statistic for its class label Y. We derive an alternating minimization procedure for simultaneously learning codebooks in the Euclidean feature space and in the simplex of posterior class distributions. The resulting quantizer can be used to encode unlabeled points outside the training set and to predict their posterior class distributions, and has an elegant interpretation in terms of lossless source coding. The proposed method is extensively validated on synthetic and real datasets, and is applied to two diverse problems: learning discriminative visual vocabularies for bagoffeatures image classification, and image segmentation.
On Bregman Voronoi Diagrams
 in "Proc. 18th ACMSIAM Sympos. Discrete Algorithms
, 2007
"... The Voronoi diagram of a point set is a fundamental geometric structure that partitions the space into elementary regions of influence defining a discrete proximity graph and dually a wellshaped Delaunay triangulation. In this paper, we investigate a framework for defining and building the Voronoi ..."
Abstract

Cited by 61 (28 self)
 Add to MetaCart
(Show Context)
The Voronoi diagram of a point set is a fundamental geometric structure that partitions the space into elementary regions of influence defining a discrete proximity graph and dually a wellshaped Delaunay triangulation. In this paper, we investigate a framework for defining and building the Voronoi diagrams for a broad class of distortion measures called Bregman divergences, that includes not only the traditional (squared) Euclidean distance, but also various divergence measures based on entropic functions. As a byproduct, Bregman Voronoi diagrams allow one to define informationtheoretic Voronoi diagrams in statistical parametric spaces based on the relative entropy of distributions. We show that for a given Bregman divergence, one can define several types of Voronoi diagrams related to each other
A Unified View of Matrix Factorization Models
"... Abstract. We present a unified view of matrix factorization that frames the differences among popular methods, such as NMF, Weighted SVD, EPCA, MMMF, pLSI, pLSIpHITS, Bregman coclustering, and many others, in terms of a small number of modeling choices. Many of these approaches can be viewed as m ..."
Abstract

Cited by 58 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We present a unified view of matrix factorization that frames the differences among popular methods, such as NMF, Weighted SVD, EPCA, MMMF, pLSI, pLSIpHITS, Bregman coclustering, and many others, in terms of a small number of modeling choices. Many of these approaches can be viewed as minimizing a generalized Bregman divergence, and we show that (i) a straightforward alternating projection algorithm can be applied to almost any model in our unified view; (ii) the Hessian for each projection has special structure that makes a Newton projection feasible, even when there are equality constraints on the factors, which allows for matrix coclustering; and (iii) alternating projections can be generalized to simultaneously factor a set of matrices that share dimensions. These observations immediately yield new optimization algorithms for the above factorization methods, and suggest novel generalizations of these methods such as incorporating row and column biases, and adding or relaxing clustering constraints. 1
On the optimality of conditional expectation as a Bregman predictor
, 2003
"... Given a probability space (Ω, F,P), a Fmeasurable random variable X, anda subσalgebra G ⊂ F, it is well known that the conditional expectation E[XG] isthe optimal L 2predictor (also known as “the least mean square error ” predictor) of X among all the Gmeasurable random variables [8, 11]. In t ..."
Abstract

Cited by 47 (4 self)
 Add to MetaCart
(Show Context)
Given a probability space (Ω, F,P), a Fmeasurable random variable X, anda subσalgebra G ⊂ F, it is well known that the conditional expectation E[XG] isthe optimal L 2predictor (also known as “the least mean square error ” predictor) of X among all the Gmeasurable random variables [8, 11]. In this paper, we provide necessary and sufficient conditions under which the conditional expectation is the unique optimal predictor. We show that E[XG] is the optimal predictor for all Bregma nLoss Functions (BLFs), of which L 2 loss function is a special case. Moreover, under mild conditions, we show that BLFs are exhaustive. Namely, if the inÞmum of E[F (X, Y)] over all the Gmeasurable random variables Y and for any variable X is attained at the conditional expectation E[XG], then F is a BLF.
Unsupervised Learning on Kpartite Graphs
, 2006
"... Various data mining applications involve data objects of multiple types that are related to each other, which can be naturally formulated as a kpartite graph. However, the research on mining the hidden structures from a kpartite graph is still limited and preliminary. In this paper, we propose a g ..."
Abstract

Cited by 46 (4 self)
 Add to MetaCart
Various data mining applications involve data objects of multiple types that are related to each other, which can be naturally formulated as a kpartite graph. However, the research on mining the hidden structures from a kpartite graph is still limited and preliminary. In this paper, we propose a general model, the relation summary network, to find the hidden structures (the local cluster structures and the global community structures) from a kpartite graph. The model provides a principal framework for unsupervised learning on kpartite graphs of various structures. Under this model, we derive a novel algorithm to identify the hidden structures of a kpartite graph by constructing a relation summary network to approximate the original kpartite graph under a broad range of distortion measures. Experiments on both synthetic and real data sets demonstrate the promise and effectiveness of the proposed model and algorithm. We also establish the connections between existing clustering approaches and the proposed model to provide a unified view to the clustering approaches.
Differentiable Sparse Coding
"... Prior work has shown that features which appear to be biologically plausible as well as empirically useful can be found by sparse coding with a prior such as a laplacian (L1) that promotes sparsity. We show how smoother priors can preserve the benefits of these sparse priors while adding stability t ..."
Abstract

Cited by 43 (0 self)
 Add to MetaCart
(Show Context)
Prior work has shown that features which appear to be biologically plausible as well as empirically useful can be found by sparse coding with a prior such as a laplacian (L1) that promotes sparsity. We show how smoother priors can preserve the benefits of these sparse priors while adding stability to the Maximum APosteriori (MAP) estimate that makes it more useful for prediction problems. Additionally, we show how to calculate the derivative of the MAP estimate efficiently with implicit differentiation. One prior that can be differentiated this way is KLregularization. We demonstrate its effectiveness on a wide variety of applications, and find that online optimization of the parameters of the KLregularized model can significantly improve prediction performance. 1
A Probabilistic Framework for Relational Clustering
 KDD'07
"... Relational clustering has attracted more and more attention due to its phenomenal impact in various important applications which involve multitype interrelated data objects, such as Web mining, search marketing, bioinformatics, citation analysis, and epidemiology. In this paper, we propose a probab ..."
Abstract

Cited by 41 (1 self)
 Add to MetaCart
Relational clustering has attracted more and more attention due to its phenomenal impact in various important applications which involve multitype interrelated data objects, such as Web mining, search marketing, bioinformatics, citation analysis, and epidemiology. In this paper, we propose a probabilistic model for relational clustering, which also provides a principal framework to unify various important clustering tasks including traditional attributesbased clustering, semisupervised clustering, coclustering and graph clustering. The proposed model seeks to identify cluster structures for each type of data objects and interaction patterns between different types of objects. Under this model, we propose parametric hard and soft relational clustering algorithms under a large number of exponential family distributions. The algorithms are applicable to relational data of various structures and at the same time unifies a number of statoftheart clustering algorithms: coclustering algorithms, the kpartite graph clustering, and semisupervised clustering based on hidden Markov random fields.
A General Model for Multiple View Unsupervised Learning
, 2008
"... Multiple view data, which have multiple representations from different feature spaces or graph spaces, arise in various data mining applications such as information retrieval, bioinformatics and social network analysis. Since different representations could have very different statistical properties ..."
Abstract

Cited by 41 (2 self)
 Add to MetaCart
Multiple view data, which have multiple representations from different feature spaces or graph spaces, arise in various data mining applications such as information retrieval, bioinformatics and social network analysis. Since different representations could have very different statistical properties, how to learn a consensus pattern from multiple representations is a challenging problem. In this paper, we propose a general model for multiple view unsupervised learning. The proposed model introduces the concept of mapping function to make the different patterns from different pattern spaces comparable and hence an optimal pattern can be learned from the multiple patterns of multiple representations. Under this model, we formulate two specific models for
Information, Divergence and Risk for Binary Experiments
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2009
"... We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all ..."
Abstract

Cited by 41 (8 self)
 Add to MetaCart
We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all are related to costsensitive binary classification. As well as developing relationships between generative and discriminative views of learning, the new machinery leads to tight and more general surrogate regret bounds and generalised Pinsker inequalities relating fdivergences to variational divergence. The new viewpoint also illuminates existing algorithms: it provides a new derivation of Support Vector Machines in terms of divergences and relates Maximum Mean Discrepancy to Fisher Linear Discriminants.