Results 1  10
of
156
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract

Cited by 572 (15 self)
 Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacianbased methods in a statistical setting.
Spectral hashing
, 2009
"... Semantic hashing [1] seeks compact binary codes of datapoints so that the Hamming distance between codewords correlates with semantic similarity. In this paper, we show that the problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be sho ..."
Abstract

Cited by 284 (4 self)
 Add to MetaCart
(Show Context)
Semantic hashing [1] seeks compact binary codes of datapoints so that the Hamming distance between codewords correlates with semantic similarity. In this paper, we show that the problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be shown to be NP hard. By relaxing the original problem, we obtain a spectral method whose solutions are simply a subset of thresholded eigenvectors of the graph Laplacian. By utilizing recent results on convergence of graph Laplacian eigenvectors to the LaplaceBeltrami eigenfunctions of manifolds, we show how to efficiently calculate the code of a novel datapoint. Taken together, both learning the code and applying it to a novel point are extremely simple. Our experiments show that our codes outperform the stateofthe art.
SPECTRAL CLUSTERING AND THE HIGHDIMENSIONAL STOCHASTIC BLOCKMODEL
 SUBMITTED TO THE ANNALS OF STATISTICS
"... Networks or graphs can easily represent a diverse set of data sources that are characterized by interacting units or actors. Social networks, representing people who communicate with each other, are one example. Communities or clusters of highly connected actors form an essential feature in the stru ..."
Abstract

Cited by 105 (7 self)
 Add to MetaCart
Networks or graphs can easily represent a diverse set of data sources that are characterized by interacting units or actors. Social networks, representing people who communicate with each other, are one example. Communities or clusters of highly connected actors form an essential feature in the structure of several empirical networks. Spectral clustering is a popular and computationally feasible method to discover these communities. The Stochastic Blockmodel (Holland, Laskey and Leinhardt, 1983) is a social network model with well defined communities; each node is a member of one community. For a network generated from the Stochastic Blockmodel, we bound the number of nodes “misclustered” by spectral clustering. The asymptotic results in this paper are the first clustering results that allow the number of clusters in the model to grow with the number of nodes, hence the name highdimensional. In order to study spectral clustering under the Stochastic Blockmodel, we first show that under the more general latent space model, the eigenvectors of the normalized graph Laplacian asymptotically converge to the eigenvectors of a “population” normalized graph Laplacian. Aside from the implication for spectral clustering, this provides insight into a graph visualization technique. Our method of studying the eigenvectors of random matrices is original.
Protovalue functions: A laplacian framework for learning representation and control in markov decision processes
 Journal of Machine Learning Research
, 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract

Cited by 92 (10 self)
 Add to MetaCart
(Show Context)
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called protovalue functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A threephased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using leastsquares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for outofsample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.
Diffusion Maps, Spectral Clustering and Reaction
 Applied and Computational Harmonic Analysis: Special issue on Diffusion Maps and Wavelets
, 2006
"... A central problem in data analysis is the low dimensional representation of high dimensional data, and the concise description of its underlying geometry and density. In the analysis of large scale simulations of complex dynamical systems, where the notion of time evolution comes into play, importan ..."
Abstract

Cited by 90 (14 self)
 Add to MetaCart
A central problem in data analysis is the low dimensional representation of high dimensional data, and the concise description of its underlying geometry and density. In the analysis of large scale simulations of complex dynamical systems, where the notion of time evolution comes into play, important problems are the identification of slow variables and dynamically meaningful reaction coordinates that capture the long time evolution of the system. In this paper we provide a unifying view of these apparently different tasks, by considering a family of di#usion maps, defined as the embedding of complex (high dimensional) data onto a low dimensional Euclidian space, via the eigenvectors of suitably defined random walks defined on the given datasets. Assuming that the data is randomly sampled from an underlying general probability distribution p(x) = e U(x) , we show that as the number of samples goes to infinity, the eigenvectors of each di#usion map converge to the eigenfunctions of a corresponding di#erential operator defined on the support of the probability distribution. Di#erent normalizations of the Markov chain on the graph lead to di#erent limiting di#erential operators.
Wavelets on graphs via spectral graph theory
, 2009
"... We propose a novel method for constructing wavelet transforms of functions defined on the vertices of an arbitrary finite weighted graph. Our approach is based on defining scaling using the the graph analogue of the Fourier domain, namely the spectral decomposition of the discrete graph Laplacian L. ..."
Abstract

Cited by 90 (8 self)
 Add to MetaCart
We propose a novel method for constructing wavelet transforms of functions defined on the vertices of an arbitrary finite weighted graph. Our approach is based on defining scaling using the the graph analogue of the Fourier domain, namely the spectral decomposition of the discrete graph Laplacian L. Given a wavelet generating kernel g and a scale parameter t, we define the scaled wavelet operator T t g = g(tL). The spectral graph wavelets are then formed by localizing this operator by applying it to an indicator function. Subject to an admissibility condition on g, this procedure defines an invertible transform. We explore the localization properties of the wavelets in the limit of fine scales. Additionally, we present a fast Chebyshev polynomial approximation algorithm for computing the transform that avoids the need for diagonalizing L. We highlight potential applications of the transform through examples of wavelets on graphs corresponding to a variety of different problem domains.
Semisupervised learning in gigantic image collections
 In Advances in Neural Information Processing Systems 22
, 2009
"... With the advent of the Internet it is now possible to collect hundreds of millions of images. These images come with varying degrees of label information. “Clean labels” can be manually obtained on a small fraction, “noisy labels ” may be extracted automatically from surrounding text, while for mo ..."
Abstract

Cited by 76 (3 self)
 Add to MetaCart
(Show Context)
With the advent of the Internet it is now possible to collect hundreds of millions of images. These images come with varying degrees of label information. “Clean labels” can be manually obtained on a small fraction, “noisy labels ” may be extracted automatically from surrounding text, while for most images there are no labels at all. Semisupervised learning is a principled framework for combining these different label sources. However, it scales polynomially with the number of images, making it impractical for use on gigantic collections with hundreds of millions of images and thousands of classes. In this paper we show how to utilize recent results in machine learning to obtain highly efficient approximations for semisupervised learning. Specifically, we use the convergence of the eigenvectors of the normalized graph Laplacian to eigenfunctions of weighted LaplaceBeltrami operators. We combine this with a label sharing framework obtained from Wordnet to propagate label information to classes lacking manual annotations. Our algorithm enables us to apply semisupervised learning to a database of 80 million images with 74 thousand classes. 1.
Graph construction and bmatching for semisupervised learning
 In International Conference on Machine Learning
"... Graph based semisupervised learning (SSL) methods play an increasingly important role in practical machine learning systems. A crucial step in graph based SSL methods is the conversion of data into a weighted graph. However, most of the SSL literature focuses on developing label inference algorithm ..."
Abstract

Cited by 65 (13 self)
 Add to MetaCart
(Show Context)
Graph based semisupervised learning (SSL) methods play an increasingly important role in practical machine learning systems. A crucial step in graph based SSL methods is the conversion of data into a weighted graph. However, most of the SSL literature focuses on developing label inference algorithms without extensively studying the graph building method and its effect on performance. This article provides an empirical study of leading semisupervised methods under a wide range of graph construction algorithms. These SSL inference algorithms include the Local and Global Consistency (LGC) method, the Gaussian Random Field (GRF) method, the Graph Transduction via Alternating Minimization (GTAM) method as well as other techniques. Several approaches for graph construction, sparsification and weighting are explored including the popular knearest neighbors method (kNN) and the bmatching method. As opposed to the greedily constructed kNN graph, the bmatched graph ensures each node in the graph has the same number of edges and produces a balanced or regular graph. Experimental results on both artificial data and real benchmark datasets indicate that bmatching produces more robust graphs and therefore provides significantly better prediction accuracy without any significant change in computation time.
From graph to manifold Laplacian: The convergence rate
, 2006
"... The convergence of the discrete graph Laplacian to the continuous manifold Laplacian in the limit of sample size N →∞ while the kernel bandwidth ε → 0, is the justification for the success of Laplacian based algorithms in machine learning, such as dimensionality reduction, semisupervised learning a ..."
Abstract

Cited by 61 (8 self)
 Add to MetaCart
The convergence of the discrete graph Laplacian to the continuous manifold Laplacian in the limit of sample size N →∞ while the kernel bandwidth ε → 0, is the justification for the success of Laplacian based algorithms in machine learning, such as dimensionality reduction, semisupervised learning and spectral clustering. In this paper we improve the convergence rate of the variance term recently obtained by Hein et al. [From graphs to manifolds—Weak and strong pointwise consistency of graph
Data fusion and multicue data matching by diffusion maps
 IEEE Transactions on Pattern Analysis and Machine Intelligence
"... Abstract—Data fusion and multicue data matching are fundamental tasks of highdimensional data analysis. In this paper, we apply the recently introduced diffusion framework to address these tasks. Our contribution is threefold: First, we present the LaplaceBeltrami approach for computing density i ..."
Abstract

Cited by 57 (5 self)
 Add to MetaCart
(Show Context)
Abstract—Data fusion and multicue data matching are fundamental tasks of highdimensional data analysis. In this paper, we apply the recently introduced diffusion framework to address these tasks. Our contribution is threefold: First, we present the LaplaceBeltrami approach for computing density invariant embeddings which are essential for integrating different sources of data. Second, we describe a refinement of the Nyström extension algorithm called “geometric harmonics. ” We also explain how to use this tool for data assimilation. Finally, we introduce a multicue data matching scheme based on nonlinear spectral graphs alignment. The effectiveness of the presented schemes is validated by applying it to the problems of lipreading and image sequence alignment. Index Terms—Pattern matching, graph theory, graph algorithms, Markov processes, machine learning, data mining, image databases. Ç 1