Results 1  10
of
110
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract

Cited by 572 (15 self)
 Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacianbased methods in a statistical setting.
Computing communities in large networks using random walks
 J. of Graph Alg. and App. bf
, 2004
"... Dense subgraphs of sparse graphs (communities), which appear in most realworld complex networks, play an important role in many contexts. Computing them however is generally expensive. We propose here a measure of similarities between vertices based on random walks which has several important advan ..."
Abstract

Cited by 226 (3 self)
 Add to MetaCart
Dense subgraphs of sparse graphs (communities), which appear in most realworld complex networks, play an important role in many contexts. Computing them however is generally expensive. We propose here a measure of similarities between vertices based on random walks which has several important advantages: it captures well the community structure in a network, it can be computed efficiently, and it can be used in an agglomerative algorithm to compute efficiently the community structure of a network. We propose such an algorithm, called Walktrap, which runs in time O(mn 2) and space O(n 2) in the worst case, and in time O(n 2 log n) and space O(n 2) in most realworld cases (n and m are respectively the number of vertices and edges in the input graph). Extensive comparison tests show that our algorithm surpasses previously proposed ones concerning the quality of the obtained community structures and that it stands among the best ones concerning the running time.
Randomwalk computation of similarities between nodes of a graph, with application to collaborative recommendation
 IEEE Transactions on Knowledge and Data Engineering
"... ABSTRACT This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted, undirected, graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average commu ..."
Abstract

Cited by 194 (19 self)
 Add to MetaCart
(Show Context)
ABSTRACT This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted, undirected, graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the "length" of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel (it contains innerproducts closely related to commute times). A procedure for computing the subspace projection of the node vectors of the graph that preserves as much variance as possible in terms of the commutetime distance a principal components analysis (PCA) of the graph is also introduced. This graph PCA provides a nice interpretation to the "Fiedler vector", widely used for graph partitioning. The model is evaluated on a collaborativerecommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacianbased similarities perform well in comparison with other methods. The model, which nicely fits into the socalled "statistical relational learning" framework, could also be used to compute document or word similarities, and, more generally, could be applied to machinelearning and patternrecognition tasks involving a database. * François Fouss, Alain Pirotte and Marco Saerens are with the
From graph to manifold Laplacian: The convergence rate
, 2006
"... The convergence of the discrete graph Laplacian to the continuous manifold Laplacian in the limit of sample size N →∞ while the kernel bandwidth ε → 0, is the justification for the success of Laplacian based algorithms in machine learning, such as dimensionality reduction, semisupervised learning a ..."
Abstract

Cited by 61 (8 self)
 Add to MetaCart
The convergence of the discrete graph Laplacian to the continuous manifold Laplacian in the limit of sample size N →∞ while the kernel bandwidth ε → 0, is the justification for the success of Laplacian based algorithms in machine learning, such as dimensionality reduction, semisupervised learning and spectral clustering. In this paper we improve the convergence rate of the variance term recently obtained by Hein et al. [From graphs to manifolds—Weak and strong pointwise consistency of graph
Clustering and Embedding using Commute Times
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
"... This paper exploits the properties of the commute time between nodes of a graph for the purposes of clustering and embedding, and explores its applications to image segmentation and multibody motion tracking. Our starting point is the lazy random walk on the graph, which is determined by the heatke ..."
Abstract

Cited by 56 (5 self)
 Add to MetaCart
(Show Context)
This paper exploits the properties of the commute time between nodes of a graph for the purposes of clustering and embedding, and explores its applications to image segmentation and multibody motion tracking. Our starting point is the lazy random walk on the graph, which is determined by the heatkernel of the graph and can be computed from the spectrum of the graph Laplacian. We characterize the random walk using the commute time (i.e. the expected time taken for a random walk to travel between two nodes and return) and show how this quantity may be computed from the Laplacian spectrum using the discrete Green’s function. Our motivation is that the commute time can be anticipated to be a more robust measure of the proximity of data than the raw proximity matrix. In this paper, we explore two applications of the commute time. The first is to develop a method for image segmentation using the eigenvector corresponding to the smallest eigenvalue of the commute time matrix. We show that our commute time segmentation method has the property of enhancing the intragroup coherence while weakening intergroup coherence and is superior to the normalized cut. The second application is to develop a robust multibody motion tracking method using an embedding based on the commute time. Our embedding procedure preserves commute time, and is closely akin to kernel PCA, the Laplacian eigenmap and the diffusion map. We illustrate the results both on synthetic image sequences and real world video sequences, and compare our results with several alternative methods.
Unsupervised CoSegmentation of a Set of Shapes via DescriptorSpace Spectral Clustering
"... We introduce an algorithm for unsupervised cosegmentation of a set of shapes so as to reveal the semantic shape parts and establish their correspondence across the set. The input set may exhibit significant shape variability where the shapes do not admit proper spatial alignment and the correspondi ..."
Abstract

Cited by 51 (9 self)
 Add to MetaCart
We introduce an algorithm for unsupervised cosegmentation of a set of shapes so as to reveal the semantic shape parts and establish their correspondence across the set. The input set may exhibit significant shape variability where the shapes do not admit proper spatial alignment and the corresponding parts in any pair of shapes may be geometrically dissimilar. Our algorithm can handle such challenging input sets since, first, we perform coanalysis in a descriptor space, where a combination of shape descriptors relates the parts independently of their pose, location, and cardinality. Secondly, we exploit a key enabling feature of the input set, namely, dissimilar parts may be “linked ” through thirdparties present in the set. The links are derived from the pairwise similarities between the parts ’ descriptors. To reveal such linkages, which may manifest themselves as anisotropic and nonlinear structures in the descriptor space, we perform spectral clustering with the aid of diffusion maps. We show that with our approach, we are able to cosegment sets of shapes that possess significant variability, achieving results that are close to those of a supervised approach. Keywords: Cosegmentation, shape correspondence, spectral clustering, diffusion maps. Links: DL PDF 1
Symmetry in 3D Geometry: Extraction and Applications
, 2012
"... The concept of symmetry has received significant attention in computer graphics and computer vision research in recent years. Numerous methods have been proposed to find and extract geometric symmetries and exploit such highlevel structural information for a wide variety of geometry processing task ..."
Abstract

Cited by 28 (6 self)
 Add to MetaCart
The concept of symmetry has received significant attention in computer graphics and computer vision research in recent years. Numerous methods have been proposed to find and extract geometric symmetries and exploit such highlevel structural information for a wide variety of geometry processing tasks. This report surveys and classifies recent developments in symmetry detection. We focus on elucidating the similarities and differences between existing methods to gain a better understanding of a fundamental problem in digital geometry processing and shape understanding in general. We discuss a variety of applications in computer graphics and geometry that benefit from symmetry information for more effective processing. An analysis of the strengths and limitations of existing algorithms highlights the plenitude of opportunities for future research both in terms of theory and applications.
An experimental investigation of graph kernels on a collaborative recommendation task
 Proceedings of the 6th International Conference on Data Mining (ICDM 2006
, 2006
"... This paper presents a survey as well as a systematic empirical comparison of seven graph kernels and two related similarity matrices (simply referred to as graph kernels), namely the exponential diffusion kernel, the Laplacian exponential diffusion kernel, the von Neumann diffusion kernel, the regul ..."
Abstract

Cited by 27 (7 self)
 Add to MetaCart
(Show Context)
This paper presents a survey as well as a systematic empirical comparison of seven graph kernels and two related similarity matrices (simply referred to as graph kernels), namely the exponential diffusion kernel, the Laplacian exponential diffusion kernel, the von Neumann diffusion kernel, the regularized Laplacian kernel, the commutetime kernel, the randomwalkwithrestart similarity matrix, and finally, three graph kernels introduced in this paper: the regularized commutetime kernel, the Markov diffusion kernel, and the crossentropy diffusion matrix. The kernelonagraph approach is simple and intuitive. It is illustrated by applying the nine graph kernels to a collaborativerecommendation task and to a semisupervised classification task, both on several databases. The graph methods compute proximity measures between nodes that help study the structure of the graph. Our comparisons suggest that the regularized commutetime and the Markov diffusion kernels perform best, closely followed by the regularized Laplacian kernel. 1
Fundamental Limitations of Spectral Clustering
 in Advanced in Neural Information Processing Systems 19, B. Schölkopf and
, 2007
"... Spectral clustering methods are common graphbased approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
(Show Context)
Spectral clustering methods are common graphbased approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these findings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottomup graphbased clustering method, it is scalefree and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure finds the expected clusters at all scales.
Gene regulatory networks: A coarsegrained, equationfree approach to multiscale computation
 J. Chem. Phys
, 2006
"... Abstract: We present computerassisted methods for analyzing stochastic models of gene regulatory networks. The main idea that underlies this equationfree analysis is the design and execution of appropriatelyinitialized short bursts of stochastic simulations; the results of these are processed to ..."
Abstract

Cited by 24 (10 self)
 Add to MetaCart
Abstract: We present computerassisted methods for analyzing stochastic models of gene regulatory networks. The main idea that underlies this equationfree analysis is the design and execution of appropriatelyinitialized short bursts of stochastic simulations; the results of these are processed to estimate coarsegrained quantities of interest, such as mesoscopic transport coefficients. In particular, using a simple model of a genetic toggle switch, we illustrate the computation of an effective free energy Φ and of a statedependent effective diffusion coefficient D that characterize an unavailable effective FokkerPlanck equation. Additionally we illustrate the linking of equationfree techniques with continuation methods for performing a form of stochastic “bifurcation analysis”; estimation of mean switching times in the case of a bistable switch is also implemented in this equationfree context. The accuracy of our methods is tested by direct comparison with longtime stochastic simulations. This type of equationfree analysis appears to be a promising approach to computing features of the longtime, coarsegrained behavior of certain classes of complex stochastic models of gene regulatory networks, circumventing the need for long Monte Carlo simulations. 1