Results 1 - 10
of
112
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract
-
Cited by 567 (15 self)
- Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacian-based methods in a statistical setting.
Computing communities in large networks using random walks
- J. of Graph Alg. and App. bf
, 2004
"... Dense subgraphs of sparse graphs (communities), which appear in most real-world complex networks, play an important role in many contexts. Computing them however is generally expensive. We propose here a measure of similarities between vertices based on random walks which has several important advan ..."
Abstract
-
Cited by 224 (3 self)
- Add to MetaCart
Dense subgraphs of sparse graphs (communities), which appear in most real-world complex networks, play an important role in many contexts. Computing them however is generally expensive. We propose here a measure of similarities between vertices based on random walks which has several important advantages: it captures well the community structure in a network, it can be computed efficiently, and it can be used in an agglomerative algorithm to compute efficiently the community structure of a network. We propose such an algorithm, called Walktrap, which runs in time O(mn 2) and space O(n 2) in the worst case, and in time O(n 2 log n) and space O(n 2) in most real-world cases (n and m are respectively the number of vertices and edges in the input graph). Extensive comparison tests show that our algorithm surpasses previously proposed ones concerning the quality of the obtained community structures and that it stands among the best ones concerning the running time.
From graph to manifold Laplacian: The convergence rate
, 2006
"... The convergence of the discrete graph Laplacian to the continuous manifold Laplacian in the limit of sample size N →∞ while the kernel bandwidth ε → 0, is the justification for the success of Laplacian based algorithms in machine learning, such as dimensionality reduction, semi-supervised learning a ..."
Abstract
-
Cited by 67 (8 self)
- Add to MetaCart
The convergence of the discrete graph Laplacian to the continuous manifold Laplacian in the limit of sample size N →∞ while the kernel bandwidth ε → 0, is the justification for the success of Laplacian based algorithms in machine learning, such as dimensionality reduction, semi-supervised learning and spectral clustering. In this paper we improve the convergence rate of the variance term recently obtained by Hein et al. [From graphs to manifolds—Weak and strong pointwise consistency of graph
Unsupervised Co-Segmentation of a Set of Shapes via Descriptor-Space Spectral Clustering
"... We introduce an algorithm for unsupervised co-segmentation of a set of shapes so as to reveal the semantic shape parts and establish their correspondence across the set. The input set may exhibit significant shape variability where the shapes do not admit proper spatial alignment and the correspondi ..."
Abstract
-
Cited by 55 (11 self)
- Add to MetaCart
We introduce an algorithm for unsupervised co-segmentation of a set of shapes so as to reveal the semantic shape parts and establish their correspondence across the set. The input set may exhibit significant shape variability where the shapes do not admit proper spatial alignment and the corresponding parts in any pair of shapes may be geometrically dissimilar. Our algorithm can handle such challenging input sets since, first, we perform co-analysis in a descriptor space, where a combination of shape descriptors relates the parts independently of their pose, location, and cardinality. Secondly, we exploit a key enabling feature of the input set, namely, dissimilar parts may be “linked ” through third-parties present in the set. The links are derived from the pairwise similarities between the parts ’ descriptors. To reveal such linkages, which may manifest themselves as anisotropic and non-linear structures in the descriptor space, we perform spectral clustering with the aid of diffusion maps. We show that with our approach, we are able to co-segment sets of shapes that possess significant variability, achieving results that are close to those of a supervised approach. Keywords: Co-segmentation, shape correspondence, spectral clustering, diffusion maps. Links: DL PDF 1
Clustering and Embedding using Commute Times
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
"... This paper exploits the properties of the commute time between nodes of a graph for the purposes of clustering and embedding, and explores its applications to image segmentation and multi-body motion tracking. Our starting point is the lazy random walk on the graph, which is determined by the heatke ..."
Abstract
-
Cited by 55 (4 self)
- Add to MetaCart
(Show Context)
This paper exploits the properties of the commute time between nodes of a graph for the purposes of clustering and embedding, and explores its applications to image segmentation and multi-body motion tracking. Our starting point is the lazy random walk on the graph, which is determined by the heatkernel of the graph and can be computed from the spectrum of the graph Laplacian. We characterize the random walk using the commute time (i.e. the expected time taken for a random walk to travel between two nodes and return) and show how this quantity may be computed from the Laplacian spectrum using the discrete Green’s function. Our motivation is that the commute time can be anticipated to be a more robust measure of the proximity of data than the raw proximity matrix. In this paper, we explore two applications of the commute time. The first is to develop a method for image segmentation using the eigenvector corresponding to the smallest eigenvalue of the commute time matrix. We show that our commute time segmentation method has the property of enhancing the intra-group coherence while weakening inter-group coherence and is superior to the normalized cut. The second application is to develop a robust multi-body motion tracking method using an embedding based on the commute time. Our embedding procedure preserves commute time, and is closely akin to kernel PCA, the Laplacian eigenmap and the diffusion map. We illustrate the results both on synthetic image sequences and real world video sequences, and compare our results with several alternative methods.
Symmetry in 3D Geometry: Extraction and Applications
, 2012
"... The concept of symmetry has received significant attention in computer graphics and computer vision research in recent years. Numerous methods have been proposed to find and extract geometric symmetries and exploit such high-level structural information for a wide variety of geometry processing task ..."
Abstract
-
Cited by 29 (7 self)
- Add to MetaCart
The concept of symmetry has received significant attention in computer graphics and computer vision research in recent years. Numerous methods have been proposed to find and extract geometric symmetries and exploit such high-level structural information for a wide variety of geometry processing tasks. This report surveys and classifies recent developments in symmetry detection. We focus on elucidating the similarities and differences between existing methods to gain a better understanding of a fundamental problem in digital geometry processing and shape understanding in general. We discuss a variety of applications in computer graphics and geometry that benefit from symmetry information for more effective processing. An analysis of the strengths and limitations of existing algorithms highlights the plenitude of opportunities for future research both in terms of theory and applications.
An experimental investigation of graph kernels on a collaborative recommendation task
- Proceedings of the 6th International Conference on Data Mining (ICDM 2006
, 2006
"... This paper presents a survey as well as a systematic empirical comparison of seven graph kernels and two related similarity matrices (simply referred to as graph kernels), namely the exponential diffusion kernel, the Laplacian exponential diffusion kernel, the von Neumann diffusion kernel, the regul ..."
Abstract
-
Cited by 27 (7 self)
- Add to MetaCart
(Show Context)
This paper presents a survey as well as a systematic empirical comparison of seven graph kernels and two related similarity matrices (simply referred to as graph kernels), namely the exponential diffusion kernel, the Laplacian exponential diffusion kernel, the von Neumann diffusion kernel, the regularized Laplacian kernel, the commute-time kernel, the random-walk-with-restart similarity matrix, and finally, three graph kernels introduced in this paper: the regularized commute-time kernel, the Markov diffusion kernel, and the cross-entropy diffusion matrix. The kernel-on-a-graph approach is simple and intuitive. It is illustrated by applying the nine graph kernels to a collaborative-recommendation task and to a semisupervised classification task, both on several databases. The graph methods compute proximity measures between nodes that help study the structure of the graph. Our comparisons suggest that the regularized commute-time and the Markov diffusion kernels perform best, closely followed by the regularized Laplacian kernel. 1
Quality assessment of dimensionality reduction: Rank-based criteria
- NEUROCOMPUTING
, 2009
"... ..."
Fundamental Limitations of Spectral Clustering
- in Advanced in Neural Information Processing Systems 19, B. Schölkopf and
, 2007
"... Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
(Show Context)
Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the first few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these findings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure finds the expected clusters at all scales.
A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances
- in Proceedings of the 14th SIGKDD International Conference on Knowledge Discovery and Data Mining
"... This work introduces a new family of link-based dissimilarity measures between nodes of a weighted directed graph. This measure, called the randomized shortest-path (RSP) dissimilarity, depends on a parameter θ and has the interesting property of reducing, on one end, to the standard shortest-path d ..."
Abstract
-
Cited by 25 (11 self)
- Add to MetaCart
(Show Context)
This work introduces a new family of link-based dissimilarity measures between nodes of a weighted directed graph. This measure, called the randomized shortest-path (RSP) dissimilarity, depends on a parameter θ and has the interesting property of reducing, on one end, to the standard shortest-path distance when θ is large and, on the other end, to the commute-time (or resistance) distance when θ is small (near zero). Intuitively, it corresponds to the expected cost incurred by a random walker in order to reach a destination node from a starting node while maintaining a constant entropy (related to θ) spread in the graph. The parameter θ is therefore biasing gradually the simple random walk on the graph towards the shortest-path policy. By adopting a statistical physics approach and computing a sum over all the possible paths (discrete path integral), it is shown that the RSP dissimilarity from every node to a particular node of interest can be computed efficiently by solving two linear systems of n equations, where n is the number of nodes. On the other hand, the dissimilarity between every couple of nodes is obtained by inverting an n × n matrix. The proposed measure can be used for various graph mining tasks such as computing betweenness centrality, finding dense communities, etc, as shown in the experimental section.