Results 1  10
of
196
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract

Cited by 572 (15 self)
 Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacianbased methods in a statistical setting.
On Clusterings: Good, Bad and Spectral
, 2003
"... We motivate and develop a natural bicriteria measure for assessing the quality of a clustering which avoids the drawbacks of existing measures. A simple recursive heuristic is shown to have polylogarithmic worstcase guarantees under the new measure. The main result of the paper is the analysis of ..."
Abstract

Cited by 332 (11 self)
 Add to MetaCart
We motivate and develop a natural bicriteria measure for assessing the quality of a clustering which avoids the drawbacks of existing measures. A simple recursive heuristic is shown to have polylogarithmic worstcase guarantees under the new measure. The main result of the paper is the analysis of a popular spectral algorithm. One variant of spectral clustering turns out to have effective worstcase guarantees; another finds a "good" clustering, if one exists.
Statistical properties of community structure in large social and information networks
"... A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structur ..."
Abstract

Cited by 246 (14 self)
 Add to MetaCart
(Show Context)
A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse realworld networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large realworld networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in ” with the rest of the network and thus become less “communitylike.” This behavior is not explained, even at a qualitative level, by any of the commonlyused network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are wellembeddable in a lowdimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a network community structure similar to our observations.
Community structure in large networks: Natural cluster sizes and the absence of large welldefined clusters
, 2008
"... A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins wit ..."
Abstract

Cited by 208 (17 self)
 Add to MetaCart
(Show Context)
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins with the premise that a community or a cluster should be thought of as a set of nodes that has more and/or better connections between its members than to the remainder of the network. In this paper, we explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. Rather than defining a procedure to extract sets of nodes from a graph and then attempt to interpret these sets as a “real ” communities, we employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales. We study over 100 large realworld networks, ranging from traditional and online social networks, to technological and information networks and
Local Graph Partitioning using Pagerank Vectors.
 In Proc. of IEEE FoCS,
, 2006
"... ..."
(Show Context)
Spectral Partitioning of Random Graphs
, 2001
"... Problems such as bisection, graph coloring, and clique are generally believed hard in the worst case. However, they can be solved if the input data is drawn randomly from a distribution over graphs containing acceptable solutions. In this paper we show that a simple spectral algorithm can solve all ..."
Abstract

Cited by 165 (3 self)
 Add to MetaCart
Problems such as bisection, graph coloring, and clique are generally believed hard in the worst case. However, they can be solved if the input data is drawn randomly from a distribution over graphs containing acceptable solutions. In this paper we show that a simple spectral algorithm can solve all three problems above in the average case, as well as a more general problem of partitioning graphs based on edge density. In nearly all cases our approach meets or exceeds previous parameters, while introducing substantial generality. We apply spectral techniques, using foremost the observation that in all of these problems, the expected adjacency matrix is a low rank matrix wherein the structure of the solution is evident.
Towards a Theoretical Foundation for LaplacianBased Manifold Methods
, 2007
"... In recent years manifold methods have attracted a considerable amount of attention in machine learning. However most algorithms in that class may be termed “manifoldmotivated” as they lack any explicit theoretical guarantees. In this paper we take a step towards closing the gap between theory and p ..."
Abstract

Cited by 156 (12 self)
 Add to MetaCart
(Show Context)
In recent years manifold methods have attracted a considerable amount of attention in machine learning. However most algorithms in that class may be termed “manifoldmotivated” as they lack any explicit theoretical guarantees. In this paper we take a step towards closing the gap between theory and practice for a class of Laplacianbased manifold methods. These methods utilize the graph Laplacian associated to a data set for a variety of applications in semisupervised learning, clustering, data representation. We show that under certain conditions the graph Laplacian of a point cloud of data samples converges to the LaplaceBeltrami operator on the underlying manifold. Theorem 3.1 contains the first result showing convergence of a random graph Laplacian to the manifold Laplacian in the context of machine learning.
Graph mining: laws, generators, and algorithms
 ACM COMPUT SURV (CSUR
, 2006
"... How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M: N relation in ..."
Abstract

Cited by 132 (7 self)
 Add to MetaCart
How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M: N relation in database terminology can be represented as a graph. A lot of these questions boil down to the following: “How can we generate synthetic but realistic graphs? ” To answer this, we must first understand what patterns are common in realworld graphs and can thus be considered a mark of normality/realism. This survey give an overview of the incredible variety of work that has been done on these problems. One of our main contributions is the integration of points of view from physics, mathematics, sociology, and computer science. Further, we briefly describe recent advances on some related and interesting graph problems.
Some Applications of Laplace Eigenvalues of Graphs
 GRAPH SYMMETRY: ALGEBRAIC METHODS AND APPLICATIONS, VOLUME 497 OF NATO ASI SERIES C
, 1997
"... In the last decade important relations between Laplace eigenvalues and eigenvectors of graphs and several other graph parameters were discovered. In these notes we present some of these results and discuss their consequences. Attention is given to the partition and the isoperimetric properties of ..."
Abstract

Cited by 129 (0 self)
 Add to MetaCart
In the last decade important relations between Laplace eigenvalues and eigenvectors of graphs and several other graph parameters were discovered. In these notes we present some of these results and discuss their consequences. Attention is given to the partition and the isoperimetric properties of graphs, the maxcut problem and its relation to semidefinite programming, rapid mixing of Markov chains, and to extensions of the results to infinite graphs.