Results 1  10
of
41
A local clustering algorithm for massive graphs and its application to nearlylinear time graph partitioning
, 2013
"... We study the design of local algorithms for massive graphs. A local graph algorithm is one that finds a solution containing or near a given vertex without looking at the whole graph. We present a local clustering algorithm. Our algorithm finds a good cluster—a subset of vertices whose internal conn ..."
Abstract

Cited by 58 (8 self)
 Add to MetaCart
(Show Context)
We study the design of local algorithms for massive graphs. A local graph algorithm is one that finds a solution containing or near a given vertex without looking at the whole graph. We present a local clustering algorithm. Our algorithm finds a good cluster—a subset of vertices whose internal connections are significantly richer than its external connections—near a given vertex. The running time of our algorithm, when it finds a nonempty local cluster, is nearly linear in the size of the cluster it outputs. The running time of our algorithm also depends polylogarithmically on the size of the graph and polynomially on the conductance of the cluster it produces. Our clustering algorithm could be a useful primitive for handling massive graphs, such as social networks and webgraphs. As an application of this clustering algorithm, we present a partitioning algorithm that finds an approximate sparsest cut with nearly optimal balance. Our algorithm takes time nearly linear in the number edges of the graph. Using the partitioning algorithm of this paper, we have designed a nearly linear time algorithm for constructing spectral sparsifiers of graphs, which we in turn use in a nearly linear time algorithm for solving linear systems in symmetric, diagonally dominant matrices. The linear system solver also leads to a nearly linear time algorithm for approximating the secondsmallest eigenvalue and corresponding eigenvector of the Laplacian matrix of a graph. These other results are presented in two companion papers.
Streaming graph partitioning for large distributed graphs
"... Extracting knowledge by performing computations on graphs is becoming increasingly challenging as graphs grow in size. A standard approach distributes the graph over a cluster of nodes, but performing computations on a distributed graph is expensive if large amount of data have to be moved. Without ..."
Abstract

Cited by 48 (2 self)
 Add to MetaCart
(Show Context)
Extracting knowledge by performing computations on graphs is becoming increasingly challenging as graphs grow in size. A standard approach distributes the graph over a cluster of nodes, but performing computations on a distributed graph is expensive if large amount of data have to be moved. Without partitioning the graph, communication quickly becomes a limiting factor in scaling the system up. Existing graph partitioning heuristics incur high computation and communication cost on large graphs, sometimes as high as the future computation itself. Observing that the graph has to be loaded into the cluster, we ask if the partitioning can be done at the same time with a lightweight streaming algorithm. We propose natural, simple heuristics and compare their performance to hashing and METIS, a fast, offline heuristic. We show on a large collection of graph datasets that our heuristics are a significant improvement, with the best obtaining an average gain of 76%. The heuristics are scalable in the size of the graphs and the number of partitions. Using our streaming partitioning methods, we are able to speed up PageRank computations on Spark [32], a distributed computation system, by 18 % to 39 % for large social networks.
Algorithms, Graph Theory, and Linear Equations in Laplacian Matrices
"... Abstract. The Laplacian matrices of graphs are fundamental. In addition to facilitating the application of linear algebra to graph theory, they arise in many practical problems. In this talk we survey recent progress on the design of provably fast algorithms for solving linear equations in the Lapla ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
Abstract. The Laplacian matrices of graphs are fundamental. In addition to facilitating the application of linear algebra to graph theory, they arise in many practical problems. In this talk we survey recent progress on the design of provably fast algorithms for solving linear equations in the Laplacian matrices of graphs. These algorithms motivate and rely upon fascinating primitives in graph theory, including lowstretch spanning trees, graph sparsifiers, ultrasparsifiers, and local graph clustering. These are all connected by a definition of what it means for one graph to approximate another. While this definition is dictated by Numerical Linear Algebra, it proves useful and natural from a graph theoretic perspective.
Fast Approximation Algorithms for Cutbased Problems in Undirected Graphs
"... We present a general method of designing fast approximation algorithms for cutbased minimization problems in undirected graphs. In particular, we develop a technique that given any such problem that can be approximated quickly on trees, allows approximating it almost as quickly on general graphs wh ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
We present a general method of designing fast approximation algorithms for cutbased minimization problems in undirected graphs. In particular, we develop a technique that given any such problem that can be approximated quickly on trees, allows approximating it almost as quickly on general graphs while only losing a polylogarithmic factor in the approximation guarantee. To illustrate the applicability of our paradigm, we focus our attention on the undirected sparsest cut problem with general demands and the balanced separator problem. By a simple use of our framework, we obtain polylogarithmic approximation algorithms for these problems that run in time close to linear. The main tool behind our result is an efficient procedure that decomposes general graphs into simpler ones while approximately preserving the cutflow structure. This decomposition is inspired by the cutbased graph decomposition of Räcke that was developed in the context of oblivious routing schemes, as well as, by the construction of the ultrasparsifiers due to Spielman and Teng that was employed to preconditioning symmetric diagonallydominant matrices. 1
Largescale community detection on youtube for topic discovery and exploration
 in Proc. of the Fifth international AAAI Conference on Weblogs and Social Media
"... Detecting coherent and wellconnected communities inside largescale graphs is an interesting problem that can provide useful insight into the graph structure and individual communities. It can also serve as the basis for content exploration and discovery within the graph. Clustering is a popular te ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
Detecting coherent and wellconnected communities inside largescale graphs is an interesting problem that can provide useful insight into the graph structure and individual communities. It can also serve as the basis for content exploration and discovery within the graph. Clustering is a popular technique for community detection, however, the two main categories of clustering algorithms, i.e, global and local algorithms, have either scalability or usability issues, e.g, global algorithms do not scale well, and local algorithms may cover only a portion of the graph. Such onestage algorithms typically optimize one objective function and do not work well in settings where we need to optimize various coverage, coherence and connectivity metrics. In this paper, we study largescale community detection over a realworld graph composed of millions of YouTube videos. In particular, we present a multistage scalable clustering algorithm, combining a preprocessing stage, a local clustering stage, and a postprocessing stage to generate clusters of YouTube videos with coherent content. We formalize coverage, coherence, and connectivity metrics and evaluate the quality of the proposed multistage clustering algorithms for YouTube videos. We also use extracted entities to attach meaningful labels to our clusters. Our use of local algorithms for global clustering, and its implementation and practical evaluation on such a large scale is a first of its kind.
SoK: The Evolution of Sybil Defense via Social Networks
"... Abstract—Sybil attacks in which an adversary forges a potentially unbounded number of identities are a danger to distributed systems and online social networks. The goal of sybil defense is to accurately identify sybil identities. This paper surveys the evolution of sybil defense protocols that leve ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
Abstract—Sybil attacks in which an adversary forges a potentially unbounded number of identities are a danger to distributed systems and online social networks. The goal of sybil defense is to accurately identify sybil identities. This paper surveys the evolution of sybil defense protocols that leverage the structural properties of the social graph underlying a distributed system to identify sybil identities. We make two main contributions. First, we clarify the deep connection between sybil defense and the theory of random walks. This leads us to identify a community detection algorithm that, for the first time, offers provable guarantees in the context of sybil defense. Second, we advocate a new goal for sybil defense that addresses the more limited, but practically useful, goal of securely whitelisting a local region of the graph. I.
Local Graph Partitions for Approximation and Testing
"... Abstract—We introduce a new tool for approximation and testing algorithms called partitioning oracles. We develop methods for constructing them for any class of boundeddegree graphs with an excluded minor, and in general, for any hyperfinite class of boundeddegree graphs. These oracles utilize onl ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
Abstract—We introduce a new tool for approximation and testing algorithms called partitioning oracles. We develop methods for constructing them for any class of boundeddegree graphs with an excluded minor, and in general, for any hyperfinite class of boundeddegree graphs. These oracles utilize only local computation to consistently answer queries about a global partition that breaks the graph into small connected components by removing only a small fraction of the edges. We illustrate the power of this technique by using it to extend and simplify a number of previous approximation and testing results for sparse graphs, as well as to provide new results that were unachievable with existing techniques. For instance: • We give constanttime approximation algorithms for the size of the minimum vertex cover, the minimum dominating set, and the maximum independent set for any class of graphs with an excluded minor. • We show a simple proof that any minorclosed graph property is testable in constant time in the bounded degree model. • We prove that it is possible to approximate the distance to almost any hereditary property in any bounded degree hereditary families of graphs. Hereditary properties of interest include bipartiteness, kcolorability, and perfectness. 1.
Approximating the exponential, the Lanczos method, and an Õ(m)time spectral algorithm for balanced separator
 IN: PROCEEDINGS OF THE 44TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING (STOC
, 2012
"... We give a novel spectral approximation algorithm for the balanced separator problem that, given a graph G, a constant balance b ∈ (0, 1/2], and a parameter γ, either finds an Ω(b)balanced cut of conductance O ( √ γ) in G, or outputs a certificate that all bbalanced cuts in G have conductance at le ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
We give a novel spectral approximation algorithm for the balanced separator problem that, given a graph G, a constant balance b ∈ (0, 1/2], and a parameter γ, either finds an Ω(b)balanced cut of conductance O ( √ γ) in G, or outputs a certificate that all bbalanced cuts in G have conductance at least γ, and runs in time Õ(m). This settles the question of designing asymptotically optimal spectral algorithms for balanced separator. Our algorithm relies on a variant of the heat kernel random walk and requires, as a subroutine, an algorithm to compute exp(−L)v where L is the Laplacian of a graph related to G and v is a vector. Algorithms for computing the matrixexponentialvector product efficiently comprise our next set of results. Our main result here is a new algorithm which computes a good approximation to exp(−A)v for a class of symmetric positive semidefinite (PSD) matrices A and a given vector u, in time roughly Õ(m A), where m A is the number of nonzero entries of A. This uses, in a nontrivial way, the breakthrough result of Spielman and Teng on inverting symmetric and diagonallydominant matrices in Õ(m A) time. Finally, we prove that e −x can be uniformly approximated up to a small additive error, in a nonnegative interval [a, b] with a polynomial of
A Sharp PageRank Algorithm with Applications to Edge Ranking and Graph Sparsification
, 2010
"... We give an improved algorithm for computing personalized PageRank vectors with tight error bounds which can be as small as Ω(n −p) for any fixed positive integer p. The improved PageRank algorithm is crucial for computing a quantitative ranking of edges in a given graph. We will use the edge rankin ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
(Show Context)
We give an improved algorithm for computing personalized PageRank vectors with tight error bounds which can be as small as Ω(n −p) for any fixed positive integer p. The improved PageRank algorithm is crucial for computing a quantitative ranking of edges in a given graph. We will use the edge ranking to examine two interrelated problems – graph sparsification and graph partitioning. We can combine the graph sparsification and the partitioning algorithms using PageRank vectors to derive an improved partitioning algorithm.