Results 1  10
of
25
GraphX: Graph Processing in a Distributed Dataflow Framework
 USENIX ASSOCIATION 11TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI ’14)
, 2014
"... In pursuit of graph processing performance, the systems community has largely abandoned generalpurpose distributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming abstractions and accelerate the execution of iterative graph algorithms. In thi ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
In pursuit of graph processing performance, the systems community has largely abandoned generalpurpose distributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming abstractions and accelerate the execution of iterative graph algorithms. In this paper we argue that many of the advantages of specialized graph processing systems can be recovered in a modern generalpurpose distributed dataflow system. We introduce GraphX, an embedded graph processing framework built on top of Apache Spark, a widely used distributed dataflow system. GraphX presents a familiar composable graph abstraction that is sufficient to express existing graph APIs, yet can be implemented using only a few basic dataflow operators (e.g., join, map, groupby). To achieve performance parity with specialized graph systems, GraphX recasts graphspecific optimizations as distributed join optimizations and materialized view maintenance. By leveraging advances in distributed dataflow frameworks, GraphX brings lowcost fault tolerance to graph processing. We evaluate GraphX on real workloads and demonstrate that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of specialized graph processing systems while enabling a wider range of computation.
FENNEL: Streaming Graph Partitioning for Massive Scale Graphs
"... Balanced graph partitioning in the streaming setting is a key problem to enable scalable and efficient computations on massive graph data such as web graphs, knowledge graphs, and graphs arising in the context of online social networks. Two families of heuristics for graph partitioning in the stream ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Balanced graph partitioning in the streaming setting is a key problem to enable scalable and efficient computations on massive graph data such as web graphs, knowledge graphs, and graphs arising in the context of online social networks. Two families of heuristics for graph partitioning in the streaming setting are in wide use: place the newly arrived vertex in the cluster with the largest number of neighbors or in the cluster with the least number of nonneighbors. In this work, we introduce a framework which unifies the two seemingly orthogonal heuristics and allows us to quantify the interpolation between them. More generally, the framework enables a well principled design of scalable, streaming graph partitioning algorithms that are amenable to distributed implementations. We derive a novel onepass, streaming graph partitioning algorithm and show that it yields significant performance improvements over previous approaches using an extensive set of realworld and synthetic graphs. Surprisingly, despite the fact that our algorithm is a onepass streaming algorithm, we found its performance to be in many cases comparable to the defacto standard offline software METIS and in some cases even superiror. For instance, for the Twitter graph with more than 1.4 billion of edges, our method partitions the graph in about 40 minutes achieving a balanced partition that cuts as few as 6.8 % of edges, whereas it took more than 8 1 hours by METIS to 2 produce a balanced partition that cuts 11.98 % of edges. We also demonstrate the performance gains by using our graph partitioner while solving standard PageRank computation in a graph processing platform with respect to the communication cost and runtime.
Restreaming graph partitioning: Simple versatile algorithms for advanced balancing
 In ACM KDD
, 2013
"... Partitioning large graphs is difficult, especially when performed in the limited models of computation afforded to modern large scale computing systems. In this work we introduce restreaming graph partitioning and develop algorithms that scale similarly to streaming partitioning algorithms yet empi ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
Partitioning large graphs is difficult, especially when performed in the limited models of computation afforded to modern large scale computing systems. In this work we introduce restreaming graph partitioning and develop algorithms that scale similarly to streaming partitioning algorithms yet empirically perform as well as fully offline algorithms. In streaming partitioning, graphs are partitioned serially in a single pass. Restreaming partitioning is motivated by scenarios where approximately the same dataset is routinely streamed, making it possible to transform streaming partitioning algorithms into an iterative procedure. This combination of simplicity and powerful performance allows restreaming algorithms to be easily adapted to efficiently tackle more challenging partitioning objectives. In particular, we consider the problem of stratified graph partitioning, where each of many node attribute strata are balanced simultaneously. As such, stratified partitioning is well suited for the study of network effects on social networks, where it is desirable to isolate disjoint dense subgraphs with representative user demographics. To demonstrate, we partition a large social network such that each partition exhibits the same degree distribution in the original graph — a novel achievement for nonregular graphs. As part of our results, we also observe a fundamental difference in the ease with which social graphs are partitioned when compared to web graphs. Namely, the modular structure of web graphs appears to motivate full offline optimization, whereas the locally dense structure of social graphs precludes significant gains from global manipulations.
Unicorn: A System for Searching the Social Graph
"... Unicorn is an online, inmemory social graphaware indexing system designed to search trillions of edges between tens of billions of users and entities on thousands of commodity servers. Unicorn is based on standard concepts in information retrieval, but it includes features to promote results with ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Unicorn is an online, inmemory social graphaware indexing system designed to search trillions of edges between tens of billions of users and entities on thousands of commodity servers. Unicorn is based on standard concepts in information retrieval, but it includes features to promote results with good social proximity. It also supports queries that require multiple roundtrips to leaves in order to retrieve objects that are more than one edge away from source nodes. Unicorn is designed to answer billions of queries per day at latencies in the hundreds of milliseconds, and it serves as an infrastructural building block for Facebook’s Graph Search product. In this paper, we describe the data model and query language supported by Unicorn. We also describe its evolution as it became the primary backend for Facebook’s search offerings. 1.
Partitioning Complex Networks via Sizeconstrained Clustering
 in Proceedings of the 13th International Symposium on Experimental Algorithms (SEA’14), ser. LNCS
, 2014
"... Abstract. The most commonly used method to tackle the graph partitioning problem in practice is the multilevel approach. During a coarsening phase, a multilevel graph partitioning algorithm reduces the graph size by iteratively contracting nodes and edges until the graph is small enough to be part ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
(Show Context)
Abstract. The most commonly used method to tackle the graph partitioning problem in practice is the multilevel approach. During a coarsening phase, a multilevel graph partitioning algorithm reduces the graph size by iteratively contracting nodes and edges until the graph is small enough to be partitioned by some other algorithm. A partition of the input graph is then constructed by successively transferring the solution to the next finer graph and applying a local search algorithm to improve the current solution. In this paper, we describe a novel approach to partition graphs effectively especially if the networks have a highly irregular structure. More precisely, our algorithm provides graph coarsening by iteratively contracting sizeconstrained clusterings that are computed using a label propagation algorithm. The same algorithm that provides the sizeconstrained clusterings can also be used during uncoarsening as a fast and simple local search algorithm. Depending on the algorithm’s configuration, we are able to compute partitions of very high quality outperforming all competitors, or partitions that are comparable to the best competitor in terms of quality, hMetis, while being nearly an order of magnitude faster on average. The fastest configuration partitions the largest graph available to us with 3.3 billion edges using a single machine in about ten minutes while cutting less than half of the edges than the fastest competitor, kMetis. 1
Graph cluster randomization: network exposure to multiple universes
, 2013
"... A/B testing is a standard approach for evaluating the effect of online experiments; the goal is to estimate the ‘average treatment effect ’ of a new feature or condition by exposing a sample of the overall population to it. A drawback with A/B testing is that it is poorly suited for experiments in ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
A/B testing is a standard approach for evaluating the effect of online experiments; the goal is to estimate the ‘average treatment effect ’ of a new feature or condition by exposing a sample of the overall population to it. A drawback with A/B testing is that it is poorly suited for experiments involving social interference, when the treatment of individuals spills over to neighboring individuals along an underlying social network. In this work, we propose a novel methodology using graph clustering to analyze average treatment effects under social interference. To begin, we characterize graphtheoretic conditions under which individuals can be considered to be ‘network exposed ’ to an experiment. We then show how graph cluster randomization admits an efficient exact algorithm to compute the probabilities for each vertex being network exposed under several of these exposure conditions. Using these probabilities as inverse weights, a HorvitzThompson estimator can then provide an effect estimate that is unbiased, provided that the exposure model has been properly specified. Given an estimator that is unbiased, we focus on minimizing the variance. First, we develop simple sufficient conditions for the variance of the estimator to be asymptotically small in n, the size of the graph. However, for general randomization schemes, this variance can be lower bounded by an exponential function of the degrees of a graph. In contrast, we show that if a graph satisfies a restrictedgrowth condition on the growth rate of neighborhoods, then there exists a natural clustering algorithm, based on vertex neighborhoods, for which the variance of the estimator can be upper bounded by a linear function of the degrees. Thus we show that proper cluster randomization can lead to exponentially lower estimator variance when experimentally measuring average treatment effects under interference. 1.
Balanced graph edge partition
 KDD
, 2014
"... Abstract Balanced edge partition has emerged as a new approach to partition an input graph data for the purpose of scaling out parallel computations, which is of interest for several modern data analytics computation platforms, including platforms for iterative computations, machine learning probl ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract Balanced edge partition has emerged as a new approach to partition an input graph data for the purpose of scaling out parallel computations, which is of interest for several modern data analytics computation platforms, including platforms for iterative computations, machine learning problems, and graph databases. This new approach stands in a stark contrast to the traditional approach of balanced vertex partition, where for given number of partitions, the problem is to minimize the number of edges cut subject to balancing the vertex cardinality of partitions. In this paper, we first characterize the expected costs of vertex and edge partitions with and without aggregation of messages, for the commonly deployed policy of placing a vertex or an edge uniformly at random to one of the partitions. We then obtain the first approximation algorithms for the balanced edgepartition problem which for the case of no aggregation matches the best known approximation ratio for the balanced vertexpartition problem, and show that this remains to hold for the case with aggregation up to factor that is equal to the maximum indegree of a vertex. We report results of an extensive empirical evaluation on a set of realworld graphs, which quantifies the benefits of edgevs. vertexpartition, and demonstrates efficiency of natural greedy online assignments for the balanced edgepartition problem with and with no aggregation.
Adaptive partitioning for largescale dynamic graphs
 In proc. ICDCS (2014
"... Abstract—In the last years, largescale graph processing has gained increasing attention, with most recent systems placing particular emphasis on latency. One possible technique to improve runtime performance in a distributed graph processing system is to reduce network communication. The most notab ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract—In the last years, largescale graph processing has gained increasing attention, with most recent systems placing particular emphasis on latency. One possible technique to improve runtime performance in a distributed graph processing system is to reduce network communication. The most notable way to achieve this goal is to partition the graph by minimizing the number of edges that connect vertices assigned to different machines, while keeping the load balanced. However, realworld graphs are highly dynamic, with vertices and edges being constantly added and removed. Carefully updating the partitioning of the graph to reflect these changes is necessary to avoid the introduction of an extensive number of cut edges, which would gradually worsen computation performance. In this paper we show that performance degradation in dynamic graph processing systems can be avoided by adapting continuously the graph partitions as the graph changes. We present a novel highly scalable adaptive partitioning strategy, and show a number of refinements that make it work under the constraints of a largescale distributed system. The partitioning strategy is based on iterative vertex migrations, relying only on local information. We have implemented the technique in a graph processing system, and we show through three realworld scenarios how adapting graph partitioning reduces execution time by over 50 % when compared to commonly used hashpartitioning. I.
xdgp: A dynamic graph processing system with adaptive partitioning. arXiv
, 2013
"... Many realworld systems, such as social networks, rely on mining efficiently large graphs, with hundreds of millions of vertices and edges. This volume of information requires partitioning the graph across multiple nodes in a distributed system. This has a deep effect on performance, as traversing ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Many realworld systems, such as social networks, rely on mining efficiently large graphs, with hundreds of millions of vertices and edges. This volume of information requires partitioning the graph across multiple nodes in a distributed system. This has a deep effect on performance, as traversing edges cut between partitions incurs a significant performance penalty due to the cost of communication. Thus, several systems in the literature have attempted to improve computational performance by enhancing graph partitioning, but they do not support another characteristic of realworld graphs: graphs are inherently dynamic, their topology evolves continuously, and subsequently the optimum partitioning also changes over time. In this work, we present the first system that dynamically repartitions massive graphs to adapt to structural changes. The system optimises graph partitioning to prevent performance degradation without using data replication. The system adopts an iterative vertex migration algorithm that relies on local information only, making complex coordination unnecessary. We show how the improvement in graph partitioning reduces execution time by over 50%, while adapting the partitioning to a large number of changes to the graph in three realworld scenarios.
A Node Influence Based Label Propagation Algorithm for Community Detection
 in Networks. The Scientific World Journal, 2014, Article ID: 627581. http://dx.doi.org/10.1155/2014/627581
, 2014
"... Label propagation algorithm (LPA) is an extremely fast community detection method and is widely used in large scale networks. In spite of the advantages of LPA, the issue of its poor stability has not yet been well addressed. We propose a novel node influence based label propagation algorithm for c ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Label propagation algorithm (LPA) is an extremely fast community detection method and is widely used in large scale networks. In spite of the advantages of LPA, the issue of its poor stability has not yet been well addressed. We propose a novel node influence based label propagation algorithm for community detection (NIBLPA), which improves the performance of LPA by improving the node orders of label updating and the mechanism of label choosing when more than one label is contained by the maximum number of nodes. NIBLPA can get more stable results than LPA since it avoids the complete randomness of LPA. The experimental results on both synthetic and real networks demonstrate that NIBLPA maintains the efficiency of the traditional LPA algorithm, and, at the same time, it has a superior performance to some representative methods.