Results 1  10
of
66
Streaming graph partitioning for large distributed graphs
"... Extracting knowledge by performing computations on graphs is becoming increasingly challenging as graphs grow in size. A standard approach distributes the graph over a cluster of nodes, but performing computations on a distributed graph is expensive if large amount of data have to be moved. Without ..."
Abstract

Cited by 48 (2 self)
 Add to MetaCart
(Show Context)
Extracting knowledge by performing computations on graphs is becoming increasingly challenging as graphs grow in size. A standard approach distributes the graph over a cluster of nodes, but performing computations on a distributed graph is expensive if large amount of data have to be moved. Without partitioning the graph, communication quickly becomes a limiting factor in scaling the system up. Existing graph partitioning heuristics incur high computation and communication cost on large graphs, sometimes as high as the future computation itself. Observing that the graph has to be loaded into the cluster, we ask if the partitioning can be done at the same time with a lightweight streaming algorithm. We propose natural, simple heuristics and compare their performance to hashing and METIS, a fast, offline heuristic. We show on a large collection of graph datasets that our heuristics are a significant improvement, with the best obtaining an average gain of 76%. The heuristics are scalable in the size of the graphs and the number of partitions. Using our streaming partitioning methods, we are able to speed up PageRank computations on Spark [32], a distributed computation system, by 18 % to 39 % for large social networks.
Partitioning Graphs into Balanced Components
, 2009
"... We consider the kbalanced partitioning problem, where the goal is to partition the vertices of an input graph G into k equally sized components, while minimizing the total weight of the edges connecting different components. We allow k to be part of the input and denote the cardinality of the verte ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
We consider the kbalanced partitioning problem, where the goal is to partition the vertices of an input graph G into k equally sized components, while minimizing the total weight of the edges connecting different components. We allow k to be part of the input and denote the cardinality of the vertex set by n. This problem is a natural and important generalization of wellknown graph partitioning problems, including minimum bisection and minimum balanced cut. We present a (bicriteria) approximation algorithm achieving an approximation of O ( √ log n log k), which matches or improves over previous algorithms for all relevant values of k. Our algorithm uses a semidefinite relaxation which combines ℓ 2 2 metrics with spreading metrics. Surprisingly, we show that the integrality gap of the semidefinite relaxation is Ω(log k) even for large values of k (e.g., k = n Ω(1)), implying that the dependence on k of the approximation factor is necessary. This is in contrast to previous approximation algorithms for kbalanced partitioning, which are based on linear programming relaxations and their approximation factor is independent of k.
Minmax graph partitioning and small set expansion
, 2011
"... We study graph partitioning problems from a minmax perspective, in which an input graph on n vertices should be partitioned into k parts, and the objective is to minimize the maximum number of edges leaving a single part. The two main versions we consider are: (i) the k parts need to be of equal s ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
We study graph partitioning problems from a minmax perspective, in which an input graph on n vertices should be partitioned into k parts, and the objective is to minimize the maximum number of edges leaving a single part. The two main versions we consider are: (i) the k parts need to be of equal size, and (ii) the parts must separate a set of k given terminals. We consider a common generalization of these two problems, and design for it an O ( √ log n log k)approximation algorithm. This improves over an O(log 2 n) approximation for the second version due to Svitkina and Tardos [22], and roughly O(k log n) approximation for the first version that follows from other previous work. We also give an improved O(1)approximation algorithm for graphs that exclude any fixed minor. Our algorithm uses a new procedure for solving the SmallSet Expansion problem. In this problem, we are given a graph G and the goal is to find a nonempty set S ⊆ V of size S  ≤ ρn with minimum edgeexpansion. We give an O ( √ log n log (1/ρ)) bicriteria approximation algorithm for the general case of SmallSet Expansion, and O(1) approximation algorithm for graphs that exclude any fixed minor.
Channel adaptive quantization for limited feedback MIMO beamforming systems
 IEEE TRANS. ON SIG. PROC
, 2005
"... Multipleinput multipleoutput (MIMO) wireless systems can achieve significant diversity and array gain by using transmit beamforming and receive combining techniques. In the absence of full channel knowledge at the transmitter, the transmit beamforming vector can be quantized at the receiver and se ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
Multipleinput multipleoutput (MIMO) wireless systems can achieve significant diversity and array gain by using transmit beamforming and receive combining techniques. In the absence of full channel knowledge at the transmitter, the transmit beamforming vector can be quantized at the receiver and sent to the transmitter using a lowrate feedback channel. In the literature, quantization algorithms for the beamforming vector are designed and optimized for a particular channel distribution, commonly the uncorrelated Rayleigh distribution. When the channel is not uncorrelated Rayleigh, however, these quantization strategies result in a degradation of the receive signaltonoise ratio. In this paper, switched codebook quantization is proposed where the codebook is dynamically chosen based on the channel distribution. The codebook adaptation enables the quantization to exploit the spatial and temporal correlation inherent in the channel. The convergence properties of the codebook selection algorithm are studied assuming a blockstationary model for the channel. In the case of a nonstationary channel, it is shown using simulations that the selected codebook tracks the distribution of the channel resulting in improvements in signaltonoise ratio. Simulation results show that in the case of correlated channels, the SNR performance of the link can be significantly improved by adaptation, compared to nonadaptive quantization strategies designed for uncorrelated Rayleigh fading channels.
Balanced Partitions of Trees and Applications ∗
"... We study the kBALANCED PARTITIONING problem in which the vertices of a graph are to be partitioned into k sets of size at most ⌈n/k ⌉ while minimising the cut size, which is the number of edges connecting vertices in different sets. The problem is well studied for general graphs, for which it canno ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
We study the kBALANCED PARTITIONING problem in which the vertices of a graph are to be partitioned into k sets of size at most ⌈n/k ⌉ while minimising the cut size, which is the number of edges connecting vertices in different sets. The problem is well studied for general graphs, for which it cannot be approximated within any factor in polynomial time. However, little is known about restricted graph classes. We show that for trees kBALANCED PARTITIONING remains surprisingly hard. In particular, approximating the cut size is APXhard even if the maximum degree of the tree is constant. If instead the diameter of the tree is bounded by a constant, we show that it is NPhard to approximate the cut size within nc, for any constant c < 1. In the face of the hardness results, we show that allowing nearbalanced solutions, in which there are at most (1+ε)⌈n/k ⌉ vertices in any of the k sets, admits a PTAS for trees. Remarkably, the computed cut size is no larger than that of an optimal balanced solution. In the final section of our paper, we harness results on embedding graph metrics into tree metrics to extend our PTAS for trees to general graphs. In addition to being conceptually simpler and easier to analyse, our scheme improves the best factor known on the cut size of nearbalanced solutions from O(log 1.5 (n)/ε 2) [Andreev and Räcke TCS 2006] to O(log n), for weighted graphs. This also settles a question posed by Andreev and Räcke of whether an algorithm with approximation guarantees on the cut size independent from ε exists.
Fast algorithms for maximal clique enumeration with limited memory
 In Proceedings of the ACM SIGKDD international
, 2012
"... Maximal clique enumeration (MCE) is a longstanding problem in graph theory and has numerous important applications. Though extensively studied, most existing algorithms become impractical when the input graph is too large and is diskresident. We first propose an efficient partitionbased algorith ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
(Show Context)
Maximal clique enumeration (MCE) is a longstanding problem in graph theory and has numerous important applications. Though extensively studied, most existing algorithms become impractical when the input graph is too large and is diskresident. We first propose an efficient partitionbased algorithm for MCE that addresses the problem of processing large graphs with limited memory. We then further reduce the high cost of CPU computation of MCE by a careful nested partition based on a cost model. Finally, we parallelize our algorithm to further reduce the overall running time. We verified the efficiency of our algorithms by experiments in large realworld graphs.
Towards HighThroughput Gibbs Sampling at Scale: A Study across Storage Managers
"... Factor graphs and Gibbs sampling are a popular combination for Bayesian statistical methods that are used to solve diverse problems including insurance risk models, pricing models, and information extraction. Given a fixed sampling method and a fixed amount of time, an implementation of a sampler th ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Factor graphs and Gibbs sampling are a popular combination for Bayesian statistical methods that are used to solve diverse problems including insurance risk models, pricing models, and information extraction. Given a fixed sampling method and a fixed amount of time, an implementation of a sampler that achieves a higher throughput of samples will achieve a higher quality than a lowerthroughput sampler. We study how (and whether) traditional data processing choices about materialization, page layout, and bufferreplacement policy need to be changed to achieve highthroughput Gibbs sampling for factor graphs that are larger than main memory. We find that both new theoretical and new algorithmic techniques are required to understand the tradeoff space for each choice. On both real and synthetic data, we demonstrate that traditional baseline approaches may achieve two orders of magnitude lower throughput than an optimal approach. For a handful of popular tasks across several storage backends, including HBase and traditional unix files, we show that our simple prototype achieves competitive (and sometimes better) throughput compared to specialized stateoftheart approaches on factor graphs that are larger than main memory.
An Experimental Comparison of Pregellike Graph Processing Systems∗
"... The introduction of Google’s Pregel generated much interest in the field of largescale graph data processing, inspiring the development of Pregellike systems such as Apache Giraph, GPS, Mizan, and GraphLab, all of which have appeared in the past two years. To gain an understanding of how Pregel ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
The introduction of Google’s Pregel generated much interest in the field of largescale graph data processing, inspiring the development of Pregellike systems such as Apache Giraph, GPS, Mizan, and GraphLab, all of which have appeared in the past two years. To gain an understanding of how Pregellike systems perform, we conduct a study to experimentally compare Giraph, GPS, Mizan, and GraphLab on equal ground by considering graph and algorithm agnostic optimizations and by using several metrics. The systems are compared with four different algorithms (PageRank, single source shortest path, weakly connected components, and distributed minimum spanning tree) on up to 128 Amazon EC2 machines. We find that the system optimizations present in Giraph and GraphLab allow them to perform well. Our evaluation also shows Giraph 1.0.0’s considerable improvement since Giraph 0.1 and identifies areas of improvement for all systems. 1.
Recent advances in graph partitioning
 Arxiv
"... Abstract. We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions. 1 ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract. We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions. 1
Efficient Data Partitioning Model for Heterogeneous Graphs
 in the Cloud. In ACM/IEEE SC
, 2013
"... As the size and variety of information networks continue to grow in many scientific and engineering domains, we witness a growing demand for efficient processing of large heterogeneous graphs using a cluster of compute nodes in the Cloud. One open issue is how to effectively partition a large grap ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
As the size and variety of information networks continue to grow in many scientific and engineering domains, we witness a growing demand for efficient processing of large heterogeneous graphs using a cluster of compute nodes in the Cloud. One open issue is how to effectively partition a large graph to process complex graph operations efficiently. In this paper, we present VBPartitioner − a distributed data partitioning model and algorithms for efficient processing of graph operations over largescale graphs in the Cloud. Our VBPartitioner has three salient features. First, it introduces vertex blocks (VBs) and extended vertex blocks (EVBs) as the building blocks for semantic partitioning of large graphs. Second, VBPartitioner utilizes vertex block grouping algorithms to place those vertex blocks that have high correlation in graph structure into the same partition. Third, VBPartitioner employs a VBpartition guided query partitioning model to speed up the parallel processing of graph pattern queries by reducing the amount of interpartition query processing. We conduct extensive experiments on several realworld graphs with millions of vertices and billions of edges. Our results show that VBPartitioner significantly outperforms the popular random blockbased data partitioner in terms of query latency and scalability over largescale graphs.