Results 1  10
of
25
Pregel: A system for largescale graph processing
 IN SIGMOD
, 2010
"... Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs—in some cases billions of vertices, trillions of edges—poses challenges to their efficient processing. In this paper we present a computational model ..."
Abstract

Cited by 472 (0 self)
 Add to MetaCart
(Show Context)
Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs—in some cases billions of vertices, trillions of edges—poses challenges to their efficient processing. In this paper we present a computational model suitable for this task. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. This vertexcentric approach is flexible enough to express a broad set of algorithms. The model has been designed for efficient, scalable and faulttolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Distributionrelated details are hidden behind an abstract API. The result is a framework for processing large graphs that is expressive and easy to program.
Scalable SPARQL Querying of Large RDF Graphs
"... The generation of RDF data has accelerated to the point where many data sets need to be partitioned across multiple machines in order to achieve reasonable performance when querying the data. Although tremendous progress has been made in the Semantic Web community for achieving high performance data ..."
Abstract

Cited by 66 (1 self)
 Add to MetaCart
(Show Context)
The generation of RDF data has accelerated to the point where many data sets need to be partitioned across multiple machines in order to achieve reasonable performance when querying the data. Although tremendous progress has been made in the Semantic Web community for achieving high performance data management on a single node, current solutions that allow the data to be partitioned across multiple machines are highly inefficient. In this paper, we introduce a scalable RDF data management system that is up to three orders of magnitude more efficient than popular multinode RDF data management systems. In so doing, we introduce techniques for (1) leveraging stateoftheart single node RDFstore technology (2) partitioning the data across nodes in a manner that helps accelerate query processing through locality optimizations and (3) decomposing SPARQL queries into high performance fragments that take advantage of how data is partitioned in a cluster.
Parallel breadthfirst search on distributed memory systems
, 2011
"... Dataintensive, graphbased computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for BreadthFirst Search (BFS), a key subroutine in several ..."
Abstract

Cited by 34 (9 self)
 Add to MetaCart
(Show Context)
Dataintensive, graphbased computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for BreadthFirst Search (BFS), a key subroutine in several graph algorithms. We present two highlytuned parallel approaches for BFS on large parallel systems: a levelsynchronous strategy that relies on a simple vertexbased partitioning of the graph, and a twodimensional sparse matrix partitioningbased approach that mitigates parallel communication overhead. For both approaches, we also present hybrid versions with intranode multithreading. Our novel hybrid twodimensional algorithm reduces communication times by up to a factor of 3.5, relative to a common vertex based approach. Our experimental study identifies execution regimes in which these approaches will be competitive, and we demonstrate extremely high performance on leading distributedmemory parallel systems. For instance, for a 40,000core parallel execution on Hopper, an AMD MagnyCours based system, we achieve a BFS performance rate of 17.8 billion edge visits per second on an undirected graph of 4.3 billion vertices and 68.7 billion edges with skewed degree distribution. 1.
Bulk Synchronous Parallel ML: Modular Implementation and Performance Prediction
 International Conference on Computational Science, Part II, number 3515 in LNCS
, 2005
"... Abstract. BSML is a library for parallel programming with the functional language Objective Caml. It is based on an extension of the λcalculus by parallel operations on a parallel data structure named parallel vector. The execution time can be estimated, deadlocks and indeterminism are avoided. Pro ..."
Abstract

Cited by 12 (9 self)
 Add to MetaCart
(Show Context)
Abstract. BSML is a library for parallel programming with the functional language Objective Caml. It is based on an extension of the λcalculus by parallel operations on a parallel data structure named parallel vector. The execution time can be estimated, deadlocks and indeterminism are avoided. Programs are written as usual functional programs (in Objective Caml) but using a small set of additional functions. Provided functions are used to access the parameters of the parallel machine and to create and operate on parallel vectors. It follows the execution and cost model of the Bulk Synchronous Parallel model. The paper presents the lastest implementation of this library and experiments of performance prediction. 1
HipG: Parallel Processing of LargeScale Graphs
"... Distributed processing of realworld graphs is challenging duetotheirsizeandtheinherentirregularstructureofgraph computations. We present HipG, a distributed framework that facilitates programming parallel graph algorithms by composing the parallel application automatically from the userdefined pie ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Distributed processing of realworld graphs is challenging duetotheirsizeandtheinherentirregularstructureofgraph computations. We present HipG, a distributed framework that facilitates programming parallel graph algorithms by composing the parallel application automatically from the userdefined pieces of sequential work on graph nodes. To make the user code highlevel, the framework provides a unified interface to executing methods on local and nonlocal graph nodes and an abstraction of exclusive execution. The graph computations are managed by logical objects called synchronizers, which we used, for example, to implement distributed divideandconquer decomposition into strongly connected components. The code written in HipG is independent of a particular graph representation, to the point that the graph can be created onthefly, i.e. by the algorithm that computes on this graph, which we used to implement a distributed model checker. HipG programs are in general short and elegant; they achieve good portability, memory utilization, and performance. 1.
PRO: A Model for the Design and Analysis of Efficient and Scalable Parallel Algorithms
"... We present a new parallel computation model that enables the design of resourceoptimal and scalable parallel algorithms and simplifies their analysis. The model rests on the following novel ideas: it incorporates optimality relative to a specific sequential algorithm as an integral part, and it mea ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
We present a new parallel computation model that enables the design of resourceoptimal and scalable parallel algorithms and simplifies their analysis. The model rests on the following novel ideas: it incorporates optimality relative to a specific sequential algorithm as an integral part, and it measures the quality of a parallel algorithm in terms of granularity. Inspired by the BSP model, an algorithm in the PRO model is organized as a sequence of supersteps. The supersteps are not however required to be separated by synchronization barriers.
The STAPL pArray
 In Proc. of the 2007 Wshop on Memory Performance
, 2007
"... The Standard Template Adaptive Parallel Library (stapl) is a parallel programming framework that extends C++ and STL with support for parallelism. stapl provides parallel data structures (pContainers) and generic parallel algorithms (pAlgorithms), and a methodology for extending them to provide cust ..."
Abstract

Cited by 10 (8 self)
 Add to MetaCart
(Show Context)
The Standard Template Adaptive Parallel Library (stapl) is a parallel programming framework that extends C++ and STL with support for parallelism. stapl provides parallel data structures (pContainers) and generic parallel algorithms (pAlgorithms), and a methodology for extending them to provide customized functionality. stapl pContainers are threadsafe, concurrent objects, i.e., shared objects that provide parallel methods that can be invoked concurrently. They provide views as a generic means to access data that can be passed as input to generic pAlgorithms. In this work, we present the stapl pArray, the parallel equivalent of the sequential stl valarray, a fixedsize data structure optimized for storing and accessing data based on onedimensional indices. We describe the pArray design and show how it can support a variety of underlying data distribution policies currently available in stapl, such as blocked or blocked cyclic. We provide experimental results showing that pAlgorithms using the pArray scale well to more than 2,000 processors. We also provide results using different data distributions that illustrate that the performance of pAlgorithms and pArray methods is usually sensitive to the underlying data distribution, and moreover, that there is no one data distribution that performs best for all pAlgorithms, processor counts, or machines. 1.
A HighLevel Framework for Distributed Processing of LargeScale Graphs
"... Abstract. Distributed processing of realworld graphs is challenging due to their size and the inherent irregular structure of graph computations. We present HIPG, a distributed framework that facilitates highlevel programming of parallel graph algorithms by expressing them as a hierarchy of distri ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Distributed processing of realworld graphs is challenging due to their size and the inherent irregular structure of graph computations. We present HIPG, a distributed framework that facilitates highlevel programming of parallel graph algorithms by expressing them as a hierarchy of distributed computations executed independently and managed by the user. HIPG programs are in general short and elegant; they achieve good portability, memory utilization and performance. 1
Associative parallel containers in STAPL
 In Int. Wshop on Languages and Compilers for Parallel Computing, in LNCS
, 2008
"... Abstract. The Standard Template Adaptive Parallel Library (stapl) is a parallel programming framework that extends C++ and stl with support for parallelism. stapl provides a collection of parallel data structures (pContainers) and algorithms (pAlgorithms) and a generic methodology for extending them ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
(Show Context)
Abstract. The Standard Template Adaptive Parallel Library (stapl) is a parallel programming framework that extends C++ and stl with support for parallelism. stapl provides a collection of parallel data structures (pContainers) and algorithms (pAlgorithms) and a generic methodology for extending them to provide customized functionality. stapl pContainers are threadsafe, concurrent objects, i.e., shared objects that provide parallel methods that can be invoked concurrently. They also provide appropriate interfaces that can be used by generic pAlgorithms. In this work, we present the design and implementation of the stapl associative pContainers: pMap, pSet, pMultiMap, pMultiSet, pHashMap, andpHashSet. These containers provide optimal insert, search, and delete operations for a distributed collection of elements based on keys. Their methods include counterparts of the methods provided by the stl associative containers, and also some asynchronous (nonblocking) variants that can provide improved performance in parallel. We evaluate the performance of the stapl associativepContainers on an IBM Power5 cluster, an IBM Power3 cluster, and on a linuxbased Opteron cluster, and show that the new pContainer asynchronous methods, generic pAlgorithms (e.g., pfind) and a sort application based on associative pContainers, all provide good scalability on more than 10 3 processors. 1
SingleSource Shortest Paths with the Parallel Boost Graph Library
"... The Parallel Boost Graph Library (Parallel BGL) is a library of graph algorithms and data structures for distributedmemory computation on large graphs. Developed with the Generic Programming paradigm, the Parallel BGL is highly customizable, supporting various graph data structures, arbitrary verte ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
The Parallel Boost Graph Library (Parallel BGL) is a library of graph algorithms and data structures for distributedmemory computation on large graphs. Developed with the Generic Programming paradigm, the Parallel BGL is highly customizable, supporting various graph data structures, arbitrary vertex and edge properties, and different communication media. In this paper, we describe the implementation of two parallel variants of Dijkstra’s singlesource shortest paths algorithm in the Parallel BGL. We also provide an experimental evaluation of these implementations using synthetic and realworld benchmark graphs from the 9 th DIMACS Implementation Challenge. 1