Results 1  10
of
13
Pregel: A system for largescale graph processing
 IN SIGMOD
, 2010
"... Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs—in some cases billions of vertices, trillions of edges—poses challenges to their efficient processing. In this paper we present a computational model ..."
Abstract

Cited by 472 (0 self)
 Add to MetaCart
(Show Context)
Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs—in some cases billions of vertices, trillions of edges—poses challenges to their efficient processing. In this paper we present a computational model suitable for this task. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. This vertexcentric approach is flexible enough to express a broad set of algorithms. The model has been designed for efficient, scalable and faulttolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Distributionrelated details are hidden behind an abstract API. The result is a framework for processing large graphs that is expressive and easy to program.
A GPU Implementation of Inclusionbased Pointsto Analysis ∗
"... Graphics Processing Units (GPUs) have emerged as powerful accelerators for many regular algorithms that operate on dense arrays and matrices. In contrast, we know relatively little about using GPUs to accelerate highly irregular algorithms that operate on pointerbased data structures such as graphs ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
(Show Context)
Graphics Processing Units (GPUs) have emerged as powerful accelerators for many regular algorithms that operate on dense arrays and matrices. In contrast, we know relatively little about using GPUs to accelerate highly irregular algorithms that operate on pointerbased data structures such as graphs. For the most part, research has focused on GPU implementations of graph analysis algorithms that do not modify the structure of the graph, such as algorithms for breadthfirst search and stronglyconnected components. In this paper, we describe a highperformance GPU implementation of an important graph algorithm used in compilers such as gcc and LLVM: Andersenstyle inclusionbased pointsto analysis. This algorithm is challenging to parallelize effectively on GPUs because it makes extensive modifications to the structure of the underlying graph and performs relatively little computation. In spite of this, our program, when executed on a 14 Streaming Multiprocessor GPU, achieves an average speedup of 7x compared to a sequential CPU implementation and outperforms a parallel implementation of the same algorithm running on 16 CPU cores. Our implementation provides general insights into how to produce highperformance GPU implementations of graph algorithms, and it highlights key differences between optimizing parallel programs for multicore CPUs and for GPUs.
Ordered vs. unordered: a comparison of parallelism and workefficiency in irregular algorithms
 In PPoPP
, 2011
"... Outside of computational science, most problems are formulated in terms of irregular data structures such as graphs, trees and sets. Unfortunately, we understand relatively little about the structure of parallelism and locality in irregular algorithms. In this paper, we study several algorithms for ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Outside of computational science, most problems are formulated in terms of irregular data structures such as graphs, trees and sets. Unfortunately, we understand relatively little about the structure of parallelism and locality in irregular algorithms. In this paper, we study several algorithms for four such problems: discreteevent simulation, singlesource shortest path, breadthfirst search, and minimal spanning trees. We show that these algorithms can be classified into two categories that we call unordered and ordered, and demonstrate experimentally that there is a tradeoff between parallelism and work efficiency: unordered algorithms usually have more parallelism than their ordered counterparts for the same problem, but they may also perform more work. Nevertheless, our experimental results show that unordered algorithms typically lead to more scalable implementations, demonstrating that less workefficient irregular algorithms may be better for parallel execution. Categories and Subject Descriptors:
Distributed Memory BreadthFirst Search Revisited: Enabling BottomUp Search
"... Abstract—Breadthfirst search (BFS) is a fundamental graph primitive frequently used as a building block for many complex graph algorithms. In the worst case, the complexity of BFS is linear in the number of edges and vertices, and the conventional topdown approach always takes as much time as the ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Breadthfirst search (BFS) is a fundamental graph primitive frequently used as a building block for many complex graph algorithms. In the worst case, the complexity of BFS is linear in the number of edges and vertices, and the conventional topdown approach always takes as much time as the worst case. A recently discovered bottomup approach manages to cut down the complexity all the way to the number of vertices in the best case, which is typically at least an order of magnitude less than the number of edges. The bottomup approach is not always advantageous, so it is combined with the topdown approach to make the directionoptimizing algorithm which adaptively switches from topdown to bottomup as the frontier expands. We present a scalable distributedmemory parallelization of this challenging algorithm and show up to an order of magnitude speedups compared to an earlier purely topdown code. Our approach also uses a 2D decomposition of the graph that has previously been shown to be superior to a 1D decomposition. Using the default parameters of the Graph500 benchmark, our new algorithm achieves a performance rate of over 240 billion edges per second on 115 thousand cores of a Cray XE6, which makes it over 7 × faster than a conventional topdown algorithm using the same set of optimizations and data distribution. I.
Crunching Large Graphs with Commodity Processors
, 2011
"... Crunching large graphs is the basis of many emerging applications, such as social network analysis and bioinformatics. Graph analytics algorithms exhibit little locality and therefore present significant performance challenges. Hardware multithreading systems (e.g., Cray XMT) show that with enough c ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Crunching large graphs is the basis of many emerging applications, such as social network analysis and bioinformatics. Graph analytics algorithms exhibit little locality and therefore present significant performance challenges. Hardware multithreading systems (e.g., Cray XMT) show that with enough concurrency, we can tolerate long latencies. Unfortunately, this solution is not available with commodity parts. Our goal is to develop a latencytolerant system built out of commodity parts and mostly in software. The proposed system includes a runtime that supports a large number of lightweight contexts, fullbit synchronization and a memory manager that provides a highlatency but highbandwidth global shared memory. This paper lays out the vision for our system and justifies its feasibility with a performance analysis of the runtime for latency tolerance.
Mitigating I/O latency in SSDbased Graph Traversal
, 2012
"... Mining large graphs has now become an important aspect of many applications. Recent interest in low cost graph traversal on single machines has lead to the construction of systems that use solid state drives (SSDs) to store the graph. An SSD can be accessed with far lower latency than magnetic media ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Mining large graphs has now become an important aspect of many applications. Recent interest in low cost graph traversal on single machines has lead to the construction of systems that use solid state drives (SSDs) to store the graph. An SSD can be accessed with far lower latency than magnetic media, while remaining cheaper than main memory. Unfortunately SSDs are slower than main memory and algorithms running on such systems are hampered by large IO latencies when accessing the SSD. In this paper we present two novel techniques to reduce the impact of SSD IO latency on semiexternal memory graph traversal. We introduce a variant of the Compressed Sparse Row (CSR) format that we call Compressed Enumerated Encoded Sparse Offset Row (CEESOR). CEESOR is particularly efficient for graphs with hierarchical structure and can reduce the space required to represent connectivity information by amounts varying from 5 % to as much as 76%. CEESOR allows a larger number of edges to be moved for each unit of IO transfer from the SSD to main memory and more effective use of operating system caches. Our second contribution is a runtime prefetching technique that exploits the ability of solid state drives to service multiple random access requests in parallel. We present a novel Run Along SSD Prefetcher (RASP). RASP is capable of hiding the effect of IO latency in single threaded graph traversal in breadthfirst and shorted path order to the extent that it improves iteration time for large graphs by amounts varying from 2.6X6X.
Open Access
"... Nonparametric generalized belief propagation based on pseudojunction tree for cooperative localization in wireless networks ..."
Abstract
 Add to MetaCart
(Show Context)
Nonparametric generalized belief propagation based on pseudojunction tree for cooperative localization in wireless networks
Declaration
, 2014
"... I Ilias Giechaskiel of Magdalene College, being a candidate for the M.Phil in Advanced Computer Science, hereby declare that this report and the work described in it are my own work, unaided except as may be specified below, and that the report does not contain material that has already been used to ..."
Abstract
 Add to MetaCart
I Ilias Giechaskiel of Magdalene College, being a candidate for the M.Phil in Advanced Computer Science, hereby declare that this report and the work described in it are my own work, unaided except as may be specified below, and that the report does not contain material that has already been used to any substantial extent for a comparable purpose. Total word count: 14,311 (excluding Appendices A and B)