Results 21  30
of
526
DirectionOptimizing BreadthFirst Search
"... Abstract—BreadthFirst Search is an important kernel used by many graphprocessing applications. In many of these emerging applications of BFS, such as analyzing social networks, the input graphs are lowdiameter and scalefree. We propose a hybrid approach that is advantageous for lowdiameter grap ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
(Show Context)
Abstract—BreadthFirst Search is an important kernel used by many graphprocessing applications. In many of these emerging applications of BFS, such as analyzing social networks, the input graphs are lowdiameter and scalefree. We propose a hybrid approach that is advantageous for lowdiameter graphs, which combines a conventional topdown algorithm along with a novel bottomup algorithm. The bottomup algorithm can dramatically reduce the number of edges examined, which in turn accelerates the search as a whole. On a multisocket server, our hybrid approach demonstrates speedups of 3.3–7.8 on a range of standard synthetic graphs and speedups of 2.4–4.6 on graphs from real social networks when compared to a strong baseline. We also typically double the performance of prior leading shared memory (multicore and GPU) implementations. I.
A workefficient parallel breadthfirst search algorithm (or how to cope with the nondeterminism of reducers
 In SPAA ’10: Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures
, 2010
"... We have developed a multithreaded implementation of breadthfirst search (BFS) of a sparse graph using the Cilk++ extensions to C++. Our PBFS program on a single processor runs as quickly as a standard C++ breadthfirst search implementation. PBFS achieves high workefficiency by using a novel imple ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
(Show Context)
We have developed a multithreaded implementation of breadthfirst search (BFS) of a sparse graph using the Cilk++ extensions to C++. Our PBFS program on a single processor runs as quickly as a standard C++ breadthfirst search implementation. PBFS achieves high workefficiency by using a novel implementation of a multiset data structure, called a “bag, ” in place of the FIFO queue usually employed in serial breadthfirst search algorithms. For a variety of benchmark input graphs whose diameters are significantly smaller than the number of vertices — a condition met by many realworld graphs — PBFS demonstrates good speedup with the number of processing cores. Since PBFS employs a nonconstanttime “reducer ” — a “hyperobject” feature of Cilk++ — the work inherent in a PBFS execution depends nondeterministically on how the underlying workstealing scheduler loadbalances the computation. We provide a general method for analyzing nondeterministic programs that use reducers. PBFS also is nondeterministic in that it contains benign races which affect its performance but not its correctness. Fixing these races with mutualexclusion locks slows down PBFS empirically, but it makes the algorithm amenable to analysis. In particular, we show that for a graph G =(V,E) with diameter D and bounded outdegree, this dataracefree version of PBFS algorithm runs in time O((V + E)/P + Dlg3 (V /D)) on P processors, which means that it attains nearperfect linear speedup if P ≪ (V + E)/Dlg3 (V /D).
GMap: Visualizing Graphs and Clusters as Maps
, 2009
"... Information visualization is essential in making sense out of large data sets. Often, highdimensional data are visualized as a collection of points in 2dimensional space through dimensionality reduction techniques. However, these traditional methods often do not capture well the underlying structu ..."
Abstract

Cited by 34 (22 self)
 Add to MetaCart
(Show Context)
Information visualization is essential in making sense out of large data sets. Often, highdimensional data are visualized as a collection of points in 2dimensional space through dimensionality reduction techniques. However, these traditional methods often do not capture well the underlying structural information, clustering, and neighborhoods. In this paper, we describe GMap, a practical tool for visualizing relational data with geographiclike maps. We illustrate the effectiveness of this approach with examples from several domains. All the maps referenced in this paper can be found in www.research.att.com/˜yifanhu/GMap.
A finegrain hypergraph model for 2D decomposition of sparse matrices
 in: Proceedings of the 15th International Parallel and Distributed Processing Symposium, 2001, p. 118. C. Aykanat
"... We propose a new hypergraph model for the decomposition of irregular computational domains. This work focuses on the decomposition of sparse matrices for parallel matrixvector multiplication. However, the proposed model can also be used to decompose computational domains of other parallel reduction ..."
Abstract

Cited by 33 (9 self)
 Add to MetaCart
(Show Context)
We propose a new hypergraph model for the decomposition of irregular computational domains. This work focuses on the decomposition of sparse matrices for parallel matrixvector multiplication. However, the proposed model can also be used to decompose computational domains of other parallel reduction problems. We propose a “finegrain” hypergraph model for twodimensional decomposition of sparse matrices. In the proposed finegrain hypergraph model, vertices represent nonzeros and hyperedges represent sparsity patterns of rows and columns of the matrix. By partitioning the finegrain hypergraph into equally weighted vertex parts (processors) so that hyperedges are split among as few processors as possible, the model correctly minimizes communication volume while maintaining computationalload balance. Experimental results on a wide range of realistic sparse matrices confirm the validity of the proposed model, by achieving up to 50 percent better decompositionsthan the existing models, in terms of totalcommunication volume. 1
Multilevel preconditioners constructed from inversebased ILUs
, 2004
"... This paper analyzes dropping strategies in a multilevel incomplete LU decomposition context and presents a few of strategies for obtaining related ILUs with enhanced robustness. The analysis shows that the Incomplete LU factorization resulting from dropping small entries in Gaussian elimination prod ..."
Abstract

Cited by 32 (9 self)
 Add to MetaCart
(Show Context)
This paper analyzes dropping strategies in a multilevel incomplete LU decomposition context and presents a few of strategies for obtaining related ILUs with enhanced robustness. The analysis shows that the Incomplete LU factorization resulting from dropping small entries in Gaussian elimination produces a good preconditioner when the inverses of these factors have norms that are not too large. As a consequence a few strategies are developed whose goal is to achieve this feature. A number of “templates” for enabling implementations of these factorizations are presented. Numerical experiments show that the resulting ILUs offer a good compromise between robustness and efficiency.
Engineering a Scalable High Quality Graph Partitioner
 24TH IEEE INTERNATIONAL PARALLAL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS
, 2010
"... We describe an approach to parallel graph partitioning that scales to hundreds of processors and produces a high solution quality. For example, for many instances from Walshaw’s benchmark collection we improve the best known partitioning. We use the well known framework of multilevel graph partiti ..."
Abstract

Cited by 29 (17 self)
 Add to MetaCart
We describe an approach to parallel graph partitioning that scales to hundreds of processors and produces a high solution quality. For example, for many instances from Walshaw’s benchmark collection we improve the best known partitioning. We use the well known framework of multilevel graph partitioning. All components are implemented by scalable parallel algorithms. Quality improvements compared to previous systems are due to better prioritization of edges to be contracted, better approximation algorithms for identifying matchings, better local search heuristics, and perhaps most notably, a parallelization of the FM local search algorithm that works more locally than previous approaches.
Engineering Multilevel Graph Partitioning Algorithms
"... We present a multilevel graph partitioning algorithm using novel local improvement algorithms and global search strategies transferred from multigrid linear solvers. Local improvement algorithms are based on maxflow mincut computations and more localized FM searches. By combining these technique ..."
Abstract

Cited by 28 (14 self)
 Add to MetaCart
We present a multilevel graph partitioning algorithm using novel local improvement algorithms and global search strategies transferred from multigrid linear solvers. Local improvement algorithms are based on maxflow mincut computations and more localized FM searches. By combining these techniques, we obtain an algorithm that is fast on the one hand and on the other hand is able to improve the best known partitioning results for many inputs. For example, in Walshaw’s well known benchmark tables we achieve 317 improvements for the tables at 1%, 3 % and 5 % imbalance. Moreover, in 118 out of the 295 remaining cases we have been able to reproduce the best cut in this benchmark.
COMPUTING THE ACTION OF THE MATRIX EXPONENTIAL, WITH AN APPLICATION TO EXPONENTIAL INTEGRATORS
, 2010
"... A new algorithm is developed for computing etAB, where A is an n × n matrix and B is n×n0 with n0 ≪ n. The algorithm works for any A, its computational cost is dominated by the formation of products of A with n × n0 matrices, and the only input parameter is a backward error tolerance. The algorithm ..."
Abstract

Cited by 28 (9 self)
 Add to MetaCart
(Show Context)
A new algorithm is developed for computing etAB, where A is an n × n matrix and B is n×n0 with n0 ≪ n. The algorithm works for any A, its computational cost is dominated by the formation of products of A with n × n0 matrices, and the only input parameter is a backward error tolerance. The algorithm can return a single matrix etAB or a sequence etkAB on an equally spaced grid of points tk. It uses the scaling part of the scaling and squaring method together with a truncated Taylor series approximation to the exponential. It determines the amount of scaling and the Taylor degree using the recent analysis of AlMohy and Higham [SIAM J. Matrix Anal. Appl. 31 (2009), pp. 970989], which provides sharp truncation error bounds expressed in terms of the quantities ‖Ak‖1/k for a few values of k, where the norms are estimated using a matrix norm estimator. Shifting and balancing are used as preprocessing steps to reduce the cost of the algorithm. Numerical experiments show that the algorithm performs in a numerically stable fashion across a wide range of problems, and analysis of rounding errors and of the conditioning of the problem provides theoretical support. Experimental comparisons with two Krylovbased MATLAB codes show the new algorithm to be sometimes much superior in terms of computational cost and accuracy. An important application of the algorithm is to exponential integrators for ordinary differential equations. It is shown that the sums of the form ∑p k=0 ϕk(A)uk that arise in exponential integrators, where the ϕk are related to the exponential function, can be expressed in terms of a single exponential of a matrix of dimension n + p built by augmenting A with additional rows and columns, and the algorithm of this paper can therefore be employed.
NEARLY OPTIMAL PRECONDITIONED METHODS FOR HERMITIAN EIGENPROBLEMS UNDER LIMITED MEMORY. PART II: SEEKING MANY EIGENVALUES
, 2006
"... In a recent companion paper, we proposed two methods, GD+k and JDQMR, as nearly optimal methods for finding one eigenpair of a real symmetric matrix. In this paper, we seek nearly optimal methods for a large number, nev, of eigenpairs, that work with a search space whose size is O(1), independent ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
In a recent companion paper, we proposed two methods, GD+k and JDQMR, as nearly optimal methods for finding one eigenpair of a real symmetric matrix. In this paper, we seek nearly optimal methods for a large number, nev, of eigenpairs, that work with a search space whose size is O(1), independent from nev. The motivation is twofold: avoid the additional O(nevN) storage, and the O(nev 2 N) iteration costs. First, we provide an analysis of the oblique projectors required in the JacobiDavidson method, and we identify ways to avoid them during the inner iterations, either completely, or partially. Second, we develop a comprehensive set of performance models for GD+k, JacobiDavidson type methods, and ARPACK. Based both on theoretical arguments and on our models we argue that any eigenmethod with O(1) basis size, preconditioned or not, will be superseded asymptotically by Lanczos type methods that use O(nev) vectors in the basis. However, this may not happen until nev> O(1000). Third, we perform an extensive set of experiments with our methods and against other stateoftheart software that validate our models, and confirm our GD+k and JDQMR methods as nearly optimal within the class of O(1) basis size methods.
On ApproximateInverse Preconditioners
, 1995
"... We investigate the use of sparse approximateinverse preconditioners for the iterative solution of unsymmetric linear systems of equations. Such methods are of particular interest because of the considerable scope for parallelization. We propose a number of enhancements which may improve their perfo ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
We investigate the use of sparse approximateinverse preconditioners for the iterative solution of unsymmetric linear systems of equations. Such methods are of particular interest because of the considerable scope for parallelization. We propose a number of enhancements which may improve their performance. When run in a sequential environment, these methods can perform unfavourably when compared with other techniques. However, they can be successful when other methods fail and simulations indicate that they can be competitive when considered in a parallel environment. 1 Current reports available by anonymous ftp from joyousgard.cc.rl.ac.uk (internet 130.246.9.91) in the directory "pub/reports". Computing and Information Systems Department, Atlas Centre, Rutherford Appleton Laboratory, Oxfordshire OX11 0QX, England. June 23, 1995. 1 INTRODUCTION 1 1 Introduction Suppose that A is a real n by n unsymmetric matrix, whose columns are a j , 1 j n. We are principally concerned wit...