Results 11  20
of
84
Partitioning sparse matrices for parallel preconditioned iterative methods
 SIAM Journal on Scientific Computing
, 2004
"... Abstract. This paper addresses the parallelization of the preconditioned iterative methods that use explicit preconditioners such as approximate inverses. Parallelizing a full step of these methods requires the coefficient and preconditioner matrices to be well partitioned. We first show that differ ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
(Show Context)
Abstract. This paper addresses the parallelization of the preconditioned iterative methods that use explicit preconditioners such as approximate inverses. Parallelizing a full step of these methods requires the coefficient and preconditioner matrices to be well partitioned. We first show that different methods impose different partitioning requirements for the matrices. Then we develop hypergraph models to meet those requirements. In particular, we develop models that enable us to obtain partitionings on the coefficient and preconditioner matrices simultaneously. Experiments on a set of unsymmetric sparse matrices show that the proposed models yield effective partitioning results. A parallel implementation of the right preconditioned BiCGStab method on a PC cluster verifies that the theoretical gains obtained by the models hold in practice.
HYPERGRAPHBASED UNSYMMETRIC NESTED DISSECTION ORDERING FOR SPARSE LU FACTORIZATION
"... Abstract. In this paper we present HUND, a hypergraphbased unsymmetric nested dissection ordering algorithm for reducing the fillin incurred during Gaussian elimination. HUND has several important properties. It takes a global perspective of the entire matrix, as opposed to local heuristics. It ta ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
Abstract. In this paper we present HUND, a hypergraphbased unsymmetric nested dissection ordering algorithm for reducing the fillin incurred during Gaussian elimination. HUND has several important properties. It takes a global perspective of the entire matrix, as opposed to local heuristics. It takes into account the assymetry of the input matrix by using a hypergraph to represent its structure. It is suitable for performing Gaussian elimination in parallel, with partial pivoting. This is possible because the row permutations performed due to partial pivoting do not destroy the column separators identified by the nested dissection approach. Experimental results on 27 medium and large size highly unsymmetric matrices compare HUND to four other wellknown reordering algorithms. The results show that HUND provides a robust reordering algorithm, in the sense that it is the best or close to the best (often within 10%) of all the other methods.
A Parallel Matrix Scaling Algorithm
"... We recently proposed an iterative procedure which asymptotically scales the rows and columns of a given matrix to one in a given norm. In this work, we briefly mention some of the properties of that algorithm and discuss its efficient parallelization. We report on a parallel performance study of ou ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
We recently proposed an iterative procedure which asymptotically scales the rows and columns of a given matrix to one in a given norm. In this work, we briefly mention some of the properties of that algorithm and discuss its efficient parallelization. We report on a parallel performance study of our implementation on a few computing environments.
Parallel Multilevel Algorithms for Hypergraph Partitioning
, 2007
"... In this paper, we present parallel multilevel algorithms for the hypergraph partitioning problem. In particular, we describe schemes for parallel coarsening, parallel greedy kway refinement and parallel multiphase refinement. Using an asymptotic theoretical performance model, we derive the isoeffi ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
In this paper, we present parallel multilevel algorithms for the hypergraph partitioning problem. In particular, we describe schemes for parallel coarsening, parallel greedy kway refinement and parallel multiphase refinement. Using an asymptotic theoretical performance model, we derive the isoefficiency function for our algorithms and hence show that they are technically scalable when the maximum vertex and hyperedge degrees are small. We conduct experiments on hypergraphs from six different application domains to investigate the empirical scalability of our algorithms both in terms of runtime and partition quality. Our findings confirm that the quality of partition produced by our algorithms is stable as the number of processors is increased while being competitive with those produced by a stateoftheart serial multilevel partitioning tool. We also validate our theoretical performance model through an isoefficiency study. Finally, we evaluate the impact of introducing parallel multiphase refinement into our parallel multilevel algorithm in terms of the trade off between improved partition quality and higher runtime cost.
Hypergraph partitioning for faster parallel PageRank computation
 LECTURE NOTES IN COMPUTER SCIENCE 3670
, 2005
"... The PageRank algorithm is used by search engines such as Google to order web pages. It uses an iterative numerical method to compute the maximal eigenvector of a transition matrix derived from the web’s hyperlink structure and a usercentred model of websurfing behaviour. As the web has expanded a ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
The PageRank algorithm is used by search engines such as Google to order web pages. It uses an iterative numerical method to compute the maximal eigenvector of a transition matrix derived from the web’s hyperlink structure and a usercentred model of websurfing behaviour. As the web has expanded and as demand for usertailored web page ordering metrics has grown, scalable parallel computation of PageRank has become a focus of considerable research effort. In this paper, we seek a scalable problem decomposition for parallel PageRank computation, through the use of stateoftheart hypergraphbased partitioning schemes. These have not been previously applied in this context. We consider both one and twodimensional hypergraph decomposition models. Exploiting the recent availability of the Parkway 2.1 parallel hypergraph partitioner, we present empirical results on a gigabit PC cluster for three publicly available web graphs. Our results show that hypergraphbased partitioning substantially reduces communication volume over conventional partitioning schemes (by up to three orders of magnitude), while still maintaining computational load balance. They also show a halving of the periteration runtime cost when compared to the most effective alternative approach used to date.
Multithreaded Clustering for Multilevel Hypergraph Partitioning
 in IPDPS
, 2012
"... Abstract—Requirements for efficient parallelization of many complex and irregular applications can be cast as a hypergraph partitioning problem. The currentstateofthe art software libraries that provide tool support for the hypergraph partitioning problem are designed and implemented before the ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Requirements for efficient parallelization of many complex and irregular applications can be cast as a hypergraph partitioning problem. The currentstateofthe art software libraries that provide tool support for the hypergraph partitioning problem are designed and implemented before the gamechanging advancements in multicore computing. Hence, analyzing the structure of those tools for designing multithreaded versions of the algorithms is a crucial tasks. The most successful partitioning tools are based on the multilevel approach. In this approach, a given hypergraph is coarsened to a much smaller one, a partition is obtained on the the smallest hypergraph, and that partition is projected to the original hypergraph while refining it on the intermediate hypergraphs. The coarsening operation corresponds to clustering the vertices of a hypergraph and is the most time consuming task in a multilevel partitioning tool. We present three efficient multithreaded clustering algorithms which are very suited for multilevel partitioners. We compare their performance with that of the ones currently used in today’s hypergraph partitioners. We show on a large number of real life hypergraphs that our implementations, integrated into a commonly used partitioning library PaToH, achieve good speedups without reducing the clustering quality. KeywordsMultilevel hypergraph partitioning; coarsening; multithreaded clustering algorithms; multicore programming I.
A library for parallel sparse matrixvector multiplies
, 2005
"... We provide parallel matrixvector multiply routines for 1D and 2D partitioned sparse square and rectangular matrices. We clearly give pseudocodes that perform necessary initializations for parallel execution. We show how to maximize overlapping between communication and computation through the pro ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
(Show Context)
We provide parallel matrixvector multiply routines for 1D and 2D partitioned sparse square and rectangular matrices. We clearly give pseudocodes that perform necessary initializations for parallel execution. We show how to maximize overlapping between communication and computation through the proper usage of compressed sparse row and compressed sparse column formats of the sparse matrices. We give pseudocodes for multiplication routines which benefit from such overlaps.
Parallel Greedy Graph Matching using an Edge Partitioning Approach
"... We present a parallel version of the KarpSipser graph matching heuristic for the maximum cardinality problem. It is bulksynchronous, separating computation and communication, and uses an edgebased partitioning of the graph, translated from a twodimensional partitioning of the corresponding adjacen ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
We present a parallel version of the KarpSipser graph matching heuristic for the maximum cardinality problem. It is bulksynchronous, separating computation and communication, and uses an edgebased partitioning of the graph, translated from a twodimensional partitioning of the corresponding adjacency matrix. It is shown that the communication volume of Karp–Sipser graph matching is proportional to that of parallel sparse matrix–vector multiplication (SpMV), so that efficient partitioners developed for SpMV can be used. The algorithm is presented using a small basic set of 7 message types, which are discussed in detail. Experimental results show that for most matrices, edgebased partitioning is superior to vertexbased partitioning, in terms of both parallel speedup and matching quality. Good speedups are obtained on up to 64 processors.