Results 1  10
of
33
On twodimensional sparse matrix partitioning: Models, methods, and a recipe
 SIAM J. SCI. COMPUT
, 2010
"... We consider twodimensional partitioning of general sparse matrices for parallel sparse matrixvector multiply operation. We present three hypergraphpartitioningbased methods, each having unique advantages. The first one treats the nonzeros of the matrix individually and hence produces finegrain ..."
Abstract

Cited by 35 (18 self)
 Add to MetaCart
(Show Context)
We consider twodimensional partitioning of general sparse matrices for parallel sparse matrixvector multiply operation. We present three hypergraphpartitioningbased methods, each having unique advantages. The first one treats the nonzeros of the matrix individually and hence produces finegrain partitions. The other two produce coarser partitions, where one of them imposes a limit on the number of messages sent and received by a single processor, and the other trades that limit for a lower communication volume. We also present a thorough experimental evaluation of the proposed twodimensional partitioning methods together with the hypergraphbased onedimensional partitioning methods, using an extensive set of public domain matrices. Furthermore, for the users of these partitioning methods, we present a partitioning recipe that chooses one of the partitioning methods according to some matrix characteristics.
Parallel sparse matrixvector and matrixtransposevector multiplication using compressed sparse blocks
 IN SPAA
, 2009
"... This paper introduces a storage format for sparse matrices, called compressed sparse blocks (CSB), which allows both Ax and A T x to be computed efficiently in parallel, where A is an n × n sparse matrix with nnz ≥ n nonzeros and x is a dense nvector. Our algorithms use Θ(nnz) work (serial running ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
(Show Context)
This paper introduces a storage format for sparse matrices, called compressed sparse blocks (CSB), which allows both Ax and A T x to be computed efficiently in parallel, where A is an n × n sparse matrix with nnz ≥ n nonzeros and x is a dense nvector. Our algorithms use Θ(nnz) work (serial running time) and Θ ( √ nlgn) span (criticalpath length), yielding a parallelism of Θ(nnz / √ nlgn), which is amply high for virtually any large matrix. The storage requirement for CSB is esssentially the same as that for the morestandard compressedsparserows (CSR) format, for which computing Ax in parallel is easy but A T x is difficult. Benchmark results indicate that on one processor, the CSB algorithms for Ax and A T x run just as fast as the CSR algorithm for Ax, but the CSB algorithms also scale up linearly with processors until limited by offchip memory bandwidth.
Multilevel direct Kway hypergraph partitioning with multiple constraints and fixed vertices
, 2007
"... ..."
Revisiting hypergraph models for sparse matrix partitioning
 SIAM Review
, 2007
"... Abstract. We provide an exposition of hypergraph models for parallelizing sparse matrixvector multiplies. Our aim is to emphasize the expressive power of hypergraph models. First, we set forth an elementary hypergraph model for the parallel matrixvector multiply based on onedimensional (1D) matri ..."
Abstract

Cited by 20 (11 self)
 Add to MetaCart
(Show Context)
Abstract. We provide an exposition of hypergraph models for parallelizing sparse matrixvector multiplies. Our aim is to emphasize the expressive power of hypergraph models. First, we set forth an elementary hypergraph model for the parallel matrixvector multiply based on onedimensional (1D) matrix partitioning. In the elementary model, the vertices represent the data of a matrixvector multiply, and the nets encode dependencies among the data. We then apply a recently proposed hypergraph transformation operation to devise models for 1Dsparse matrix partitioning. The resulting 1Dpartitioning models are equivalent to the previously proposed computational hypergraph models and are not meant to be replacements for them. Nevertheless, the new models give us insights into the previous ones and help us explain a subtle requirement, known as the consistency condition, of hypergraph partitioning models. Later, we demonstrate the flexibility of the elementary model on a few 1Dpartitioning problems that are hard to solve using the previously proposed models. We also discuss extensions of the proposed elementary model to twodimensional matrix partitioning. Key words. parallel computing, sparse matrixvector multiply, hypergraph models
Cacheoblivious sparse matrixvector multiplication by using sparse matrix partitioning methods
 SIAM Journal on Scientific Computing
, 2009
"... Abstract. In this article, we introduce a cacheoblivious method for sparse matrix–vector multiplication. Our method attempts to permute the rows and columns of the input matrix using a recursive hypergraphbased sparse matrix partitioning scheme so that the resulting matrix induces cachefriendly b ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
(Show Context)
Abstract. In this article, we introduce a cacheoblivious method for sparse matrix–vector multiplication. Our method attempts to permute the rows and columns of the input matrix using a recursive hypergraphbased sparse matrix partitioning scheme so that the resulting matrix induces cachefriendly behavior during sparse matrix–vector multiplication. Matrices are assumed to be stored in rowmajor format, by means of the compressed row storage (CRS) or its variants incremental CRS and zigzag CRS. The zigzag CRS data structure is shown to fit well with the hypergraph metric used in partitioning sparse matrices for the purpose of parallel computation. The separated blockdiagonal (SBD) form is shown to be the appropriate matrix structure for cache enhancement. We have implemented a runtime cache simulation library enabling us to analyze cache behavior for arbitrary matrices and arbitrary cache properties during matrix–vector multiplication within a kway setassociative idealized cache model. The results of these simulations are then verified by actual experiments run on various cache architectures. In all these experiments, we use the Mondriaan sparse matrix partitioner in onedimensional mode. The savings in computation time achieved by our matrix reorderings reach up to 50 percent, in the case of a large link matrix.
HYPERGRAPH PARTITIONINGBASED FILLREDUCING ORDERING
, 2009
"... A typical first step of a direct solver for linear system Mx = b is reordering of symmetric matrix M to improve execution time and space requirements of the solution process. In this work, we propose a novel nesteddissectionbased ordering approach that utilizes hypergraph partitioning. Our approac ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
A typical first step of a direct solver for linear system Mx = b is reordering of symmetric matrix M to improve execution time and space requirements of the solution process. In this work, we propose a novel nesteddissectionbased ordering approach that utilizes hypergraph partitioning. Our approach is based on formulation of graph partitioning by vertex separator (GPVS) problem as a hypergraph partitioning problem. This new formulation is immune to deficiency of GPVS in a multilevel framework hence enables better orderings. In matrix terms, our method relies on the existence of a structural factorization of the input M matrix in the form of M = AAT (or M = AD2AT). We show that the partitioning of the rownet hypergraph representation of rectangular matrix A induces a GPVS of the standard graph representation of matrix M. In the absence of such factorization, we also propose simple, yet effective structural factorization techniques that are based on finding an edge clique cover of the standard graph representation of matrix M, and hence applicable to any arbitrary symmetric matrix M. Our experimental evaluation has shown that the proposed method achieves better ordering in comparison to stateoftheart graphbased ordering tools even for symmetric matrices where structural M = AAT factorization is not provided as an input. For matrices coming from linear programming problems, our method enables even faster and better orderings.
A Parallel Matrix Scaling Algorithm
"... We recently proposed an iterative procedure which asymptotically scales the rows and columns of a given matrix to one in a given norm. In this work, we briefly mention some of the properties of that algorithm and discuss its efficient parallelization. We report on a parallel performance study of ou ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
We recently proposed an iterative procedure which asymptotically scales the rows and columns of a given matrix to one in a given norm. In this work, we briefly mention some of the properties of that algorithm and discuss its efficient parallelization. We report on a parallel performance study of our implementation on a few computing environments.
Parallel Multilevel Algorithms for Hypergraph Partitioning
, 2007
"... In this paper, we present parallel multilevel algorithms for the hypergraph partitioning problem. In particular, we describe schemes for parallel coarsening, parallel greedy kway refinement and parallel multiphase refinement. Using an asymptotic theoretical performance model, we derive the isoeffi ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
In this paper, we present parallel multilevel algorithms for the hypergraph partitioning problem. In particular, we describe schemes for parallel coarsening, parallel greedy kway refinement and parallel multiphase refinement. Using an asymptotic theoretical performance model, we derive the isoefficiency function for our algorithms and hence show that they are technically scalable when the maximum vertex and hyperedge degrees are small. We conduct experiments on hypergraphs from six different application domains to investigate the empirical scalability of our algorithms both in terms of runtime and partition quality. Our findings confirm that the quality of partition produced by our algorithms is stable as the number of processors is increased while being competitive with those produced by a stateoftheart serial multilevel partitioning tool. We also validate our theoretical performance model through an isoefficiency study. Finally, we evaluate the impact of introducing parallel multiphase refinement into our parallel multilevel algorithm in terms of the trade off between improved partition quality and higher runtime cost.
Combinatorial problems in solving linear systems
, 2009
"... Numerical linear algebra and combinatorial optimization are vast subjects; as is their interaction. In virtually all cases there should be a notion of sparsity for a combinatorial problem to arise. Sparse matrices therefore form the basis of the interaction of these two seemingly disparate subjects. ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Numerical linear algebra and combinatorial optimization are vast subjects; as is their interaction. In virtually all cases there should be a notion of sparsity for a combinatorial problem to arise. Sparse matrices therefore form the basis of the interaction of these two seemingly disparate subjects. As the core of many of today’s numerical linear algebra computations consists of the solution of sparse linear system by direct or iterative methods, we survey some combinatorial problems, ideas, and algorithms relating to these computations. On the direct methods side, we discuss issues such as matrix ordering; bipartite matching and matrix scaling for better pivoting; task assignment and scheduling for parallel multifrontal solvers. On the iterative method side, we discuss preconditioning techniques including incomplete factorization preconditioners, support graph preconditioners, and algebraic multigrid. In a separate part, we discuss the block triangular form of sparse matrices.