Results 1  10
of
202
Parallel Numerical Linear Algebra
, 1993
"... We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illust ..."
Abstract

Cited by 766 (23 self)
 Add to MetaCart
We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem, the nonsymmetric eigenvalue problem, and the singular value decomposition. We consider dense, band and sparse matrices.
A supernodal approach to sparse partial pivoting
 SIAM Journal on Matrix Analysis and Applications
, 1999
"... We investigate several ways to improve the performance of sparse LU factorization with partial pivoting, as used to solve unsymmetric linear systems. To perform most of the numerical computation in dense matrix kernels, we introduce the notion of unsymmetric supernodes. To better exploit the memory ..."
Abstract

Cited by 263 (25 self)
 Add to MetaCart
We investigate several ways to improve the performance of sparse LU factorization with partial pivoting, as used to solve unsymmetric linear systems. To perform most of the numerical computation in dense matrix kernels, we introduce the notion of unsymmetric supernodes. To better exploit the memory hierarchy, weintroduce unsymmetric supernodepanel updates and twodimensional data partitioning. To speed up symbolic factorization, we use Gilbert and Peierls's depth rst search with Eisenstat and Liu's symmetric structural reductions. We have implemented a sparse LU code using all these ideas. We present experiments demonstrating that it is signi cantly faster than earlier partial pivoting codes. We also compare performance with Umfpack, which uses a multifrontal approach; our code is usually faster.
Sparse matrices in Matlab: Design and implementation
, 1991
"... We have extended the matrix computation language and environment Matlab to include sparse matrix storage and operations. The only change to the outward appearance of the Matlab language is a pair of commands to create full or sparse matrices. Nearly all the operations of Matlab now apply equally to ..."
Abstract

Cited by 164 (22 self)
 Add to MetaCart
(Show Context)
We have extended the matrix computation language and environment Matlab to include sparse matrix storage and operations. The only change to the outward appearance of the Matlab language is a pair of commands to create full or sparse matrices. Nearly all the operations of Matlab now apply equally to full or sparse matrices, without any explicit action by the user. The sparse data structure represents a matrix in space proportional to the number of nonzero entries, and most of the operations compute sparse results in time proportionaltothenumber of arithmetic operations on nonzeros.
An UnsymmetricPattern Multifrontal Method for Sparse LU Factorization
 SIAM J. MATRIX ANAL. APPL
, 1994
"... Sparse matrix factorization algorithms for general problems are typically characterized by irregular memory access patterns that limit their performance on parallelvector supercomputers. For symmetric problems, methods such as the multifrontal method avoid indirect addressing in the innermost loops ..."
Abstract

Cited by 150 (27 self)
 Add to MetaCart
Sparse matrix factorization algorithms for general problems are typically characterized by irregular memory access patterns that limit their performance on parallelvector supercomputers. For symmetric problems, methods such as the multifrontal method avoid indirect addressing in the innermost loops by using dense matrix kernels. However, no efficient LU factorization algorithm based primarily on dense matrix kernels exists for matrices whose pattern is very unsymmetric. We address this deficiency and present a new unsymmetricpattern multifrontal method based on dense matrix kernels. As in the classical multifrontal method, advantage is taken of repetitive structure in the matrix by factorizing more than one pivot in each frontal matrix thus enabling the use of Level 2 and Level 3 BLAS. The performance is compared with the classical multifrontal method and other unsymmetric solvers on a CRAY YMP.
SuperLU DIST: A scalable distributedmemory sparse direct solver for unsymmetric linear systems
 ACM Trans. Mathematical Software
, 2003
"... We present the main algorithmic features in the software package SuperLU DIST, a distributedmemory sparse direct solver for large sets of linear equations. We give in detail our parallelization strategies, with a focus on scalability issues, and demonstrate the software’s parallel performance and sc ..."
Abstract

Cited by 144 (18 self)
 Add to MetaCart
(Show Context)
We present the main algorithmic features in the software package SuperLU DIST, a distributedmemory sparse direct solver for large sets of linear equations. We give in detail our parallelization strategies, with a focus on scalability issues, and demonstrate the software’s parallel performance and scalability on current machines. The solver is based on sparse Gaussian elimination, with an innovative static pivoting strategy proposed earlier by the authors. The main advantage of static pivoting over classical partial pivoting is that it permits a priori determination of data structures and communication patterns, which lets us exploit techniques used in parallel sparse Cholesky algorithms to better parallelize both LU decomposition and triangular solution on largescale distributed machines.
Algorithm 887: Cholmod, supernodal sparse cholesky factorization and update/downdate
 ACM Transactions on Mathematical Software
, 2008
"... CHOLMOD is a set of routines for factorizing sparse symmetric positive definite matrices of the form A or A A T, updating/downdating a sparse Cholesky factorization, solving linear systems, updating/downdating the solution to the triangular system Lx = b, and many other sparse matrix functions for b ..."
Abstract

Cited by 109 (8 self)
 Add to MetaCart
CHOLMOD is a set of routines for factorizing sparse symmetric positive definite matrices of the form A or A A T, updating/downdating a sparse Cholesky factorization, solving linear systems, updating/downdating the solution to the triangular system Lx = b, and many other sparse matrix functions for both symmetric and unsymmetric matrices. Its supernodal Cholesky factorization relies on LAPACK and the Level3 BLAS, and obtains a substantial fraction of the peak performance of the BLAS. Both real and complex matrices are supported. CHOLMOD is written in ANSI/ISO C, with both C and MATLAB TM interfaces. It appears in MATLAB 7.2 as x=A\b when A is sparse symmetric positive definite, as well as in several other sparse matrix functions.
A column preordering strategy for the unsymmetricpattern multifrontal method
 ACM Transactions on Mathematical Software
, 2004
"... A new method for sparse LU factorization is presented that combines a column preordering strategy with a rightlooking unsymmetricpattern multifrontal numerical factorization. The column ordering is selected to give a good a priori upper bound on fillin and then refined during numerical factoriza ..."
Abstract

Cited by 94 (5 self)
 Add to MetaCart
(Show Context)
A new method for sparse LU factorization is presented that combines a column preordering strategy with a rightlooking unsymmetricpattern multifrontal numerical factorization. The column ordering is selected to give a good a priori upper bound on fillin and then refined during numerical factorization (while preserving the bound). Pivot rows are selected to maintain numerical stability and to preserve sparsity. The method analyzes the matrix and automatically selects one of three preordering and pivoting strategies. The number of nonzeros in the LU factors computed by the method is typically less than or equal to those found by a wide range of unsymmetric sparse LU factorization methods, including leftlooking methods and prior multifrontal methods.
Approximating Treewidth, Pathwidth, Frontsize, and Shortest Elimination Tree
, 1995
"... Various parameters of graphs connected to sparse matrix factorization and other applications can be approximated using an algorithm of Leighton et al. that finds vertex separators of graphs. The approximate values of the parameters, which include minimum front size, treewidth, pathwidth, and minimum ..."
Abstract

Cited by 79 (5 self)
 Add to MetaCart
Various parameters of graphs connected to sparse matrix factorization and other applications can be approximated using an algorithm of Leighton et al. that finds vertex separators of graphs. The approximate values of the parameters, which include minimum front size, treewidth, pathwidth, and minimum elimination tree height, are no more than O(logn) (minimum front size and treewidth) and O(log^2 n) (pathwidth and minimum elimination tree height) times the optimal values. In addition, we show that unless P = NP there are no absolute approximation algorithms for any of the parameters.
Tuning the performance of I/O intensive parallel applications
 In Fourth Workshop on Input/Output in Parallel and Distributed Systems (Philadelphia
, 1996
"... Getting good I/O performance from parallel programs is a critical problem for many application domains. In this paper, we report our experience tuning the I/O performance of four application programs from the areas of satellitedata processing and linear algebra. After tuning, three of the four appl ..."
Abstract

Cited by 78 (26 self)
 Add to MetaCart
(Show Context)
Getting good I/O performance from parallel programs is a critical problem for many application domains. In this paper, we report our experience tuning the I/O performance of four application programs from the areas of satellitedata processing and linear algebra. After tuning, three of the four applications achieve applicationlevel I/O rates of over 100 MB/s on 16 processors. The total volume of I/O required by the programs ranged from about 75 MB to over 200 GB. We report the lessons learned in achieving high I/O performance from these applications, including the need for code restructuring, local disks on every node and knowledge of future I/O requests. We also report our experience on achieving high performance on peertopeer con gurations. Finally, wecomment on the necessity of complex I/O interfaces like collective I/O and strided requests to achieve high performance. 1