Results 1  10
of
28
Numerical solution of saddle point problems
 ACTA NUMERICA
, 2005
"... Large linear systems of saddle point type arise in a wide variety of applications throughout computational science and engineering. Due to their indefiniteness and often poor spectral properties, such linear systems represent a significant challenge for solver developers. In recent years there has b ..."
Abstract

Cited by 323 (25 self)
 Add to MetaCart
(Show Context)
Large linear systems of saddle point type arise in a wide variety of applications throughout computational science and engineering. Due to their indefiniteness and often poor spectral properties, such linear systems represent a significant challenge for solver developers. In recent years there has been a surge of interest in saddle point problems, and numerous solution techniques have been proposed for solving this type of systems. The aim of this paper is to present and discuss a large selection of solution methods for linear systems in saddle point form, with an emphasis on iterative methods for large and sparse problems.
A column preordering strategy for the unsymmetricpattern multifrontal method
 ACM Transactions on Mathematical Software
, 2004
"... A new method for sparse LU factorization is presented that combines a column preordering strategy with a rightlooking unsymmetricpattern multifrontal numerical factorization. The column ordering is selected to give a good a priori upper bound on fillin and then refined during numerical factoriza ..."
Abstract

Cited by 99 (5 self)
 Add to MetaCart
(Show Context)
A new method for sparse LU factorization is presented that combines a column preordering strategy with a rightlooking unsymmetricpattern multifrontal numerical factorization. The column ordering is selected to give a good a priori upper bound on fillin and then refined during numerical factorization (while preserving the bound). Pivot rows are selected to maintain numerical stability and to preserve sparsity. The method analyzes the matrix and automatically selects one of three preordering and pivoting strategies. The number of nonzeros in the LU factors computed by the method is typically less than or equal to those found by a wide range of unsymmetric sparse LU factorization methods, including leftlooking methods and prior multifrontal methods.
Analysis and comparison of two general sparse solvers for distributed memory computers
 ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE
, 2001
"... This paper provides a comprehensive study and comparison of two stateoftheart direct solvers for large sparse sets of linear equations on largescale distributedmemory computers. One is a multifrontal solver called MUMPS, the other is a supernodal solver called SuperLU. We describe the main algo ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
This paper provides a comprehensive study and comparison of two stateoftheart direct solvers for large sparse sets of linear equations on largescale distributedmemory computers. One is a multifrontal solver called MUMPS, the other is a supernodal solver called SuperLU. We describe the main algorithmic features of the two solvers and compare their performance characteristics with respect to uniprocessor speed, interprocessor communication, and memory requirements. For both solvers, preorderings for numerical stability and sparsity play an important role in achieving high parallel efficiency. We analyse the results with various ordering algorithms. Our performance analysis is based on data obtained from runs on a 512processor Cray T3E using a set of matrices from real applications. We also use regular 3D grid problems to study the scalability of the two solvers.
A multifrontal QR factorization approach to distributed inference applied to multirobot localization and mapping
 AAAI National Conference on Artificial Intelligence
, 2005
"... QR factorization is most often used as a black box algorithm, but is in fact an elegant computation on a factor graph. By computing a rooted clique tree on this graph, the computation can be parallelized across subtrees, which forms the basis of socalled multifrontal QR methods. By judiciously c ..."
Abstract

Cited by 20 (7 self)
 Add to MetaCart
QR factorization is most often used as a black box algorithm, but is in fact an elegant computation on a factor graph. By computing a rooted clique tree on this graph, the computation can be parallelized across subtrees, which forms the basis of socalled multifrontal QR methods. By judiciously choosing the order in which variables are eliminated in the clique tree computation, we show that one straightforwardly obtains a method for performing inference in distributed sensor networks. One obvious application is distributed localization and mapping with a team of robots. We phrase the problem as inference on a largescale Gaussian Markov Random Field induced by the measurement factor graph, and show how multifrontal QR on this graph solves for the global map and all the robot poses in a distributed fashion. The method is illustrated using both small and largescale simulations, and validated in practice through actual robot experiments.
Multifrontal multithreaded rankrevealing sparse QR factorization
"... SuiteSparseQR is a sparse QR factorization package based on the multifrontal method. Within each frontal matrix, LAPACK and the multithreaded BLAS enable the method to obtain high performance on multicore architectures. Parallelism across different frontal matrices is handled with Intel’s Threading ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
SuiteSparseQR is a sparse QR factorization package based on the multifrontal method. Within each frontal matrix, LAPACK and the multithreaded BLAS enable the method to obtain high performance on multicore architectures. Parallelism across different frontal matrices is handled with Intel’s Threading Building Blocks library. The symbolic analysis and ordering phase preeliminates singletons by permuting the input matrix into the form [R11 R12; 0 A22] where R11 is upper triangular with diagonal entries above a given tolerance. Next, the fillreducing ordering, column elimination tree, and frontal matrix structures are found without requiring the formation of the pattern of A T A. Rankdetection is performed within each frontal matrix using Heath’s method, which does not require column pivoting. The resulting sparse QR factorization obtains a substantial fraction of the theoretical peak performance of a multicore computer.
A.: Parallel Algorithms for the Singular Value Decomposition. In: Handbook on Parallel Computing and Statistics. Volume 184 of Statistics: A Series of Textbooks and Monographs
, 2006
"... ..."
(Show Context)
A comparison of parallel solvers for diagonally dominant and general narrowbanded linear systems
 PARALLEL AND DISTRIBUTED COMPUTING PRACTICES (PCDP) 2000
, 1999
"... We continue the comparison of parallel algorithms for solving diagonally dominant and general narrowbanded linear systems of equations that we started in [2]. The solvers compared are the banded system solvers of ScaLAPACK [6] and those investigated by Arbenz and Hegland [1, 5]. We present the nume ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
We continue the comparison of parallel algorithms for solving diagonally dominant and general narrowbanded linear systems of equations that we started in [2]. The solvers compared are the banded system solvers of ScaLAPACK [6] and those investigated by Arbenz and Hegland [1, 5]. We present the numerical experiments that we conducted on the IBM SP/2.
A Blocked Implementation of Level 3 BLAS for RISC Processors
, 1996
"... We describe a version of the Level 3 BLAS which is designed to be efficient on RISC processors. This is an extension of previous studies by the same authors (see Amestoy, Dayd'e, Duff & Mor`ere (1995), Dayd'e, Duff & Petitet (1994), and Dayd'e & Duff (1995)) where they ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
We describe a version of the Level 3 BLAS which is designed to be efficient on RISC processors. This is an extension of previous studies by the same authors (see Amestoy, Dayd'e, Duff & Mor`ere (1995), Dayd'e, Duff & Petitet (1994), and Dayd'e & Duff (1995)) where they describe a similar approach for efficient serial and parallel implementations of Level 3 BLAS on shared and virtual shared memory multiprocessors. All our codes are written in Fortran and use loopunrolling, blocking, and copying to improve the performance. A blocking technique is used to express the BLAS in terms of operations involving triangular blocks and calls to the matrixmatrix multiplication kernel (GEMM). No manufacturersupplied or assembler code is used. This blocked implementation uses the same blocking ideas as in Dayd'e et al. (1994) except that the ordering of loops is designed for efficient reuse of data held in cache and not necessarily for parallelization. A parameter which controls the bloc...
Developments and Trends in the Parallel Solution of Linear Systems
, 1999
"... In this review paper, we consider some important developments and trends in algorithm design for the solution of linear systems concentrating on aspects that involve the exploitation of parallelism. We briefly discuss the solution of dense linear systems, before studying the solution of sparse equat ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
In this review paper, we consider some important developments and trends in algorithm design for the solution of linear systems concentrating on aspects that involve the exploitation of parallelism. We briefly discuss the solution of dense linear systems, before studying the solution of sparse equations by direct and iterative methods. We consider preconditioning techniques for iterative solvers and discuss some of the present research issues in this field.