Results 1 -
8 of
8
An Extended Set of Fortran Basic Linear Algebra Subprograms
- ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE
, 1986
"... This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrix-vector operations which should provide for efficient and portable implementations of algorithms for high performance computers. ..."
Abstract
-
Cited by 409 (72 self)
- Add to MetaCart
This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrix-vector operations which should provide for efficient and portable implementations of algorithms for high performance computers.
ALGORITHM 656 -- An Extended Set of Basic Linear Algebra . . .
, 1988
"... ... Subprograms (Level 2 BLAS). Level 2 BLAS are targeted at matrix-vector operations with the aim of providing more efficient, but portable, implementations of algorithms on high-performance computers. The model implementation provides a portable set of FORTRAN 77 Level 2 BLAS for machines where sp ..."
Abstract
-
Cited by 39 (9 self)
- Add to MetaCart
... Subprograms (Level 2 BLAS). Level 2 BLAS are targeted at matrix-vector operations with the aim of providing more efficient, but portable, implementations of algorithms on high-performance computers. The model implementation provides a portable set of FORTRAN 77 Level 2 BLAS for machines where specialized implementations do not exist or are not required. The test software aims to verify that specialized implementations meet the specification of Level 2 BLAS and that imple-mentations are correctly installed.
Stability of Block Algorithms with Fast Level 3 BLAS
- ACM Trans. Math. Soft
, 1992
"... . Block algorithms are becoming increasingly popular in matrix computations. Since their basic unit of data is a submatrix rather than a scalar they have a higher level of granularity than point algorithms, and this makes them well-suited to high-performance computers. The numerical stability of the ..."
Abstract
-
Cited by 33 (14 self)
- Add to MetaCart
. Block algorithms are becoming increasingly popular in matrix computations. Since their basic unit of data is a submatrix rather than a scalar they have a higher level of granularity than point algorithms, and this makes them well-suited to high-performance computers. The numerical stability of the block algorithms in the new linear algebra program library LAPACK is investigated here. It is shown that these algorithms have backward error analyses in which the backward error bounds are commensurate with the error bounds for the underlying level 3 BLAS (BLAS3). One implication is that the block algorithms are as stable as the corresponding point algorithms when conventional BLAS3 are used. A second implication is that the use of BLAS3 based on fast matrix multiplication techniques affects the stability only insofar as it increases the constant terms in the normwise backward error bounds. For linear equation solvers employing LU factorization it is shown that fixed precision iterative re...
A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures
- SIAM J. SCI. COMPUT
, 2002
"... One approach to solving the nonsymmetric eigenvalue problem in parallel is to parallelize the QR algorithm. Not long ago, this was widely considered to be a hopeless task. Recent efforts have led to significant advances, although the methods proposed up to now have suffered from scalability problems ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
One approach to solving the nonsymmetric eigenvalue problem in parallel is to parallelize the QR algorithm. Not long ago, this was widely considered to be a hopeless task. Recent efforts have led to significant advances, although the methods proposed up to now have suffered from scalability problems. This paper discusses an approach to parallelizingthe QR algorithm that greatly improves scalability. A theoretical analysis indicates that the algorithm is ultimately not scalable, but the nonscalability does not become evident until the matrix dimension is enormous. Experiments on the Intel Paragon system, the IBM SP2 supercomputer, the SGI Origin 2000, and the Intel ASCI Option Red supercomputer are reported.
New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems
, 1998
"... . We present a new recursive algorithm for the QR factorization of an m by n matrix A. The recursion leads to an automatic variable blocking that allow us to replace a level 2 part in a standard block algorithm by level 3 operations. However, there are some additional costs for performing the update ..."
Abstract
-
Cited by 29 (6 self)
- Add to MetaCart
. We present a new recursive algorithm for the QR factorization of an m by n matrix A. The recursion leads to an automatic variable blocking that allow us to replace a level 2 part in a standard block algorithm by level 3 operations. However, there are some additional costs for performing the updates which prohibits the efficient use of the recursion for large n. This obstacle is overcome by using a hybrid recursive algorithm that outperforms the LAPACK algorithm DGEQRF by 78% to 21% as m = n increases from 100 to 1000. A successful parallel implementation on a PowerPC 604 based IBM SMP node based on dynamic load balancing is presented. For 2, 3, 4 processors and m = n = 2000 it shows speedups of 1.96, 2.99, and 3.92 compared to our uniprocessor algorithm. 1 Introduction LAPACK algorithm DGEQRF requires more floating point operations than LAPACK algorithm DGEQR2, see [1]. Yet, DGEQRF outperforms DGEQR2 on a RS/6000 workstation by nearly a factor of 3 on large matrices. Dongarra, Kaufm...
Numerical Algorithms Group, Ltd. and
"... Subprograms (Level 2 BLAS). Level 2 BLAS are targeted at matrix-vector operations with the aim of providing more efficient, but portable, implementations of algorithms on high-performance com-puters. The model implementation provides a portable set of FORTRAN 77 Level 2 BLAS for machines where speci ..."
Abstract
- Add to MetaCart
Subprograms (Level 2 BLAS). Level 2 BLAS are targeted at matrix-vector operations with the aim of providing more efficient, but portable, implementations of algorithms on high-performance com-puters. The model implementation provides a portable set of FORTRAN 77 Level 2 BLAS for machines where specialized implementations do not exist or are not required. The test software aims to verify that specialized implementations meet the specification of Level 2 BLAS and that imple-mentations are correctly installed. Categories and Subject Descriptors: F.2.1 [Analysis of Algorithms and Problem Complexity]: Numerical Algorithms and Problems-compututiott-s on matrices; G.l.O [Numerical Analysis]:

