See this document in CiteSeerX!

GEMM-Based Level 3 BLAS: High-Performance Model Implementations and Performance Evaluation Benchmark (1995)  (Make Corrections)  (15 citations)
Bo Kågström, Per Ling, Charles Van Loan



  Home/Search   Context   Related

Links:   DBLP

 
View or download:
enseeiht.fr/netlib/lapack/...lawn107.ps
irisa.fr/pub/mirrors/netli...lawn107.ps
phys.kookmin.ac.kr/lapack/...lawn107.ps
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  enseeiht.fr/netlib/lapack...index (more)
From:  phys.kookmin.ac.kr/lapack/lawn...
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: The level 3 Basic Linear Algebra Subprograms (BLAS) are designed to perform various matrix multiply and triangular system solving computations. The development of optimal level 3 BLAS code is costly and time consuming, because it requires assembly level programming/thinking. However, it is possible to develop a portable and high-performance level 3 BLAS library mainly relying on a highly optimized GEMM, the routine for the general matrix multiply and add operation. With suitable partitioning,... (Update)

Context of citations to this paper:   More

.... while maintaining the functionality of the BLAS and thereby guarantee high performance and portability of dense linear algebra codes [5, 6, 4]. The level 3 factorization algorithms typified by LAPACK make repeated calls to the level 3 BLAS with matrix operands equal to...

.... for FLAME is the idea proposed by Kagstrom, Ling and Van Loan to code level 3 BLAS in terms of optimized matrix matrix multiplication [19, 17]. This work was based on a careful study of memory hierarchies and how best to construct an ecient implementation of the entire BLAS...

Cited by:   More
The Science of Programming High-Performance Linear Algebra.. - Paolo Bientinesi John (2002)   (Correct)
Design and Evaluation of a - Top Linux Super   (Correct)
A Web Computing Environment for the SLICOT Library - Elmroth, Johansson.. (2001)   (Correct)

Similar documents (at the sentence level):
8.6%:   GEMM-Based Level 3 BLAS: High-Performance Model.. - Kågström, Ling, Van Loan (1997)   (Correct)

Active bibliography (related documents):   More   All
0.5:   Local Basic Linear Algebra Subroutines (LBLAS) for.. - Johnsson, Ortiz (1992)   (Correct)
0.3:   Generic Programming for High Performance Numerical Linear.. - Siek, Lumsdaine, Lee (1998)   (Correct)
0.3:   The SU(2)-Lattice Gauge Theory Simulation Code on the.. - Gutbrod, Attig, Weber   (Correct)

Similar documents based on text:   More   All
0.6:   Parallel Triangular Sylvester-Type Matrix Equation.. - Jonsson, Kågström (2000)   (Correct)
0.4:   High Performance Cholesky Factorization via Blocking and.. - Gustavson, Jonsson (2000)   (Correct)
0.2:   Superscalar GEMM-based Level 3 BLAS - The On-going .. - Gustavson.. (1998)   (Correct)

Related documents from co-citation:   More   All
8:   A Set of Level 3 Basic Linear Algebra Subprograms (context) - Dongarra, DuCroz et al. - 1989
7:   Basic Linear Algebra Subprograms for Fortran Usage (context) - Lawson, Hanson et al. - 1989
7:   Automatically Tuned Linear Algebra Software - Whaley, Dongarra - 1997

BibTeX entry:   (Update)

B. Kagstrom, P. Ling, and C. Van Loan. Gemm-based level 3 blas: high-performance model implementations and performance evaluation benchmark. Accepted for publication in ACM Transactions on Mathematical Software, 1997. http://citeseer.ist.psu.edu/article/kagstrom95gemmbased.html   More

@techreport{ kagstrom95gemmbased,
    author = "B. Kagstrom and P. Ling and C. van Loan",
    title = "{GEMM}-Based Level 3 {BLAS}: Installation, Tuning and Use of the Model Implementations and the Performance Evaluation Benchmark",
    number = "UT-CS-95-316",
    year = "1995",
    url = "citeseer.ist.psu.edu/article/kagstrom95gemmbased.html" }
Citations (may not include all citations):
532   LAPACK Users Guide (context) - Anderson, Bai et al. - 1992
387   A Set of Level 3 Basic Linear Algebra Subprograms (context) - Dongarra, DuCroz et al. - 1990  ACM   DBLP
345   Basic Linear Algebra Subprograms for Fortran Usage (context) - Lawson, Hanson et al. - 1979  ACM   DBLP
245   An Extended Set of Fortran Basic Linear Algebra Subprograms - Dongarra, Croz et al. - 1988  ACM
168   Gaussian elimination is not optimal (context) - Strassen - 1969
41   Impact of Hierarchical Memory Systems on Linear Algebra Algo.. (context) - Gallivan, Jalby et al. - 1988
28   Exploiting Fast Matrix Multiplication Within the Level 3 BLA.. (context) - Higham - 1990  ACM   DBLP
26   Algorithm 679: A Set of Level 3 Basic Linear Algebra Subprog.. (context) - Dongarra, DuCroz et al. - 1990
23   Department of Computer Science (context) - Kagstrom, Van Loan et al. - 1989
19   Improving performance of linear algebra algorithms for dense.. (context) - Agarwal, Gustavson et al. - 1994
15   GEMM-Based Level 3 BLAS: High-Performance Model Implementati.. - Kagstrom, Ling et al. - 1995  DBLP
12   Portable High Performance GEMM-- Based Level 3 BLAS (context) - Kagstrom, Ling et al. - 1993
6   The IBM RISC System 6000 and Linear Algebra Operations (context) - Dongarra, Mayes et al. - 1991
6   High Performance GEMM-Based Level 3 BLAS: Sample Routines fo.. (context) - Kagstrom, Ling et al. - 1991
4   Some Remarks on Fast Multiplication of Polynomials (context) - Winograd - 1973
3   A Set of High Performance Level-3 BLAS Structured and Tuned .. (context) - Ling - 1993
3   Paragon Basic Math Library Performance Report (context) - Corporation - 1993
3   Guide and Reference (context) - Engineering, Library - 1994
2   Optimization of Level 3 BLAS for SIEMENS VP Systems (context) - Grasemann - 1989
2   Implementation of the Level 2 and 3 BLAS on the CRAY Y-MP an.. (context) - Sheik, Phuong et al. - 1992
2   GEMM-Based Level 3 BLAS: Algorithms for the Model Implementa.. (context) - Kagstrom, Ling et al. - 1994
1   High Performance Level 3 BLAS (context) - Green - 1994
1   A Parallel Block Implementation of Level3 BLAS for MIMD Vect.. (context) - Dayde, Duff et al. - 1994
1   GEMMV: A Portable Level 3 BLAS Winograd Variant of Strassen'.. (context) - Douglas, Heroux et al. - 1994
1   Design Issues and the Performance of Level 1 and Level 2 Ker.. (context) - Dackland - 1995
1   Exploting functional parallelism of POWER2 to design high-pe.. (context) - Agarwal, Gustavson et al. - 1994



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.enseeiht.fr/netlib/lapack/lawns/index.html):   More
A Proposal for a Fortran 90 Interface for LAPACK - Dongarra, Croz, Hammarling.. (1995)   (Correct)
Block Reduction of Matrices to Condensed Forms for.. - Dongarra.. (1987)   (Correct)
Improved Error Bounds for Underdetermined System Solvers - Demmel, Higham (1991)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC