See this document in CiteSeerX!

Superscalar GEMM-based Level 3 BLAS The On-going Evolution of a Portable and High-Performance Library (1998)  (Make Corrections)  (8 citations)
Fred Gustavson, André Henriksson, Isak Jonsson, Bo Kågström, Per Ling
PARA



  Home/Search   Context   Related

 
View or download:
cs.umu.se/~isak/publ/15410207.ps
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  cs.umu.se/~isak/ (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: . Recently, a first version of our GEMM-based level 3 BLAS for superscalar type processors was announced. A new feature is the inclusion of DGEMM itself. This DGEMM routine contains inline what we call a level 3 kernel routine, which is based on register blocking. Additionally, it features level 1 cache blocking and data copying of submatrix operands for the level 3 kernel. Our other BLAS's which possess triangular operands, e.g., DTRSM, DSYRK use a similar level 3 kernel routine to... (Update)

Context of citations to this paper:   More

.... of the level 3 BLAS stemmed from the observation that they can be cast in terms of optimized matrix matrix multiplication [1, 47, 52]. The performance of the resulting libraries was comparable to that of the optimized, assembly coded, vendor supplied BLAS in many...

.... for FLAME is the idea proposed by Kagstrom, Ling and Van Loan to code level 3 BLAS in terms of optimized matrix matrix multiplication [19, 17]. This work was based on a careful study of memory hierarchies and how best to construct an ecient implementation of the entire BLAS...

Cited by:   More
Recursive Blocked Algorithms for Solving Triangular.. - Jonsson, Kågström (2001)   (Correct)
Recursive Blocked Algorithms for Solving Triangular.. - Jonsson, Kågström (2001)   (Correct)
On Reducing TLB Misses in Matrix Multiplication - Goto, Geijn (2002)   (Correct)

Active bibliography (related documents):   More   All
0.1:   Parallel and Fully Recursive Multifrontal Supernodal.. - Irony, Shklarski, Toledo (2002)   (Correct)
0.1:   Communication Lower Bounds for Distributed-Memory Matrix.. - Irony, Toledo   (Correct)
0.1:   Parallel and Fully Recursive Multifrontal Sparse Cholesky - Irony, Shklarski, Toledo (2002)   (Correct)

Similar documents based on text:   More   All
0.8:   High Performance Cholesky Factorization via Blocking and.. - Gustavson, Jonsson (2000)   (Correct)
0.7:   Parallel Triangular Sylvester-Type Matrix Equation.. - Jonsson, Kågström (2000)   (Correct)
0.5:   Recursive Blocked Data Formats and BLAS's for.. - Gustavson.. (1998)   (Correct)

Related documents from co-citation:   More   All
8:   Recursion leads to automatic variable blockingfor dense linearalgebra algorithms (context) - Gustavson - 1997
6:   Recursive blocked data formats and BLAS's for dense linear algebra algorithms - Gustavson, Henriksson et al. - 1998
5:   LAPACK Users' Guide (context) - Anderson, Bai et al. - 1995

BibTeX entry:   (Update)

Fred Gustavson, Andr'e Henriksson, Isak Jonsson, Bo Kagstrom, and Per Ling. Superscalar GEMM-based Level 3 BLAS -- The On-going Evolution of a Portable and High-Performance Library. 1998. http://citeseer.ist.psu.edu/gustavson98superscalar.html   More

@inproceedings{ gustavson98superscalar,
    author = "Fred G. Gustavson and Andre Henriksson and Isak Jonsson and Bo Km and Per Ling",
    title = "Superscalar {GEMM}-based Level 3 {BLAS} - The On-going Evolution of a Portable and High-Performance Library",
    booktitle = " {PARA}",
    pages = "207-215",
    year = "1998",
    url = "citeseer.ist.psu.edu/gustavson98superscalar.html" }
Citations (may not include all citations):
19   Improving performance of linear algebra algorithms for dense.. (context) - Agarwal, Gustavson et al. - 1994
16   Exploiting functional parallelism of POWER2 to design high-p.. (context) - Agarwal, Gustavson et al. - 1994



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.cs.umu.se/~isak/):
Recursive Blocked Data Formats and BLAS's for.. - Gustavson.. (1998)   (Correct)
High-Performance Matrix Multiplication on the IBM SP High Node - Henriksson, Jonsson (1998)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC