See this document in CiteSeerX!

Distributed General Matrix Multiply and Add for a 2D Mesh Processor Network (1995)  (Make Corrections)  
Bo Kågström, Mikael Rännar



  Home/Search   Context   Related

 
View or download:
cs.umu.se/~mr/Rapp...kagstrom_rannar.ps
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  netlib.org/utk/pa...sblasmeeting (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: . A distributed algorithm with the same functionality as the single-processor level 3 BLAS operation GEMM, i.e., general matrix multiply and add, is presented. With the same functionality we mean the ability to perform GEMM operations on arbitrary subarrays of the matrices involved. The logical network is a 2D square mesh with torus connectivity. The matrices involved are distributed with non-scattered blocked data distribution. The algorithm consists of two main parts, alignment and data... (Update)

Active bibliography (related documents):   More   All
0.2:   Communication Lower Bounds for Distributed-Memory Matrix.. - Irony, Toledo   (Correct)
0.2:   GEMM-Based Level 3 BLAS: High-Performance Model.. - Kågström, Ling, Van Loan (1997)   (Correct)
0.2:   Scalable BLAS 2 and 3 Matrix Multiplication for.. - Chrisochoides..   (Correct)

Similar documents based on text:   More   All
0.7:   Parallel Triangular Sylvester-Type Matrix Equation.. - Jonsson, Kågström (2000)   (Correct)
0.5:   Graduate Course in High-Performance Programming - Kågström (1998)   (Correct)
0.4:   Superscalar GEMM-based Level 3 BLAS - The On-going .. - Gustavson.. (1998)   (Correct)

BibTeX entry:   (Update)

@misc{ gstr-distributed,
  author = "Bo Kågström and Mikael Rännar",
  title = "Distributed General Matrix Multiply and Add for a 2D Mesh Processor Network",
  url = "citeseer.ist.psu.edu/237592.html" }
Citations (may not include all citations):
394   Solving Problems on Concurrent Processors (context) - Fox, Johnson et al. - 1988
387   A set of level 3 basic linear algebra subprograms (context) - Dongarra, Croz et al. - 1990
155   Society for Industrial and Applied Mathematics (context) - Anderson, Bai et al. - 1992
56   Parallel matrix and graph algorithms (context) - Dekel, Nassimi et al. - 1981
40   A Users' Guide to PICL: A portable instrumented communicatio.. (context) - Geist, Heath et al. - 1990
34   PUMMA: Parallel Universal Matrix Multiplication Algorithms o.. - Choi, Dongarra et al. - 1993
29   SUMMA: Scalable universal matrix multiplication algorithm - Geijn, Watts - 1995
24   Matrix multiplication on the Intel Touchstone Delta (context) - Huss-Lederman, Jacobson et al. - 1994
12   Portable High Performance GEMM-- Based Level 3 BLAS (context) - Kagstrom, Ling et al. - 1993
11   Level 3 BLAS for distributed memory concurrent computers (context) - Choi, Dongarra et al. - 1992
7   Oak Ridge National Laboratory (context) - Guide, Manual et al. - 1993
6   High Performance GEMM--Based Level 3 BLAS: Sample Routines f.. (context) - Kagstrom, Ling et al. - 1991
5   Efficient mapping and implementation of matrix algorithms on.. (context) - Cherkassky, Smith - 1988
1   Portable and General GEMM Operation for a 2D Mesh Processor .. (context) - Rannar, Distributed - 1995

Documents on the same site (http://www.netlib.org/utk/papers/sblas-meeting.html):   More
Using BLACS and MPI in ScaLAPACK - Whaley (1995)   (Correct)
GEMM-Based Level 3 BLAS: Installation, Tuning and Use of .. - Kågström, Ling, Van Loan (1995)   (Correct)
GEMM-Based Level 3 BLAS: High-Performance Model.. - Kågström, Ling, Van Loan (1997)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC