See this document in CiteSeerX!

Tuning Strassen's Matrix Multiplication for Memory Efficiency (1998)  (Make Corrections)  (28 citations)
Mithuna Thottethodi, Siddhartha Chatterjee, Alvin R. Lebeck



  Home/Search   Context   Related

 
View or download:
duke.edu/~alvy/papers/sc98.ps
unc.edu/pub/users/sc/papers/sc98.ps
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  swri.edu/sc98/TechPapers/...index (more)
From:  unc.edu/Research/TUNE...TUNEpubs
Homepages:  A.Lebeck  

Rate this article: (best)
View Comments (0)
(Enter summary)

Abstract: Strassen's algorithm for matrix multiplication gains its lower arithmetic complexity at the expense of reduced locality of reference, which makes it challenging to implement the algorithm efficiently on a modern machine with a hierarchical memory system. We report on an implementation of this algorithm that uses several unconventional techniques to make the algorithm memory-friendly. First, the algorithm internally uses a non-standard array layout known as Morton order that is based on a... (Update)

Cited by:   More
Understanding the Efficiency of GPU Algorithms for.. - Fatahalian.. (2004)   (Correct)
On Improving the Memory Access Patterns During the Execution .. - ElGindy, Ferizis   (Correct)
Compiler Support for Optimizing Tensor Contraction.. - Baumgartner, Cociorva, ..   (Correct)

Similar documents (at the sentence level):
68.1%:   Tuning Strassen's Matrix Multiplication for Memory.. - Thottethodi, Chatterjee.. (1998)   (Correct)
12.3%:   Recursive Array Layouts and Fast Matrix Multiplication - Chatterjee, Lebeck.. (1999)   (Correct)

Active bibliography (related documents):   More   All
0.9:   Architecture-efficient Strassen's Matrix.. - Pauca, Sun.. (1998)   (Correct)
0.5:   Efficient Parallel Solutions of Indexed Recurrences with.. - Ben-Asher, Haber (1997)   (Correct)
0.4:   A Tensor Product Formulation of Strassen's Matrix.. - Kumar, Huang, Sadayappan (1990)   (Correct)

Similar documents based on text:   More   All
0.4:   Nonlinear Array Layouts for Hierarchical Memory Systems - Chatterjee, Jain.. (1999)   (Correct)
0.4:   Recursive Array Layouts and Fast Parallel Matrix.. - Chatterjee, Lebeck.. (1999)   (Correct)
0.3:   Annotated Memory References: A Mechanism for Informed Cache .. - Alvin Lebeck David (1998)   (Correct)

Related documents from co-citation:   More   All
17:   FFTW: An adaptive software architecture for the FFT - Frigo, Johnson - 1998
16:   Automatically Tuned Linear Algebra Software - Whaley, Dongarra - 1997
14:   The cache performance and Optimizations of Blocked Algorithms (context) - Lam, Rothberg et al. - 1991

BibTeX entry:   (Update)

M. Thottethodi, S. Chatterjee, and A. R. Lebeck. Tuning Strassen's matrix multiplication for memory efficiency. In Proceedings of SC98 (CD-ROM), Orlando, FL, Nov. 1998. Available from http://www.supercomp.org/sc98. http://citeseer.ist.psu.edu/thottethodi98tuning.html   More

@inproceedings{ thottethodi98tuning,
    author = "Mithuna S. Thottethodi et al.",
    title = "Tuning {Strassen}'s Matrix Multiplication for Memory Efficiency",
    year = "1998",
    url = "citeseer.ist.psu.edu/thottethodi98tuning.html" }
Citations (may not include all citations):
387   A set of level 3 basic linear algebra subprograms (context) - Dongarra, Croz et al. - 1990
386   Atom a system for building customized program analysis tools (context) - Srivastava, Eustace - 1994
376   The cache performance and optimizations of blocked algorithm.. (context) - Lam, Rothberg et al. - 1991
234   Accuracy and Stability of Numerical Algorithms (context) - Higham - 1996
211   LAPACK User's Guide (context) - Anderson, Bai et al. - 1995
175   Evaluating associativity in CPU caches (context) - Hill, Smith - 1989
168   Gaussian elimination is not optimal (context) - Strassen - 1969
124   Tile size selection using cache organization and data layout - Coleman, McKinley - 1995
109   Cache profiling and the SPEC benchmarks: A case study - Lebeck, Wood - 1994
88   Data-centric multi-level blocking - Kodukula, Ahmed et al. - 1997
83   Data transformations for eliminating conflict misses - Rivera, Tseng - 1998
77   Cache miss equations: An analytical representation of cache .. - Ghosh, Martonosi et al. - 1997
42   Auto-blocking matrix-multiplication or tracking BLAS3 perfor.. - Frens, Wise - 1997
33   Dynamic partitioning of non-uniform structured workloads wit.. - Pilkington, Baden - 1996
16   Extra high speed matrix multiplication on the Cray - Bailey - 1988
14   GEMMW: a portable level 3 BLAS Winograd variant of Strassen'.. - Douglas, Heroux et al. - 1994
10   Efficient procedures for using matrix algorithms (context) - Fischer, Probert - 1974
6   A tensor product formulation of Strassen's matrix multiplica.. - Huang, Johnson et al. - 1990
5   Implementation of Strassen's algorithm for matrix multiplica.. (context) - Huss-Lederman, Jacobson et al. - 1996
3   IBM engineering and scientific subroutine library guide and .. (context) - Corporation - 1992
2   Experiments with quadtree representation of matrices (context) - Abdali, Wise - 1988
1   Architecture-efficient Strassen's matrix multiplication: A c.. - Pauca, Sun et al. - 1997
1   On memory requirements of Strassen's algorithms (context) - Kreczmar - 1976



The graph only includes citing articles where the year of publication is known.


Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC