See this document in CiteSeerX!

On Reducing TLB Misses in Matrix Multiplication (2002)  (Make Corrections)  (4 citations)
Kazushige Goto, Robert van de Geijn



  Home/Search   Context   Related

 
View or download:
utexas.edu/ftp/pub/techr...tr0255.ps.Z
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  utexas.edu/ftp/pub/techreports... (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: During the last decade, a number of projects have pursued the high-performance implementation of matrix multiplication. Typically, these projects organize the computation around an \inner kernel," C = A B +C, that keeps one of the operands in the L1 cache, while streaming parts of the other operands through that cache. Variants include approaches that extend this principle to multiple levels of cache or that apply the same principle to the L2 cache while essentially ignoring the L1... (Update)

Cited by:   More
Performance Modeling and Analysis of Cache Blocking.. - Nishtala, Vuduc.. (2004)   (Correct)
Is Search Really Necessary to Generate High-Performance .. - Yotov, Li, Ren.. (2005)   (Correct)
The Opie Compiler: from Row-major Source to Morton-ordered.. - Gabriel, Wise (2004)   (Correct)

Active bibliography (related documents):   More   All
0.7:   A Systematic Approach to the Design and Analysis of Linear.. - Gunnels   (Correct)
0.6:   Recursive Blocked Algorithms for Solving Triangular.. - Jonsson, Kågström (2001)   (Correct)
0.6:   Recursive Blocked Algorithms for Solving Triangular.. - Jonsson, Kågström (2001)   (Correct)

Similar documents based on text:   More   All
0.3:   Adapting Radix Sort to the Memory Hierarchy - Rahman, Raman (2000)   (Correct)
0.2:   Software Prefetching and Caching for Translation Lookaside.. - Bala, Kaashoek, Weihl (1994)   (Correct)
0.2:   Data Sequence Locality: a Generalization of Temporal Locality - Loechner, Meister, Clauss   (Correct)

Related documents from co-citation:   More   All
3:   Exact analysis of the cache behavior of nested loops (context) - Chatterjee, Parker et al. - 2001
3:   Automatically Tuned Linear Algebra Software - Whaley, Dongarra - 1997
2:   Modeling and improving locality for irregular problems: sparse matrix-vector pro.. - Heras, Perez et al. - 1999

BibTeX entry:   (Update)

K. Goto and R. van de Geijn. On reducing TLB misses in matrix multiplication. Technical Report TR-2002-55, The University of Texas at Austin, Department of Computer Sciences, 2002. FLAME Working Note #9. http://citeseer.ist.psu.edu/goto02reducing.html   More

@misc{ goto02reducing,
  author = "K. Goto and R. Geijn",
  title = "On reducing TLB misses in matrix multiplication",
  text = "K. Goto and R. van de Geijn. On reducing TLB misses in matrix multiplication.
    Technical Report TR-2002-55, The University of Texas at Austin, Department
    of Computer Sciences, 2002. FLAME Working Note #9.",
  year = "2002",
  url = "citeseer.ist.psu.edu/goto02reducing.html" }
Citations (may not include all citations):
532   LAPACK Users' Guide (context) - Anderson, Bai et al. - 1992
387   A set of level 3 basic linear algebra subprograms (context) - Dongarra, Croz et al. - 1990
248   Solving Linear Systems on Vector and Shared Memory Computers (context) - Dongarra, Du et al. - 1991
216   Performance of various computers using standard linear equat.. - Dongarra - 2002
157   Automatically tuned linear algebra software - Whaley, Dongarra - 1998
147   LINPACK Users' Guide (context) - Dongarra, Bunch et al. - 1979
123   Optimizing matrix multiply using PHiPAC: a Portable - Bilmes, Asanovi et al. - 1997
122   Scalapack: A scalable linear algebra library for distributed.. (context) - Choi, Dongarra et al. - 1992
72   LAPACK: A portable linear algebra library for highperformanc.. - Anderson, Bai et al. - 1990
60   Recursion leads to automatic variable blocking for dense lin.. (context) - Gustavson - 1997
41   The impact of hierarchical memory systems on linear algebra .. (context) - Gallivan, Jalby et al. - 1987
38   Locality of reference in lu decomposition with partial pivot.. - Toledo - 1997
20   Using PLAPACK: Parallel Linear Algebra Package (context) - Geijn - 1997
20   Prospectus for the development of a linear algebra library f.. - Demmel, Dongarra et al. - 1987
16   Exploiting functional parallelism of POWER2 to design high-p.. (context) - Agarwal, Gustavson et al. - 1994
15   Applying recursion to serial and parallel QR factorization l.. - Elmroth, Gustavson - 2000
13   Guide and Reference (context) - Engineering, Library - 1988
10   Flame: Formal linear algebra methods environment (context) - Gunnels, Gustavson et al. - 2001
8   GEMM-based level 3 BLAS: High performance model implementati.. (context) - agstr, Ling et al. - 1998
8   Superscalar GEMMbased level 3 BLAS { the on-going evolution .. - Gustavson, Henriksson et al. - 1998
8   Minimal storage high-performance Cholesky factorization via .. (context) - Gustavson, Jonsson - 2000
6   A family of high-performance matrix multiplication algorithm.. (context) - Gunnels, Henry et al. - 2001
5   A framework for high-performance matrix multiplication based.. (context) - Valsalam, Skjellum - 2002
5   BLAS based on block data structures (context) - Henry - 1992
4   Recursive blocked algorithms for solving triangular matrix e.. - Jonsson, agstr - 2001
1   Gemm-based level 3 blas: High-performance model (context) - agstr, Ling et al. - 1995
1   New generalized matrix data structures lead to a variety of .. (context) - Gustavson - 2001
1   Flexible high-performance matrix multiply via self-modifying.. - Henry - 2001
www.netlib.org/benchmark/hpl/

Documents on the same site (http://www.cs.utexas.edu/ftp/pub/techreports/):   More
Parametric Quantitative Temporal Reasoning - Emerson, Trefler (1999)   (Correct)
Two Problems of TCP AIMD Congestion Control - Yang, Kim, Zhang, Lam (2000)   (Correct)
Verifying Adder Circuits Using Powerlists - Adams (1994)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC