See this document in CiteSeerX!

Performance Modeling and Analysis of Cache Blocking in Sparse Matrix Vector Multiply (2004)  (Make Corrections)  (1 citation)
Rajesh Nishtala, Richard W. Vuduc, James W. Demmel, Katherine A. Yelick



  Home/Search   Context   Related

 
View or download:
berkeley.edu/~rajeshn...csd041335.pdf
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  berkeley.edu/~rajeshn/ (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: SpMV), or y = y +A x, which is an important and ubiquitous computational kernel. Prior work indicates that cache blocking of SpMV is extremely important for some matrix and machine combinations, with speedups as high as 3x. In this paper we present a new, more compact data structure for cache blocking for SpMV and look at the general question of when and why performance improves. Cache blocking appears to be most e#ective when simultaneously 1) the vector x does not fit in cache 2) the... (Update)

Cited by:   More
When Cache Blocking of Sparse Matrix Vector Multiply.. - Nishtala, Vuduc..   (Correct)

Similar documents (at the sentence level):
8.0%:   Performance Modeling and Analysis of Cache Blocking.. - Nishtala, Vuduc.. (2004)   (Correct)
5.5%:   Performance Optimizations and Bounds for Sparse.. - Vuduc, Demmel.. (2002)   (Correct)

Active bibliography (related documents):   More   All
0.9:   Automatic Performance Tuning and Analysis of Sparse .. - Vuduc, Kamil, Hsu, .. (2002)   (Correct)
0.9:   Memory Hierarchy Optimizations and Performance Bounds .. - Vuduc, Gyulassy.. (2003)   (Correct)
0.5:   Memory Hierarchy Optimizations and Performance Bounds.. - Vuduc, Gyulassy.. (2003)   (Correct)

Similar documents based on text:   More   All
0.4:   Automatic Performance Tuning of Sparse Matrix Kernels - Vuduc (2003)   (Correct)
0.3:   Modeling the Benefits of Mixed Data and Task Parallelism - Chakrabarti, Demmel, Yelick (1995)   (Correct)
0.2:   Sparse Gaussian Elimination on High Performance Computers - Li (1996)   (Correct)

BibTeX entry:   (Update)

R. Nishtala, R. W. Vuduc, J. W. Demmel, and K. A. Yelick. Performance modeling and analysis of cache blocking in sparse matrix vector multiply. Technical report, University of California, Berkeley, EECS Dept., 2004. (to appear). http://citeseer.ist.psu.edu/article/nishtala04performance.html   More

@misc{ nishtala04performance,
  author = "R. Nishtala and R. Vuduc and J. Demmel and K. Yelick",
  title = "Performance modeling and analysis of cache blocking in sparse matrix vector
    multiply",
  text = "R. Nishtala, R. W. Vuduc, J. W. Demmel, and K. A. Yelick. Performance modeling
    and analysis of cache blocking in sparse matrix vector multiply. Technical
    report, University of California, Berkeley, EECS Dept., 2004. (to appear).",
  year = "2004",
  url = "citeseer.ist.psu.edu/article/nishtala04performance.html" }
Citations (may not include all citations):
474   A data locality optimizing algorithm (context) - Wolf, Lam - 1991
162   Improving data locality with loop transformations - McKinley, Carr et al. - 1996
157   Automatically tuned linear algebra software - Whaley, Dongarra - 1998
84   Compiler blockability of numerical algorithms - Carr, Kennedy - 1992
58   Cache miss equations: a compiler framework for analyzing and.. - Ghosh, Martonosi et al. - 1999
33   Exact analysis of the cache behavior of nested loops (context) - Chatterjee, Parker et al. - 2001
27   Characterizing the behavior of sparse algorithms on caches - Temam, Jalby - 1992
23   Optimizing the performance of sparse matrix-vector multiplic.. (context) - Im - 2000
22   A scalable cross-platform infrastructure for application per.. - Browne, Dongarra et al. - 2000
15   Optimizing sparse matrix computations for register reuse in .. - Im, Yelick - 2073
13   A Relational Approach to the Automatic Generation of Sequent.. (context) - Stodghill - 1997
13   CPU Performance Evaluation and Execution Time Prediction Usi.. (context) - Saavedra-Barrera - 1992
12   Automatic nonzero structure analysis (context) - Bik, Wijsho - 1999
10   Performance optimizations and bounds for sparse matrix-vecto.. - Vuduc, Demmel et al. - 2002
10   Memory hierarchy performance prediction for sparse blocked a.. (context) - Fraguela, Doallo et al. - 1999
9   Modeling and improving locality for irregular problems: spar.. - Heras, Perez et al. - 1999
8   STREAM: Measuring sustainable memory bandwidth in high perfo.. (context) - McCalpin - 1995
7   Towards realistic bounds for implicit CFD codes (context) - Gropp, Kasushik et al. - 1999
6   Modeling application performance by convolving machine signa.. - Snavely, Carrington et al. - 2001
6   Document for the Basic Linear Algebra Subprograms (context) - Blackford, Corliss et al. - 2001
5   Fracture mechanics on the Intel Itanium architecture: A case.. (context) - Heber, Dolgert et al. - 2001
4   Automatic performance tuning of sparse matrix kernels - Vuduc - 2003
4   On reducing TLB misses in matrix multiplication - Goto, Geijn - 2002
3   cient code for sparse matrix computations (context) - Pugh, Shpeisman - 1998

Documents on the same site (http://www.cs.berkeley.edu/~rajeshn/):   More
Firehose: An Algorithm for Distributed - On   (Correct)
When Cache Blocking of Sparse Matrix Vector Multiply.. - Nishtala, Vuduc..   (Correct)
UPC Implementation of the Sparse Triangular Solve and NAS FT - Bell, Nishtala (2004)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC