(Enter summary)
Abstract: SpMV), or y = y +A
x, which is an important and ubiquitous computational kernel. Prior work indicates
that cache blocking of SpMV is extremely important for some matrix and machine combinations,
with speedups as high as 3x. In this paper we present a new, more compact data structure for cache
blocking for SpMV and look at the general question of when and why performance improves. Cache
blocking appears to be most e#ective when simultaneously 1) the vector x does not fit in cache 2) the... (Update)
Cited by: More
When Cache Blocking of Sparse Matrix Vector Multiply.. - Nishtala, Vuduc..
(Correct)
Similar documents (at the sentence level):
8.0%: Performance Modeling and Analysis of Cache Blocking.. - Nishtala, Vuduc.. (2004)
(Correct)
5.5%: Performance Optimizations and Bounds for Sparse.. - Vuduc, Demmel.. (2002)
(Correct)
Active bibliography (related documents): More All
0.9: Automatic Performance Tuning and Analysis of Sparse .. - Vuduc, Kamil, Hsu, .. (2002)
(Correct)
0.9: Memory Hierarchy Optimizations and Performance Bounds .. - Vuduc, Gyulassy.. (2003)
(Correct)
0.5: Memory Hierarchy Optimizations and Performance Bounds.. - Vuduc, Gyulassy.. (2003)
(Correct)
System load high. Please wait...
Timeout. Please try your query later.
Similar documents based on text:
4.0: Unknown -
(Correct)
BibTeX entry: (Update)
R. Nishtala, R. W. Vuduc, J. W. Demmel, and K. A. Yelick. Performance modeling and analysis of cache blocking in sparse matrix vector multiply. Technical report, University of California, Berkeley, EECS Dept., 2004. (to appear). http://citeseer.ist.psu.edu/nishtala04performance.html More
@misc{ nishtala04performance,
author = "R. Nishtala and R. Vuduc and J. Demmel and K. Yelick",
title = "Performance modeling and analysis of cache blocking in sparse matrix vector
multiply",
text = "R. Nishtala, R. W. Vuduc, J. W. Demmel, and K. A. Yelick. Performance modeling
and analysis of cache blocking in sparse matrix vector multiply. Technical
report, University of California, Berkeley, EECS Dept., 2004. (to appear).",
year = "2004",
url = "citeseer.ist.psu.edu/nishtala04performance.html" }
Citations (may not include all citations):
474
A data locality optimizing algorithm (context) - Wolf, Lam - 1991
162
Improving data locality with loop transformations
- McKinley, Carr et al. - 1996
157
Automatically tuned linear algebra software
- Whaley, Dongarra - 1998
84
Compiler blockability of numerical algorithms
- Carr, Kennedy - 1992
58
Cache miss equations: a compiler framework for analyzing and..
- Ghosh, Martonosi et al. - 1999
33
Exact analysis of the cache behavior of nested loops (context) - Chatterjee, Parker et al. - 2001
27
Characterizing the behavior of sparse algorithms on caches
- Temam, Jalby - 1992
23
Optimizing the performance of sparse matrix-vector multiplic.. (context) - Im - 2000
22
A scalable cross-platform infrastructure for application per..
- Browne, Dongarra et al. - 2000
15
Optimizing sparse matrix computations for register reuse in ..
- Im, Yelick - 2073
13
A Relational Approach to the Automatic Generation of Sequent.. (context) - Stodghill - 1997
13
CPU Performance Evaluation and Execution Time Prediction Usi.. (context) - Saavedra-Barrera - 1992
12
Automatic nonzero structure analysis (context) - Bik, Wijsho - 1999
10
Performance optimizations and bounds for sparse matrix-vecto..
- Vuduc, Demmel et al. - 2002
10
Memory hierarchy performance prediction for sparse blocked a.. (context) - Fraguela, Doallo et al. - 1999
9
Modeling and improving locality for irregular problems: spar..
- Heras, Perez et al. - 1999
8
STREAM: Measuring sustainable memory bandwidth in high perfo.. (context) - McCalpin - 1995
7
Towards realistic bounds for implicit CFD codes (context) - Gropp, Kasushik et al. - 1999
6
Modeling application performance by convolving machine signa..
- Snavely, Carrington et al. - 2001
6
Document for the Basic Linear Algebra Subprograms (context) - Blackford, Corliss et al. - 2001
5
Fracture mechanics on the Intel Itanium architecture: A case.. (context) - Heber, Dolgert et al. - 2001
4
Automatic performance tuning of sparse matrix kernels
- Vuduc - 2003
4
On reducing TLB misses in matrix multiplication
- Goto, Geijn - 2002
3
cient code for sparse matrix computations (context) - Pugh, Shpeisman - 1998
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC