(Enter summary)
Abstract: Sparse matrix-vector multiplication is an important kernel that often runs inefficiently on
superscalar RISC processors. This paper describes techniques that increase instruction-level
parallelism and improve performance. The techniques include reordering to reduce cache misses
originally due to Das et al., blocking to reduce load instructions, and prefetching to prevent
multiple load-store units from stalling simultaneously. The techniques improve performance
from about 40 Mflops (on a... (Update)
Cited by: More
Compact Data Structures with Fast Queries - Blandford (2005)
(Correct)
Parallel Multigrid Algorithms for Unstructured 3D Large.. - Adams (1999)
(Correct)
Heuristics for the Automatic Construction of Coarse Grids in - Multigrid Solvers For
(Correct)
Similar documents (at the sentence level):
70.7%: Improving Memory-System Performance of Sparse Matrix-Vector.. - Sivan Toledo (1997)
(Correct)
Active bibliography (related documents): More All
0.2: Improving Performance of Sparse Matrix-Vector Multiplication - Pinar, Heath (1999)
(Correct)
0.1: Optimizing Sparse Matrix-Vector Product Computations.. - Mellor-Crummey, Garvin (2003)
(Correct)
0.1: Self-Avoiding Walks Over Adaptive Unstructured Grids - Heber, BISWAS, GAO (1999)
(Correct)
Similar documents based on text: More All
0.2: A Power Efficient Embedded High Performance Computer for.. - Puschak, al.
(Correct)
0.2: The Design and Implementation of SOLAR, a Portable Library.. - Toledo, Gustavson (1999)
(Correct)
0.2: Performance Of The Shallow Water Equations On The Suprenum-1.. - McBryan
(Correct)
Related documents from co-citation: More All
8: way partitioning scheme for irregular graphs (context) - Karypis, Kumar et al. - 1996
6: Characterizing the Behavior of Sparse Algorithms on Caches
- Temam, Jalby - 1992
6: Optimizing sparse matrix-vector multiplication on SMPs
- Im, Yelick - 1999
BibTeX entry: (Update)
Sivan Toledo. Improving memory-system performance of sparse matrix-vector multiplication. In Proceedings of the 8th SIAM Conference on Parallel Processing for Scientific Computing, March 1997. http://citeseer.ist.psu.edu/toledo97improving.html More
@article{ toledo97improving,
author = "S. Toledo",
title = "Improving the memory-system performance of sparse-matrix vector multiplication",
journal = "IBM Journal of Research and Development",
volume = "41",
number = "6",
pages = "711--725",
year = "1997",
url = "citeseer.ist.psu.edu/toledo97improving.html" }
Citations (may not include all citations):
107
Software Methods for Improvement of Cache Performance on Sup.. (context) - Porterfield - 1989
79
The effect of ordering on preconditioned conjugate gradient (context) - Duff, Meurant - 1989
70
The design and implementation of a parallel unstructured Eul..
- Das, Mavriplis et al. - 1994
61
Argonne National Laboratory (context) - Balay, Gropp et al. - 1996
27
Characterizing the behavior of sparse algorithms on caches
- Temam, Jalby - 1992 ACM DBLP
19
Improving performance of linear algebra algorithms for dense.. (context) - Agarwal, Gustavson et al. - 1994
11
A high performance algorithm using pre-processing for sparse.. (context) - Agarwal, Gustavson et al. - 1992
10
Renumbering unstructured grids to improve the performance of..
- Burgess, Giles - 1995 ACM
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.math.tau.ac.il/~sivan/pubs.html): More
Performance Prediction with Benchmaps - Toledo
(Correct)
The Design and Implementation of SOLAR, a Portable Library for .. - Sivan Toledo (1996)
(Correct)
PERFSIM: A Tool for Automatic Performance Analysis of.. - Sivan Toledo (1995)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC