Caching-efficient multithreaded fast multiplication of sparse matrices (1998) [2 citations — 0 self]
Abstract:
Several fast sequential algorithms have been proposed in the past to multiply sparse matrices. These algorithms do not explicitly address the impact of caching on performance. We show that a rather simple sequential cache–efficient algorithm provides significantly better performance than existing algorithms for sparse matrix multiplication. We then describe a multithreaded implementation of this simple algorithm and show that its performance scales well with the number of threads and CPUs. For 10 % sparse, 500 X 500 matrices, the multithreaded version running on 4–CPU systems provides more than a 41.1–fold speed increase over the well–known BLAS routine and a 14.6 fold and 44.6–fold speed increase over two other recent techniques for fast sparse matrix multiplication, both of which are relatively difficult to parallelize efficiently.
Citations
| 138 | Cache profiling and the SPEC benchmarks: A case study – LEBECK, WOOD - 1994 |
| 99 | The design and implementation of a parallel unstructured Euler solver using software primitives – Das, Mavriplis, et al. - 1994 |
| 44 | Improving memory-system performance of sparse matrix-vector multiplication – Toledo - 1997 |
| 23 | A set of Level 3 – Dongarra, DuCroz, et al. - 1990 |
| 12 | Two fast algorithms for sparse matrices: Multiplication and permuted transposition – Gustavson - 1978 |
| 2 | Fast sparse matrix multiplication – Park, Draayer, et al. - 1992 |
| 1 | Algorithms for Large Symmetric – Lanczos - 1985 |
| 1 | Data Structures for Compact Sparse Matrices Representation – Felice, Agnifili, et al. - 1989 |

