MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Deliverable HwA5b: Multilevel Blocking and Prefetching for Linear Algebra Computations

Download:
Download as a PDF | Download as a PS
by Responsibles (upc) Elena Garcia, Josep L. Larriba-pey, Jose R. Herrero, Toni Juan, Juan J. Navarro, Tomas Lang
ftp://ftp.wi.leidenuniv.nl/pub/APPARC/DELIVERABLES/HwA5b.ps.gz
Add To MetaCart

Abstract:

In a previous work [Nava94] it was shown that the performance of linear algebra computations, which access large amounts of data, is dependent on the behavior of the memory hierarchy. This research is aimed to use the multilevel orthogonal blocking approach in conjuntion with other software techniques to further improve the performance of linear algebra computations. This work has been divided into two parts. In Part I the blocking techniques are applied to improve Sparse matrix computations that appear in many linear algebra kernels of scientific applications. The combination of several software techniques (loop unrolling, software pipelining) together with blocking to the sparse matrix by dense matrix multiplication introduces a very large search space. In Part II the performance of the dense matrix by matrix multiplication executed on a superscalar high performance workstation is improved using binding and nonbinding prefetching to hide the memory latency together with the well known technique of blocking.

Citations

487 The cache performance and optimizations of blocked algorithms – LAM, ROTHBERG, et al. - 1991
455 Design and evaluation of a compiler algorithm for prefetching – Mowry, Lam, et al. - 1992
264 Tolerating Latency Through SoftwareControlled Prefetching in Shared-Memory Multiprocessors – Mowry, Gupta - 1991
105 Optimal loop parallelization – Aiken, Nicolau - 1988
83 Complexity/performance tradeoffs with non-blocking loads – Farkas, Jouppi - 1994
77 Parallel algorithms for dense linear algebra computations – Gallivan, Plemmons, et al. - 1990
65 Code generation schema for modulo scheduled loops – Rau, Schlansker, et al. - 1992
49 Implementing linear algebra algorithms for dense matrices on a vector pipeline machine – Dongarra, Gustafson, et al. - 1984
45 Improving the Performance of Virtual Memory Computers – Abu-Sufah - 1979
27 MOB forms: a class of multilevel block algorithms for dense linear algebra operations – Navarro, Juan, et al. - 1994
20 Improving performance of linear algebra algorithms for dense matrices, using algorithmic prefetch. IBM – Agarwal, Gustavson, et al. - 1994
17 Unrolling loops – Dongarra, Hinds - 1979
13 Implementing sparse BLAS primitives on concurrent/vector processors: A case study – Wijshoff - 1989
12 Software Pipelining: An Effective Technique for VLIW Machines – Lam - 1988
12 Sparse Matrix – Pissanetzky - 1984
10 Sparse matrix multiplication on vector computers – Erhel - 1990
10 Characterizing the Behaviour of Sparse Algorithms on Caches – Temam, Jalby - 1992
7 et al., The Design of the DEC 3000 – Dutton - 1992
3 et al., Deliverable HwA4: Memory Organization and Management for Linear Algebra Computations. Deliverable of the APPARC Basic Research Action – Navarro - 1994
2 Improving register allocation for subscribed variables – Callahan, Coke, et al. - 1990