Download:
|
by Responsibles (upc) Elena Garcia, Josep L. Larriba-pey, Jose R. Herrero, Toni Juan, Juan J. Navarro, Tomas Lang
ftp://ftp.wi.leidenuniv.nl/pub/APPARC/DELIVERABLES/HwA5b.ps.gz
Add To MetaCart
Abstract:
In a previous work [Nava94] it was shown that the performance of linear algebra computations, which access large amounts of data, is dependent on the behavior of the memory hierarchy. This research is aimed to use the multilevel orthogonal blocking approach in conjuntion with other software techniques to further improve the performance of linear algebra computations. This work has been divided into two parts. In Part I the blocking techniques are applied to improve Sparse matrix computations that appear in many linear algebra kernels of scientific applications. The combination of several software techniques (loop unrolling, software pipelining) together with blocking to the sparse matrix by dense matrix multiplication introduces a very large search space. In Part II the performance of the dense matrix by matrix multiplication executed on a superscalar high performance workstation is improved using binding and nonbinding prefetching to hide the memory latency together with the well known technique of blocking.
Citations
|
487
|
The cache performance and optimizations of blocked algorithms
– LAM, ROTHBERG, et al.
- 1991
|
|
455
|
Design and evaluation of a compiler algorithm for prefetching
– Mowry, Lam, et al.
- 1992
|
|
264
|
Tolerating Latency Through SoftwareControlled Prefetching in Shared-Memory Multiprocessors
– Mowry, Gupta
- 1991
|
|
105
|
Optimal loop parallelization
– Aiken, Nicolau
- 1988
|
|
83
|
Complexity/performance tradeoffs with non-blocking loads
– Farkas, Jouppi
- 1994
|
|
77
|
Parallel algorithms for dense linear algebra computations
– Gallivan, Plemmons, et al.
- 1990
|
|
65
|
Code generation schema for modulo scheduled loops
– Rau, Schlansker, et al.
- 1992
|
|
49
|
Implementing linear algebra algorithms for dense matrices on a vector pipeline machine
– Dongarra, Gustafson, et al.
- 1984
|
|
45
|
Improving the Performance of Virtual Memory Computers
– Abu-Sufah
- 1979
|
|
27
|
MOB forms: a class of multilevel block algorithms for dense linear algebra operations
– Navarro, Juan, et al.
- 1994
|
|
20
|
Improving performance of linear algebra algorithms for dense matrices, using algorithmic prefetch. IBM
– Agarwal, Gustavson, et al.
- 1994
|
|
17
|
Unrolling loops
– Dongarra, Hinds
- 1979
|
|
13
|
Implementing sparse BLAS primitives on concurrent/vector processors: A case study
– Wijshoff
- 1989
|
|
12
|
Software Pipelining: An Effective Technique for VLIW Machines
– Lam
- 1988
|
|
12
|
Sparse Matrix
– Pissanetzky
- 1984
|
|
10
|
Sparse matrix multiplication on vector computers
– Erhel
- 1990
|
|
10
|
Characterizing the Behaviour of Sparse Algorithms on Caches
– Temam, Jalby
- 1992
|
|
7
|
et al., The Design of the DEC 3000
– Dutton
- 1992
|
|
3
|
et al., Deliverable HwA4: Memory Organization and Management for Linear Algebra Computations. Deliverable of the APPARC Basic Research Action
– Navarro
- 1994
|
|
2
|
Improving register allocation for subscribed variables
– Callahan, Coke, et al.
- 1990
|