Download:
|
by Hwansoo Han, Chau-wen Tseng
http://www.cs.umd.edu/projects/cosmic/papers/pldi01-submit.ps
Add To MetaCart
Abstract:
Irregular scientific codes experience poor cache performance due to their memory access patterns. We examine several data and computation locality transformations including GPART, a new technique based on hierarchical clustering. GPART constructs quality partitions quickly by clustering multiple neighboring nodes in a few passes, with priority on nodes with high degree. Overhead is kept low by considering only edges between partitions. We develop compiler analyses and transformations in SUIF to automatically apply locality transformations, and propose user annotations to locate coordinate information needed by geometric partitioning algorithms. We experimentally evaluate locality optimizations for both static and adaptive codes, where connection patterns dynamically change at intervals during program execution. We derive a simple cost model to guide locality optimizations when access patterns change. Experiments on several irregular scientific codes show locality optimization techniques are effective improving performance by 30--50 % on average. In particular, GPART closely matches the performance of more sophisticated partitioning algorithms, with one third of the overhead.
Citations
|
196
|
A partitioning strategy for nonuniform problems on multiprocessors
– Berger, Bokhari
|
|
138
|
Reducing the Bandwidth of Sparse Symmetric Matrices
– Cuthill, McKee
- 1969
|
|
137
|
Communication optimizations for irregular scientific computations on distributed memory architectures
– Das, Uysal, et al.
- 1994
|
|
99
|
The design and implementation of a parallel unstructured Euler solver using software primitives
– Das, Mavriplis, et al.
- 1994
|
|
79
|
Improving Memory Hierarchy Performance For Irregular Applications
– Mellor-Crummy, Whalley, et al.
- 1999
|
|
75
|
Multilevel k-way hypergraph partitioning
– Karypis, Kumar
- 1999
|
|
71
|
Dynamic remapping of parallel computations with varying resource demands
– Nicol, Saltz
- 1988
|
|
62
|
Give-N-Take: A balanced code placement framework
– Hanxleden, Kennedy
- 1994
|
|
54
|
et al. “SUIF: An Infrastructure for Research on Parallelizing and Optimizing
– Wilson
- 1994
|
|
43
|
Runtime and language support for compiling adaptive irregular programs. Software-Practice and Experience
– Hwang, Moon, et al.
- 1995
|
|
41
|
Localizing non-affine array references
– Mitchell, Carter, et al.
- 1999
|
|
35
|
Improving cache performance of dynamic applications with computation and data layout transformations
– Ding, Kennedy
- 1999
|
|
34
|
High Performance Fortran for highly irregular problems
– Hu, Johnsson, et al.
- 1997
|
|
33
|
Memory hierarchy management for iterative graph structures
– Al-Furaih, Ranka
- 1998
|
|
29
|
Improving compiler and runtime support for adaptive irregular codes
– Han, Tseng
- 1998
|
|
25
|
Adaptive reduction parallelization techniques
– Yu, Rauchwerger
- 2000
|
|
12
|
Improving fine-grained irregular shared-memory benchmarks by data reordering
– Hu, Cox, et al.
- 2000
|
|
10
|
Feedback guided dynamic loop scheduling: Algorithms and experiments
– Bull
- 1998
|