Many important applications, such as those using sparse data structures, have memory reference patterns that are unknown at compile-time. Prior work has developed runtime reorderings of data and computation that enhance locality in such applications. This paper presents a compile-time framework that allows the explicit composition of run-time data and iterationreordering transformations. Our framework builds on the iteration-reordering framework of Kelly and Pugh to represent the effects of a given composition. To motivate our extension, we show that new compositions of run-time reordering transformations can result in better performance on three benchmarks. We show how to express a number of run-time data and iteration-reordering transformations that focus on improving data locality. We also describe the space of possible run-time reordering transformations and how existing transformations fit within it. Since sparse tiling techniques are included in our framework, they become more generally applicable, both to a larger class of applications, and in their composition with other reordering transformations. Finally, within the presented framework data need be remapped only once at runtime for a given composition thus exhibiting one example of overhead reductions the framework can express.
|
784
|
Plans and Situated Actions: the problem of human-machine communication
– Suchman
- 1987
|
|
191
|
Compiler optimizations for improving data locality
– Carr, McKinley, et al.
- 1994
|
|
143
|
Reducing the bandwidth of sparse symmetric matrices
– CUTHILL, McKEE
- 1969
|
|
140
|
Communication optimizations for irregular scientific computations on distributed memory architectures
– Das, Uysal, et al.
- 1994
|
|
119
|
A singular loop transformation framework based on non-singular matrices
– Li, Pingali
- 1994
|
|
105
|
Combining Loop Transformations Considering Caches and Scheduling
– Wolf, Maydan, et al.
- 1996
|
|
100
|
Implementation of a parallel unstructured Euler solver on shared and distributed memory machines, The Journal of Supercomputing 8
– MAVRIPLIS, DAS, et al.
- 1995
|
|
88
|
DyC: An Expressive Annotation-Directed Dynamic Compiler for C
– Grant, Mock, et al.
- 1997
|
|
81
|
Improving memory hierarchy performance for irregular applications
– Mellor-Crummey, Whalley, et al.
- 2001
|
|
68
|
K.: Improving cache performance in dynamic applications through data and computation reorganization at run time
– Ding, Kennedy
- 1999
|
|
59
|
Constraint-based array dependence analysis
– Pugh, Wonnacott
- 1998
|
|
57
|
Toward an Open Shared Workspace: Computer and Video Fusion Approach of Team Workstation
– Ishii, Miyake
- 1991
|
|
56
|
Videowhiteboard: video shadows to support remote collaboration
– Tang, Minneman
- 1991
|
|
53
|
Videodraw: a video interface for collaborative drawing
– Tang, Minneman
- 1991
|
|
50
|
Load balancing and data locality in adaptive hierarchical Nbody methods: Barnes-Hut, fast multipole, and radiosity
– Singh, Holt, et al.
- 1995
|
|
46
|
Improving locality using loop and data transformations in an integrated framework
– Kandemir, Choudhary, et al.
- 1998
|
|
45
|
Cache optimization for structured and unstructured grid multigrid”, Elect
– Douglas, Hu, et al.
|
|
43
|
Localizing non-affine array references
– Mitchell, Carter, et al.
- 1999
|
|
40
|
Synthesizing transformations for locality enhancement of imperfectly-nested loop nests
– Ahmed, Mateev, et al.
- 2001
|
|
38
|
A General Framework for Iteration-Reordering Loop Transformations
– Sarkar, Thekkath
- 1992
|
|
36
|
Run-Time Methods for Parallelizing Partially Parallel Loops
– Rauchwerger, Amato, et al.
- 1995
|
|
35
|
Memory hierarchy management for iterative graph structures
– Al-Furaih, Ranka
- 1998
|
|
27
|
A comparison of locality transformations for irregular codes
– Han, Tseng
- 2000
|
|
27
|
Finding legal reordering transformations using mappings
– Kelly, Pugh
- 1994
|
|
24
|
The digitaldesk calculator: Tactile manipulation on a desk top display
– Wellner
- 1991
|
|
23
|
Inter-array data regrouping
– Ding, Kennedy
- 1999
|
|
23
|
Optimizing sparse matrix computations for register reuse in SPARSITY
– Im, Yelick
- 2001
|
|
23
|
Iteration Space Slicing For Locality
– Pugh, Rosser
- 1999
|
|
23
|
A classifying invariant of knots, the knot quandle
– Joyce
- 1982
|
|
21
|
A unified framework for systematic loop transformations
– Lu
- 1991
|
|
20
|
Listing, drawing, and gesturing in design: A study of the use of shared workspaces by design teams
– Tang
- 1989
|
|
19
|
A unifying framework for iteration reordering transformations
– Kelly, Pugh
- 1995
|
|
18
|
Racks and links in codimension two
– Fenn, Rourke
- 1992
|
|
15
|
Virtual knot theory
– Kauffman
- 1999
|
|
11
|
Hybrid analysis: static & dynamic memory reference analysis
– Rus, Rauchwerger, et al.
- 2002
|
|
11
|
A Unified Framework for Schedule and Storage Optimization, M.Eng
– Thies
- 2002
|
|
10
|
Rescheduling for locality in sparse matrix computations
– Strout, Carter, et al.
- 2001
|
|
10
|
Combining performance aspects of irregular Gauss–Seidel via sparse tiling
– Strout, Carter, et al.
- 2002
|
|
9
|
A modal model of memory
– Mitchell, Carter, et al.
|
|
9
|
Free differential calculus. I. Derivation in the free group ring
– Fox
- 1953
|
|
8
|
Distributive groupoids in knot theory
– Matveev
- 1984
|
|
7
|
Kinds of seeing and their functions in designing
– Schon, Wiggins
- 1992
|
|
6
|
Drawing and CAD
– Tovey
- 1989
|
|
6
|
Adaptive Thresholding for the DigitalDesk EuroPARC
– Wellner
- 1993
|
|
6
|
Automorphic sets and singularities
– Brieskorn
- 1988
|
|
6
|
Quandle cohomology and state-sum invariants of knotted curves and surfaces
– Carter, Jelsovsky, et al.
- 2003
|
|
6
|
Abstract link diagrams and virtual knots
– Kamada, Kamada
- 1999
|
|
5
|
Twisted Alexander polynomial for finitely presentable groups, Topology 33
– Wada
- 1994
|
|
4
|
Automatic parallelization of irregular applications
– Gutiérrez, Asenjo, et al.
- 2000
|
|
4
|
Parallel reductions: An application of adaptive algorithm selection
– Yu, Dang, et al.
- 2002
|