by Chi-keung Luk, Todd C. Mowry
In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems
http://www.cs.umd.edu/class/spring1999/cmsc732/papers/mowry-asplos96.ps
Add To MetaCart
Abstract:
Software-controlled data prefetching offers the potential for bridging the ever-increasing speed gap between the memory subsystem and today's high-performance processors. While prefetching has enjoyed considerable success in array-based numeric codes, its potential in pointer-based applications has remained largely unexplored. This paper investigates compilerbased prefetching for pointer-based applications---in particular, those containing recursive data structures. We identify the fundamental problem in prefetching pointer-based data structures and propose a guideline for devising successful prefetching schemes. Based on this guideline, we design three prefetching schemes, we automate the most widely applicable scheme (greedy prefetching) in an optimizing research compiler, and we evaluate the performance of all three schemes on a modern superscalar processor similar to the MIPS R10000. Our results demonstrate that compiler-inserted prefetching can significantly improve the execution speed of pointer-based codes---as much as 45 % for the applications we study. In addition, the more sophisticated algorithms (which we currently perform by hand, but which might be implemented in future compilers) can improve performance by as much as twofold. Compared with the only other compiler-based pointer prefetching scheme in the literature, our algorithms offer substantially better performance by avoiding unnecessary overhead and hiding more latency. 1
Citations
|
676
|
A data locality optimizing algorithm
– Wolf, Lam
- 1991
|
|
455
|
Design and evaluation of a compiler algorithm for prefetching
– Mowry, Lam, et al.
- 1992
|
|
329
|
Context-sensitive interprocedural points-to analysis in the presence of function pointers
– Enami, Ghiya, et al.
- 1994
|
|
264
|
Tolerating Latency Through SoftwareControlled Prefetching in Shared-Memory Multiprocessors
– Mowry, Gupta
- 1991
|
|
254
|
APRIL: a processor architecture for multiprocessing
– Agarwal, Lim, et al.
- 1990
|
|
240
|
Software prefetching
– Callahan, Kennedy, et al.
- 1991
|
|
199
|
An effective on-chip preloading scheme to reduce data access penalty
– Baer, Chen
- 1991
|
|
188
|
Compiler optimizations for improving data locality
– Carr, McKinley, et al.
- 1994
|
|
152
|
Is it a Tree, a DAG, or a Cyclic Graph? A shape analysis for heap-directed pointers in C
– Ghiya, Hendren
- 1996
|
|
137
|
Tracing with Pixie
– Smith
- 1991
|
|
133
|
Supporting Dynamic Data Structures on Distributed Memory Machines
– Rogers, Carlisle, et al.
- 1995
|
|
114
|
Interprocedural modification side effect analysis with pointer aliasing
– Landi, Ryder, et al.
- 1993
|
|
71
|
Sharlit - a tool for building optimizers
– Tjiang, Hennessy
- 1992
|
|
68
|
Interleaving: A multithreading technique targeting multiprocessors and workstations
– Laudon, Gupta, et al.
- 1994
|
|
67
|
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching
– Chen, Mahlke, et al.
- 1991
|
|
62
|
A storeless model of aliasing and its abstractions using firrite representations of right-regular equiwdence relations
– Deutsch
- 1992
|
|
61
|
A general data dependence test for dynamic, pointer-based data structures
– Hummel, Hendren, et al.
- 1994
|
|
59
|
Spaid:software prefetching in pointer and call intensive environments
– Lipasti, Schmidt, et al.
- 1995
|
|
37
|
MASA: a multithreaded processor architecture for parallel symbolic computing
– Fujita
- 1988
|
|
17
|
Speeding up Irregular Applications in Shared-Memory Multiprocessors: Memory Binding and Group Prefetching
– Zhang, Torrellas
- 1995
|
|
4
|
Fast fits
– Stephenson
- 1983
|