This paper presents dynamic feedback, a technique that enables computations to adapt dynamically to different execution environments. A compiler that uses dynamic feedback produces several different versions of the same source code; each version uses a different optimization policy. The generated code alternately performs sampling phases and production phases. Each sampling phase measures the overhead of each version in the current environment. Each production phase uses the version with the least overhead in the previous sampling phase. The computation periodically resamples to adjust dynamically to changes in the environment. We have implemented dynamic feedback in the context of a parallelizing compiler for object-based programs. The generated code uses dynamic feedback to automatically choose the best synchronization optimization policy. Our experimental results show that the synchronization optimization policy has a significant impact on the overall performance of the computation, that the best policy varies from program to program, that the compiler is unable to statically choose the best policy, and that dynamic feedback enables the generated code to exhibit performance that is comparable to that of code that has been manually tuned to use the best policy. We have also performed a theoretical analysis which provides, under certain assumptions, a guaranteed optimality bound for dynamic feedback relative to a hypothetical (and unrealizable) optimal algorithm that uses the best policy at every point during the execution. 1
|
705
|
SPLASH: Stanford Parallel Applications for Shared Memory
– Singh, Weber, et al.
- 1992
|
|
247
|
Profile guided code positioning
– Pettis, Hansen
- 1990
|
|
241
|
Global optimizations for parallelism and locality on scalable parallel machines
– Anderson, Lain
- 1993
|
|
217
|
gprof: A call graph execution profiler
– Graham, Kessler, et al.
- 1982
|
|
198
|
Lazy task creation: a technique for increasing the granularity of parallel programs
– Mohr, Kranz, et al.
- 1990
|
|
192
|
Customization: Optimizing compiler technology for Self, a dynamically-typed objectoriented programming language
– Chambers, Ungar
- 1989
|
|
176
|
Global register allocation at link time
– Wall
- 1986
|
|
157
|
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization
– Rauchwerger, Padua
- 1995
|
|
151
|
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers
– Gupta, Banerjee
- 1992
|
|
151
|
Optimizing ML with run-time code generation
– Leone, Lee
- 1995
|
|
139
|
Optimizing Dynamically-Dispatched Calls With RunTime Type Feedback
– Hölzle, Ungar
- 1994
|
|
118
|
VCODE: A retargetable, extensible, very fast dynamic code generation system
– Engler
- 1996
|
|
114
|
Adaptive cache coherency for detecting migratory shared data
– Cox, Fowler
- 1993
|
|
112
|
Effective Dynamic Compilation
– Fast
- 1996
|
|
110
|
Profile-guided automatic inline expansion for C programs
– Chang, Mahlke, et al.
- 1992
|
|
108
|
Beyond the black box: Open implementation
– Kiczales
- 1996
|
|
105
|
An empirical study of FORTRAN programs
– Knuth
- 1971
|
|
89
|
ApplicationSpecific Protocols for User-Level Shared Memory
– Falsafi, Lebeck, et al.
- 1994
|
|
85
|
Simple and Effective Link-time Optimization of Modula-3 Programs
– Fernandez
- 1995
|
|
81
|
Profile-Guided Receiver Class Prediction
– Grove, Dean, et al.
- 1995
|
|
80
|
A hierarchical O(NlogN) forcecalculation algorithm
– Barnes, Hut
- 1986
|
|
75
|
Automatic Data Layout for High Performance Fortran
– Kremer
- 1995
|
|
56
|
Improving the performance of runtime parallelization
– Leung, Zahorjan
- 1993
|
|
51
|
High-level optimization via automated statistical modeling
– Brewer
- 1995
|
|
49
|
Obtaining Sequential Efficiency for Concurrent Object-Oriented Languages
– Plevyak, Zhang, et al.
- 1995
|
|
46
|
Commutativity analysis: A new analysis framework for parallelizing compilers
– Rinard, Diniz
- 1996
|
|
42
|
The Design and Analysis of DASH: A Scalable Directory-Based Multiprocessor
– Lenoski
- 1992
|
|
38
|
CCG: A prototype coagulating code generator
– Morris
- 1991
|
|
38
|
Dynamic page mapping policies for cache conflict resolution on standard hardware
– Romer, Lee, et al.
- 1994
|
|
33
|
A hybrid execution model for fine-grained languages on distributed memory multicomputers
– Plevyak, Karamcheti, et al.
- 1995
|
|
30
|
Multiprocessors and runtime compilation
– Saltz, Berryman, et al.
- 1990
|
|
26
|
Synchronization transformations for paral-lel computing
– Diniz, Rinard
- 1997
|
|
23
|
Tomographic string inver-sion
– Harris, Lazaratos, et al.
|
|
22
|
Lazy Threads: Implementing a fast parallel call
– Goldstein, Schauser, et al.
- 1996
|
|
22
|
Heterogeneous Parallel Programming in Jade
– Rinard, Scales, et al.
- 1992
|
|
22
|
Improving the Effectiveness of Software Prefetching With Adaptive Execution
– Saavedra, Park
- 1996
|
|
21
|
Experience with the SETL optimizer
– Freudenberger, Schwartz, et al.
- 1983
|
|
13
|
Profile-Assisted Instruction Scheduling
– CHEN, MAHLKE, et al.
- 1994
|
|
5
|
Communication optimization and code generation for distributed-memory machines
– Amarasinghe, Lam
- 1993
|