Download:
|
by Huiyang Zhou, Thomas M. Conte
in Proceedings of the 6th Annual Workshop on the Interaction between Compilers and Computer Architectures (INTERACT-6) held in conjunction with the 8th International Symposium on High Performance Computer Architecture (HPCA-8
http://www.tinker.ncsu.edu/symposia/interact02.ps
Add To MetaCart
Abstract:
In global scheduling for ILP processors, regionenlarging optimizations, especially tail duplication, are commonly used. The code size increase due to such optimizations, however, raises serious concerns about the affected I-cache and TLB performance. In this paper, we propose a quantitative measure of the code size efficiency at compile time for any code size related optimization. Then, based on the efficiency of tail duplication, we propose the solutions to two related problems: (1) how to achieve the best performance for a given code size increase, (2) how to get the optimal code size efficiency for any program. Our study shows that code size increase has a significant but varying impact on IPC, e.g., the first 2 % code size increase results in 18.5 % increase in static IPC, but less than 1 % when the given code size further increases from 20 % to 30%. We then use this feature to define the optimal code size efficiency and to derive a simple, yet robust threshold scheme finding it. The experimental results using SPECint95 benchmarks show that this threshold scheme finds the optimal efficiency accurately. While the optimal efficiency results show an average increase of 2 % in code size, the improved I-cache performance is observed and a speedup of 17 % over the natural treegion results is achieved.
Citations
|
194
|
IMPACT: An architectural framework for multiple-instruction-issue processors
– Chang, Mahlke, et al.
- 1991
|
|
43
|
HPL-PD architecture specification: Version 1.1
– Kathail, Schlansker, et al.
- 2000
|
|
31
|
Treegion scheduling for wide-issue processors
– Havanki, Banerjia, et al.
- 1998
|
|
23
|
Code Optimization Technique for Embedded Processors: Methods, Algorithms
– Leupers
- 2000
|
|
18
|
EPIC: An Architecture for Instruction-Level Parallel Procesors
– Schlansker, Rau
- 2000
|
|
18
|
Code Duplication: An Assist for Global Instruction Scheduling
– Berstein, Cohen, et al.
- 1991
|
|
15
|
Elcor’s Machine Description System: Version 3.0
– Aditya, Kathail, et al.
- 1998
|
|
12
|
Balance scheduling: Weighting branch tradeoffs in superblocks
– Eichenberger, Meleis
- 1999
|
|
10
|
Dynamic Branch Prediction for a VLIW Processor
– Hoogerbrugge
- 1997
|
|
9
|
Tree Traversal Scheduling: A Global Scheduling Technique for VLIW/EPIC
– Zhou, Jennings, et al.
- 2001
|
|
9
|
The Superblock: An effective way for VLIW and superblock compilation.” The
– Hwu, Mahlke, et al.
- 1993
|
|
7
|
Sathaye, "Instruction fetch mechanisms for VLIW architectures with compressed encodings
– Conte, Banerjia, et al.
- 1996
|
|
5
|
Hwu, "The Effect of Code Expanding Optimizations on Instruction Cache Design
– Chen, Chang, et al.
- 1993
|
|
5
|
Avoiding Conditional Branches via Code Replication
– Mueller, Whalley
- 1995
|
|
5
|
Performance Bounds for Rapid Computer System Evaluation”, in Fast Simulation of Computer Architectures, edited by
– Mangione-Smith
- 1995
|
|
5
|
Iterative Module Scheduling
– Rau
- 1995
|
|
4
|
A Treegion-based Unified Approach to Speculation and Predication in Global Instruction Scheduling
– Jennings, Zhou, et al.
- 2001
|
|
2
|
Bringmann "Effective compiler support for predicated execution using the Hyperblock
– Mahlke, Lin, et al.
- 1992
|
|
2
|
Code Size Efficiency in Global Scheduling
– Zhou, Conte
- 2002
|