by Huiyang Zhou, Huiyang Zhou, Thomas M. Conte, Thomas M. Conte
Processors, in "The 6th Annual Workshop on Interaction between Compilers and Computer Architectures (INTERACT-6) held in
http://www.tinker.ncsu.edu/techreports/code_size.pdf
Add To MetaCart
Abstract:
In embedded computing, code size is very important for system cost and performance. In global scheduling for VLIW/EPIC style embedded processors, region-enlarging optimizations, especially tail duplication, are commonly used to exploit instruction level parallelism (ILP) to boost the performance. The code size increase due to such optimizations, however, raises serious concerns about the affected I-cache, branch and TLB performance. In this paper, we focus on the code size efficiency of code size related optimizations in global scheduling. First, we propose to use the ratio of static IPC (instruction per cycle) changes to code size changes as a quantitative measure of the code size efficiency at compile time for any code size related optimization. Then, based on the code size efficiency of tail duplication, we propose the solutions to the two related problems: (1) how to achieve the best performance for a given code size increase, (2) how to get the optimal code size efficiency for any program. Our study shows that code size increase resulting from tail duplication has a significant but varying impact on IPC, e.g., the first 2 % code size increase results in 18.5 % increase in static IPC, while the static IPC changes less than 1 % when given code size increase ranging from 20 % to 30%. We then use this feature to define the optimal code size efficiency and to derive a simple, yet robust
Citations
|
264
|
Effective compiler support for predicated execution using the hyperblock
– Mahlke, Lin, et al.
- 1992
|
|
194
|
IMPACT: An architectural framework for multiple-instruction-issue processors
– Chang, Mahlke, et al.
- 1991
|
|
43
|
HPL-PD architecture specification: Version 1.1
– Kathail, Schlansker, et al.
- 2000
|
|
31
|
Treegion scheduling for wide-issue processors
– Havanki, Banerjia, et al.
- 1998
|
|
30
|
The Effect of Code Expanding Optimizations on Instruction Cache Design
– Chen, Chang, et al.
- 1991
|
|
25
|
Instruction Fetch Mechanisms for VLIW Architectures with Compressed Encodings
– Conte, Banerjia, et al.
- 1996
|
|
23
|
Code Optimization Technique for Embedded Processors: Methods, Algorithms
– Leupers
- 2000
|
|
18
|
EPIC: An Architecture for Instruction-Level Parallel Procesors
– Schlansker, Rau
- 2000
|
|
18
|
Code Duplication: An Assist for Global Instruction Scheduling
– Berstein, Cohen, et al.
- 1991
|
|
15
|
Elcor’s Machine Description System: Version 3.0
– Aditya, Kathail, et al.
- 1998
|
|
12
|
Balance scheduling: Weighting branch tradeoffs in superblocks
– Eichenberger, Meleis
- 1999
|
|
10
|
Dynamic Branch Prediction for a VLIW Processor
– Hoogerbrugge
- 1997
|
|
9
|
Tree Traversal Scheduling: A Global Scheduling Technique for VLIW/EPIC
– Zhou, Jennings, et al.
- 2001
|
|
9
|
The Superblock: An effective way for VLIW and superblock compilation.” The
– Hwu, Mahlke, et al.
- 1993
|
|
5
|
Avoiding Conditional Branches via Code Replication
– Mueller, Whalley
- 1995
|
|
5
|
Performance Bounds for Rapid Computer System Evaluation”, in Fast Simulation of Computer Architectures, edited by
– Mangione-Smith
- 1995
|
|
5
|
Iterative Module Scheduling
– Rau
- 1995
|
|
4
|
A Treegion-based Unified Approach to Speculation and Predication in Global Instruction Scheduling
– Jennings, Zhou, et al.
- 2001
|