by Ann Gordon-ross, Frank Vahid
http://www.cs.ucr.edu/~vahid/pubs/glsvlsi05_reorder.pdf
Add To MetaCart
Abstract:
The instruction cache is a popular target for optimizations of microprocessor-based systems because of the cache’s high impact on system performance and power, and because of the cache’s predictable temporal and spatial locality. Optimization techniques can be designed based on this predictability. We explore for the first time the interplay of two popular instruction cache optimization techniques: the long-known technique of code reordering and the relatively-new technique of cache configuration. We address the question of whether those two optimizations complement each other or if one optimization dominates the other. Through experiments using embedded system benchmarks, we show that cache configuration dominates a particular category of code reordering techniques with respect to optimizing performance and energy, obviating the need for reordering. We also examine the modern scenario of synthesized custom caches, and show that combining cache configuration with code reordering results in cache size reductions of 13 % on average, and up to 89 % in some benchmarks, beyond just cache configuration alone.
Citations
|
594
|
MediaBench: A tool for evaluating and synthesizing multimedia and communication systems
– Lee, Potkonjak, et al.
- 1997
|
|
247
|
Profile guided code positioning
– Pettis, Hansen
- 1990
|
|
152
|
Achieving high instruction cache performance with an optimizing compiler
– CHANG
- 1989
|
|
147
|
A practical system for intermodule code optimization at link-time
– Srivastava, Wall
- 1992
|
|
141
|
Program optimization for instruction caches
– MCFARLING
- 1989
|
|
108
|
Cache Design Trade-offs for Power and Performance Optimization: A Case Study
– Su, Despain
- 1995
|
|
90
|
Reducing branch costs via branch alignment
– Calder, Grunwald
- 1994
|
|
86
|
Instrumentation and optimization of Win32/Intel executables using etch
– Romer, Voelker, et al.
- 1999
|
|
60
|
A Low Power Unified Cache Architecture Providing Power and Performance Flexibility
– Malik, Moyer, et al.
- 2000
|
|
54
|
Procedure Placement Using Temporal Ordering Information
– GLOY, BLACKWELL, et al.
- 1997
|
|
36
|
PLTO: A link-time optimizer for the Intel IA-32 architecture
– Debray, Schwarz, et al.
- 2001
|
|
26
|
Spike: An optimizer for Alpha/NT executables
– Cohn, Goodwin, et al.
- 1997
|
|
15
|
A self-tuning cache architecture for embedded systems
– Zhang, Vahid, et al.
- 2004
|
|
13
|
Selective cache ways: on demand cache resource allocation
– Albonesi
- 1999
|
|
13
|
A Highly-Configurable Cache Architecture for Embedded Systems
– Zhang, Vahid, et al.
- 2003
|
|
11
|
Code reorganization for instruction caches
– Samples
- 1988
|
|
8
|
Reducing startup latency in web and desktop applications
– Lee, Baer, et al.
- 1999
|
|
8
|
Cache Configuration Exploration on Prototyping Platforms
– Zhang, Vahid
- 2003
|
|
7
|
the Embedded Microprocessor Benchmark Consortium, www.eembc.org
– EEMBC
- 2005
|
|
7
|
Automatic tuning of two-level caches to embedded applications
– Gordon-Ross, Vahid, et al.
- 2004
|
|
6
|
The Swift Java Compiler: Design and Implementation
– Scales, Randall, et al.
- 2000
|
|
4
|
Design and analysis of profile-based optimization in Compaq’s compilation tools for Alpha
– Cohn, Lowney
- 2000
|
|
4
|
alto : A link-time optimizer for the Compaq Alpha. Software Practice and Experience
– Muth, Debray, et al.
- 2001
|
|
4
|
Efficient Dynamic Procedure Placement
– Scales
- 1998
|
|
3
|
Code Placement using Temporal Profile Information
– Gloy
- 1998
|
|
2
|
Cacti2.0: an integraded cache timing and power model
– Reinman, Jouppi
- 1999
|
|
1
|
Checking program profiles
– Moseley, Debray, et al.
- 2003
|