Download:
by Nathan Clark, Manjunath Kudlur, Hyunchul Park, Scott Mahlke, Krisztián Flautner
In Proceedings of the International Symposium on Microarchitecture
http://cccp.eecs.umich.edu/papers/ntclark-micro04.pdf
Add To MetaCart
Abstract:
Application-specific instruction set extensions are an effective way of improving the performance of processors. Critical computation subgraphs can be accelerated by collapsing them into new instructions that are executed on specialized function units. Collapsing the subgraphs simultaneously reduces the length of computation as well as the number of intermediate results stored in the register file. The main problem with this approach is that a new processor must be generated for each application domain. While new instructions can be designed automatically, there is a substantial amount of engineering cost incurred to verify and to implement the final custom processor. In this work, we propose a strategy to transparent customization of the core computation capabilities of the processor without changing its instruction set. A configurable array of function units is added to the baseline processor that enables the acceleration of a wide range of dataflow subgraphs. To exploit the array, the microarchitecture performs subgraph identification at run-time, replacing them with new microcode instructions to configure and utilize the array. We compare the effectiveness of replacing subgraphs in the fill unit of a trace cache versus using a translation table during decode, and evaluate the tradeoffs between static and dynamic identification of subgraphs for instruction set customization. 1.
Citations
|
284
|
Garp: A MIPS Processor with a Reconfigurable Coprocessor
– Hauser, Wawrzynek
- 1997
|
|
279
|
Dynamo: a transparent dynamic optimization system
– Bala, Duesterwald, et al.
- 2000
|
|
267
|
SimpleScalar: an infrastructure for computer system modeling
– Austin, Larson, et al.
- 2002
|
|
165
|
A high-performance microarchitecture with hardware-programmable functional units
– Razdan, Smith
- 1994
|
|
157
|
Processor reconfiguration through instruction-set metamorphosis
– Athanas, Silverman
- 1993
|
|
86
|
The Performance Potential of Data Dependence Speculation and Collapsing
– Sazeides, Vassiliadis, et al.
- 1996
|
|
76
|
Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache Microprocessors
– Friendly, Patel, et al.
- 1998
|
|
65
|
Automatic application-specific instruction-set extensions under microarchitectural constraints
– Atasu, Pozzi, et al.
- 2003
|
|
46
|
rePLay: A Hardware Framework for Dynamic Optimization
– Patel, Lumetta
- 2001
|
|
43
|
Instruction pre-processing in trace processors
– Jacobson, Smith
- 1999
|
|
37
|
Processor acceleration through automated instruction set customization
– Clark, Zhong, et al.
- 2003
|
|
37
|
A Quantitative Analysis of Reconfigurable Coprocessors for Multimedia Applications
– Miyamori, Olukotun
- 1998
|
|
30
|
DISE: A programmable macro engine for customizing applications
– Corliss, Lewis, et al.
- 2003
|
|
25
|
Automatic generation of application specific processors
– Goodwin, Petkov
- 2003
|
|
24
|
Synthesis of application specific instruction sets
– Huang, Despain
- 1995
|
|
24
|
High-performance 3-1 interlock collapsing ALU’s
– Phillips, Vassiliadis
- 1994
|
|
20
|
DISC: The dynamic instruction set computer
– Wirthlin, Hutchings
- 1995
|
|
18
|
The effect of reconfigurable units in superscalar processors
– Esparza, Chow
- 2001
|
|
18
|
Dynamic binary translation and optimization
– Ebcio˘glu, Altman, et al.
- 2001
|
|
17
|
From sequences of dependent instructions to functions: A complexity-effective approach for improving performance without ILP or speculation
– Yehia, Temam
- 2004
|
|
15
|
et al. The multiflow trace scheduling compiler
– Lowney
- 1993
|
|
15
|
Characterizing Embedded Applications for Instruction-Set Extensible Processors
– Yu, Mitra
- 2004
|
|
14
|
et al. The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Real-life Challenges
– Dehnert
- 2003
|
|
8
|
et al. Performance characterization of a hardware mechanism for dynamic optimization
– Fahs
- 2001
|
|
7
|
et al. Synthesis of custom processors based on extensible platforms
– Sun
- 2002
|
|
7
|
et al. Chimaera: a high-performance architecture with a tightly-coupled reconfigurable functional unit
– Ye
- 2000
|
|
6
|
Characterization of repeating dynamic code fragments
– Spadini, Fertig, et al.
- 2002
|
|
5
|
et al. Instruction generation and regularity extraction for reconfigurable processors
– Brisk
- 2002
|
|
5
|
et al. Automatic instruction set extension and utilization for embedded processors
– Peymandoust
- 2003
|
|
4
|
Piperench implementation of the instruction path coprocessor
– Chou, Pillai, et al.
- 2000
|
|
4
|
A high performance 32-bit alu for programmable logic
– Metzgen
- 2004
|
|
1
|
et al. Morphosys: A reconfigurable processor trageted to high performance image application
– Lu
- 1999
|