by Eric Rotenberg, Quinn Jacobson, Jim Smith
http://www.ece.wisc.edu/~jes/papers/sfc.ps
Add To MetaCart
Abstract:
Exploiting control independence has been put forward as a significant source of ILP in future generation processors. Potentially, control independence can be exploited either with machines using multiple flows of control (multiple fetch units), similar to conventional multiprocessors, or through enhancements to conventional uniprocessors that support a single flow of control. In this paper, methods using a single flow of control are studied. First, a series of ideal machine models are used to study gross performance differences between perfect limit-study-like models and more restricted ones. As might be expected, there is a very large drop in performance, but there remains a typical performance improvement of between 50 and 100 percent over a baseline superscalar processor that does not exploit control independence. Then important implementation issues are discussed and some design alternatives are given. This discussion focuses on key areas where complex design issues must be tackled if control independence is to be successfully exploited. This is followed by a more detailed set of simulations, where the key implementation features are realistically modeled. These simulations show typical performance improvements of 20 to 30 percent over the baseline superscalar processor. 1.
Citations
|
432
|
Multi-scalar processors
– Sohi, Breach, et al.
- 1995
|
|
351
|
Evaluating Future Microprocessors: the SimpleScalar Tool Set
– Burger, Austin, et al.
- 1996
|
|
221
|
The predictability of data values
– Sazeides, Smith
- 1997
|
|
205
|
Limits of control flow on parallelism
– Lam, Wilson
- 1992
|
|
188
|
An efficient method of computing static single assignment form
– CYTRON, FERRANTE, et al.
- 1989
|
|
177
|
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
– Steffan, Mowry
- 1998
|
|
156
|
Trace processors
– Rotenberg, Jacobson, et al.
- 1997
|
|
155
|
Dynamic Instruction Reuse
– Sodani, Sohi
- 1997
|
|
141
|
Speculative execution based on value prediction
– Gabbay, Mendelson
- 1996
|
|
134
|
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References
– Franklin, Sohi
- 1996
|
|
120
|
The Multiscalar Architecture
– Franklin
- 1993
|
|
115
|
Global instruction scheduling for superscalar machines
– Bernstein, Rodeh
- 1991
|
|
108
|
The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation
– Tsai, Yew
- 1996
|
|
79
|
Improving Superscalar Instruction Dispatch and Issue by Exploiting Dynamic Code Sequences
– Vajapeyam, Mitra
- 1997
|
|
78
|
Single-Program Speculative Multithreading (SPSM) Architecture: Compiler-Assisted Fine-Grained Multithreading
– Dubey, O’Brien, et al.
- 1995
|
|
53
|
Value Locality and Speculative Execution
– Lipasti
- 1997
|
|
48
|
Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor
– Oplinger, Heine, et al.
- 1997
|
|
40
|
Superspeculative microarchitecture for beyond ad 2000
– Lipasti, Shen
- 1997
|
|
35
|
Trace Processors: Moving to Fourth-Generation Microarchitectures
– Smith, Vajapayem
- 1997
|
|
31
|
One billion transistors, one uniprocessor, one chip
– Patt, Patel, et al.
- 1997
|
|
7
|
Multiscalar execution along a single flow of control. ICPP’97
– Sundararaman, Franklin
- 1997
|