While there have been many recent proposals for hardware that supports Thread-Level Speculation (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential for parallelizing programs optimistically. In this paper, we focus on one important limitation of program performance under TLS, which is stalls due to forwarding scalar values between threads that would otherwise cause frequent data dependences. We present and evaluate dataflow algorithms for three increasingly-aggressive instruction scheduling techniques that reduce the critical forwarding path introduced by the synchronization associated with this data forwarding. In addition, we contrast our compiler techniques with related hardware-only approaches. With our most aggressive compiler and hardware techniques, we improve performance under TLS by 6.2--28.5 % for 6 of 14 applications, and by at least 2.7 % for half of the other applications.
|
560
|
Trace scheduling: A technique for global microcode compaction
– Fisher
- 1981
|
|
431
|
Multiscalar processors
– Sohi, Breach, et al.
- 1995
|
|
356
|
The MIPS R10000 superscalar microprocessor
– Yeager
- 1996
|
|
186
|
Efficient path profiling
– Ball, Larus
- 1996
|
|
150
|
Dynamic speculation and synchronization of data dependences
– Moshovos, Breach, et al.
- 1997
|
|
141
|
Speculative versioning cache
– Gopal, Vijaykumar, et al.
- 1998
|
|
137
|
A dynamic multithreading processor
– Akkary, Driscoll
- 1998
|
|
136
|
Lazy code motion
– Knoop, Rüthing, et al.
- 1992
|
|
120
|
The Multiscalar Architecture
– Franklin
- 1993
|
|
119
|
Data Speculation Support for a Chip Multiprocessor
– Hammond, Willey, et al.
- 1998
|
|
107
|
Speculative Multithreaded Processors
– Marcuello, Gonzálex, et al.
- 1998
|
|
101
|
A scalable approach to thread-level speculation
– Steffan, Colohan, et al.
- 2000
|
|
92
|
Dynamic Memory Disambiguation Using the Memory Conflict Buffer
– Gallagher, Chen, et al.
- 1994
|
|
88
|
DOACROSS: Beyond vectorization for multiprocessors
– Cytron
- 1986
|
|
80
|
Compiler algorithms for synchronization
– Midkiff, Padua
- 1987
|
|
67
|
Run-Time Disambiguation: Coping with Statically Unpredictable Dependencies
– Nicolau
- 1989
|
|
65
|
Improving data-flow analysis with path profiles
– Ammons, Larus
- 1998
|
|
58
|
A Scheme to Enforce Data Dependence on Large Multiprocessor Systems
– Zhu, Yew
- 1987
|
|
55
|
The superthreaded processor architecture
– Tsai, Huang, et al.
- 1999
|
|
49
|
High-speed multiprocessors and compilation techniques
– Padua, Kuck, et al.
- 1980
|
|
38
|
Compiling for the Multiscalar Architecture
– Vijaykumar
- 1998
|
|
36
|
Improving value communication for thread-level speculation
– Steffan, Colohan, et al.
- 2002
|
|
33
|
Architectural Support for Thread-Level Data Speculation
– Steffan, Colohan, et al.
- 1997
|
|
32
|
MAJC: Microprocessor architecture for java computing. HotChips ’99
– Tremblay
- 1999
|
|
31
|
Qualified data flow problems
– Holley, Rosen
- 1981
|
|
29
|
Value prediction for speculative multithreaded architectures
– Marcuello, Tubella, et al.
- 1999
|
|
24
|
Techniques for Speculative Run-Time Parallelization of Loops
– Gupta, Nim
- 1998
|
|
18
|
Power4: A Dual-CPU Processor Chip. Microprocessor Forum '99
– Kahle
- 1999
|
|
11
|
Three superblock scheduling models for superscalar and superpipelined processors
– Chang, Warter, et al.
- 1991
|
|
11
|
The need for fast communication in hardwarebased speculative chip multiprocessors
– Krishnan, Torrellas
- 1999
|
|
7
|
Statement re-ordering for DOACROSS loops
– Chen, Yew
- 1994
|
|
7
|
Ev8: The post-ultimate alpha.(keynote address
– EMER
- 2001
|
|
6
|
The Sibyte SB-1250 Processor. http://www.sibyte.com/mercurian
– CORPORATION
|
|
5
|
Learning cross-thread violations in speculative parallelization for multiprocessors
– Cintra, Torrellas
- 2002
|
|
3
|
Master/Slave Speculative Parallelization with Distilled Programs
– ZILLES, SOHI
- 2002
|
|
2
|
Languages and Compilers for Parallel Computing
– TJIANG, WOLF, et al.
- 1992
|
|
1
|
Compiler Optimizations to Accelerate Scalar Value Communication Between Speculative Threads
– ZHAI, COLOHAN, et al.
- 2002
|
|
1
|
AND HWU, W.W.Three Superblock Scheduling Models for Superscalar and Superpipelined Processors
– CHANG, WARTER, et al.
- 1991
|
|
1
|
The SPEC Benchmark Suite
– CORPORATION
|