DMCA
Exploring Speculative Parallelism in SPEC2006
Citations
1844 | The SimpleScalar Tool Set, Version 2.0
- Burger, Austin
(Show Context)
Citation Context ...hboring threads [21]. Simulation Infrastructure: We simulate a 4-core CMP using a trace-driven, cycleaccurate simulator, where each core is an out-of-order superscalar processor based on SimpleScalar =-=[22]-=-. The tracegeneration portion of this infrastructure is based on the PIN instrumentation tool [23], and the architectural simulation portion TABLE III COVERAGE OF LOOPS PARALLELIZED. Benchmark Coverag... |
991 | Pin: building customized program analysis tools with dynamic instrumentation
- Luk
- 2005
(Show Context)
Citation Context ...cleaccurate simulator, where each core is an out-of-order superscalar processor based on SimpleScalar [22]. The tracegeneration portion of this infrastructure is based on the PIN instrumentation tool =-=[23]-=-, and the architectural simulation portion TABLE III COVERAGE OF LOOPS PARALLELIZED. Benchmark Coverage (%) No. of loops I I + II I + II + III 1 I + II I + II + III milc 13 79 79 5 22 22 lbm 0 100 100... |
487 | The landscape of parallel computing research: A view from berkeley.
- Asanovic, Bodik, et al.
- 2006
(Show Context)
Citation Context ...rove performance of general-purpose applications. Potential for Thread-level parallelism 1(TLP) has been well understood in older benchmark suites like SPEC 2000 [18]. However with the current trend =-=[4]-=-towards more parallel applications, it is not clear if the newer general-purpose benchmarks in SPEC 2006 [19] exhibit more potential for Thread-Level Parallelism (TLP). Automatic compiler parallelizat... |
103 | The Super-threaded Processor Architecture
- Tsai, Amlo, et al.
- 1999
(Show Context)
Citation Context ...ional compilers; but with the help of TLS, the compiler can parallelize this loop speculatively and relying on the underlying hardware to detect and enforce inter-thread data dependences at run-time. =-=[5]-=-, [6], [7], [8], [9], [10] Though TLS has been extensively studied in the past, it is not clear how much TLS could benefit more recent benchmarks such as SPEC 2006 [11], which represent a different cl... |
90 | Compiler optimization of scalar value communication between speculative threads.
- Zhai, Colohan, et al.
- 2002
(Show Context)
Citation Context ...ilers; but with the help of TLS, the compiler can parallelize this loop speculatively and relying on the underlying hardware to detect and enforce inter-thread data dependences at run-time. [5], [6], =-=[7]-=-, [8], [9], [10] Though TLS has been extensively studied in the past, it is not clear how much TLS could benefit more recent benchmarks such as SPEC 2006 [11], which represent a different class of app... |
71 | The STAMPede approach to thread-level speculation.
- Steffan, Colohan, et al.
- 2005
(Show Context)
Citation Context ... compilers; but with the help of TLS, the compiler can parallelize this loop speculatively and relying on the underlying hardware to detect and enforce inter-thread data dependences at run-time. [5], =-=[6]-=-, [7], [8], [9], [10] Though TLS has been extensively studied in the past, it is not clear how much TLS could benefit more recent benchmarks such as SPEC 2006 [11], which represent a different class o... |
65 | Posh: a tls compiler that exploits program structure.
- LIU, TUCK, et al.
- 2006
(Show Context)
Citation Context ... the help of TLS, the compiler can parallelize this loop speculatively and relying on the underlying hardware to detect and enforce inter-thread data dependences at run-time. [5], [6], [7], [8], [9], =-=[10]-=- Though TLS has been extensively studied in the past, it is not clear how much TLS could benefit more recent benchmarks such as SPEC 2006 [11], which represent a different class of applications. Some ... |
46 | Using thread-level speculation to simplify manual parallelization
- PRABHU, OLUKOTUN
- 2003
(Show Context)
Citation Context ...rtain loops that the compiler will not normally automatically parallelize. Such manual analysis combined with simple loop transformations has been shown to create more parallelism in previous studies =-=[16, 5, 12]-=-. In this paper our aim is to present the TLS performance as an automatic parallelizing compiler without manual intervention. In [10] Kejariwal et al used their framework of analysis to study SPEC 200... |
45 | A cost-driven compilation framework for speculative parallelization of sequential programs.
- DU, LIM, et al.
- 2004
(Show Context)
Citation Context ... with the help of TLS, the compiler can parallelize this loop speculatively and relying on the underlying hardware to detect and enforce inter-thread data dependences at run-time. [5], [6], [7], [8], =-=[9]-=-, [10] Though TLS has been extensively studied in the past, it is not clear how much TLS could benefit more recent benchmarks such as SPEC 2006 [11], which represent a different class of applications.... |
37 | Limits on Speculative Module-Level Parallelism
- Warg, Stenström
- 2001
(Show Context)
Citation Context ...nted a study on the limits of TLS performance on some SPECint95 benchmarks. The impact of compiler optimizations and the TLS overhead were not taken into account in that study. Similarly, Warg et. al =-=[15]-=- presented a limit study for module-level parallelism in object-oriented programs. In contrast, in this study, our aim is to illustrate the realizable performance of TLS using a state-of-the-art TLS c... |
28 | Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads.
- Zhai, Colohan, et al.
- 2004
(Show Context)
Citation Context ...odel used can be found in [6]. Frequently occurring memory-based dependences and registerbased scalar dependences are synchronized by inserting special instructions as shown in Figure 2(b) similar to =-=[8]-=-, [7]. Compilation Infrastructure Our compiler infrastructure is built on Open64 3.0 Compiler [19], an industrial-strength open-source compiler targeting Intel’s Itanium Processor Family (IPF). To cre... |
20 |
Tasking with Out-Of-Order Spawn
- Renau, Tuck, et al.
- 2005
(Show Context)
Citation Context ...evious work focused on exploiting parallelism using TLS at a single loop nest level, relatively little has been done to exploit parallelism at multiple nested loop levels simultaneously. Renau et. al =-=[17]-=- proposed hardware-based techniques to determine how to allocate cores to threads that are extracted from different nesting levels; while our paper proposes compiler techniques that statically determi... |
19 | Loop Selection for Thread-Level Speculation”,
- Wang, Dai, et al.
- 2005
(Show Context)
Citation Context ...se, the compiler first estimates the parallel performance of each loop, then choose to parallelize a set of loops that maximize the overall program performance based on such estimation. Previous work =-=[20]-=- builds the performance estimation based on detailed data dependence profiling information, as shown in Figure 9. Thus, the achievable performance of speculative parallel threads is tied with the accu... |
13 | On the performance potential of different types of speculative thread-level parallelism.
- Kejariwal, Tian, et al.
- 2006
(Show Context)
Citation Context ...ograms. In contrast, in this study, our aim is to illustrate the realizable performance of TLS using a state-of-the-art TLS compiler, while taking into account various TLS overheads. Kejariwal et. al =-=[16]-=- separated the speedup achievable through traditional thread-level parallelism from that of TLS using the SPEC2000 benchmarks assuming an oracle TLS mechanism. They [12] later extended their study to ... |
11 |
Utilizing multidimensional loop parallelism on large scale parallel processor systems
- Polychronopoulos, Kuck, et al.
- 1989
(Show Context)
Citation Context ...s of loops is beyond the scope of our paper. Compiler-based scheduling schemes for nested loops have been studied in the past to support nested DOALL and DOACROSS loops. We extend the OPTAL algorithm =-=[25]-=- that was originally designed for core allocation for nested DOALL and DOACROSS loops to allocate cores for TLS loops at compile time. A. Speculative OPTAL algorithm The OPTAL algorithm uses a dynamic... |
10 |
Ev8: The post-ultimate alpha.(keynote address
- EMER
- 2001
(Show Context)
Citation Context ...ks using 8 cores when we extend TLS on multiple loop levels as opposed to restricting to a single loop level. I. INTRODUCTION With the advent of multi-threaded (e.g. simultaneous multithreading (SMT) =-=[1]-=-, hyper-threading [2]) and/or multi-core (e.g. chip multiprocessors (CMP) [3], [4]) architectures, now the challenge is to utilize these architectures to improve performance of general-purpose applica... |
9 |
Intel’s Dual-Core Processor for Desktop PCs. http://www.intel.com/personal/desktopcomputer/dual core
- CORPORATION
- 2005
(Show Context)
Citation Context ...tricting to a single loop level. I. INTRODUCTION With the advent of multi-threaded (e.g. simultaneous multithreading (SMT) [1], hyper-threading [2]) and/or multi-core (e.g. chip multiprocessors (CMP) =-=[3]-=-, [4]) architectures, now the challenge is to utilize these architectures to improve performance of general-purpose applications. Automatic compiler parallelization techniques have been developed and ... |
9 |
Tight analysis of the performance potential of thread speculation using SPEC CPU2006
- Kejariwal, Tian, et al.
- 2007
(Show Context)
Citation Context ... been extensively studied in the past, it is not clear how much TLS could benefit more recent benchmarks such as SPEC 2006 [11], which represent a different class of applications. Some recent studies =-=[12]-=- on SPEC 2006 benchmarks have shown very limited potential for TLS (less than 1%) under very conservative assumptions. In this paper, we re-examine some of these issues and give a more realistic asses... |
6 |
Compiler Techniques for Thread-Level Speculation
- WANG
- 2007
(Show Context)
Citation Context ...tial for TLS than those studies. Kejariwal et. al [12] did not take into account the effect of compiler optimizations that could improve the performance of TLS, while previous studies [10], [7], [8], =-=[13]-=- have shown that compiler-based loop selection and optimizations, such as code scheduling, can significantly improve the efficiency of TLS. Furthermore, Kejariwal et. al [12] only considered innermost... |
5 |
Leading the Industry
- CORPORATION
- 2005
(Show Context)
Citation Context ... TLS only on a single loop level. 1 Introduction With the advent of multi-threaded (e.g. simultaneous multi-threading (SMT) [7], hyper-threading [9])and/or multi-core (e.g. chip multiprocessors (CMP) =-=[8, 3]-=-) architectures, the challenge now is to utilize these architectures to improve performance of general-purpose applications. Potential for Thread-level parallelism 1(TLP) has been well understood in ... |
3 | Exploiting Speculative Thread-Level Parallelism in Data Compression Applications
- Wang, Zhai, et al.
- 2006
(Show Context)
Citation Context ...in probabilistic information through data dependence profile. In this section, we conduct detailed analysis on interthread memory-based dependence using profiling information.by such synchronization =-=[18]-=-. Fig. 6. The coverage of loops with inter-thread memory-based data dependences less than a certain probability. We classify benchmarks based on the combined coverage of loops with different number of... |
3 |
Compiler Optimizations for Parallelizing General-Purpose Applications under Thread-Level Speculation
- Zhai, Wang, et al.
- 2008
(Show Context)
Citation Context ...warding path introduced by the synchronization [7], [13]; (iii) computation and usage of reduction and reduction-like variables are transformed to avoid speculation failure and reduce synchronization =-=[21]-=-; and (iv) consecutive loop iterations are merged to balance the workload between neighboring threads [21]. Simulation Infrastructure: We simulate a 4-core CMP using a trace-driven, cycleaccurate simu... |