70 citations found. Retrieving documents...
J.-Y. Tsai and P.-C. Yew. The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation. In Proceedings of the International Conference on Parallel Architecture and Compiler Techniques, pages 35--46, October 1996.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Master/S;ave Speculative Parallelization - Sohi (2002)   (Correct)

....programs is facilitated by speculating in the presence of ambiguous dependences and providing hardware support for the detection of and recovery from actions that violate the ordering dictated by a sequential execution. Many proposals for speculatively parallelizing programs have been published [1, 6, 8, 12, 20, 23]. Speculation allows the execution to ignore potential dependences that do not occur in practice, but it does little to alleviate true dependences. True inter task data dependences can sequentialize task execution, negating much of the performance potential of speculative parallelism. In fact, ....

J.-Y. Tsai and P.-C. Yew. The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation. PACT, Oct. 1996.


A Study of Control Independence with a Single Flow of Control - Rotenberg, Jacobson, Smith   (Correct)

....that separates the impact of control independence and determines its contribution to performance in the multiscalar paradigm. It has also been proposed that more conventional multiprocessors can be modified to pursue multiple flows of control, and in the process, exploit control independence [14, 15,11, 12]. This approach is similar to the multiscalar approach, except it has a logical program counter per thread. The threads are compiler generated either as traditional parallel threads or as speculative threads. Trace processors [16,7] are in some sense a variant of multiscalar processors where ....

J.-Y. Tsai and P.-C. Yew. The superthreaded architecture: Thread pipelining with run-time data dependence checking and control speculation. Proc. International Conference on Parallel Architecture and Compilation Techniques, 1996.


Indolent Closure Creation - Strumpen   (Correct)

....This manuscript is an extended version of a talk entitled Indolent Closure Creation , which I gave at the Yale Multithreaded Programming Workshop in New Haven, June 8 9, 1998. Implementing multithreading in an efficient manner has lead to a variety of proposed multithreaded architectures [1, 6, 16, 19]. Efficient software implementations have been proposed for commodity architectures that introduce a compromise between stack and heap allocation techniques. For example, Lazy Threads [8] are based on stacklets, and the Illinois Concert system [9] employs a hybrid stack heap execution mechanism. ....

J.-Y. Tsai and P.-C. Yew. The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation. In International Conference on Parallel Architectures and Compilation Techniques, October 1996. 19


A General Compiler Framework for Speculative Multithreading - Bhowmik, Franklin (2002)   (1 citation)  (Correct)

....used SUIF and MACHSUIF compiler platforms to develop our compiler. Using our compiler framework, we have been able to compile a wide range of non numeric applications, including programs from the SPEC 2000 and Olden benchmark suites. Our work di ers from earlier works on SpMT compilation [13] [18] [19] primarily in 4 ways: i) Most of the earlier work [13] 18] primarily targets loop level parallelism only, whereas our compiler targets other kinds of parallelism also. ii) Our SpMT model is more general than the one used in earlier compiler work, and supports spawning of threads from ....

....Using our compiler framework, we have been able to compile a wide range of non numeric applications, including programs from the SPEC 2000 and Olden benchmark suites. Our work di ers from earlier works on SpMT compilation [13] 18] 19] primarily in 4 ways: i) Most of the earlier work [13] [18] primarily targets loop level parallelism only, whereas our compiler targets other kinds of parallelism also. ii) Our SpMT model is more general than the one used in earlier compiler work, and supports spawning of threads from anywhere in a thread; in [19] a thread can be spawned only from the ....

[Article contains additional citation context not shown here]

J-Y. Tsai and P-C. Yew. The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation. Proc. Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT), 1996.


Limits on Speculative Module-level Parallelism in Imperative.. - Warg, Stenström (2001)   (5 citations)  (Correct)

....was done within the Multiscalar project [15] One of the novel features of this architecture is the addressresolution buffer [4] that validates and signals violations to data dependences between threads. Another noticeable speculative architecture proposal is the superthreaded architecture [17]. Several distributed approaches to implement support for thread level speculation have also been presented in the framework of chip multiprocessors [5, 6, 16] This study is based on the feasible inclusion of such a mechanism in chip multiprocessors. Another important prerequisite for this study ....

J.-Y. Tsai and P.-C. Yew. The superthreaded architecture: Thread pipelining with run-time data dependence checking and control speculation. In Proceedings of the


Dynamically Adapting to System Load and Program Behavior in.. - Kazi, Lilja   (Correct)

....model is applicable to a wide variety of loop constructs that cannot be parallelized using traditional parallelization techniques. The speculative multithreading parallelization model is based on the fine grained thread pipelining model proposed for the superthreaded processor architecture [16, 17, 18]. The superthreaded architecture exploits task level parallelism using multiple threads of control. Each thread runs on a separate thread processing unit, each with its own program counter and instruction execution data path. The execution of a program starts from its entry thread which can then ....

J.Y. Tsai and P.C. Yew, The Superthreaded Architecture: Thread Pipelining with Run-time Data Dependence Checking and Control Speculation, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques (PACT '96), Oct. 1996, pp. 35-46.


Branch Prediction in Multi-Threaded Processors - Gummaraju, Franklin   (5 citations)  (Correct)

....indicates that these techniques, especially a hybrid of extrapolation and correlation, can substantially lower the branch misprediction ratios. 1 Introduction There has been a growing interest in the use of multithreading to speed up the execution of a single program [1] 2] 6] 9] 11] [12]. The compiler or the hardware extracts threads from a sequential program, and the hardware executes multiple threads in parallel, most likely with the help of multiple processing elements (PEs) Whereas a single threaded processor can only extract parallelism from a group of adjacent instructions ....

....Intra Thread Control Flow and Thread Execution Style The exact nature of threads and their execution style have a strong bearing on branch history and branch prediction. For instance, if threads are initiated speculatively (as in the multiscalar processor [1] 11] and the superthreaded processor [12]) then some of the active threads may get squashed because of incorrect threadlevel control speculation. When using a shared branch predictor, if the updates are done at branch prediction time, then thread level misprediction requires setting back some of the speculative updates selectively. When ....

J-Y. Tsai and P-C. Yew, "The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation," Proc. Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT '96), 1996.


The Increment Predictor for Speculative Multithreaded.. - Marcuello, Tubella.. (1999)   (Correct)

.... flow as control quasiindependent points.This type of execution model was first proposed in the Expandable Split Window paradigm [3] and the Multiscalar microarchitecture [11] Other more recent microarchitecture proposals that are also based on this generic execution model are: Superthreaded [12], Trace Processors [9] Speculative Multithreaded [7] 8] and the Dynamic Multithreaded [1] The performance of speculative multithreaded architectures is very dependent on the approach to dealing with inter thread dependences. A simple approach is to force a serialization between the producer ....

....works proposing multithreaded architectures which provide support for speculative threads have recently appeared. Pioneer work on this topic was the Expandable Split Window paradigm [3] and the follow up work on Multiscalar processors [11] Other proposals are the Superthreaded architecture [12], the Trace Processor [9] the Clustered Speculative Multithreaded architecture [7] 8] and the Dynamic Multithreaded processor [1] among others. The Multiscalar and the Superthreaded architectures do not have any mechanism for data value speculation and data dependences are always enforced by ....

J.Y. Tsai and P-C. Yew, The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation, in Proc. of the Int. Conf. on parallel Architectures and Compilation Techniques, 1996.


Increasing Effective IPC by Exploiting Distant.. - Martel, Ortega.. (1999)   (2 citations)  (Correct)

....the multiscalar [18] and trace [16] processors. In order to further increase the number of instructions from which to exploit parallelism, multithreaded architectures [27, 23] have been proposed. Threads coming from the same application are usually found in parallel loops detected by the compiler [22]. Hardware mechanisms have been proposed to detect dependence violations when loops, whose data dependence patterns cannot be decided at compile time, are executed speculatively as parallel loops [19] Other proposals try to dynamically detect these loops and extract their semantic information at ....

J.Y. Tsai and P.C. Yew. The superthreaded architecture: Thread pipelining with run-time data dependence checking and control speculation. International Conference on Parallel Architectures and Compilation Techniques, October 96.


A Comprehensive Dynamic Processor Allocation Scheme for.. - Kazi, Lilja (2000)   (Correct)

....that supports native threads. The dynamic processor allocation scheme presented in this paper is general enough to be applied to programs parallelized with other standard techniques, however. JavaSpMT is based on the fine grained thread pipelining model proposed for the superthreaded architecture [11, 12, 13]. The superthreaded architecture exploits task level parallelism using multiple threads of control. Each thread runs on a separate thread processing unit, each with its own program counter and 3 Fork and Forward target store addr Fork i 1th thread Forward continuation variable ....

....through explicit thread management and communication instructions. The execution of a thread is partitioned into four different stages continuation, target store address generation, computation, and write back. JavaSpMT extends the basic thread pipelining model of the superthreaded processor [11, 12, 13] to speculatively parallelize coarse grained Java applications on a shared memory multiprocessor system. The speculative execution allows loops with indeterminate termination conditions (e.g. do while loops) or complex branching structures (e.g. nested if then else) to be parallelized. Unlike the ....

J.Y. Tsai and P.C. Yew, The Superthreaded Architecture: Thread Pipelining with Run-time Data Dependence Checking and Control Speculation, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques (PACT '96), Oct. 1996, pp. 35-46.


Simultaneous Static Threading for VLIW/EPIC Processors - Özer   (Correct)

....processing unit is assigned a task, which is a contiguous region of the dynamic instruction sequence. Tasks are created statically by partitioning the control flow of the program. Loads are performed speculatively. In SST, threads share the functional units and Icache. Superthreaded Architecture [10] has a set of thread units that contains functional units, a communication unit and a memory buffer. The thread unit has a dynamic scheduling core. Each unit can fetch and execute instructions simultaneously. Load speculation is not allowed in the Superthreaded architecture. The compiler generates ....

J.-Y. Tsai and P.-C. Yew, "The Superthreaded Architecture: Thread Pipelining with Run-time Data Dependence Checking and Control Speculation," in Proc. Int'l Conf. Parallel Architecture and Compilation Techniques, (Boston, MA), Oct. 1996.


Exploring Microprocessor Architectures for Gigascale Integration - Codrescu (1999)   (Correct)

....candidates being considered and how their performance, area, and operation frequency are estimated. Section 5 presents the results and Section 6 offers conclusions. 2. Related Work There are considerable published research evaluating architectures in future technologies [16] 17] 19] 20] 21][22][23] This work often focuses on the performance of a specific architecture, without fully exploring the interaction between varying architectural configurations and the capabilities and limitations of a technology. It provides an in depth understanding of an architectural approach as a data ....

J.-Y. Tsai and P.-C. Yew, "The Superthreaded Architecture: Thread Pipelining with RunTime Data Dependence Checking and Control Speculation", in PACT, October 1996


Architecture of the Atlas Chip-Multiprocessor.. - Codrescu, Wills, Meindl (1999)   (9 citations)  (Correct)

....the program into threads and schedule inter thread register communication. Hardware is responsible for thread control predictions, speculative buffering, memory disambiguation, synchronizing register communication, and misspeculation recovery. The architectures presented in [9] 10] 16] 25] 34][35][40] also perform speculative multithreading. None of these architectures speculate on the values that flow between threads, and all require source code recompilation. 4 The most relevant research to this work is the trace processor [29] DMT processor [3] SM processor [18] which attempt to ....

J.-Y. Tsai and P.-C. Yew, "The Superthreaded Architecture: Thread Pipelining with RunTime Data Dependence Checking and Control Speculation", in PACT, October 1996


On Dynamic Speculative Thread Partitioning and the.. - Codrescu, Wills (1999)   (11 citations)  (Correct)

....difficult to handle statically. Hardware speculation mechanisms have been proposed to relax some of these restrictions. The Multiscalar [15] architecture introduced thread level control speculation. Numerous other designs, such as Hydra [13] Stampede [16] Iacoma [8] SPSM[6] and Superthreaded[17] also advocate speculative multithreading. The goal of these works is to broaden the class of applications that can be automatically parallelized at the thread level. All of these architectures perform thread partitioning in software and require source code recompilation. As demonstrated by the ....

J.-Y. Tsai and P.-C. Yew, "The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation", in PACT, October 1996


Profiling for Input Predictable Threads - Codrescu, al. (1998)   (Correct)

....work introduced and popularized this idea. The Multiscalar processor favors a hardware centric approach and synchronizes register flow between tasks. The XIMD [27] M Machine [6] Simultaneous Mutithreading [23] SPSM [5] Hydra [17] Stampede [21] Raw [25] Impact [10] and Superthreading [22] architectures all propose a single chip concurrent multithreaded architecture. These architectures either require independent threads, or speculate on control dependencies and or the existence of data dependencies between threads. None of these architectures speculate on the values that flow ....

J.-Y. Tsai and P.-C. Yew, "The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation", in PACT, October 1996


Architecture of the Atlas Chip-Multiprocessor: Dynamically.. - Lucian Codrescu And (1999)   (9 citations)  (Correct)

....into threads and schedule inter thread register communication. Hardware is responsible for thread control predictions, speculative buffering, memory disambiguation, synchronizing register communication, and misspeculation recovery. Hydra [18] Stampede [23] Iacoma [10] SPSM[7] Superthreaded[24], and Multithreaded Decoupled[6] also perform speculative multithreading. None of these architectures speculate on the values that flow between threads, and all require source code recompilation. The most relevant research to this work is the trace processor [20] DMT processor [2] and SM ....

J.-Y. Tsai and P.-C. Yew, "The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation", in PACT, October 1996


Process Prefetching for a Simultaneous Multithreaded.. - Goncalves, Sagula.. (1999)   (Correct)

....queues and program counters for each thread. Also, Wallace, Calder e Tullsen ( WAL 98] designed a SMT architecture with several register renaming tables to support the execution of both threads and paths simultaneously. There are two approaches for multithreaded execution: concurrent ( TSA 96] and simultaneous ( TUL 95] The former allows interleaving of execution of different applications like the one made by operating systems, so just one process runs in the pipeline at a time. That technique hides high latency operations but doesn t maximize the utilization of functional units, ....

Tsai, J.-Y. & Yew; P.-C.: The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation Proceedings of the Conference on Parallel Architectures and Compilation Techniques - PACT96, October, 1996.


A Low-Overhead Software Approach to Thread-Level Data Dependence .. - Rundberg (2000)   (Correct)

....that it is much smaller. We have used the same loops as they do on a system with similar timing parameters as them and noticed better speedups than they do [16] Kazi and Lilja [7] present a quite di erent speculation system which is inspired by the superthreaded pipelined execution model [15]. The execution of a thread is divided up into a number of phases: target store address generation (TSAG) computation, and write back (WB) Unique to their approach is that consecutive speculative threads execute these stages in a pipelined fashion. For instance the TSAG stage must be completed ....

J. Tsai and P. Yew. \The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation," in Proc. of 1996 Parallel Architectures and Compilation Techniques (PACT), Oct. 1996.


A Comprehensive Dynamic Processor Allocation Scheme for.. - Kazi, Lilja (2000)   (Correct)

....of concurrent threads. Thus, the transformed parallel programs are portable to any shared memory multiprocessor system with a Java Virtual Machine implementation that supports native threads. JavaSpMT is based on the fine grained thread pipelining model proposed for the superthreaded architecture [10, 11, 12]. The superthreaded architecture exploits task level parallelism using multiple threads of control. Each thread runs on a separate thread processing unit, each with its own program counter and instruction execution data path. The execution of a program starts from its entry thread which can then ....

....through explicit thread management and communication instructions. The execution of a thread is partitioned into four different stages continuation, target store address generation, computation, and write back. JavaSpMT extends the basic thread pipelining model of the superthreaded processor [10, 11, 12] to speculatively parallelize coarse grained Java applications on a sharedmemory multiprocessor system. The speculative execution allows loops with indeterminate termination conditions (e.g. do while loops) or complex branching structures (e.g. nested if then else) to be parallelized. Unlike the ....

J.Y. Tsai and P.C. Yew, The Superthreaded Architecture: Thread Pipelining with Run-time Data Dependence Checking and Control Speculation, PACT, Oct. '96, pp. 35-46.


Coarse-Grained Thread Pipelining - A Speculative Parallel.. - Kazi, Lilja   (Correct)

....thread pipelining, for exploiting speculative coarse grained parallelism from general purpose application programs in shared memory multiprocessor systems. This parallelization model, which is based on the fine grained thread pipelining model proposed for the superthreaded architecture [11, 12], allows concurrent execution of loop iterations in a pipelined fashion with run time data dependence checking and control speculation. The speculative execution combined with the run time dependence checking allows the parallelization of a variety of program constructs that cannot be ....

....multiprocessor systems. This model allows concurrent execution of loop iterations in a pipelined fashion with run time data dependence checking and control speculation. Our coarse grained model is based on the finegrained thread pipelining model proposed for the superthreaded architecture [11, 12]. The superthreaded architecture uses a thread pipelining execution model in which threads are dynamically initiated and executed. We extend the fine grained superthreaded model by implementing it in a set of software library routines to parallelize coarse grained applications on off the shelf ....

[Article contains additional citation context not shown here]

J.Y. Tsai and P.C. Yew, The Superthreaded Architecture: Thread Pipelining with Run-time Data Dependence Checking and Control Speculation, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques (PACT '96), pp. 35-46, Oct. 1996.


Coarse-Grained Speculative Execution in Shared-Memory.. - Kazi (1998)   (12 citations)  (Correct)

....called coarse grained thread pipelining, for exploiting coarse grained parallelism from general purpose application programs in shared memory multiprocessor systems. This parallelization model, which is based on the fine grained thread pipelining model proposed for the superthreaded architecture [7], allows concurrent execution of loop iterations in a pipelined fashion with run time data dependence checking and control speculation. The speculative execution combined with the run time dependence analysis allows the parallelization of a variety of program constructs that cannot be parallelized ....

....sharedmemory multiprocessor systems. This model allows concurrent execution of loop iterations in a pipelined fashion with run time data dependence checking and control speculation. Our coarse1 grained model is based on the fine grained thread pipelining proposed for the superthreaded architecture [7,8,9]. The superthreaded architecture uses a thread pipelining execution model in which threads are dynamically initiated and executed. We extend the fine grained superthreaded model by implementing it in a set of software library routines to parallelize coarse grained applications on multiprocessor ....

[Article contains additional citation context not shown here]

J.Y. Tsai and P.C. Yew. The Superthreaded Architecture: Thread Pipelining with Run-time Data Dependence Checking and Control Speculation. In Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques (PACT '96), pp. 35-46, Oct. 1996. 54


Control Independence in Trace Processors - Rotenberg (1999)   (6 citations)  (Correct)

....information. Nonetheless, the study is useful for understanding control independence. Multiscalar processors (Franklin, 1993; Sohi et al. 1995) Dynamic Multithreading (Akkary Driscoll, 1998) and other multithreaded architectures (Oplinger et al. 1997; Steffan Mowry, 1998; Dubey et al. 1995; Tsai Yew, 1996) exploit control independence by pursuing multiple flows of control. Either the compiler or hardware partitions the program into tasks threads, or subgraphs of the CFG, which may contain arbitrary control flow. Branch mispredictions within a task thread may not cause subsequent tasks to squash if ....

Tsai, J.-Y., & Yew, P.-C. (1996). The superthreaded architecture: Thread pipelining with run-time data dependence checking and control speculation. In Proceedings of PACT-96.


Trace Processors: Exploiting Hierarchy And Speculation - Rotenberg (1999)   (3 citations)  (Correct)

....proper coordination of PE data caches. Related work along these lines includes the Stanford Hydra Project [31,32,65,66,67,68] and the CMU STAMPede Project [102] discussed in the next section. 2.2. 2 Speculative multithreaded processors I now briefly summarize speculative multithreaded processors [2,16,66,102,106] that have the following fundamental aspects in common with the multiscalar paradigm. This also provides an opportunity to summarize the potential architectural advantages of the multiscalar and speculative multithreading paradigm. 1. Threads (tasks) are extracted from a single, sequential ....

....upon detecting misspeculation, entire threads are re fetched from buffers and analyzed to isolate the affected instructions, and these instructions are selectively re executed. Projects that embody these architectural principles include the SPSM architecture [16] the Superthreaded architecture [106], Dynamic Multithreading [2] UPC Speculative Multithreading [54,55,56,107] and several single chip multiprocessor projects the Stanford Hydra Project [31,32,65,66,67,68] and CMU STAMPede Project [102] 2.2.3 Trace processors Traces are essentially unwound , dynamic versions of static ....

J.-Y. Tsai and P.-C. Yew. The Superthreaded Architecture: Thread Pipelining with Run-time Data Dependence Checking and Control Speculation. International Conference on Parallel Architecture and Compilation Techniques, 1996.


Integrating Scalar Analyses And Optimizations In A Parallelizing.. - Zheng (2000)   (4 citations)  (Correct)

....improve the efficiency and effectiveness of scalar analyses and optimizations. This algorithm is implemented in the Agassiz Compiler [ZTZ99 ] which is an integrated parallelizing and optimizing compiler that can compile both FORTRAN77 and C programs for concurrent multithreaded architectures [TY96] 1.1 The Purpose of Scalar Analyses and Optimizations We first highlight the purpose of performing scalar analyses and optimizations in the Agassiz Compiler. Figure 1 shows the compiler infrastructure. It consists of a parallelizing front end and an optimizing back end. The front end compiler ....

.... reference and modification (Ref Mod) information are also exported to the back end compiler via a platform independent format call High Level Information (HLI) CTS 98] Thus, the front end compiler generates two outputs: an annotated parallel code for concurrent multithreaded architectures [TY96] and a HLI file. 3 Figure 1: An overview of the Agassiz Compiler The back end compiler, which is currently a modified version of GCC, reads in the annotated parallel codes and creates a RTL representation for the program. It also imports the HLI information into the RTL and uses the ....

[Article contains additional citation context not shown here]

J.-Y. Tsai and P.-C. Yew. The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation. In Proceedings of the Int'l Conf. on Parallel Architectures and Compilation Techniques, October 1996.


A Chip-Multiprocessor Architecture with Speculative.. - Krishnan, Torrellas (1999)   (22 citations)  (Correct)

....serial by the compiler. The situation becomes even worse for non numerical applications, which often access data through pointers. 3 To address this problem, speculation may be used. First, speculative threads need to be identi ed in the application. They may be identi ed either at compile time [5, 11, 14, 26, 27, 28] or completely at run time with hardware support [17, 23] Then, the di erent threads are executed in parallel speculatively. Added software or hardware support enables detection of and recovery from dependence violations. In this mode, the threads that execute on the on chip processing units do ....

....of the source program. A direct consequence of this, however, is that a large amount of hardware remains un utilized when running a fully parallel application or a multiprogrammed workload. In the second approach, the CMP is generic enough and has only minimal support for speculative execution [11, 27, 28]. Current proposals for these systems restrict the communication between processors to occur only through memory. Such limited hardware may be sucient when programs are compiled using a compiler that is aware of the speculation hardware [27] However, the need to re compile the source is a ....

[Article contains additional citation context not shown here]

J. Tsai and P. Yew. The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation. In PACT '96, pages 35-46, October 1996.


A Quantitative Assessment of Thread-Level Speculation.. - Marcuello, González (1999)   (6 citations)  (Correct)

....well as appropriate mechanisms to forward or to predict values produced by one thread and consumed by a different one. They differ in who and how is splits the program into threads. In several proposals such as the Multiscalar [5] 22] the SPSM architecture [4] and the Superthreaded architecture [25], the compiler is responsible for splitting the program into threads whereas some others only rely on hardware techniques. Examples of the latest group are the Dynamic Multithreaded Processor [1] and the Clustered Speculative Multithreaded Processor [13] 14] These previous works have shown that ....

.... the program into threads based on several heuristics that try to minimize the data dependences among active threads or to have a better load balance, among other compiler criterias [28] Other architectures such as the Multithreaded Decoupled [3] the SPSM [4] and the Superthreaded architectures [25] also rely on the compiler to split the program into threads, but in these cases, threads are assumed to be loop iterations instead of the more complex analysis of the Multiscalar compiler. On the other hand, some other architectures try to exploit thread level parallelism speculating on threads ....

J.Y. Tsai and P-C. Yew, "The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation", in Proc. of the Int. Conf. on parallel Architectures and Compilation Techniques, pp. 35-46, 1996. 22 of 22


Value Prediction for Speculative Multithreaded.. - Marcuello, Tubella.. (1999)   (17 citations)  (Correct)

....as a superscalar processor does. This type of execution model was first proposed in the Expandable Split Window paradigm [5] and the Multiscalar microarchitecture [20] Other more recent microarchitecture proposals that are also based on this generic execution model are: SPSM [4] Superthreaded [23], Trace Processors [16] 24] Speculative Multithreaded [13] 14] Dynamic Multithreaded [1] and extensions to multiprocessor architectures [8] 9] 10] 11] 15] 22] A critical issue of such architectures is the approach to identify the most effective points where speculative threads can be spawned. ....

....architectures which provide support for speculative threads have recently appeared. Pioneer work on this topic was the Expandable Split Window paradigm [5] and the follow up work on Multiscalar processors [20] Other proposals are the SPSM architecture [4] the Superthreaded architecture [23], the Multithreaded Decoupled architecture [3] the Trace Processor [16] 24] the Clustered Speculative Multithreaded architecture [13] 14] and the Dynamic Multithreaded processor [1] Figure 10: Speed up versus single thread execution for the INCR predictor and a perfect predictor. go m88ksim ....

J.Y. Tsai and P-C. Yew, "The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation", in Proc. of the Int. Conf. on parallel Architectures and Compilation Techniques, pp. 35-46, 1996.


Cooperating Threads Architecture: Improving both.. - Sundaramoorthy.. (2000)   (Correct)

....Researchers have demonstrated a tremendous amount of redundancy, repetition, and predictability in general purpose programs [5,7,8,13,24,26] This prior research forms a basis for creating the shorter program in the cooperating threads architecture. Speculative multithreading architectures [1,6,18,27,28,29] speed up a single program by dividing it into speculatively parallel threads. The speculation model uses one architectural context and future threads are spawned within temporary, private contexts, each inherited from the preceding thread s context. Future thread contexts are merged into the ....

J.-Y. Tsai and P.-C. Yew. The superthreaded architecture: Thread pipelining with run-time data dependence checking and control speculation. PACT-96, 1996.


A Study of Control Independence in Superscalar Processors - Rotenberg, Jacobson, Smith (1999)   (10 citations)  (Correct)

....important aspects of programs themselves were not modeled; in particular, a significant subset of data dependences were ignored due to the trace driven nature of the study. Several microarchitecture implementations have since been proposed that incorporate control independence in some form [11, 13, 14, 15, 16, 17, 18, 1]. In these studies, however, either the impact of control independence is not isolated, or insight into the reported performance gains is limited and obscured by artifacts of the particular design. In this paper we have three primary objectives and contributions. The first objective is to ....

....unconstrained limit study by Uht and Sindagi [2] uses a similar simulation approach, but in addition to studying minimal control dependences , a form of selective eager execution called disjoint eager execution is also studied. Multiscalar processors [11,13] and other multithreaded architectures [16, 17, 14, 15] exploit control independence by pursuing multiple flows of control. In the case of multiscalar, the compiler partitions the program into tasks, or subgraphs of the CFG. Arbitrary control flow may exist within a task, and the compiler need not guarantee that tasks be control and data independent. ....

J.-Y. Tsai and P.-C. Yew. The superthreaded architecture: Thread pipelining with run-time data dependence checking and control speculation. Intl. Conf. on Parallel Architecture and Compilation Techniques, 1996.


The Need for Fast Communication in Hardware-Based.. - Krishnan, Torrellas (1999)   (6 citations)  (Correct)

....superscalar processor on the chip, many researchers have proposed decentralized architectures wherein multiple simpler processing units are con gured on a single chip. Indeed, the chip multiprocessor (CMP) architecture has drawn great attention, with architects proposing various related designs [5, 10, 12, 16, 20, 22, 23, 24]. Though the CMP is an ideal platform to run multiple sequential applications or a fully parallel application, if it is to be fully accepted, it must also be able to give good performance when running a single sequential application or one that cannot be parallelized by the compiler e ectively. ....

....CMPs handle these applications by resorting to a speculative mode of execution. In this mode, the threads that execute on the on chip processors do not need to be fully independent; they may have data dependences with each other. Such speculative threads may be identi ed either at compile time [4, 10, 12, 22, 23, 24] or completely at run time with hardware support [16, 19] In these speculative CMPs, additional hardware support is needed to enforce inter thread dependences and ensure that sequential semantics are not violated. As a result, threads may be squashed and restarted when a dependence violation is ....

[Article contains additional citation context not shown here]

J. Tsai and P. Yew. The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation. In PACT '96, pages 35-46, October 1996. 32


The SImulator for Multithreaded Computer Architecture.. - Jian Huang Department   (4 citations)  (Correct)

.... Science and Engineering University of Minnesota Minneapolis, MN 55455 Email:huangj cs.umn.edu Overview The SImulator for Multi threaded Computer Architecture (SIMCA) is built on top of the SimpleScalar tool set [1] in an effort to evaluate the performance of the superthreaded architecture [4, 5], and to explore the different design alternatives. Our compiler can compile superthreaded source codes written in C or FORTRAN into superthreaded binary, and this binary runs on the SIMCA. All processes are automated. The performance of SIMCA with no compiler optimization on an SGI Challenge ....

....thousand instructions per second when only one thread unit is active. The main contribution of this simulator is that it resolves many questions on the details of the hardware design, and it serves as a guide for the actual hardware implementation. 1 Introduction The superthreaded architecture [4, 5] uses the thread pipelining model to execute multiple threads concurrently for better performance. Data dependence is resolved in runtime while control dependences are speculated. In order to evaluate this architecture thoroughly, we need a detailed simulator. We started the development of SIMCA ....

J. Tsai, P.-C. Yew. "The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation ". PACT'96, pp. 35-46.


Hardware for Speculative Run-Time Parallelization in.. - Zhang, Rauchwerger.. (1997)   (25 citations)  (Correct)

....loop and the loop is re executed serially. This form of speculative execution is different from the commonly proposed for advanced uniprocessors. Indeed, much work has been done in uniprocessors trying to issue loads ahead of stores even though the addresses of the locations accessed are unknown [7, 10, 15, 16]. This type of speculation is orthogonal to our work. Our framework exploits parallelism across processors in a multiprocessor. It can use these traditional techniques within each thread. We are effectively adding a second dimension of speculation. There are several advantages to our framework. A ....

J.Y. Tsai and P.C. Yew. The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques (PACT '96), 1996.


Control Independence in Trace Processors - Rotenberg, Smith (1999)   (6 citations)  (Correct)

....superscalar organization, and there is a reliance on the compiler to provide complete control dependence information. Nonetheless, the study is useful for understanding control independence. Multiscalar processors [16,2] Dynamic Multithreading [11] and other multithreaded architectures [17, 18, 19, 20] exploit control independence by pursuing multiple flows of control. Either the compiler or hardware partitions the program into tasks threads, or subgraphs of the CFG, which may contain arbitrary control flow. Branch mispredictions within a task thread may not cause subsequent tasks to squash if ....

J.-Y. Tsai and P.-C. Yew. The superthreaded architecture: Thread pipelining with run-time data dependence checking and control speculation. PACT-96, 1996.


Multithreaded Architectures for Media Processing - Balakrishnan, Nandy   (Correct)

....via separate busses to the instruction and data caches. Other system interfaces for streaming data and display buffers that are part of a any multimedia system have not been included in the figure. A processor in the SYMPHONY framework (SP) has hardware to support simultaneous multithreading [26, 25, 9]. A broad organization of an SP is shown in Fig. 1. An SP has a few extra hardware modules not seen in multithreaded architectures proposed earlier. These are the communication registers, the single assignment memory (SAM) module and the communication controller (CC) In addition to these modules, ....

J.-Y. Tsai and P.-C. Yew. The superthreaded architecture: Thread pipelining with run-time data dependence checking and control speculation. In Proc. of the Intl. Conf. on Parallel Architectures and Compilation Techniques, pages 49--58, 1996.


A Comparison of Different Multithreading Architectures - Hordijk, Corporaal (1997)   (Correct)

....(i.e. whether the dependencies are obeyed) The table holds all the data values of all stores, therefore the buffer can also resolve write after read and write after write memory dependencies. Superthreaded Architecture: The Superthreaded architecture classification can also be found in table 2 [7]. The processing units are connected in an unidirectional ring, just like in the Multiscalar architecture, and threads have a single successor. In contrast with the Multiscalar architecture, which executes threads highly speculative, the Superthreaded architecture analyzes dependencies before they ....

J. Tsai and P. Yew. The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation. In Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques, pages 35--46, 1996.


Clustered Speculative Multithreaded Processors - Marcuello, González (1999)   (26 citations)  (Correct)

....4 TU 19 of 23 4. Related work This work is inspired in previous proposals for dynamically scheduled processors with support for multiple speculative threads, such as the Expandable Split Window paradigm [7] Multiscalar processors [33] the SPSM architecture [5] the Superthreaded architecture [36]; the Multithreaded Decoupled architecture [4] Trace processors [29] 41] the Dependence Speculative Multithreaded Architecture [23] and Dynamic Multithreaded processors [2] The Clustered Speculative Multithreaded architecture differs from previous proposals in the way that Figure 10:a) ....

J.Y. Tsai and P-C. Yew, "The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation", in Proc. of the Int. Conf. on parallel Architectures and Compilation Techniques, pp. 35-46, 1996.


A Dynamically Adaptive Parallelization Model Based on Speculative.. - Kazi (2000)   Self-citation (Yew)   (Correct)

.... is proposed to exploit coarse grained loop level parallelism from general purpose application programs on sharedmemory multiprocessor systems [23, 24] The speculative multithreading parallelization model, which is based on the execution model of the superthreaded processor architecture [44, 45, 46], uses a thread pipelined execution model with run time data dependence checking and control speculation to parallelize loops with potential data dependences that cannot be analyzed at compile time as well as loops with traditionally sequential constructs. The run time dependence test in this ....

....of the array under test. As soon as parallelization fails for an access, the execution is aborted and restarted sequentially from the very beginning with all shared data restored to their original values. 2.1. 3 Multithreaded Processors A number of multithreaded processor architecture models [14, 42, 44] use control and or data speculation to concurrently execute multiple threads. Unlike the hardware based speculative run time schemes discussed in the previous section that support speculative execution on existing multiprocessor architecture, these processor architecture models use specialized ....

[Article contains additional citation context not shown here]

J.Y. Tsai and P.C. Yew, The Superthreaded Architecture: Thread Pipelining with Run-time Data Dependence Checking and Control Speculation, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques (PACT '96), Oct. 1996, pp. 35-46.


Designing the Agassiz Compiler for Concurrent.. - Zheng, Tsai.. (1999)   (4 citations)  Self-citation (Tsai Yew)   (Correct)

....Processing University of Minnesota Cupertino, CA 95014 Fudan University MPLS, MN 55108 Shanghai, P.R. China Abstract In this paper, we present the overall design of the Agassiz compiler [1] The Agassiz compiler is an integrated compiler targeting the concurrent multithreaded architectures [12][13]. These architectures can exploit both loop level and instruction level parallelism for general purpose applications (such as those in SPEC benchmarks) They also support various kinds of control and data speculation, runtime data dependence checking, and fast synchronization and communication ....

....of years. Exploiting parallelism at various granularity levels on a single chip for higher performance will soon become a reality, and even a necessity, for both uniprocessors and multiprocessors. As a matter of fact, with the recent introduction of concurrent multithreaded architectures [12][13], the line dividing traditional uniprocessors and multiprocessors for single chip architectures has gradualy disappeared. A lot of parallel processing technologies, including parallelizing compiler technology, can be leveraged and extended to the more general purpose applications typified by the ....

[Article contains additional citation context not shown here]

J.-Y. Tsai and P.-C. Yew. The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation. In Proceedings of the Int'l Conf. on Parallel Architectures and Compilation Techniques, October 1996.


Min-Cut Program Decomposition for Thread-Level Speculation - Johnson, Eigenmann.. (2004)   (Correct)

No context found.

J.-Y. Tsai and P.-C. Yew. The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation. In Proceedings of the International Conference on Parallel Architecture and Compiler Techniques, pages 35--46, October 1996.


Thread-Spawning Schemes for Speculative Multithreading - Pedro Marcuello And (2001)   (4 citations)  (Correct)

No context found.

J.Y. Tsai and P-C. Yew, "The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation", in Proc. of the Int. Conf. on Parallel Architectures and Compilation Techniques, pp. 35-46, 1996.


Improving Speculative Thread-Level Parallelism Through Module .. - Warg, Stenstrom (2003)   (Correct)

No context found.

J.-Y. Tsai and P.-C. Yew. The superthreaded architecture: Thread pipelining with run-time data dependence checking and control speculation. In Proceedings of the 1996.


Control and Data Dependence in Multithreaded Processors - Pedro Marcuello And (1998)   (Correct)

No context found.

J-Y. Tsai and P-C. Yew, "The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation", in Proc. Int. Conf. on Parallel Architectures and Compilation Techniques, pp. 35-46, 1996.


Memory Dependence Prediction - Andreas Ioannis Moshovos   (Correct)

No context found.

J.Y. Tsai and P.-C. Yew. The superthreaded architecture: thread pipelining with run-time data dependence checking and control speculation. In Proc. PACT'96, October 1996.


Exploiting Speculative Thread-Level Parallelism on a - Smt Processor Authors (1999)   (Correct)

No context found.

J-Y. Tsai and P-C. Yew, "The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation", in Proc. Int. Conf. on Parallel Architectures and Compilation Techniques, pp. 35-46, 1996.


Implications of Register and Memory Temporal Locality.. - Morano, Khalafi.. (2002)   (Correct)

No context found.

Tsai J-Y., Yew P-C. The superthreaded architecture: Thread pipelining with run-time data dependence checking and control speculation. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 35--46, 1996.


Exploiting Thread-Level Parallelism On . . . - Lo (1998)   (Correct)

No context found.

J.-Y. Tsai and P.-C. Yew. The superthreaded architecture: Thread pipelining with run-time data dependence checking and control speculation. In 1996.


Appears in the proceedings of the 35th International.. - Craig Zilles Department (2002)   (Correct)

No context found.

J.-Y. Tsai and P.-C. Yew. The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation. PACT, Oct. 1996.


Low-Cost Thread-Level Data Dependence Speculation on.. - Rundberg, Stenström (2000)   (5 citations)  (Correct)

No context found.

J. Tsai and P. Yew. "The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation," in Proc. of


Slipstream Processors: Improving both Performance and Fault .. - Sundaramoorthy, al. (2000)   (26 citations)  (Correct)

No context found.

J.-Y. Tsai and P.-C. Yew. The Superthreaded Architecture: Thread Pipelining with Run-time Data Dependence Checking and Control Speculation. Parallel Architectures and Compiler Techniques, 1996.


A Study of Control Independence in Superscalar Processors - Eric Rotenberg Quinn (1999)   (10 citations)  (Correct)

No context found.

J.-Y. Tsai and P.-C. Yew. The superthreaded architecture: Thread pipelining with run-time data dependence checking and control speculation. PACT, 1996.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC