22 citations found. Retrieving documents...
TSAI,J.Y., HUANG,J.,AMLO,C.,LILJA,D.,AND YEW,P.C. 1999. The superthreaded processor architecture. IEEE Trans. on Computers 48, 9 (Sept.), 881--902.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Appears in the Proceedings of the 30th Annual International - Symposium On Computer   (Correct)

....to a level that makes them incompatible with production runs. If we could design effective debugging hardware that remained on while in production runs, human time to debug programs would be reduced drastically. Fortunately, well known architectural support for Thread Level Speculation (TLS) [5, 10, 11, 19, 26, 27] can be reused for software debugging. Previous work has pointed out that TLS s ability to squash or commit the side effects of code sections as a unit can be used to enhance software reliability [17] In this paper, we claim that TLS has the potential to be a core technology for software ....

....execution order do not meet this requirement, we need special hardware support. In the rest of the paper, we present a design that fulfills these requirements. 3. Reusing TLS Mechanisms for Debugging TLS is a technique that has been proposed for speculatively parallelizing sequential programs [5, 10, 11, 19, 26, 27]. However, we claim that its architectural mechanisms can be easily adapted to provide support for our debugging requirements. In this section, we first give an overview of TLS and then describe how it can be adapted to provide the desired debugging supports. 3.1. Brief Overview of TLS With TLS, ....

[Article contains additional citation context not shown here]

J. Y. Tsai et al. The Superthreaded Processor Architecture. IEEE Trans. on Computers, 48(9):881--902, 1999.


Tradeoffs in Buffering Memory State for.. - Garzarán..   (Correct)

....under grants EIA 0081307, EIA 0072102, and EIA 0103741; by DARPA under grant F30602 01 C 0078; by the Ministry of Education of Spain under grant TIC 2001 0995 C02 02; and by gifts from IBM and Intel. 21, 23, 24, 26] to software based (e.g. 7, 11, 17, 18] and targeting small machines (e.g. [1, 8, 10, 12, 14, 15, 20, 23, 24]) or large ones (e.g. 4, 6, 11, 16, 17, 18, 21, 26] Each scheme for thread level speculation has to solve two major problems: detection of violations and, if a violation occurs, state repair. Most schemes detect violations in a similar way: data that is speculatively accessed (e.g. read) is ....

....safe state in order. This is challenging in multiprocessors, given their distributed caches and buffers. A variety of approaches to buffer and manage speculative memory state have been proposed. In some proposals, tasks buffer unsafe state dynamically in caches [4, 6, 10, 14, 21] write buffers [12, 24] or special buffers [8, 16] to avoid corrupting main memory. In other proposals, tasks generate a log of updates that allow them to backtrack execution in case of a violation [7, 9, 25, 26] Often, there are large differences in the way caches, buffers, and logs are used in different schemes. ....

[Article contains additional citation context not shown here]

J. Y. Tsai, J. Huang, C. Amlo, D. Lilja, and P. C. Yew. The Superthreaded Processor Architecture. IEEE Trans. on Computers, 48(9):881--902, September 1999.


Execution Performance of the Scheduled Dataflow Architecture - Kavi   (Correct)

.... to multithreading, there is a consensus that multithreading in general, achieves higher instruction issue rates on processors that contain multiple functional units (e.g. Superscalars and VLIW) or multiple processing elements (i.e. Chip Multiprocessors) Butler 91] Krishnan 99] Lam 92] Tsai 99] Wall 91] It is necessary to find an appropriate multithreaded model and implementation to achieve the best possible performance. We believe that the use of non blocking dataflow based threads are appropriate for improving the performance of Superscalar architectures. Dataflow ideas are often ....

J. Y. Tsai, J. Huang, C. Amlo, D. Lilja, and P. C. Yew. "The Superthreaded processor architecture", IEEE Trans. on Computers, Sept. 1999, pp. 881-902.


Execution and Cache Performance of a Decoupled Non-Blocking .. - Kavi, Giorgi, Arul (2000)   (Correct)

.... there is a consensus that multithreading, in general, achieves higher instruction issue rates in processors that contain multiple functional units (e.g. superscalars and VLIW) or multiple processing elements (i.e. Chip Multiprocessors) Butler 91] Kavi 98a] Krishnan 99] Lam 92] Tsai 99] Wall 91] It is necessary to find an appropriate multithreaded model and implementation to achieve the best possible performance. We believe that the use of non blocking dataflow based threads are appropriate for improving the performance of superscalar architectures. Dataflow ideas are ....

J. Y. Tsai, J. Huang, C. Amlo, D. Lilja, and P. C. Yew. "The Superthreaded processor architecture", IEEE Trans. on Computers, Sept. 1999, pp. 881-902.


Eliminating Squashes Through Learning Cross-Thread.. - Cintra, Torrellas (2002)   (5 citations)  (Correct)

....data dependences, such as those that contain pointer accesses, references to arrays with non linear subscripts, very irregular control flow, or accesses across complicated procedure calling patterns. To extract parallelism in such codes, speculative thread level parallelization has been proposed [1, 4, 6, 7, 8, 10, 12, 18, 19, 20, 22, 23, 25, 26, 32]. In this approach, potentially dependent threads are speculatively executed in parallel, hoping not to violate dependences. If a cross thread dependence is violated at run time, a corrective action is triggered to repair the state. Such an action often involves squashing one or several threads. ....

....schemes for speculative parallelization differ in many ways. For example, some schemes rely on support code inserted by the compiler to check for dependence violations and to perform corrective actions [7, 19, 20] Other schemes rely on special hardware to perform some or all of these operations [1, 4, 6, 8, 10, 12, 18, 22, 23, 25, 26, 32]. This work was supported in part by the National Science Foundation under grants CCR 9970488, EIA 0081307, and EIA 0072102; by DARPA under grant F30602 01 C 0078; and by gifts from IBM and Intel. Work conducted in part while the author was with the Department of Computer Science at the ....

[Article contains additional citation context not shown here]

J.-Y. Tsai, J. Huang, C. Amlo, D. Lilja, and P.-C. Yew. "The Superthreaded Processor Architecture." IEEE Trans. on Computers, Special Issue on Multithreaded Architectures, Vol. 48, No. 9, pages 881-902, September 1999.


Compiler Generated Multithreading to Alleviate Memory Latency - Beyls, D'Hollander (2000)   (Correct)

....plenty of parallel instructions ready to execute can be found in the computation thread. To completely overlap the data fetch time, the computation thread must be able to continue computation on main memory access. In a simultaneous multithreaded processor[Kwak et al..1999, Farcy et al..1996, Tsai et al..1999, Loikkanen et al..1996, Tullsen et al..1995] this problem does not arise since the cache missing instructions reside in the data fetch thread and the independent instructions reside in the computing thread. In multithreaded processors, instructions can leave the reservation station as soon as their ....

Jenn-Yuan Tsai, Jian Huang, Christo er Amlo, David J. Lilja, and Pen-Chung Yew. The superthreaded processor architecture. IEEE Transactions on Computers, 48(9):881-902, sept 1999.


Multiplex: Unifying Conventional and Speculative.. - Kim, Ooi, Park..   (Correct)

....To preserve program execution correctness, implicitly threaded hardware identifies and satisfies all dependences among implicit threads. Examples of proposed architectures using implicit threading are Multiscalar [29] and Trace Processor [28] Hydra [18] Stampede [30] Superthreaded processor [33], Speculative NUMA [10] and MAJC [32] To maintain program correctness, implicitly threaded architectures rely on the hardware to track dependence among threads and verify correct speculation. Upon a misspeculation, the hardware rolls back the system to a state conforming to sequential ....

....change the trade off between implicit and explicit threads. 6 Related Work There are many projects exploring architectural proposals for implicit threading such as Wisconsin Multiscalar [29,15] and Trace Processor [28] Stanford Hydra [18] CMU Stampede [30] Minnesota Superthreaded processor [33], Illinois Speculative NUMA [10] and SUN Microsystems MAJC [32] While Multiplex proposes techniques to unify implicit and explicit threading within a single application, these FIGURE 8: Reducing explicit thread dispatch overhead in class 1 applications. The figure illustrates the effect of the ....

[Article contains additional citation context not shown here]

J.-Y. Tsai, J. Huang, C. Amlo, D. Lilja, and P.-C. Yew. The superthreaded processor architecture. IEEE Transactions on Computers, 98(9), Sept. 1999.


Using Incorrect Speculation to Prefetch Data in a.. - Chen, Sendag, Lilja (2002)   Self-citation (Lilja)   (Correct)

No context found.

J-Y. Tsai, J. Huang, C. Amlo, D. J. Lilja, and P-C. Yew, "The Superthreaded Processor Architecture". IEEE Transactions on Computers, Special Issue on Multithreaded Architectures and Systems, September, 1999.


Using Incorrect Speculation to Prefetch Data in a.. - Chen, Sendag, Lilja (2002)   Self-citation (Lilja)   (Correct)

No context found.

J-Y. Tsai, J. Huang, C. Amlo, D. J. Lilja, and P-C. Yew, "The Superthreaded Processor Architecture," IEEE Transactions on Computers, Special Issue on Multithreaded Architectures and Systems, pp. 881-902, September, 1999.


The Effect of Executing Mispredicted Load.. - Multithreaded..   Self-citation (Lilja)   (Correct)

No context found.

Jenn-Yuan Tsai, Jian Huang, Christoffer Amlo, David J. Lilja, and PenChung Yew, "The Superthreaded Processor Architecture". IEEE Transactions on Computers, Special Issue on Multithreaded Architectures and Systems, September, 1999.


Evaluating Novel Memory System Alternatives for Speculative .. - KleinOsowski, Lilja   Self-citation (Lilja)   (Correct)

....Threads of execution within a program are typically loop iterations or multiple paths of a control structure. These threads most often have cross iteration data dependences that are difficult, if not impossible, for the compiler to detect at compile time. Therefore, multithreaded architectures [6, 8, 14, 18, 21] require hardware to support data dependence checking and speculative execution. In many multithreaded architectures, the compiler identifies possible data dependences and then special hardware determines, at runtime, whether these data dependences are true dependences or simply false alarms. True ....

....buffer. In the remainder of this paper, Section 2 describes various multithreaded architectures and how they handle data dependence checking and correcting. Section 2 also describes the motivation for developing a novel, hybrid cache structure. Section 3 describes the Superthreaded Architecture [18] model in detail. Section 4 describes our experimental model and the different configurations we evaluate in this work. Section 5 states the results of our simulations. In Section 6 we present ideas for continuing this work and in Section 7 we conclude with our final recommendation of the best ....

[Article contains additional citation context not shown here]

J. Tsai, J. Huang, C. Amlo, D. Lilja, and P. Yew, "The Superthreaded Processor Architecture," IEEE Transactions on Computers, Volume 48, Number 9, September 1999, p. 881-902.


A Comprehensive Dynamic Processor Allocation Scheme for.. - Kazi, Lilja (2000)   Self-citation (Lilja)   (Correct)

....that supports native threads. The dynamic processor allocation scheme presented in this paper is general enough to be applied to programs parallelized with other standard techniques, however. JavaSpMT is based on the fine grained thread pipelining model proposed for the superthreaded architecture [11, 12, 13]. The superthreaded architecture exploits task level parallelism using multiple threads of control. Each thread runs on a separate thread processing unit, each with its own program counter and 3 Fork and Forward target store addr Fork i 1th thread Forward continuation variable ....

....through explicit thread management and communication instructions. The execution of a thread is partitioned into four different stages continuation, target store address generation, computation, and write back. JavaSpMT extends the basic thread pipelining model of the superthreaded processor [11, 12, 13] to speculatively parallelize coarse grained Java applications on a shared memory multiprocessor system. The speculative execution allows loops with indeterminate termination conditions (e.g. do while loops) or complex branching structures (e.g. nested if then else) to be parallelized. Unlike the ....

J. Y. Tsai, J. Huang, C. Amlo, D. J. Lilja, and P. C. Yew, The Superthreaded Processor Architecture, In IEEE Transactions on Computers, Special Issue on Multithreaded Architectures and Systems, September 1999, pp. 881-902.


When All Else Fails, Guess: The Use of Speculative Multithreading.. - Lilja (2000)   Self-citation (Lilja)   (Correct)

....values into their private local memory buffers. Figure 3 shows how the example loop from Figure 1 would be rewritten for this speculative multithreading execution model [12] IV. The Superthreaded Processor A. Architecture We have developed the superthreaded processor architecture [12] 13] [14] to support the execution of the specu Continuation Stage L1: i1 = i; storets( i,i1 1) fork L1; Target Store Address Generation Stage allocatets( minclk) waittsagdone; releasetsagdone; Computation Stage if (functunits[i1] class = ILLEGALCLASS ) abortfuture; i = ....

....techniques take into consideration the hardware support for speculation and runtime data dependence checking in the superthreaded architecture. C. Performance Evaluation We have developed a simulator for the superthreaded processor [19] that we have used to characterize its performance potential [14]. This simulator is based on the SimpleScalar simulator [20] that is commonly used among the computer architecture research community. Our simulation methodology builds a framework around the SimpleScalar simulator to include the components that are unique to the superthread architecture, such as ....

[Article contains additional citation context not shown here]

Jenn-Yuan Tsai, Jian Huang, Christoffer Amlo, David J. Lilja, and Pen-Chung Yew, "The superthreaded processor architecture, " in IEEE Transactions on Computers, Special Issue on Multithreaded Architectures and Systems, September 1999, pp. 881--902.


The SImulator for Multithreaded Computer Architecture.. - Jian Huang Department   (4 citations)  Self-citation (Huang)   (Correct)

.... Science and Engineering University of Minnesota Minneapolis, MN 55455 Email:huangj cs.umn.edu Overview The SImulator for Multi threaded Computer Architecture (SIMCA) is built on top of the SimpleScalar tool set [1] in an effort to evaluate the performance of the superthreaded architecture [4, 5], and to explore the different design alternatives. Our compiler can compile superthreaded source codes written in C or FORTRAN into superthreaded binary, and this binary runs on the SIMCA. All processes are automated. The performance of SIMCA with no compiler optimization on an SGI Challenge ....

....thousand instructions per second when only one thread unit is active. The main contribution of this simulator is that it resolves many questions on the details of the hardware design, and it serves as a guide for the actual hardware implementation. 1 Introduction The superthreaded architecture [4, 5] uses the thread pipelining model to execute multiple threads concurrently for better performance. Data dependence is resolved in runtime while control dependences are speculated. In order to evaluate this architecture thoroughly, we need a detailed simulator. We started the development of SIMCA ....

[Article contains additional citation context not shown here]

J. Tsai, J. Huang, C. Amlo, D.J. Lilja, and P.- C. Yew. "The Superthreaded Processor Architecture ". To appear in the IEEE Transaction on Computers, Special Issue on Multithreaded Architectures, September, 1999.


Extending Value Reuse to Basic Blocks with Compiler Support - Huang, Lilja (2000)   (2 citations)  Self-citation (Huang Lilja)   (Correct)

....and multithreading are two techniques that have been introduced to extend the limits of instruction level parallelism. Some recently proposed processors that incorporate these techniques include the multiscalar architecture [2] the trace processor (TP) 13] the superthreading architecture [17, 18], the multiprocessoron a chip (MOAC) 10] and the superspeculative processor (SSP) 7] The multiscalar and trace processor architectures advocate a wide issue multi threaded approach, while the MOAC incorporates multiple separate processors on a single chip. The supertheaded processor is a ....

J.-Y. Tsai, J. Huang, C. Amlo, D. J. Lilja and P.-C. Yew. "The Superthreaded Processor Architecture". In IEEE Transaction on Computers, vol. 48, no. 9, Sep., 1999, Pages ??.


Tradeoffs in Buffering Speculative Memory State for .. - Garzaran.. (2005)   (Correct)

No context found.

TSAI,J.Y., HUANG,J.,AMLO,C.,LILJA,D.,AND YEW,P.C. 1999. The superthreaded processor architecture. IEEE Trans. on Computers 48, 9 (Sept.), 881--902.


iWatcher: Efficient Architectural Support for Software.. - Zhou, Qin, Liu, Zhou.. (2004)   (1 citation)  (Correct)

No context found.

J.-Y. Tsai, J. Huang, C. Amlo, D. J. Lilja, and P.-C. Yew. The superthreaded processor architecture. IEEE Transactions on Computers, September 1999.


iWatcher: Efficient Architectural Support for Software.. - Zhou, Qin, Liu, Zhou.. (2004)   (1 citation)  (Correct)

No context found.

J.-Y. Tsai, J. Huang, C. Amlo, D. J. Lilja, and P.-C. Yew. The superthreaded processor architecture. IEEE Transactions on Computers, September 1999.


iWatcher: Efficient Architectural Support for Software Debugging - Pin Zhou Feng (2004)   (1 citation)  (Correct)

No context found.

J.-Y. Tsai, J. Huang, C. Amlo, D. J. Lilja, and P.-C. Yew. The superthreaded processor architecture. IEEE Transactions on Computers, September 1999.


Return-Address Prediction in Speculative Multithreaded.. - Zahran, Franklin   (Correct)

No context found.

J-Y. Tsai, J. Huang, C. Amlo, D. J. Lilja, and P-C. Yew. The superthreaded processor architecture. IEEE Transactions on Computers, 48(9):881--902, 1999.


A Binary Translation System for Multithreaded Processors .. - Ootsu, Yokota, Ono, Baba   (Correct)

No context found.

J. Y. Tsai, J. Huang, and et al., "The Superthreaded Processor Architecture," IEEE Transactions on Computers, Special Issue on MultithreadedArchitectures,Vol. 48, No. 9, 1999.


iWatcher: Efficient Architectural Support for Software.. - Zhou, Qin, Liu, Zhou.. (2004)   (1 citation)  (Correct)

No context found.

J.-Y. Tsai, J. Huang, C. Amlo, D. J. Lilja, and P.-C. Yew. The superthreaded processor architecture. IEEE Transactions on Computers, September 1999.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC