21 citations found. Retrieving documents...
R. Govindarajan, S. Nemawarkar, and P. LeNir. Design and performance evaluation of a multithreaded architecture. In First IEEE Symposium on High-Performance Computer Architecture, pages 298--307, January 1995.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Execution and Cache Performance of the Scheduled Dataflow.. - Kavi, Arul, Giorgi (2000)   (Correct)

....87] Tokoro 83] Our current architecture specifically addresses the third limitation. Some researchers have proposed designs in which the dataflow scheduling is applied only at thread level (i.e. macro dataflow) while each thread is comprised of conventional control flow instructions [Govindarajan 95] Hum 95] Sakai 93] In such hybrid dataflow control flow systems, the instructions within a thread do not retain functional properties, and hence, introduce Write After Write (WAW) and Write After Read (WAR) dependencies. This in turn requires complex hardware to perform dynamic instruction ....

R. Govindarajan, S.S. Nemawarkar and P; LeNir: "Design and performance evaluation of a multithreaded architecture", Proceeding of thefirst High Performance ComputerArchitecture IPCA-1), Jan. 1995, pp 298-307.


Unknown -   (Correct)

....model. In this paper we propose a new architecture that addresses the third limitation. The literature has addressed several designs in which the dataflow scheduling was applied only at thread level (i.e. macro dataflow) where each thread was comprised of conventional control flow instructions [Govindarajan 95] Hum 95] Sakai 93] In such systems, the instructions within a thread do not retain functional properties, and hence, introduce write after write (WAW) and write after read (WAR) dependencies. Consequently, deviation from dataflow properties at instruction level requires complex hardware. In ....

R. Govindarajan, S.S. Nemawarkar and P; LeNir. "Design and performance evaluation of a multithreaded architecture", Proceeding of the first High Performance Computer Architecture (HPCA-1), Jan. 1995, pp 298-307.


Scheduled Dataflow Architecture: A Synchronous Paradigm for.. - Kavi, Kim, Hurson   (Correct)

....of dataflow instructions, and schedule instructions for synchronous execution. There have been several hybrid architectures proposed where the dataflow scheduling was applied only at thread level (i.e. macro dataflow) with conventional control flow instructions comprising threads (e.g. Govindarajan 95] Hum 92] Sakai 93] In such systems, the execution of instructions within a thread do not retain the functional properties of dataflow, and introduce side effects, WAW and WAR dependencies. In our system, the instructions within a thread still retain dataflow properties. 1.1 Overview of ....

....conventionaJ techniques such as result forwarding (where the results of an instruction are directly supplied to a dependent instruction) cannot be incorporated into the ETS pipeline. 1. 3 Hybrid Architectures Hybrid dataflow control flow organizations have been proposed by several researchers ( Govindarajan 95] Hum 95] Sakai 93] In most of these systems, coarse grained threads represent macro dataflow nodes while each thread includes conventional load store instruc tions. In one such proposed system, EARTH [Hum 95] two processors are used for execution of macro dataflow threads. One processor, ....

[Article contains additional citation context not shown here]

R. Govindarajan, S.S. Namawarkar and P. LeNir. "Design and perfor- mance evaluation of a multithreaded architecture," Proc. of the HPCA-1, Jan. 1995, pp. 298-307.


Execution and Cache Performance of a Decoupled Non-Blocking .. - Kavi, Giorgi, Arul (2000)   (Correct)

....87] Tokoro 83] Here we present a new architecture that addresses the third limitation. Some researchers have proposed designs in which the dataflow scheduling is applied only at thread level (i.e. macro dataflow) while each thread is comprised of conventional control flow instructions [Govindarajan 95] Hum 95] Sakai 93] In such systems, the instructions within a thread do not retain functional properties, and hence, introduce Write After Write (WAW) and Write After Read (WAR) dependencies. This in turn requires complex hardware to perform dynamic instruction scheduling. In our system, the ....

R. Govindarajan, S.S. Nemawarkar and P; LeNir. "Design and performance evaluation of a multithreaded architecture", Proceeding of the first High Performance Computer Architecture (HPCA-1), Jan. 1995, pp 298-307.


Scheduled Dataflow Architecture: A Synchronous Execution.. - Kavi, Kim, Hurson (1999)   (Correct)

....by token driven systems) and schedule instructions for synchronous ex ecution. There have been several hybrid architectures proposed where the dataflow scheduling was applied only at thread level (i.e. macro dataflow) with conventional control flow instruc tions comprising threads (e.g. [6], 7] 8] In such systems, the execution of instructions within a thread do not retain the functional properties of dataflow, and introduce side effects, WAW (or output) and WAR (or anti) dependencies. Not preserving dataflow properties at instruction level requires complex hardware for the ....

....conventional techniques such as result forwarding (where the results of an instruction are directly supplied to a dependent instruction) cannot be incorporated into the ETS pipeline. 1. 3 Hybrid Architectures Hybrid dataflow control flow organizations have been proposed by several researchers ([6], 10] 8] In most of these systems, coarse grained threads represent macro dataflow nodes while each thread includes conventional load store instructions. In one such proposed system, EARTH[10] two processors are used for the execution of macro dataflow threads. One processor, Execution Unit ....

[Article contains additional citation context not shown here]

R. Govindarajan, S.S. Namawarkar 8z P. LeNir, Design and performance evaluation of a multithreaded architecture, Proc. of the HPCA-1, 1995, 298-307.


Analysis of Pipeline Stall Effects in Block Multithreaded.. - Zuberek (2000)   (Correct)

.... Two popular approaches to instruction level parallelism are known as superscalar and VLIW (very long instruction word) architectures in which several instructions can be issued in a single processor cycle [15] and instruction level multithreading [1, 3, 4] and in particular, block multithreading [2, 6]. Block multithreading tolerates long latency memory accesses and synchronization delays by switching to another thread rather than waiting for the completion of a long latency operation which, in a distributed memory system, can require a hundred or even more processor cycles. A combination of ....

Govindarajan, R., Nemawarkar, S.S., LeNir, P., \Design and performance evaluation of a multithreaded architecture"; Proc. First IEEE Symp. on High{Performance Computer Architecture, Raleigh, NC, pp.298-307, 1995.


Performance Comparison Of Fine-Grain And Block Multithreaded.. - Zuberek (2000)   (Correct)

....require a hundred or more processor cycles) the processor switches to another thread if such a thread is ready for execution. If different threads are associated with different sets of processor registers, switching from one thread to another can be done in one or just a few processor cycles [2, 4, 6]. A distributed memory system with 16 processors connected by a 2 dimensional torus like network is used as a running example in this paper; an outline of such a system is shown in Fig.1. Fig.1. Outline of a 16 processor system. It is usually assumed that the messages sent from one node to ....

Govindarajan, R., Nemawarkar, S.S., LeNir, P., 1995. "Design and performance evaluation of a multithreaded architecture"; Proc. First IEEE Symp. on High--Performance Computer Architecture, Raleigh, NC, pp.298-307.


Performance Modeling of Multithreaded Distributed Memory.. - Zuberek Department Of (1999)   (Correct)

....Switching to another thread can be performed very efficiently because the threads are executing in the same address space. If different threads are associated with different sets of processor registers, switching from one thread to another can be done in one or just a few processor cycles [6, 7]. A distributed memory system with 16 processors connected by a 2 dimensional torus like network is used as a running example in this paper; an outline of such a system is shown in Fig.1.1. It is usually assumed that the messages sent from one node to another are routed along the shortest paths. ....

Govindarajan, R., Nemawarkar, S.S., LeNir, P., "Design and performance evaluation of a multithreaded architecture"; Proc. First IEEE Symp. on High-- Performance Computer Architecture, Raleigh, NC, pp.298-307, 1995.


A Decoupled Scheduled Dataflow Multithreaded Architecture - Kavi, Kim, Arul, Hurson (2000)   (Correct)

....execution (like PL PS) to alleviate memory latencies to further exploit multithreading. There have been several hybrid architectures proposed where the dataflow scheduling was applied only at thread level (i.e. macro dataflow) with conventional control flow instructions comprising threads (e.g. [5], 7] 14] In such systems, the instructions within a thread do not retain functional properties, and introduce side effects, WAW and WAR dependencies. Lacking dataflow properties at instruction level requires complex hardware for the detection of data dependencies and dynamic scheduling of ....

R. Govindarajan, S.S. Namawarkar and P. LeNir, "Design and performance evaluation of a multithreaded architecture," Proc. of the HPCA-1, Jan. 1995, pp. 298--307.


Advanced Vector Architectures - Espasa (1997)   (Correct)

....attention in recent years [ALKK90, Aga92, TEE 95, TEL96, HKN 92, EJK 96] and has been found to be generally useful. The latency properties of multithreading have been asserted by several researchers [LB96, BR92] Research has produced many alternative multithreaded designs [GHG 91, GNL95, GB96] most focusing on extending high performance RISC cores with extra instructions or synchronization primitives to exploit thread level parallelism. Designs combine several degrees of hardware and software cooperation to detect, schedule and execute threads from applications [LC95, TE94, ....

R. Govindarajan, S. S. Nemawarkar, and Philip LeNir. Design and performance evaluation of a multithreaded architecture. In Proceedings of the First International Symposium on High-Performance Computer Architecture, pages 298--307, Raleigh, North Carolina, January 22--25, 1995. IEEE Computer Society TCCA.


Scheduled Dataflow Architecture: A Synchronous Execution.. - Kavi, Kim, Hurson (1999)   (Correct)

....of dataflow instructions, and schedule instructions for synchronous execution. There have been several hybrid architectures proposed where the dataflow scheduling was applied only at thread level (i.e. macro dataflow) with conventional control flow instructions comprising threads (e.g. [6], 9] 15] In such systems, the execution of instructions within a thread do not retain the functional properties of dataflow, and introduce side effects, WAW and WAR dependencies. Not preserving dataflow properties at instruction level requires complex hardware for the detection of data ....

....conventional techniques such as result forwarding (where the results of an instruction are directly supplied to a dependent instruction) cannot be incorporated into the ETS pipeline. 1. 3 Hybrid Architectures Hybrid dataflow control flow organizations have been proposed by several researchers ([6], 8] 15] In most of these systems, coarse grained threads represent macro dataflow nodes while each thread includes conventional load store instructions. In one such proposed system, EARTH[8] two processors are used for the execution of macro dataflow threads. One processor, Execution Unit ....

[Article contains additional citation context not shown here]

R. Govindarajan, S.S. Namawarkar and P. LeNir. "Design and performance evaluation of a multithreaded architecture," Proc. of the HPCA-1, Jan. 1995, pp. 298--307.


Improving Single-Process Performance with Multithreaded.. - Farcy, Temam (1996)   (3 citations)  (Correct)

....architectures were closer to data flow machines like [24] than to superscalar processors. The goal was to both hide latency and increase instruction throughput by implementing enhanced context switching techniques (like in [2] also novel thread interleaving techniques have been proposed [18, 9]. Because cache hierarchies, as used in current superscalar processors 1 , are relatively successful at hiding latency, larger gains can be expected from the high instruction throughput 1 First level and second level cache on chip in DEC 21164 [6] second level on the processor module in Intel ....

R. Govindarajan, S.S. Nemawarkar, and Philip Lenir. Design and Performance Evaluation of a Multithreaded Architecture. In International Symposium on Computer Architecture, pages 298--307, 1995.


Converting Thread-Level Parallelism to.. - Lo, Eggers, Emer, .. (1997)   (43 citations)  (Correct)

....application. Their simulations do not include caches or TLBs. Prasadh and Wu [23] as well as Keckler and Dally [15] have proposed architectures in which VLIW operations from multiple threads are dynamically interleaved onto a processor. The architectures described by Govindarajan, et al. [11], Gunther [13] and Beckmann and Polychronopoulos [2] partition issue bandwidth among threads, and only one instruction can be issued from each thread per cycle. These architectures lack flexible resource sharing, which contributes to resource waste when only a single thread is running. Studies by ....

R. Govindarajan, S. Nemawarkar, and P. LeNir. Design and performance evaluation of a multithreaded architecture. In First IEEE Symposium on High-Performance Computer Architecture, pages 298--307, January 1995.


Timed Petri Net Models of Multithreaded Multiprocessor.. - Govindarajan, Suciu, al. (1997)   Self-citation (Govindarajan)   (Correct)

....instead of waiting the processor can switch to another thread and continue doing useful work. With multithreading, the processor utilization is largely independent of the latency in completing remote accesses. Several multithreaded architectures have recently been proposed in the literature [1, 6, 8, 10, 11, 15]. Analyzing the performance of such architectures is rather involved as it depends on a number of parameters related to the architecture memory latency time, context switching time, switch delay in interconnection network and a number of application parameters number of parallel ....

....switching time, switch delay in interconnection network and a number of application parameters number of parallel threads, runlengths of threads, remote memory access pattern and so on. The performance of multithreaded architectures have been evaluated using discrete event simulation [11, 15, 10], analytical models using either queuing networks or Petri Nets [2, 5, 17, 18, 14, 20] or using trace driven simulation [24] Petri nets have been proposed as a simple and convenient formalism for modeling systems that exhibit parallel and concurrent activities [19, 16] In order to take the ....

Govindarajan, R., Nemawarkar, S.S., LeNir, P., "Design and performance evaluation of a multithreaded architecture"; Proc. First IEEE Symp. on High-- Performance Computer Architecture, Raleigh, NC, pp.298-307, 1995.


Performance Bounds for Distributed Memory Multithreaded.. - Zuberek, Govindarajan (1998)   Self-citation (Govindarajan)   (Correct)

....architectures propose to extract instruction level parallelism by grouping instructions from multiple instruction streams or threads. We generically refer to these architectures as simultaneous multithreading. Several simultaneous multithreaded architectures have been reported in the literature [12, 10, 8, 17]. Designing a multithreaded processor is an intricate process since each design decision impacts upon others. For example, a single instruction thread size may be desirable (e.g. HEP [16] because instruction dependencies in pipelined execution can be eliminated (consecutive instruction are ....

....Since careful allocation of the CPU resource is vital for efficient execution of many applications, a larger thread size has to be tolerated so that suitable scheduling decision can be made. Several multithreaded architectures have been proposed which differ in the implementation of multithreading [1, 4, 5, 6, 8, 10, 12]. They differ in two basic aspects, in the number of instructions executed before switching to another thread (one, several, as many as possible) and the cause of context switching (every load, remote load) It is assumed in this paper that context switching can be performed very efficiently (in ....

Govindarajan, R., Nemawarkar, S.S., LeNir, P., "Design and performance evaluation of a multithreaded architecture"; Proc. First IEEE Symp. on High--Performance Computer Architecture, Raleigh, NC, pp.298-307, 1995.


Timed Petri Net Models of Multithreaded Multiprocessor.. - Govindarajan, Suciu.. (1997)   Self-citation (Govindarajan)   (Correct)

....instead of waiting, the processor can switch to another thread and continue doing useful work. With multithreading, the processor utilization is largely independent of the latency in completing remote accesses. Several multithreaded architectures have recently been proposed in the literature [Ag90, Al90, Cu91, GN95, Hi92, KD92]. Analyzing the performance of such architectures is rather involved as it depends on a number of parameters related to the architecture memory latency time, context switch time, switching delay in interconnection network and a number of application parameters number of parallel ....

....switch time, switching delay in interconnection network and a number of application parameters number of parallel threads, runlength of threads, remote memory access pattern and so on. The performance of multithreaded architectures have been evaluated using discrete event simulation [Hi92, KD92, GN95], analytical models using either queuing networks or Petri Nets [Ag91, AB91, NG93, NG95, Jo92, SB90] or using trace driven simulation [WG89] Petri nets have been proposed as a simple and convenient formalism for modeling systems that exhibit parallel and concurrent activities [Re85, Mu89] ....

Govindarajan, R., Nemawarkar, S.S., LeNir, P., "Design and performance evaluation of a multithreaded architecture"; Proc. First IEEE Symp. on High--Performance Computer Architecture, Raleigh, NC, pp.298-307, 1995.


Performance Balancing in Multithreaded Multiprocessor Systems - Zuberek, al. (1998)   Self-citation (Govindarajan)   (Correct)

....the interconnection network and a number of application parameters (average) number of parallel threads, average number of instructions per thread, memory (local remote) access pattern, etc. The performance of multithreaded architectures has been evaluated using discrete event simulation [8, 10, 6], analytical models using either queueing models or Petri nets [2, 11] or using trace driven simulation [12] The approach used in this paper is based on general rules of operational analysis [5] These rules are used to derive simple conditions for a multithreaded multiprocessors system to ....

Govindarajan, R., Nemawarkar, S.S., LeNir, P., "Design and performance evaluation of a multithreaded architecture"; Proc. First IEEE Symp. on High-- Performance Computer Architecture, Raleigh, NC, pp.298-307, 1995.


Classification and Performance Evaluation of Simultaneous.. - Krishna, Govindarajan (1997)   Self-citation (Govindarajan)   (Correct)

....threads to issue instructions to multiple functional units in each cycle. The objective of SM is to substantially increase the instruction level parallelism that can be exploited by the processor. Several simultaneous multithreaded (SM) architectures have been reported in the literature [4, 5, 6, 7, 10]. In this paper, we propose a classification of SM architectures. The classification is based on (i) the maximum number of threads from which instructions can be issued in each cycle, ii) the maximum number of instructions that can be issued from each thread in a single cycle, and (iii) the ....

....(iii) the number of operations in each instruction. The proposed classification helps us to understand the present trend of technology and to explore new avenues for greater performance in multithreaded architecture. Based on this classification, we propose a modification to the LCM architecture [5] which better utilizes the hardware features of modern superscalar architectures. We evaluate the performance of SM architectures using discrete event simulation. Lastly, we report the effect of different network routing techniques on the performance of SM architectures. The rest of the paper is ....

[Article contains additional citation context not shown here]

R. Govindarajan, S. S. Nemawarkar, and P. LeNir. Design and performance evaluation of a multithreaded architecture. In Proc. of the 1st Intl. Symp. on High-Performance Computer Architecture, pages 298--307, Jan. 1995.


Timed Colored Petri Net Models of Distributed Memory.. - Zuberek.. (1998)   (1 citation)  Self-citation (Govindarajan)   (Correct)

....and processors can be hidden by other activities. The same hardware mechanisms can be used to synchronize interprocess communication and to alleviate operating system overheads. Several multithreaded architectures have recently been proposed which differ in the implementation of multithreading [1, 3, 5, 7, 9]. Switching from one thread to another can be performed under different circumstances [4] ffl Switching on every instruction: one instruction is picked from each of runnable threads and is inserted into the processor s pipeline; if there are many threads, then each stage of the pipeline is ....

....switching time, switch delay in the interconnection network and a number of application parameters number of parallel threads, runlengths of threads, remote memory access pattern and so on. The performance of multithreaded architectures have been evaluated using discrete event simulation [9, 12, 7], analytical models using either queuing networks or Petri nets [2, 17] or using trace driven simulation [19] Event driven simulation of the timed colored net models [21] was used to obtain the results presented in this paper. This paper presents Petri net models of several multithreaded ....

Govindarajan, R., Nemawarkar, S.S., LeNir, P., "Design and performance evaluation of a multithreaded architecture"; Proc. First IEEE Symp. on High--Performance Computer Architecture, Raleigh, NC, pp.298--307, 1995.


Exploiting Thread-Level Parallelism On . . . - Lo (1998)   (Correct)

No context found.

R. Govindarajan, S. Nemawarkar, and P. LeNir. Design and performance evaluation of a multithreaded architecture. In First IEEE Symposium on High-Performance Computer Architecture, pages 298--307, January 1995.


Execution Performance of the Scheduled Dataflow Architecture - Kavi   (Correct)

No context found.

R. Govindarajan, S.S. Nemawarkar and P. LeNir. "Design and performance evaluation of a multithreaded architecture", Proceeding of the first High Performance Computer Architecture (HPCA-1), Jan. 1995, pp 298-307.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC