127 citations found. Retrieving documents...
R. Nikhil, G. Papadopoulos, and Arvind. *T: A Multithreaded Massively Parallel Architecture. In Intl Symposium on Computer Architecture, pages 156--169, May 1992.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Fine Grained Multithreading with Process Calculi - Lopes, Silva, Vasconcelos (2001)   (Correct)

....In the dataflow model, computations are triggered by the availability of all input values to an instruction (the firing rule) This makes the model totally asynchronous and the instructions self scheduling. Dataflow architectures range from pure dataflow [3, 10, 24] hybrid dataflow control flow [8, 22, 23] and lately multithreaded RISC [1, 2, 26] designs. Multithreading aims to provide high processor utilization in the presence of large memory or interprocessor communication latency. These high latency operations are overlapped with computation by rapidly switching to the execution of other ....

R. Nikhil, G. Papadopoulos, and Arvind. *T: A Multithreaded Massively Parallel Architecture. In 19th International Symposium on Computer Architecture, pages 156-- 167, 1992.


Mini-threads: Increasing TLP on Small-Scale SMT Processors - Joshua Redstone Susan (2003)   (3 citations)  (Correct)

....48 37 29 5 Water spatial 24 8 3 9 partitioning to reduce context switch overhead. Multiple threads executing within a single stack frame and possibly a single register set have been investigated in the context of dataflow architectures on large parallel machines, such as T, pRISC, and TAM [20, 21, 10]. In these architectures, threads consists of only a few instructions, and are used to hide latencies of, for example, memory operations. The compiler manages register and stack frame usage between threads. This management is typically conservative because of the large number of threads and the ....

NIKHIL, R. S., PAPADOPOULOS, G. M., AND ARVIND. *T: A multithreaded massively parallel architecture. In Proceedings of the International Symposium on Computer Architecture (May 1992).


The M-Machine Multicomputer - Fillo, Keckler, Dally, Carter.. (1995)   (22 citations)  (Correct)

....of the MAP provide fast single thread execution as well as latency tolerance for better local memory bandwidth utilization. Furthermore, none of the multithreaded machines have multiple clusters for exploiting wide instruction level parallelism. Various machines optimized for dataflow languages [24, 16, 28] provide hardware support for fine grained synchronization between threads (usually via memory synchronization bits) but they do not exploit instruction level parallelism, nor do they provide low cost register based synchronization between threads. The X1MD architecture [33] uses multiple ALUs to ....

NIKHIL, R. S., PAPADOPOULOS, G. M., AND ARVIND. *T; A multithreaded massively parallel architecture. Computation Structures Group Memo 325-1, Laboratory for Computer Science, Massachusetts Institute of Technology, Nov. 1991.


An Analysis of Software Interface Issues for SMT Processors - Redstone (2002)   (1 citation)  (Correct)

....and suggest compiler based register file partitioning to reduce context switch overhead. Multiple threads executing within a single stack frame and possibly a single register set have been investigated in the context of dataflow architectures on large parallel machines, such as T, pRISC, and TAM [53, 54, 24]. In these architectures, threads consists of only a few instructions, and are used to hide latencies of, for example, memory operations. The compiler manages register and stack frame usage between threads. This management is typically conservative because of the large number of threads and the ....

NIKHIL, R. S., PAPADOPOULOS, G. M., AND ARVIND. *T: A multithreaded massively parallel architecture. In Proceedings of the International Symposium on Computer Architecture (May 1992). 121


Implementation of the StarT-Voyager Bus Interface Units - Conley (1997)   (Correct)

....other designs, including StarT Voyager, utilize a secondary protocol processor (along with a smaller amount of custom logic) to provide both forms of inter processor communication. 1. 2 The StarT Voyager Project StarT Voyager is a continuation of the parallel computing research begun in StarT [14] and continued in StarT NG [3] StarT and StarT NG were designed with the 12 aP Arctic Network L2 Cache I O Subsystem aP DRAM Union NES Card Data Figure 1 1: A StarT Voyager site dataflow programming model in general and the Id language [13] in particular; the Voyager project ....

....18 of the NESBufferOp, as detailed in Table 2.36. The exact encoding of the various operations, as well as the actions required for each one, are described in the following sections. Two common functions exist: any NESBuffer command can explicitly increment the DMARxDataQ CPtr, as indicated 47 [14,16] 00 DMA Receive Command 01 NES Mastered Operation 10 DMARx Mastered Operation 11 Miscellaneous Commands Table 2.36: NESBuffer Operations Field Content Description [0] X DMAIncrement [1:3] X Unused [4:13] X aSRAM Address [14:16] 100 Fixed Field [17] 0 Do not acknowledge 1 ....

Rishiyur S. Nikhil, Gregory M. Papadoupolus, and Arvind. *T: A Multithreaded Massively Parallel Architecture. In Proceedings of the 19th International Symposium on Computer Architecture, pages 224--235, May 1992.


Integration of Message Passing and Shared Memory.. - Heinlein.. (1994)   (40 citations)  (Correct)

....handles the complete transfer, thus taking cycles away from the main computation. Several systems have proposed delegating message protocol handling to a second processor on the node to alleviate overheads and allow for overlap of computation and communication. The Intel Paragon [tt] and T [2, 16] designs advocate the use of a second processor on the same node. The key issues in these designs are the level of integration with the network, the protocol handling performance of the second processor, and the additional cost and complexity of providing this processor. Paragon uses a ....

Rishiyur Nikhil, Gregory M. Papadopoulos, and Arvin& *T: A multithreaded massively parallel architecture. In Proceedings of the 19th International Symposium on Computer Architecture, pages 15(167, May 1992.


Dynamic Characteristics of Multithreaded Execution in.. - Hirofumi Sakane.. (1995)   (1 citation)  (Correct)

....data allocated to another processor. This remote memory latency is often regarded as the main bottleneck towards high performance. Various techniques have been developed to reduce tolerate hide the latency including data partitioning, runtime load balancing, multithreading, coherent cache, etc. [1, 2, 3, 4, 6, 7]. Of these techniques multithreading is known to be a latency tolerance approach. The idea behind multithreading is to overlap computation and communication such that the effect of communication is minimal, if not negligible. Studies have indicated that multithreading is effective for a ....

....approach. The idea behind multithreading is to overlap computation and communication such that the effect of communication is minimal, if not negligible. Studies have indicated that multithreading is effective for a dataflow based architecture suitable for fine grain parallel computations [4, 7, 8, 11]. An analytical study on multithreading indicated that increasing the number of threads exposed a saturated region in which little improvement is expected [1] However, this result is based trace level simulation, ignoring physical conditions such as actual network configuration. It is not clear ....

Nikhil,R.S., Papadopoulos,G.M., Arvind. *T: A Multithreaded Massively Parallel Architecture, Proc. 19 th Annual Int. Symp. on Computer Architecture, (1992), pp.156-167.


Functional Encapsulation and Type Reconstruction in a.. - Gupta (1995)   (2 citations)  (Correct)

....we have implemented all the three schemes for the same source language, compiler, and the target architecture. Our source language is Id, which is a polymorphic, strongly typed, implicitly parallel programming language [Nik91] We are compiling Id for the T multiprocessor architecture [NPA92, PBGB93] and executing it on an emulator for that machine. We have chosen a very simple mark and sweep garbage collection algorithm so that the 157 cost of object identification can be clearly identified during the mark phase. The wall clock performance of the garbage collection algorithm is ....

....rather than generating compositions of a fixed set of datatype marking functions as shown above. This would clearly reduce the overhead of using higher order marking functions. 8. 5 T Implementation T is a parallel, distributed memory machine with a high performance interconnection network [NPA92, PBGB93] The T architecture extends a basic RISC instruction set with low overhead, user mode communication and synchronization primitives. The details of the architecture may be found elsewhere [Bec92] In this section, we briefly summarize some of the 5 Readers familiar with Haskell s type ....

Rishiyur S. Nikhil, Gregory M. Papadopoulos, and Arvind. *T: A Multithreaded Massively Parallel Architecture. In Proceedings of the 19th International Symposium on Computer Architecture, Queensland, Australia. ACM Press, May 1992.


Fine Grained Multithreading with Process Calculi - Lopes, Silva (2000)   (Correct)

....In the dataflow model, computations are triggered by the availability of all input values to an instruction (the firing rule) This makes the model totally asynchronous and the instructions self scheduling. Dataflow architectures range from pure dataflow [3, 10, 24] hybrid dataflow control flow [8, 22, 23] and lately multithreaded RISC [1, 2, 26] designs. Multithreading aims to provide high processor utilization in the presence of large memory or interprocessor communication latency. These high latency operations are overlapped with computation by rapidly switching to the execution of other ....

R. Nikhil, G. Papadopoulos, and Arvind. *T: A Multithreaded Massively Parallel Architecture. In 19th International Symposium on Computer Architecture, pages 156-- 167, 1992.


Explicit Dynamic Scheduling: A Practical Micro-Dataflow.. - Beckmann.. (1993)   (Correct)

....resemble conventional RISC machines [9, 18, 25] Content addressable matching stores have been replaced by explicit token stores and synchronization bits in memory, and parallelism is exploited via light weight threads that are created and managed in software. This threaded execution model [11, 24, 23] is designed to deal effectively with the occasional long latency operation, as EDS is. However, since the granularity of threads is larger than individual instructions, the hybrid dataflow model is not appropriate for exploiting the finest grained parallelism, as EDS and static instruction ....

Rishiyur S. Nikhil, G. M. Papadopoulos, and Arvind. *T: A multithreaded massively parallel architecture. In Proceedings of the 19th International Symposium on Computer Architecture, pages 156--167, May 1992.


FUGU: Implementing Translation and Protection in a .. - Mackenzie.. (1994)   (9 citations)  (Correct)

....in the best case situation, although it provides concurrency in most other situations. Typhoon provides network protection by context switching the network in the manner of the CM 5 [17] Context switching the network fails to support a client server model of interprocess communication. The T [18] processor uses a memory coprocessor model as well. T does not include DMA facilities or coherent caches and thus does not address the interaction of these features with messaging. A recent T paper [20] has independently proposed protection mechanisms similar to FUGU s for short messages and ....

R. S. Nikhil, G. M. Papadopoulos, and Arvind. *T: A Multithreaded Massively Parallel Architecture. In Proceedings of the 19th International Symposium on Computer Architecture, pages 156--167. ACM, 1992.


Arctic Routing Chip - Boughton (1994)   (23 citations)  (Correct)

....which provides error detection, limited error handling, and in circuit testability. 1 Introduction Arctic is a four input four output packet router implemented on a Motorola H4CP gate array chip. Arctic has all the features necessary for use in a commercial multiboard multiprocessor such as T [7, 8, 1]. It has high bandwidth (200 MBytes sec port) two priority levels, packet sizes up to 96 bytes, and extensive error detection; it has limited error handling, keeps statistics, can directly drive long PC traces, and provides significant testing support. While Arctic has special features to support ....

R. S. Nikhil, G. M. Papadopoulos, and Arvind. *T: A Multithreaded Massively Parallel Architecture. In Proceedings of the 19th International Symposium on Computer Architecture, May 1992.


Infrastructure for Research towards Ubiquitous.. - Grosz, Kung.. (1994)   (Correct)

....and fast buffer mapping which would support higher performance than the equivalent software structures. This type of caching is already proposed for some network protocol software [19] Finally, we are considering the impact of a separate hardware context for the operating system. See [23][87][108] for information on multiple context processors in the area of multiprocessing. We want to see if this small amount of dedicated hardware could be an efficient way to reduce the system call overheads. Though this work is appropriate for and targeted toward today s network subsystems, this ....

R.S. Nikhil, G.M. Papadopoulos, and Arvind. *t: A multithreaded massively parallel architecture. In Proceedings of 19th Annual International Symposium on Computer Architecture, pages 156--167, May 1992.


Analysis of Communications and Overhead Reduction in.. - Roh, Najjar (1995)   (Correct)

....synchronizations of the dataflow model. There currently exists a wide array of multithreading processor architecture models reflecting a large set of design parameters. These design parameters include: ffl A blocking [Smi81, ACC 90, DFK 92, WG89] or nonblocking thread execution model [NPA92, SYH 89, SYH 91, HG93, HTG94, Vas94] Note that in a nonblocking execution the synchronization points are statically determined by the compilers while in a blocking This work is supported by NSF Grant MIP 9113268 execution the hardwrae must support dynamic synchronization. ffl The use of ....

....MIP 9113268 execution the hardwrae must support dynamic synchronization. ffl The use of hardware contexts with separate register sets [Smi81, ACC 90] ffl The degree of multithreading allowed. ffl The use of dataflow style matching for synchronization [RNSB94] or direct matching [PC90, NPA92, SYH 89] ffl Hardware support for synchronization [NPA92, SYH 89] or synchronizations in the software [CSS 91] Any of these multithreading processor architectures can serve as a building block for massively parallel processors. Conceptually, each thread runs on its own virtual ....

[Article contains additional citation context not shown here]

R. S. Nikhil, G. M. Papadopoulos, and Arvind. *T: A multithreaded massively parallel architecture. In Proc. 19 th Ann. Int. Symp. on Computer Architecture, pages 156--167, 1992.


Effective Caching for Multithreaded Processors - May, Irwin, Muller, Page (2000)   (1 citation)  (Correct)

....the algorithms outlined in Section 2.2 to automatically include information for dealing with the partitioned cache in the resultant machine code. Using these tools, we can measure the effectiveness of partitioned caches running benchmark programs. Our processor is modelled on the HEP [8] and T [13] multithreaded architectures. Commercially successful implementations of these ideas are currently being realised in machines such as the Tera MTA 8. Our architecture uses an execution unit, or processor, to switch between a number of process contexts which hold the state of each process being ....

R. S. Nikhil, G. M. Papadopoulos, and Arvind. *t : A multithreaded massively parallel architecture. In 19th International Symposium on Computer Architecture, pages 156-- 167, May 1992.


A New Parallelism Management Scheme for Multiprocessor.. - Verians, Legat.. (1999)   (2 citations)  (Correct)

....in parallel. If each processing is described by a large number of tasks, a sequential decoding cannot provide a comprehensive view on the global parallelism structure. A simultaneous decoding of different task graph parts is necessary. High level commands, such as classical fork join commands [5], are added to the task program. The key difference with other systems is that these commands are only used at a high level, where the command management overhead is negligible compared with the execution time of the multiple task sets. 3 Task Management 3.1 Parallelism Extraction Algorithm The ....

Nikhil, R.S., Papadopoulos, G.M., Arvind: *T: A Multithreaded Massively Parallel Architecture. Int. Symp. on Computer Architecture, 1992, 156--167


Mitsubishi Electric Research Laboratories - Cambridge Research Center   (Correct)

No context found.

R. Nikhil, G. Papadopoulos, and Arvind. *T: A Multithreaded Massively Parallel Architecture. In Intl Symposium on Computer Architecture, pages 156--169, May 1992.


Analyzing NIC Overheads in Network-Intensive Workloads - Binkert, Hsu, Saidi.. (2005)   (Correct)

No context found.

R. S. Nikhil, G. M. Papadopoulos, and Arvind. *T: A multithreaded massively parallel architecture. In Proc. 19th Ann. Int'l Symp. on Computer Architecture, pages 156--167, May 1992.


The Performance Potential of an Integrated Network.. - Binkert, Dreslinski.. (2004)   (Correct)

No context found.

R. S. Nikhil, G. M. Papadopoulos, and Arvind. *T: A multithreaded massively parallel architecture. In Proc. 19th Ann. Int'l Symp. on Computer Architecture, pages 156--167, May 1992.


Mitsubishi Electric Research Laboratories - Cambridge Research Center   (Correct)

No context found.

R. Nikhil, G. Papadopoulos, and Arvind. *T: A Multithreaded Massively Parallel Architecture. In Intl Symposium on Computer Architecture, pages 156--169, May 1992.


Low Latency Workstation Cluster Communications Using.. - Swanson, Stoller (1996)   (12 citations)  (Correct)

No context found.

Nikhil, R. S., Papdopoulous, G., and Arvind. *T: A Multithreaded Massively Parallel Architecture. In Proceedings of the 19th Annual International Symposium on Computer Architecture (May 1992), pp. 156--167.


Exploiting Thread-Level Parallelism On . . . - Lo (1998)   (Correct)

No context found.

R. Nikhil, G. Papadopoulos, and Arvind. *T: a multithreaded massively parallel architecture. In 19th Annual International Symposium on Computer Architecture, pages 156--167, May 1992.


Hardware and Software Mechanisms for Multithreading in.. - Bradford (2001)   (Correct)

No context found.

R.S. Nikhil, G.M. Papadopoulos, and Arvind. *T: A multithreaded massively parallel architecture. In Proceedings of the Nineteenth International Symposium on Computer Architecture, 1992.


Exploiting Fine-Grain Thread Level Parallelism on.. - Keckler, Dally.. (1998)   (14 citations)  (Correct)

No context found.

NIKHIL, R. S., PAPADOPOULOS, G. M., AND ARVIND. *T: A multithreaded massively parallel architecture. In Proceedings of the 19th International Symposium on Computer Architecture (May 1992), pp. 156-167.


Caches with Compositional Performance - Muller, Page, Irwin, May (2002)   (1 citation)  (Correct)

No context found.

R. Nikhil, G. Papadopoulos, and Arvind. *T: A Multithreaded Massively Parallel Architecture. In 19th International Symposium on Computer Architecture, pp 156# 167, May 1992.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC