31 citations found. Retrieving documents...
B. J. Smith. A pipelined, shared resource MIMD computer. In Proceedings of International Conference on Parallel Proc essing, pages 6--8, 1978.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Fine Grained Multithreading with Process Calculi - Lopes, Silva, Vasconcelos (2001)   (Correct)

....triggered by the availability of all input values to an instruction (the firing rule) This makes the model totally asynchronous and the instructions self scheduling. Dataflow architectures range from pure dataflow [3, 10, 24] hybrid dataflow control flow [8, 22, 23] and lately multithreaded RISC [1, 2, 26] designs. Multithreading aims to provide high processor utilization in the presence of large memory or interprocessor communication latency. These high latency operations are overlapped with computation by rapidly switching to the execution of other threads. Next generation microprocessor design ....

B. Smith. A Pipelined, Shared Resource MIMD Computer. In International Conference on Parallel Programming - ICPP'78, pages 6--8, 1978.


Performance Tradeoffs In Multithreaded Processors - Agarwal (1991)   (38 citations)  (Correct)

....are called finely multithreaded processors, and the others are called coarsely multithreaded processors or block multithreaded processors. Several processor designs have used multithreading to mask communication and synchronization latencies, or to utilize deep pipelines effectively, e.g. [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. By multithreading a processor such that an instruction from a different thread can be initiated every cycle (or every few cycles) pipeline bubbles due to pipeline dependencies or processor stalls due to memory latency can be prevented. Processors in message passing multicomputers often maintain ....

....state that must be saved and restored on a process switch, but also increases the difficulty of cleanly halting the pipeline. For example, if a remote memory operation occurs, successive instructions in the pipeline must be stalled (however, special support in the form of register full empty bits [2] can reduce the need to stall the pipeline) A switch to a new thread must save the program counters and the processor status word, increment the context pointer, and restart the pipeline. Program counters and status words can be saved in register frames, or they can be implemented in separate ....

[Article contains additional citation context not shown here]

B.J. Smith. A Pipelined, Shared Resource MIMD Computer. In Proceedings of the 1978.


Daisy, DSI and LiMP - Issues in Architecture for Suspending.. - Johnson   (Correct)

....rsvp transactions dominate in graph processing programs. Hence, an LPE executing a single process would make headway of about 1 P; at best. Thus, take P to be the maximal degree of multiplexing an LPE can do in its local process space. This is similar to process pipelining in HEP architecture [Sm78], and is also discussed by Halstead as a direction for Multilisp [Ha85] The true degree of multiplexing cannot reach P because this would flood the routing network, increase blockage, and degrade performance. It is essential to leave enough holes to ameliorate blockage. With multitasking, an ....

B. J. Smith. A pipelined, shared resource MIMD computer. in Proc. International Conf. on Parallel Processing, 1978.


Fine Grained Multithreading with Process Calculi - Lopes, Silva (2000)   (Correct)

....triggered by the availability of all input values to an instruction (the firing rule) This makes the model totally asynchronous and the instructions self scheduling. Dataflow architectures range from pure dataflow [3, 10, 24] hybrid dataflow control flow [8, 22, 23] and lately multithreaded RISC [1, 2, 26] designs. Multithreading aims to provide high processor utilization in the presence of large memory or interprocessor communication latency. These high latency operations are overlapped with computation by rapidly switching to the execution of other threads. Next generation microprocessor design ....

B. Smith. A Pipelined, Shared Resource MIMD Computer. In International Conference on Parallel Programming - ICPP'78, pages 6--8, 1978.


Multithreading: A Revisionist View of Dataflow Architectures - Papadopoulos, Traub (1991)   (44 citations)  (Correct)

....that the two models are simply the extrema of an architectural continuum. The von Neumann architecture can be extended into a multithreaded model by replicating the program counter and register set, and by providing primitives to synchronize among the several threads of control. The Denelcor HEP [14] interleaved up to 64 threads per processing element in order to hide the latency of remote memory references. More recently, researchers have suggested rapid context switching among fewer threads to mitigate the cost of occasional cache misses [15, 1] While useful for dealing with unpredictable ....

B. J. Smith. A Pipelined, Shared Resource MIMD Computer. In Proceedings of the 1978 International Conference on Parallel Processing, pages 6--8, 1978.


Synchronization and Pipeline Design for a Multithreaded Massively.. - Sakai (1992)   (2 citations)  (Correct)

....include a von Neumann architecture (typically, a RISC architecture) which efficiently executes sequential threads with a set of registers and an advanced control pipeline. The concept of multithreading is not exclusive to the extension of dataflow architectures. For instance, the Denelcor HEP [1] and the Tera Computing System [6] are multithreaded computers in the sense that they execute and control multiple threads in a single pipeline. Dally s J machine [16] does not interleave multiple threads, but it can switch between threads very quickly; thus, we can say that it actually supports ....

....mechanism simplifies the synchronization itself and eases the understanding of what is happening in the synchronization stages. To design and to debug are both much easier than the other data driven methods. 3 Pipeline Design 3. 1 Basic Methodologies Conventional dataflow machines and the HEP[1] have a pipeline where multiple threads share its slots at the same time. The pipeline of this type is called a circular pipeline[3] The circular pipeline has several advantages as follows. 13 ffl It exploits the parallel activities maximally in a single pipeline. There is no overhead of a ....

Smith, B.J.: A Pipelined, Shared Resource MIMD Computer, Proc. of 1978 ICPP, pp.6-8 (1978).


Virtual Memory Mapped Network Interface for the.. - Blumrich, Li.. (1994)   (238 citations)  (Correct)

....is that shared data is stored in only one place, so all but one of the sharers must access it remotely. By 10 contrast, our automatic update scheme lets a producer and consumer share data without requiring a timeconsuming remote access. Several parallel architectures use multiple threads [22, 29, 2, 1] to overlap communication with computation. These approaches require applications or compilers to create multiple threads on each node, and require the node CPU to switch thread contexts very fast. The idea of automatic update data delivery in our network interface is derived from the Pipelined ....

Burton J. Smith. A pipelined, shared resource MIMD computer. In Proceedings of International Conference on Parallel Processing, pages 6--8, 1978.


Design and Performance of Multithreaded Architectures - Thekkath (1995)   (Correct)

....then, the CDC 6600 s I O processor [Tho64, Tho70] and the Xerox Alto [TML 82] computer have used multiple register sets to switch between different I O jobs. Multithreading, as we know it, i.e. as a technique to tolerate long memory latencies, had its origin in Burton Smith s HEP machine [Smi78, Smi81] The HEP simultaneously supported 128 processes in hardware. Since it did not implement hazard detection logic in the execution pipeline, it issued an instruction from a different register set every cycle, ensuring that no two instructions in the pipeline were from the same process. Since ....

Burton J. Smith. A pipelined, shared resource MIMD computer. In G. Jack Lipovski, editor, International Conference on Parallel Processing, pages 6--8. IEEE, August 1978.


Robust, High-Speed Network Design for Large-Scale Multiprocessing - DeHon (1993)   (1 citation)  (Correct)

....reception. In dataflow programs, latency determines the delay between the computation of a data value and the time when the value can actually be used. Data parallel operations are limited by the rate at which processors can obtain access to the data on which they need to operate. Multithreaded ( Smi78] Jor83] ALKK90] SBCvE90] CSS 91] NPA92] and dataflow ( ACM88] AI87] PC90] architectures have been developed to mitigate communication latency by hiding its effects. These techniques all rely on an abundance of parallelism to provide useful processing to perform while waiting on slow ....

B. J. Smith. A Pipelined, Shared-Resource MIMD Computer. In Proceedings of the 1978 International Conference on Parallel Processing, pages 6--8, 1978.


Local Memory Reference Behavior of Fine-Grain.. - Motomura, Papadopoulos (1993)   (Correct)

....2 A Fine Grain Multithreaded Execution A number of approaches to multithreading for latency tolerance have been proposed and several have been implemented. Many schemes provide a substantial amount of hardware support for representing and scheduling the multiple contexts. The Denelcor HEP [9] provided cycle by cycle interleaving of threads which are each given their own hardware context of local registers. A thread which is blocking on a network transaction is not scheduled by a processor, and thus does not directly induce idle cycles, until the dependent transaction completes. Other ....

B. J. Smith. A Pipelined, Shared Resource MIMD Computer. In Proceedings of 1978 International Conference on Parallel Processing, 1978, Pages 6-8.


Virtual Memory Mapped Network Interface for the.. - Blumrich, Alpert.. (1993)   (238 citations)  (Correct)

....approach to support shared memory. Examples include Memnet[9] Merlin [19] and its successor SESAME [30] the Plus system [4] and Galactica Net [15] These systems do not provide a mechanism for high bandwidth, lowoverhead block data transfer. Several parallel architectures use multiple threads [20, 27, 2, 1] to overlap communication with computation. These approaches require applications or compilers to create multiple threads on each node, and require the node CPU to switch thread contexts very fast. The idea of automatic update data delivery in our network interface is derived from the Pipelined ....

Burton J. Smith. A pipelined, shared resource MIMD computer. In Proceedings of International Conference on Parallel Processing, pages 6--8, 1978.


Closing the Window of Vulnerability in Multiphase Memory.. - Kubiatowicz (1993)   (20 citations)  (Correct)

....remote transaction machine, and network interface. In addition, the cache tags file consists of two banks of static RAM and comprises roughly one third of the chip area. The cache management block is responsible for handling cache fills and invalidations, as well as full empty bit synchronization [14, 26]; it is responsible for processing all control (non data) messages for the processor side of the LimitLESS cachecoherence protocol [9] The memory management block handles the memory side of the LimitLESS protocol, as well as memory requests from the local processor and DMA requests from the ....

B.J. Smith. A Pipelined, Shared Resource MIMD Computer. In Proceedings of the 1978 International Conference on Parallel Processing, pages 6--8, 1978.


*T: A Multithreaded Massively Parallel Architecture - Nikhil, Papadopoulos, Arvind (1992)   (27 citations)  (Correct)

....network interface for message handling is well integrated into the processor pipeline. Thus, TTDA, Monsoon, P RISC and T are properly viewed as steps in the evolution of the node architecture of dataflow MPA s. Important architectural influences in our thinking have been the seminal Denelcor HEP [30] and Iannucci s Dataflow von Neumann Hybrid architecture [20] which incorporated some dataflow ideas into von Neumann processors for MPA s (similar ideas were also expressed by Buehrer and Ekanadham [9] ETL s EM 4 [28] Sandia s Epsilon [15] Dennis and Gao s Argument Fetching Dataflow machine ....

....extremely difficult for the programmer compiler to schedule threads precisely. Therefore, it is essential to have a large continuation namespace. The size of the hardware supported continuation namespace varies greatly in existing designs: from 1 in DASH [21] 4 in Alewife [1] 64 in the HEP [30], and 1024 in the Tera [3] to the local memory address space in Monsoon [25] Hybrid Dataflow von Neumann [20] MDP [12] and T. Of course, if the hardware supported namespace is small, one can always virtualize it by multiplexing in software, but this has an associated overhead. 3.3 Fine Grain ....

[Article contains additional citation context not shown here]

B. J. Smith. A Pipelined, Shared Resource MIMD Computer. In Proc. Intl. Conf. on Parallel Processing, pages 6--8, 1978.


Compiler Optimizations For Parallel Loops With Fine-Grained.. - Chen (1994)   (5 citations)  (Correct)

....(1; 0) can be automatically preserved by the forward execution of the outer loop in each processor. Other vectors in the BDVS have to be enforced only at the boundaries of the adjacent strips with explicit synchronization using synchronization primitives such as post wait [Par88] FULL EMPTY bit [Smi78] Cedar synchronization [EPY89] or by a process based synchronization scheme described in [SY89] that is, many explicit synchronizations can be eliminated if the source and the sink of a dependence are in the same strip and are executed by the same processor. For the example given in Figure ....

B. J. Smith. A pipelined, shared resource mimd computer. In Int'l. Conf. on Parallel Processing, pages 6--8, Aug. 1978.


R-CODE A Very Capable Virtual Computer - Walton (1995)   (Correct)

....the fact that they can be executed in data flow order: each primitive operation (e.g. add) can execute whenever its inputs are ready. To maximize the parallelism this permits, special hardware is needed to detect when the inputs to an operation are ready (see, for example, the work of Burton Smith[Smi78, ACC 90] and Arvind[Nik91, Pap90] But R CODE must run efficiently without special hardware. Therefore R CODE uses a compromise. It is possible to compile a functional language efficiently for a sequential computer as long as all the variables are register like, in that they are local to a ....

....the fact that they can be executed in dataflow order: each primitive operation (e.g. add) can execute whenever its inputs are ready. To maximize the parallelism this permits, special hardware is needed to detect when the inputs to an operation are ready (see, for example, the work of Burton Smith[Smi78, ACC 90] and Arvind[Nik91, Pap90] But R CODE must run efficiently on existing sequential computers. Therefore R CODE uses a compromise. It is possible to compile a functional language efficiently for a sequential computer as long as all the variables are register like, in that they are ....

[Article contains additional citation context not shown here]

Burton J. Smith. A pipelined, shared resource MIMD computer. In Proceedings of the International Conference on Parallel Processing, pages 6--8. IEEE, August 1978.


Latency Tolerance through Multithreading in Large-Scale .. - Kurihara, Chaiken.. (1991)   (12 citations)  (Correct)

....useful computation. This strategy attempts to hide the latency of interprocessor communication by allowing multiple outstanding transactions per processor. While previous architectures have implemented multithreading with cycle by cycle interleaving of instructions from different processes [11, 12, 16, 21] (termed fine multithreading) we use the same name for systems that interleave blocks of instructions from different processes as well [3, 23] termed coarse or block multithreading) Block multithreaded processors do not force a context switch every cycle and can achieve high single thread ....

B.J. Smith. A Pipelined, Shared Resource MIMD Computer. In Proceedings of the 1978 International Conference on Parallel Processing, pages 6--8, 1978.


The Impact of Synchronization and Granularity on Parallel Systems - Chen (1990)   (28 citations)  (Correct)

....Barriers are quite effective in enforcing data dependences between different loops if they are used properly. However, they are not for enforcing data dependences between iterations within the same loop. With efficient lowlevel data synchronization supports such as that used in Cedar [25] HEP [21], and Horizon [15] data dependences can be enforced explicitly, which allows a loop with cross iteration data dependences (a socalled Doacross loop [6] to execute its iterations in parallel with some degree of overlap. It also allows iterations from different loops to be executed in parallel ....

B. J. Smith. A pipelined, shared resource mimd computer. 1978 Int. Conf. on Parallel Processing, 6--8, Aug. 1978.


On Effective Execution of Non-Uniform DOACROSS Loops - Chen, Yew (1996)   (4 citations)  (Correct)

....(1; 0) can be automatically preserved by the forward execution of the outer loop in each processor. Other vectors in the BDVS have to be enforced only at the boundaries of the adjacent strips with explicit synchronization using synchronization primitives such as post wait [20] FULL EMPTY bit [24], Cedar synchronization [12] or by a process based synchronization scheme as described in [25] That is, many explicit synchronizations can be eliminated if the source and the sink of a dependence are in the same strip and are executed by the same processor. For the example given in Figure 7, ....

B. J. Smith. A pipelined, shared resource mimd computer. In Int'l. Conf. on Parallel Processing, pages 6--8, Aug. 1978.


Performance Evaluation of the Sylvan Multiprocessor.. - Burkowski, Clarke..   (Correct)

....Processors The purpose of the context handlers is to assist the Taskmaster in context switching the processors as quickly as possible. A better solution would be to provide hardware support for fast context switching in the processor itself, such as that prototyped in the Denelcor HEP[29]. Some current RISC processors, such as the AMD 29000[16] and the Sun SPARC[12] provide support for multiple contexts in the form of multiple register banks, and TID tagging of cache and TLB entries. The APRIL project at MIT[1] is investigating the use of multiple register banks in just this way ....

Burton J. Smith. A Pipelined, Shared Resource MIMD Computer. In Proceedings of the International Conference on Parallel Processing, 1978.


APRIL: A Processor Architecture for Multiprocessing - Agarwal (1990)   (186 citations)  (Correct)

....due to synchronization latency. Spin lock accesses have a low overhead of memory requests, but busy waiting on a synchronization event wastes processor cycles. Synchronization mechanisms that avoid busy waiting through process blocking incur a high overhead. Full empty bit synchronization [22] in a rapid context switching processor allows efficient fine grain synchronization. This scheme associates synchronization information with objects at the granularity of a data word, allowing a low overhead expression of maximum concurrency. Because the processor can rapidly switch to other ....

....work from other threads. This reduces the negative effects of synchronization on processor utilization. This paper describes the architecture of APRIL, a processor designed for large scale multiprocessing. APRIL builds on previous research on processors for parallel architectures such as HEP [22], MASA [8] PRISC [19] 14] 15] and [18] Most of these processors support fine grain interleaving of instruction streams from multiple threads, but suffer from poor singlethread performance. In the HEP, for example, instructions from a single thread can only be executed once every 8 cycles. ....

[Article contains additional citation context not shown here]

B.J. Smith. A Pipelined, Shared Resource MIMD Computer. In Proceedings of the 1978 International Conference on Parallel Processing, pages 6--8, 1978.


The Cilk System for Parallel Multithreaded Computing - Joerg (1996)   (33 citations)  (Correct)

....communication. Executing efficiently under these conditions requires a platform with cheap thread creation and scheduling, as well as a high bandwidth, low overhead communication infrastructure. There are several machines which have been designed with these characteristics in mind, such as HEP [Smi78] Tera [AAC 92] and dataflow machines such as Monsoon [PC90] and the EM 4 [SKY91] Most existing machines do not have these characteristics, however. As analysis techniques improve, compilers are becoming better able to exploit locality in these programs and to increase the thread lengths. ....

....good machine utilizations at reasonable cost. These heuristics usually work well for a broad class of applications, making it possible to implement the scheduling and placement task as a fairly generic service that resides at the core of the runtime system. Several research machines, such as HEP [Smi78] the Monsoon dataflow system [PC90] and the forthcoming Tera machine [AAC 92] have been designed expressly to support multithreaded computations. These machines provide highly integrated, low overhead, message interfaces as well as hardware support for scheduling and synchronization. ....

Burton J. Smith. A pipelined, shared resource MIMD computer. In Proceedings of the 1978 International Conference on Parallel Processing, pages 6--8, 1978.


Accounting for Memory Bank Contention and Delay in.. - Blelloch (1995)   (20 citations)  (Correct)

....for the memory from its use. Vectorization has been used for over 20 years to hide memory latency and has the advantage that it is simple to implement. On the other hand it restricts the kinds of program that can be used. Multithreading was suggested and implemented for hiding latency on the hep [45] and was later used in the design of the tera and Sparcle [1] Multithreading is more complicated to implement than vectorization but permits the use of a wider class of programs. Prefetching and non blocking caches are becoming common on commodity processors, although the number of outstanding ....

B. J. Smith. A pipelined, shared resource MIMD computer. In Proceedings International Conference on Parallel Processing, 1978.


Hardware And Software For Functional And Fine Grain Parallelism - Beckmann (1993)   (16 citations)  (Correct)

....the latency of instructions is not fixed) the next instruction address is passed through the functional unit pipeline in parallel with the data. Instructions from independent tasks may thus be presented to the functional unit on successive clock cycles, as in other multithreaded architectures [97, 104]. active matrix TEST BITS SET BITS D E U S S I Y D A E R x u m SATISFIED logic e d o c e d mux ack offsets start task ID OP ready instruction address Figure 7.3 Basic ATG Scheduling Unit Each instruction word contains two fields: a functional unit instruction, which is like an ordinary ....

.... approach [52] As a means of avoiding processor idle time due to these delays, a number of multithreaded architectures have been proposed that attempt to hide such longlatency operations by maintaining multiple threads of execution on a single processor and rapidly context switching between them [97, 104, 7, 52, 76, 80, 28, 79, 113, 47, 2]. This section proposes a method for achieving the same results on a conventional processor. Memory and synchronization operations are examples of operations incurring long periods of latency. In pipelined machines, any number of instructions, including floating point operations, may incur long ....

Burton Smith. A pipelined, shared resource MIMD computer. In Proceedings 1978 International Conference on Parallel Processing, pages 6--8, 1978.


A Simulator for a Multithreaded Processor - Adda, Niar, Bleuel, Lopez   Self-citation (Pipelined)   (Correct)

....communication) Memory latency hiding in a multithreaded processor is realized by switching to another thread, when the executed thread is enable to progress possibly, a reference to a non local memory element is performed. The HEP computer was the first multithreaded multiprocessor computer [5]. A processor can execute up to 128 threads. The architecture supports fine grain parallelism by issuing one instruction from one thread per cycle. Synchronization between threads is maintained by using tagged bits on each memory location and registers. TERA [3] is a scientific multi users ....

B. J. Smith, A Pipelined, Shared Resource MIMD Computer, Inter. Conf. on Parall. Processing, P6-8 1978.


C-Miner: Mining Block Correlations in Storage Systems - Li, Chen, Srinivasan, Zhou (2004)   (1 citation)  (Correct)

No context found.

B. J. Smith. A pipelined, shared resource MIMD computer. In Proceedings of International Conference on Parallel Proc essing, pages 6--8, 1978.


Dynamic Tracking of Page Miss Ratio Curve for Memory.. - Zhou, Pandey.. (2004)   (Correct)

No context found.

B. J. Smith. A pipelined, shared resource MIMD computer. In Proceedings of International Conference on Parallel Proc essing, pages 6--8, 1978.


Latency Tolerant Architectures - Bennett (1998)   (2 citations)  (Correct)

No context found.

B.J. Smith. A pipelined, shared resource MIMD computer. In 1978 International Conference on Parallel Processing, pages 6--8, August 1978.


C-Miner: Mining Block Correlations in Storage Systems - Li, Chen, Srinivasan, Zhou (2004)   (1 citation)  (Correct)

No context found.

B. J. Smith. A pipelined, shared resource MIMD computer. In Proceedings of International Conference on Parallel Proc essing, pages 6--8, 1978.


Memory Management for Networked Servers - Zhou (2000)   (Correct)

No context found.

Burton J. Smith. A pipelined, shared resource MIMD computer. In Proceedings of International Conference on Parallel Proc essing, pages 6--8, 1978.


On Universal Classes of Extremely Random Constant Time Hash.. - Siegel   (Correct)

No context found.

B. Smith. A pipelined, shared resource MIMD computer, Proceedings 1978 International Conference on Parallel Processing, 1978, pp. 6--8.


Asynchrony in parallel computing: From dataflow to.. - Silc, Robic, Ungerer (1997)   (2 citations)  (Correct)

No context found.

B.J. Smith, A pipelined, shared resource MIMD computer, in Proc. 1978 ICPP, Aug. 1978, pp. 68.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC