| William J. Dally, J.A. Stuart Fiske, John S. Keen, Richard A. Lethin, Michael D. Noakes, Peter R. Nuth, Roy E. Davison, and Gregory A. Fyler. The message-driven processor: A multicomputer processing node with e#cient mechanisms. IEEE Micro, April 1992. 70 |
....is analogous to SNAP. However, SNAP adds architectural enhancements speci cally designed for network nodes, such as bit eld manipulation instructions. SNAP also lacks oating point and multiply divide capabilities, and uses DRAM for its onchip memory. SNAP bears some similarity to the JMachine [2] and Mosaic [13] which are message passing multiprocessors, but which lack the capabilities of the timer coprocessor. Wireless sensor networks such as those based on Berkeley s Motes [14] utilize low end microcontrollers from Atmel, while more higher performance sensor nodes utilize ....
W. Dally et al. The Message-Driven Processor: A Multicomputer Processing Node with Ecient Mechanisms. IEEE Micro, pages 23-39, April 1992.
....1: The 2 D generalized hypercube GH (2;4) connectivity, it offers a viable alternative to the shared memory and other distributed memory architectures. Many past and current massively parallel computers are based on meshes or k ary n cubes (e.g. Cray T3E, Intel Paragon, Tera and the design in [2] ) Unlike the mesh or k ary n cubes, the generalized hypercubes ( GH (n;k) where: n= number of dimensions and k= number of nodes in each dimension ) have k fully interconnected nodes in each dimension. As a result they have a very low diameter and a very high bisection width. However, the ....
W.J. Dally, et. al., The Message-Driven Processor: A Multicomputer Processing Node with Efficient Mechanisms, IEEE Micro, Vol. 12, Apr. 1992, pp. 23-39.
....[7] various forms of parcel computation are being pursued by other PIM projects as well including the DIVA project, the University of Notre Dame s PIM Lite project [8] and the work at the University of Delaware on percolation. Prior art includes but is not restricted to the MIT J Machine project [12]. The MIND architecture incorporates fine grain multithreading context switching mechanisms as an intrinsic component of every node on a MIND chip. While hardware multithreading may be employed as a latency hiding strategy, as in the Cray MTA, MIND also uses multithreading as a mechanism for ....
....serious consequences for temporal locality. This is an area requiring further work. But the current method is adequate for the PIM only systems such as Gilgamesh, which is the focus of this paper. 3. 6 Multithreaded Execution Multithreading in PIM was included in a simple form in the J Machine [12] and was considered more broadly by the HTMT project [7] The IBM Blue Gene BG C [2] and University of Notre Dame PIM Lite [8] projects are including multithreading in di#erent forms in those architectures. MIND continues this trend and expands the roles of multithreading in PIM design. The MIND ....
[Article contains additional citation context not shown here]
W.J.Dally et al. The Message Driven Processor: A Multicomputer Processing Node With E#cient Mechanisms. IEEE Micro, April 1992,pp.23-28.
....and is particularly well suited for programs that exhibit dynamic communication behavior or fine grain sharing. However, certain types of communication, such as the transfer of coarse grain data, can sometimes be achieved more efficiently using an explicit message passing model. Active messages [5, 23], which provide the extra ability to invoke simple computation at destination nodes along side the data transfer, further extend the scope of communication behaviors that may be optimized through the use of explicit messages (e.g. specialized synchronization such as barrier trees) The ....
....management. More recently, active messages have been proposed as a generalization of message Cache MAGIC Protocol (a) FLASH Node Architecture Figure 1: The passing to incorporate the capability of invoking computation at remote nodes along with the traditional data transfer functionality [5, 23]. Uses for active messages also range in complexity from performing simple computation at destination nodes (e.g. a fetch and op) to more complex computation approaching the functionality of remote procedure calls. Our primary goal for FLASH has been to provide a common set of mechanisms that ....
[Article contains additional citation context not shown here]
W. Dally, J. Fiske, J. Keen, R. Lethin, M. Noakes, P. Nuth, R. Davison, and G. Fyler. The message-driven processor: A multicomputer 12 processing node with efficient mechanisms. IEEE Micro, 12(2):23 39, 1992.
....HTMT architecture or execution model. For example, the Tera machine [8] has similar underlying architecture and supports multithreading at each processing node, but it does not put memory and processor on the same chip which completely changes its data and process organization. The J machine [6] [7] supports active messages and multithreading in a distributed environment and provides support for complex programming and execution models but does not consider multi level memory hierarchies in its architecture. The Beowulf [11] computer system distinguishes among di erent types of processing ....
W. J. Dally, J.A. Stuart Fiske, J. S. Keen, et., \The Message-Driven Processor: A Multicomputer Processing Node with Ecient Mechanisms," IEEE, 1992.
....give additional levels of complexity. Many systems have similar features to the HTMT architecture or execution model. The Tera machine [8] has a similar underlying architecture and supports multithreading, but does not put the memory and processor on the same chip. The J machine [19] [9] has active messages (vs. parcels in HTMT) and multithreading in a distributed environment but does not consider the memory hierarchy. The Beowulf [20] supports multithreading, groups different types of processing nodes into clusters to ease load balancing, and has two networks (interconnection ....
W. J. Dally, J. S. Fiske, J. S. Keen, and et. The messagedriven processor: A multicomputer processing node with efficient mechanisms. 1992.
....in Parcel parlance command ) is the same. Parcels, however, are likely to provide increased flexibility in that the the invoking of a command does not require that the command be pre resident on a given node, as is the case with a handler. This is very similar to work done on the MIT J Machine [10], which attempted to create an inexpensive massively parallel computer by supporting primitive mechanism for e#cient communication, synchronization and naming of fine grain threads. 2.8 Active Pages Active Pages [29] developed at UC Davis) attempt to move beyond the von Neumann bottleneck by ....
W.J. Dally, J. Fiske, J Keene, R. Lethin, M. Noakes, P. Nuth, R. Davidson, and G. Fyler. The Message-Driven Processor: A Multicomputer Processing Node with E#cient Mechanisms. IEEE Micro, April 1992.
....has processors and each processor is connected to its neighbors. Every processor has two neighbors in each dimension. Similar practical architectures include two dimensional rectangular meshes such as Intel s Paragon architecture [10] and three dimensional meshes such as the MIT Intel J machine [6]. We next consider codeword selection for the identification of vertices in dimensional ary cubes. Every vertex in this case can be assigned a coordinate vector of length , where Two vertices and are neighbors if Let be the parity vector corresponding to such that ( if is even (odd) For ....
W. J. Dally et al., "The message-driven processor: A multicomputer processing node with efficient mechanisms," IEEE Micro, vol. 12, pp. 23--39, Apr. 1992.
....communication overhead by integrating the communication with computation. 5.6.3 Implementations The simplicity of active messages and its closeness to hardware functionality translate into fast execution. Several machines were designed to implement them directly in hardware, such as J machine [91], nCube 2 [78] Monsoon [92] SP 2 [77] or as the network interface, in SUNMOS [21] or in both (CM 5 [22] Many different universities and organizations work on the implementation of active messages interfaces to support applications such as client server programs, file systems, operating ....
Dally, et al "The Message-Driven Processor: A Multicomputer Processing Node with Efficient Mechanisms," IEEE Micro, 4/92, pp.23-39.
....capture the essence of a parallel programming model at the software hardware boundary and to make communication lightweight. SCORE s use of persistent, streaming data flow is significant in overcoming many of the overheads that still made TAM expensive to implement. Mosaic [21] J and M machines [9] [12] were early multicomputers pioneering the tight integration of communication into the processor ISA. Nonetheless, their use of dynamic messages still required a few tens of cycles to send each message [18] Streaming in SCORE enables pipelined communication, reducing the time required to send ....
William J. Dally et al. The message-driven processor: A multicomputer processing node with e#cient mechanisms. IEEE Micro, pages 23--39, April 1992.
....to distributed arrays of PIM chips is unique and contributes a new dimension and opportunity to parallel system design and operation. However, many of the individual ideas have precedence in previous work performed in di erentcontexts. Some of those are brie y mentioned here. The J Machine project [17], conducted at MIT with important collaboration at Caltech, considered a system comprising an array of systems on a chip and employed an object based model for governing their global behavior. While the technology was inadequate at the time and the project premature, the use of a global ....
W.J.Dally et al. The Message Driven Processor: A Multicomputer Processing Node With Ecient Mechanisms. IEEE Micro, April 1992,pp.23-28.
....in Parcel parlance command ) is the same. Parcels, however, are likely to provide increased flexibility in that the the invoking of a command does not require that the command be pre resident on a given node, as is the case with a handler. This is very similar to work done on the MIT J Machine [10], which attempted to create an inexpensive massively parallel computer by supporting primitive mechanism for e#cient communication, synchronization and naming of fine grain threads. 2.8 Active Pages Active Pages [29] developed at UC Davis) attempt to move beyond the von Neumann bottleneck by ....
W.J. Dally, J. Fiske, J Keene, R. Lethin, M. Noakes, P. Nuth, R. Davidson, and G. Fyler. The Message-Driven Processor: A Multicomputer Processing Node with E#cient Mechanisms. IEEE Micro, April 1992.
....It is hoped that when the first thread is rescheduled, its communication operations have concluded. Multithreading can be done in software or hardware. Software multithreading is very expensive. Some hardware multithreading research architectures for message passing systems such as the JMachine [35], and the M Machine [52] have been reported. 7 In precommunication, communication operations are pulled up from the place that communications naturally occur in the program so that it is partially or entirely completed before data is needed. This can be done in software by inserting a ....
W. J. Dally, J. A. S. Fiske, J. S. Keen, R. A. Lethin, M. D. Noakes, P. R. Nuth, "The Message Driven Processor: A Multicomputer Processing Nodes with Efficient Mechanisms", IEEE Micro, April 1992, pp. 23-39.
....0:75 125M 66 ns 3:9 [YFJ 87] 1988 SPARC 1 Theta 32 (100 ) 12.1mm Theta12.7mm 0.75 273M 60 ns 2.0 [QC88, TFT 85] 1990 PA RISC 1 Theta 32 (100 ) 14mm Theta14mm 0:5 784M 11 ns 3:7 [TLB 90] 1990 SPARC 1 Theta 64 (75 ) 14.9mm Theta15.1mm 0.4 1.4G 25 ns 2. 4 [MMN 90] IEEE FPU (25 ) 1992 SuperSparc 2 Theta 32 (82 ) 16mm Theta16mm 0:4 1.6G 25 ns 2:0 [ANAB 92] IEEE FPU (18 ) 1992 Alpha 1 Theta 64 (81 ) 16.8mm Theta13.9mm 0:38 1.7G 5 ns 7:7 [DWA 92] IEEE FPU (19 ) 1994 PA RISC 2 Theta 64 (88 ) 14mm Theta15mm 0.28 2.8G 7 ns 7:4 [RDB 94] IEEE FPU (12 ) 1994 MIPS 1 ....
....273M 60 ns 2.0 [QC88, TFT 85] 1990 PA RISC 1 Theta 32 (100 ) 14mm Theta14mm 0:5 784M 11 ns 3:7 [TLB 90] 1990 SPARC 1 Theta 64 (75 ) 14.9mm Theta15.1mm 0.4 1.4G 25 ns 2.4 [MMN 90] IEEE FPU (25 ) 1992 SuperSparc 2 Theta 32 (82 ) 16mm Theta16mm 0:4 1. 6G 25 ns 2:0 [ANAB 92] IEEE FPU (18 ) 1992 Alpha 1 Theta 64 (81 ) 16.8mm Theta13.9mm 0:38 1.7G 5 ns 7:7 [DWA 92] IEEE FPU (19 ) 1994 PA RISC 2 Theta 64 (88 ) 14mm Theta15mm 0.28 2.8G 7 ns 7:4 [RDB 94] IEEE FPU (12 ) 1994 MIPS 1 Theta 32 (100 ) 7.9mm Theta8.8 mm 0.2 1.7G 2 ns 9:1 [SYN 94] 1995 PowerPC 2 Theta64 (87 ) ....
[Article contains additional citation context not shown here]
William J. Dally et al. The Message-Driven Processor: A Multicomputer Processing Node with Efficient Mechanisms. IEEE Micro, pages 23--39, April 1992.
....In all of these systems, the network interface is implemented using a separate chip, and the computing node is based on a standard processor architecture such as the SPARC, the DEC Alpha or the Intel Pentium. Some examples from academic research projects are the MIT Message Driven Processor (MDP) [22], the CMU iWarp [23] and the Caltech Mosaic [24] In all of these projects, the network interface is tightly coupled with the processor; i.e. the network interface and the processor are placed on the same silicon. The motivation for the tight coupling is to reduce the overhead associated with ....
....by special message registers (operands) and or instructions (operators) Most tightly coupled interface designs use special purpose message instructions (e.g. a send command) in which general purpose processor registers are the operands. Some examples include the Message Driven Processor (MDP) [22], the Caltech Mosaic [24] the Henry Joerg network interface [29] and the Start ( T) 30] network interface. An exception is iWarp from CMU [23] whose systolic communication model is based on operands rather than operators. A send command is constructed by using a message register as the ....
[Article contains additional citation context not shown here]
William J. Dally, J. A. S. Fiske, John S. Keen, Richard A. Lethin, Michael D. Noakes, Peter R. Nuth, Roy E. Davison and Gregory A. Fyler. The MessageDriven Processor: a multicomputer processing node with efficient mechanisms. IEEE Micro, April 1992, pp. 23-39.
....0:75 125M 66 ns 3:9 [YFJ 87] 1988 SPARC 1 Theta 32 (100 ) 12.1mm Theta12.7mm 0.75 273M 60 ns 2.0 [QC88, TFT 85] 1990 PA RISC 1 Theta 32 (100 ) 14mm Theta14mm 0:5 784M 11 ns 3:7 [TLB 90] 1990 SPARC 1 Theta 64 (75 ) 14.9mm Theta15.1mm 0.4 1.4G 25 ns 2. 4 [MMN 90] IEEE FPU (25 ) 1992 SuperSparc 2 Theta 32 (82 ) 16mm Theta16mm 0:4 1.6G 25 ns 2:0 [ANAB 92] IEEE FPU (18 ) 1992 Alpha 1 Theta 64 (81 ) 16.8mm Theta13.9mm 0:38 1.7G 5 ns 9:5 [DWA 92] IEEE FPU (19 ) 1994 PA RISC 2 Theta 64 (88 ) 14mm Theta15mm 0.28 2.8G 7 ns 7:4 [RDB 94] IEEE FPU (12 ) 1994 MIPS 1 ....
....273M 60 ns 2.0 [QC88, TFT 85] 1990 PA RISC 1 Theta 32 (100 ) 14mm Theta14mm 0:5 784M 11 ns 3:7 [TLB 90] 1990 SPARC 1 Theta 64 (75 ) 14.9mm Theta15.1mm 0.4 1.4G 25 ns 2.4 [MMN 90] IEEE FPU (25 ) 1992 SuperSparc 2 Theta 32 (82 ) 16mm Theta16mm 0:4 1. 6G 25 ns 2:0 [ANAB 92] IEEE FPU (18 ) 1992 Alpha 1 Theta 64 (81 ) 16.8mm Theta13.9mm 0:38 1.7G 5 ns 9:5 [DWA 92] IEEE FPU (19 ) 1994 PA RISC 2 Theta 64 (88 ) 14mm Theta15mm 0.28 2.8G 7 ns 7:4 [RDB 94] IEEE FPU (12 ) 1994 MIPS 1 Theta 32 (100 ) 7.9mm Theta8.8 mm 0.2 1.7G 2 ns 9:1 [SYN 94] 1995 PowerPC 2 Theta64 (87 ) ....
[Article contains additional citation context not shown here]
William J. Dally et al. The Message-Driven Processor: A Multicomputer Processing Node with Efficient Mechanisms. IEEE Micro, pages 23--39, April 1992.
.... network latency and the small amount of dedicated buffer space required at each node, wormhole routing has become the most promising switching technology and has been adopted in the Symult 2010 [10] the nCUBE 2 [1] and nCUBE3, the Intel Touchstone DELTA [11] and Intel Paragon, the MIT J machine [14], and the Caltech MOSAIC [15] For a survey of wormhole routing in direct networks, please refer to [16] Multicast routing for multicomputers has been studied previously in [9, 17] in which various graph models and multicast routing algorithms were proposed, but the deadlock problem was not ....
W. J. Dally, J. A. S. Fiske, J. S. Keen, R. A. Lethin, M. D. Noakes, P. R. Nuth, R. E. Davison, and G. A. Fyler, "The message-driven processor: A multicomputer processing node with efficient mechanisms," IEEE Micro, pp. 23--39, Apr. 1992. 29
No context found.
William J. Dally, J. A. Stuart Fiske, John S. Keen, Richard A. Lethin, Michael D. Noakes, Peter R. Nuth, Roy E. Davison, and Gregory A. Fyler. The Message-Driven Processor: A multicomputer processing node with efficient mechanisms. IEEE Micro, 12(2):23--39, April 1992.
No context found.
William J. Dally, J.A. Stuart Fiske, John S. Keen, Richard A. Lethin, Michael D. Noakes, Peter R. Nuth, Roy E. Davison, and Gregory A. Fyler. The message-driven processor: A multicomputer processing node with e#cient mechanisms. IEEE Micro, April 1992. 70
No context found.
William J. Dally, J. A. Stuart Fiske, John S. Keen, Richard A. Lethin, Michael D. Noakes, Peter R. Nuth, Roy E. Davison, Gregory A. Fyler, "The Message-Driven Processor: A Multicomputer Processing Node with Efficient Mechanisms", IEEE Micro, April 1992, pp. 23-38.
No context found.
William J. Dally, et al. The message-driven processor: A multicomputer pro- cessing node with efficient mechanisms. IEEE Micro, 12(2):23-39, April 1992.
No context found.
Dally, W., et al, "The Message Driven Processor: A Multicomputer Processing Node with Efficient Mechanisms," IEEE Micro, April 1992, pop. 23-38.
No context found.
W. J. Dally, J. A. S. Fiske, J. S. Keen, R. A. Lethin, M. D. Noakes, P. R. Nuth, R. E. Davison, and G. A. Fyler, "The Message-Driven Processor: A multicomputer processing node with efficient mechanisms," IEEE Micro, pp. 23--39, April 1992.
No context found.
William J. Dally et al. The message-driven processor: A multicomputer processing node with efficient mechanisms. IEEE Micro, pages 23--39, April 1992.
No context found.
W. J. Dally, J. Fiske, J. Keen, R. Lethin, M. Noakes, P. Nuth, R. Davison, and G. Fyler. The message-driven processor: A multicomputer processing node with efficient mechanisms. IEEE Micro, 12(2):23--39, April 1992.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC