18 citations found. Retrieving documents...
W. J. Dally. Fine-grain Message-Passing Concurrent Computers. In Proceedings of the Third Conference on Hypercube Computers, pages 2--12. Association for Computing Machinery, ACM Press, January 1988.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Planar-Adaptive Routing (par) :low-Cost Adaptive Networks For.. - Jae Kim Eng   (Correct)

....split networks into several layers by using virtual channels. Within the virtual networks, cyclic dependences of network resources can be removed by restricting the routing algorithm appropriately. 3.2. 1 Dally s 2D mesh adaptive router A prototypical model of the approach was given by Dally in [16]. Dally divided twodimensional mesh networks into two separate virtual networks by providing two virtual channels for each y dimensional link. Eastbound messages and westbound messages 3 are routed on separate virtual networks. These networks are completely decoupled. Figure 3.1 shows the ....

W. J. Dally. Fine-grain Message-Passing Concurrent Computers. In Proceedings of the Third Conference on Hypercube Computers, pages 2--12. Association for Computing Machinery, ACM Press, January 1988.


On the Theory of Interconnection Networks for Parallel Computers - Upfal (1994)   (1 citation)  (Correct)

....routing methods, and understanding the power and limitations of each method is a new challenge for the theory community. To capture the essence of these models, such a study should focus on simple algorithms that can be implemented with minimum overhead. 4. 1 Wormwhole Routing In wormhole routing [4, 10, 11, 12, 34], a message is transmitted as a contiguous stream of bits, physically occupying a sequence of nodes edges in the network. There are no queues in the intermediate nodes, and a node can only hold a (small) fraction of a message (a flit) The routing strategy is simple, keeping the overhead in the ....

W.J. Dally. Fine grain message passing concurrent computers. In Third Conference on Hypercube Concurrent Computers and Applications, pages 2--12. ACM Press, 1988.


A Comprehensive Study of Communication in Distributed-Memory.. - Schwiebert (1995)   (Correct)

....route the message. An acyclic channel dependency graph has also been used as a basis for developing adaptive routing algorithms defined by relations of the form R : C Theta N Theta N C p , where a set of output channels, rather than a single channel, is defined on which to route the message [5, 9, 13, 22, 38, 39, 40, 48, 67, 101]. Glass and Ni [40] and Boura and Das [5] have proposed techniques for generating deadlock free algorithms. Both of these methodologies require an acyclic channel dependency graph. Glass and Ni proposed the turn model as a method of analyzing routing algorithms based on the permitted and ....

....in all the positive directions. Another example is west first for 2D meshes, which routes a message West 41 Table 1: Overview of Adaptive Routing Algorithms for Meshes Author(s) Fully VCs for Comments Adaptive 2D Mesh Chien Kim [9] Yes 6 Partially Adaptive for Higher Dimensions Dally [13] Yes 6 2D Mesh Only Dally Aoki [16] Yes k 2D Mesh with k Theta k nodes Dally Aoki [16] Yes 8 Dynamic routing algorithm Glass Ni [39] Yes 6 2D Mesh Only Glass Ni [40] No 4 Roughly Half the Adaptiveness of Fully Adaptive Jesshope, Miller, Yes 8 Number of Virtual Yantchev [48] Channels ....

[Article contains additional citation context not shown here]

W. J. Dally. Fine-Grain Message-Passing Concurrent Computers. In Proceedings of the Third Conference on Hypercube Concurrent Computers, volume 1, pages 2--12, 1988.


Adaptive Routing in Mesh-Connected Networks - Glass, Ni (1992)   (24 citations)  (Correct)

....channel. An advantage of adding virtual or physical channels, however, is that it is possible to support routing algorithms with a high degree of adaptiveness. A minimal, fully adaptive algorithm can route a packet to its destination node along any of the shortest paths in the network. Dally [13] and Linder and Harden [12] describe such an algorithm for 2D meshes. Minimal refers to the routing of packets only along shortest paths. Algorithms that can route packets along longer paths are called nonminimal . Algorithms that cannot route packets along every shortest path are called ....

W. J. Dally, "Fine-grain message passing concurrent computers," in Proc. of the Third Conference on Hypercube Concurrent Computers, vol. 1, (Pasadena, CA.), pp. 2--12, Jan. 1988.


A Necessary and Sufficient Condition for Deadlock-Free.. - Loren Schwiebert (1996)   (5 citations)  (Correct)

....route the message. An acyclic channel dependency graph has also been used as a basis for developing adaptive routing algorithms defined by relations of the form R : C Theta N Theta N C p , where a set of output channels, rather than a single channel, is defined on which to route the message [2, 4, 6, 10, 16, 17, 18, 21, 24, 30]. Since a set of output channels is provided, a selection function is then used to select which of these output channels a message uses. Glass and Ni [18] and Boura and Das [2] have proposed methodologies for generating deadlockfree algorithms, but both proof techniques require an acyclic channel ....

W. J. Dally. Fine-Grain Message-Passing Concurrent Computers. In Proceedings of the Third Conference on Hypercube Concurrent Computers, volume 1, pages 2--12, 1988.


Maximally Fully Adaptive Routing in 2D Meshes - Glass, Ni (1992)   (21 citations)  (Correct)

....Miller, and Yantchev propose using just two of the virtual networks. Their 2 plane routing algorithm routes packets SW as far as necessary and then NE as far as necessary. The drawback to this second algorithm is that it is only partially adaptive. It routes packets SE and NW nonadaptively. Dally [12] and Linder and Harden [13] propose a different minimal and fully adaptive algorithm for 2D meshes. It requires two virtual channels per physical channel along only one of the two dimensions. We assume that this is the y dimension. Figure 3(a) shows a typical router in such a doube y network. The ....

W. J. Dally, "Fine-grain message passing concurrent computers," in Proc. of the Third Conference on Hypercube Concurrent Computers, vol. 1, (Pasadena, CA.), pp. 2--12, Jan. 1988.


A Necessary and Sufficient Condition for Deadlock-Free.. - Schwiebert, Jayasimha (1994)   (5 citations)  (Correct)

....route the message. An acyclic channel dependency graph has also been used as a basis for developing adaptive routing algorithms defined by relations of the form R : C Theta N Theta N C p , where a set of output channels, rather than a single channel, is defined on which to route the message [2, 4, 5, 9, 16, 17, 18, 20, 23, 28]. Glass and Ni [18] and Boura and Das [2] have proposed methodologies for generating deadlockfree algorithms, but both proof techniques require an acyclic channel dependency graph. Glass and Ni propose a method of analyzing routing algorithms based on the permitted and prohibited dependencies from ....

W. J. Dally. Fine-Grain Message-Passing Concurrent Computers. In Proceedings of the Third Conference on Hypercube Concurrent Computers, volume 1, pages 2--12, 1988.


A Realizable Efficient Parallel Architecture - Monien, Lüling, Langhammer (1992)   (5 citations)  (Correct)

....state wormhole routing problem on the two dimensional mesh are presented. It was shown that a nearly optimal injection rate can be reached and that messages are delivered with high probability in nearly optimal time. Other results for the steady state model were mostly gained by simulations [5, 6, 12]. In the following we present first results for the maximal network capacity of the Fat Mesh of Clos network if uniformly randomized routing is used. To do this, we have to study the bottlenecks of the network. Basically, there are two types of bottlenecks. One is induced by the bisection width of ....

W. Dally, Fine grain Message Passing Concurrent Computers, 3rd Conf. on Hypercube Concurrent Computers and Applications, ACM Press, 1988, pp. 2-12


The Message Flow Model for Routing in Wormhole-Routed Networks - Lin, McKinley, Ni (1995)   (16 citations)  (Correct)

....recursively use Theorem 3 to show that the remaining channels are deadlock immune. 4 Adaptive Unicast Routing for 2D Meshes In this section, we consider an enhanced version of the double Y channel algorithm for adaptive routing in 2D mesh networks. The original algorithm, which was discussed in [7] and [4] requires that an additional pair of channels be added to the Y dimension, as shown in Figure 3(a) The resulting network can be partitioned into two subnetworks, namely, the east subnetwork and the west subnetwork. Each pair of nodes neighboring in the Y dimension is connected by two ....

W. J. Dally, "Fine-grain message passing concurrent computers," in Proc. of the Third Conference on Hypercube Concurrent Computers, vol. 1, pp. 2--12, Jan. 1988.


The Turn Model for Adaptive Routing - Glass, Ni (1992)   (133 citations)  (Correct)

....channels already sharing the physical channel. An advantage of adding virtual or physical channels, however, is that they can support routing algorithms with a high degree of adaptiveness. A minimal, fully adaptive algorithm can route packets along any of the shortest paths in the topology. Dally [17] and Linder and Harden [16] describe such an algorithm for 2D meshes. A partially adaptive algorithm cannot route packets along every shortest path. This paper presents a model for designing wormhole routing algorithms that are deadlock free, livelock free, minimal or nonminimal, and maximally ....

W. J. Dally, "Fine-grain message passing concurrent computers," in Proc. of the Third Conference on Hypercube Concurrent Computers, vol. 1, (Pasadena, CA.), pp. 2--12, Jan. 1988.


A Theory of Wormhole Routing in Parallel Computers - Felperin, Raghavan, Upfal (1993)   (34 citations)  (Correct)

.... nodes, and breaking long messages into small packets and then reconstructing the original messages when the pieces reach their destination (possibly out of order) Therefore, a trend in multicomputer architecture is to use substantially simpler routing mechanisms, in particular wormhole routing [3, 6, 7, 8, 14, 11, 19]. A message is transmitted as a contiguous stream of bits, physically occupying a sequence of nodes edges in the network. There are no queues in the intermediate nodes, and a node can only hold a (small) fraction of a message. The routing strategy is oblivious, keeping the overhead in the ....

....include the Intel Delta machine, Intel iPSC 2, Intel Sigma, Symult S14, MIT J machine, MIT April, and others. Previous research on wormhole routing focuses on simulations that study the extent to which a network can be loaded (as a fraction of the available bandwidth) before it gets clogged [5, 7, 18]. Other research has addressed the issue of deadlock, since in wormhole routing it is possible for a set of messages to mutually block each others path [9] On a two dimensional mesh, for instance, it is known that if every message first travels along its row to the column of its destination, ....

[Article contains additional citation context not shown here]

W.J. Dally. Fine grain message passing concurrent computers. In Third Conference on Hypercube Concurrent Computers and Applications, pages 2--12. ACM Press, 1988.


Storage-Efficient, Deadlock-Free Packet Routing Algorithms.. - Cypher, Gravano (1994)   (1 citation)  (Correct)

....19] Algorithms in the first class often require less storage than those in the second class. However, the central queues can become sequential bottlenecks, so algorithms in the second class may offer better performance. In addition, it should be noted that deadlockfree wormhole routing algorithms [3, 5, 6, 7, 8, 11, 2, 13, 21, 26] can be used to obtain deadlock free store and forward and virtual cut through algorithms in the second class. No previously published technique yields a minimal store and forward or virtual cut through routing algorithm that requires only a constant number of queues per node in tori of arbitrary ....

W. J. Dally. Fine-grain message passing concurrent computers. In Proc. 3rd Conference on Hypercube Concurrent Computers and Applications, pages 2--12, 1988.


Optimal Fully Adaptive Minimal Wormhole Routing for Meshes - Loren Schwiebert (1995)   (Correct)

....is a 2 ary 8 cube and a 16 Theta 16 torus is a 16 ary 2 cube. A mesh is a torus without the wrap around channels. Routing algorithms for only mesh topologies are reviewed here, because the focus of this paper is on such topologies. Many adaptive routing algorithms for meshes have been proposed [2, 3, 5, 9, 10, 11, 12]. Table I summarizes the main features of each algorithm. In Table I, VCs is used as an abbreviation for number of bidirectional virtual channels per router. Table I: Overview of Adaptive Routing Algorithms for Meshes Author(s) Fully VCs for Comments Adaptive 2D Mesh Chien Kim [2] Yes 6 ....

....of each algorithm. In Table I, VCs is used as an abbreviation for number of bidirectional virtual channels per router. Table I: Overview of Adaptive Routing Algorithms for Meshes Author(s) Fully VCs for Comments Adaptive 2D Mesh Chien Kim [2] Yes 6 Partially Adaptive for Higher Dimensions Dally [3] Yes 6 2D Mesh Only Dally Aoki [5] Yes k 2D Mesh with k Theta k nodes Dally Aoki [5] Yes 8 Dynamic routing algorithm Glass Ni [9] Yes 6 2D Mesh Only Glass Ni [10] No 4 Roughly Half the Adaptiveness of Fully Adaptive Jesshope, Miller Yes 8 Number of Virtual Yantchev [11] Channels is ....

[Article contains additional citation context not shown here]

W. J. Dally. Fine-Grain Message-Passing Concurrent Computers. In Proceedings of the Third Conference on Hypercube Concurrent Computers, volume 1, pages 2--12, 1988.


Supporting Machine Independent Parallel.. - Fenton, Ramkumar, .. (1991)   (7 citations)  (Correct)

....Environment [2] also supports machine independent programming by providing communication mechanisms. The recently proposed Concurrent Aggregates (CA) language [6] bears some similarities with the branch office chares in our language, although CA is aimed at a fine grained machine (The J machine) [7] being built at MIT. Implicitly parallel higher level languages such as Functional or Logic languages constitute another approach to machine independence. We believe that such languages should be built on top a system such as the Chare kernel to simplify the task of building them, and to render ....

Dally W.J. Fine-Grain Message-Passing Concurrent Computers. In The Third Conference on Hypercube Concurrent Computers and Applications, Pasadena, California, January 1988.


Optimal Fully Adaptive Wormhole Routing for Meshes - Loren Schwiebert (1993)   (4 citations)  (Correct)

....be further differentiated by the number of shortest paths allowed. Adaptive routing algorithms that do not allow all Table 1: Overview of Adaptive Routing Algorithms for Meshes Author(s) Fully VCs for Comments Adaptive 2D Mesh Chien Kim [2] Yes 6 Partially Adaptive for Higher Dimensions Dally [3] Yes 6 2D Mesh Only Dally Aoki [5] Yes n 2D Mesh with n Theta n nodes Glass Ni [8] Yes 6 2D Mesh Only Glass Ni [9] No 4 Roughly Half the Adaptiveness of Fully Adaptive Jesshope, Miller Yes 8 Number of Virtual Yantchev [10] Channels is Exponential Linder Harden [11] Yes 6 in Dimension of ....

....a mesh does not have any wrap around channels. Routing algorithms for only mesh topologies are reviewed here, because wormhole routing has been used primarily on low dimension meshes and the focus of this paper is on mesh topologies. Many adaptive routing algorithms for meshes have been proposed [2, 3, 5, 8, 9, 10, 11]. Table 1 summarizes the main features of each algorithm. In the Table, VCs is used for number of bidirectional virtual channels per router. Designing deadlock free routing algorithms for wormhole routing was simplified by a proof that an acyclic channel dependency graph guarantees deadlock ....

[Article contains additional citation context not shown here]

W. J. Dally. Fine-Grain Message-Passing Concurrent Computers. In Proceedings of the Third Conference on Hypercube Concurrent Computers, volume 1, pages 2--12, 1988.


A Foundation for Designing Deadlock-free Routing.. - Jayasimha, Manivannan, .. (1996)   (1 citation)  (Correct)

.... channel dependency graph has also been used as a basis for developing adaptive routing algorithms defined by relations of the form R : C Theta N Theta N P (C) P (C) is the power set of C) where a set of output channels, rather than a single channel, is defined on which to route the message [BD93, CK92, Dal88, DG92, GN92a, GN92b, GN92c, JMY89, LH91, YT92]. Since a set of output channels is provided, a selection function is then used to select which of these output channels a message uses. Duato showed that requiring an acyclic channel dependency graph is too restrictive for routing algorithms defined by relations of the form R : N Theta N P ....

W. J. Dally. Fine-Grain Message-Passing Concurrent Computers. In Proceedings of the Third Conference on Hypercube Concurrent Computers, volume 1, pages 2--12, 1988.


Multithreaded Architectures: Principles, Projects and Issues - Dennis, Gao (1994)   (4 citations)  (Correct)

....of lazy language semantics; and memory tags are provided in support of run time type management and automatic memory management. These differences are related to the higher level view of programming language support taken by the MASA designers. The J Machine. The Message Driven Processor (MDP) [35, 34] has been designed as a processor node for the MIT J machine a massively parallel, VLSI multiprocessor architecture intended to support a fine grain, message passing style of parallel computation. The principal mechanisms for achieving high performance are the direct activation of message ....

....finer grain. For example, delivering a message and dispatching a task on the J machine is three orders of magnitude faster than on the Intel iPSC. The grid network used for the J machine uses the ideas of wormhole routing and virtual channels developed in connection with the Caltech Cosmic Cube [81, 111, 36, 33, 34]. A 512 node J machine is operational and preliminary results of its evaluation have been presented in [99] The Alewife Project. Alewife is a large scale multiprocessor project led by Anant Agarwal at MIT [2] and is intended for symbolic computing using Mul T, an extended version of the Scheme ....

William J. Dally, "Fine-grain message-passing concurrent computers," in Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, vol. I, Pasadena, California, pp. 2--12, ACM, January 1988.


A Case for Intelligent RAM: IRAM - Patterson, Anderson, Cardwell.. (1997)   (48 citations)  (Correct)

No context found.

Dally, W.J. "Fine-grain message-passing concurrent computers." IN: Third Conference on Hypercube Concurrent Computers and Applications. Pasadena, CA, USA, 19-20 Jan. 1988). Edited by: Fox, G.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC