| Y.-C Tseng, T.-H. Lin, S.K.S. Gupta, and D.K. Panda. Bandwidth-optimal complete exchange on wormhole-routed 2D/3D torus networks: A diagonal-propagation approach. IEEE Transactions on Parallel and Distributed Systems, 8(4):380--396, April 1997. 27 |
.... in molecular dynamics simulations [17] Other examples include sparse matrix vector multiplication [26] and parallelized solutions of sparse triangular systems [1] Several papers consider the message exchange problem for specific interconnection topologies such as meshes, tori, and hypercubes [3, 4, 10, 22, 27, 28, 30, 29, 32]. Most of these results concern messages of the same size. Some papers also consider the sizes of the messages. Three algorithms for message exchange in meshes and hypercubes are proposed in [3] one for short messages, another for long messages, and a hybrid algorithm. In [27, 28] the authors ....
Y.-C Tseng, T.-H. Lin, S.K.S. Gupta, and D.K. Panda. Bandwidth-optimal complete exchange on wormhole-routed 2D/3D torus networks: A diagonal-propagation approach. IEEE Transactions on Parallel and Distributed Systems, 8(4):380--396, April 1997. 27
....the fast Fourier transform (FFT) and programming models as the BSP [8] The total exchange is a collective communication pattern where every node has to send a distinct message to any other node. The efficient implementation of the total exchange has been extensively studied in the past few years [9, 10, 11, 12]. Wormhole switching has been adopted by many newgeneration parallel computers, such as the Intel Touchstone Delta, Intel Paragon, MIT J Machine, Stanford Flash and the Cray T3D and T3E. In such networks, a packet is partitioned in a sequence of elementary units called flits, which are sent in a ....
....[10] aims at reducing the the link contention of the pairwise exchange algorithm on the bi dimensional cubes. In this algorithm each node communicates only with the nodes in its row and column. Each exchange along a row is followed by a complete exchange along a column. The diagonal propagation [12] approach includes a set of total exchange algorithms for the bi and threedimensional cubes whose arity is a multiple of four. This approach develops a communication schedule consisting of several congestion free phases in which the processors are grouped along the diagonals. On a bi dimensional ....
Y.-C.Tseng,T.-H.Lin,S.K.S.Gupta,andD.K. Panda. Bandwidth-Optimal Complete Exchange on Wormhole Routed 2D/3D Torus Networks: A Diagonal-Propagation Approach. IEEE Transactions on Parallel and Distributed Systems,
.... communications have been analyzed and studied for the important regular network topologies such as the mesh, torus, and hypercube (Kumar et al. 1994; Johnsson and Ho, 1989) Optimal schemes have also been presented for rings, meshes, and tori cut through routed networks (Lam et al. 1997; Tseng et al. 1997; Suh and Yalamanchili, 1998) In recent years, the trend in building parallel systems is to use commercial off theshelf (COTS) products and technologies. These networks of workstations often have an irregular topology because they can be expanded and scaled up flexibly as the requirements and ....
Tseng, Y. --C., Lin, T. --H., Gupta, S. K. S. and Panda, D. K. (1997), "BandwidthOptimal Complete Exchange on Wormhole-Routed 2D/3D Torus Networks: A DiagonalPropagation Approach," IEEE Transactions on Parallel and Distributed Computing, Vol. 8, No. 4, pp. 380-396.
....is better than all of these algorithms for short messages and is also better than the algorithms in [4, 27, 29, 30] for long messages. The performance of our algorithm is approximately the same as the algorithms in [24, 28] for long messages. There are also several algorithms for toroidal meshes [11, 13, 31, 32, 33]. By using virtual channels, our algorithm offers better performance than any of these algorithms in terms of number of rounds, which is usually the dominant cost when messages are short. For long messages, the algorithms for 3 dimensional tori in [13] and d dimensional tori, d 2 in [31] offer ....
Y.-C. Tseng, T.-H. Lin, S.K. Gupta, and D.K. Panda. Bandwidth-optimal complete exchange on wormhole-routed 2D/3D torus networks: A diagonal-propagation approach. Technical Report OSU-CISRC-3/96-TR14, Dept. of Computer and Information Science, Ohio State University, 1996.
....by different PUs, and these values are needed by all PUs for the correction phase, requiring a gossiping of the data. Previous Work. A substantial amount of research has been performed on finding efficient algorithms for collective communication operations on wormhole routed systems (see, e.g. [1, 4, 12, 3, 17]) However, most papers either deal with very small packets or with very large packets. Both these extreme cases require algorithms optimizing only one parameter. If the packets are small, then the number of start ups should be minimized. Peters and Syska [12] considered the broadcasting problem ....
....involved. For the efficiency of the two dimensional algorithm, it is essential that data is concentrated in PUs that lie on diagonals. For higher dimensional meshes we give an interesting generalization of the notion of a diagonal, which may be of independent interest. We remark that Tseng et al. [17] also used diagonals in their complete exchange algorithm. However, the generalization of a diagonal given there for three dimensional tori is rather straightforward. Hyperspaces are used that when projected give back a diagonal in two dimensional space. We generalize the diagonal in a different ....
Tseng, Y-C., T-H. Lin, S.K.S. Gupta, D.K. Panda, `Bandwidth-Optimal Complete Exchange on Wormhole-Routed 2D/3D Torus Networks: A Diagonal-Propagation Approach,' IEEE Transactions on Parallel and Distributed Systems, 8, pp. 380--396, 1997.
....schedule for a ring and then extended it to a tours network by using a notion of cross product of communication schedules. Later in collaboration with Dhableshwar Panda (a faculty at Ohio State University) we developed indirect complete exchange algorithms for wormhole routed 2D and 3D networks [51, 52]. Besides communication protocols for torus networks, I have also done some work on barrier synchronization algorithms for k ary n cubes [21] and processor allocation for torus connected multiprocessor systems [22, 23] ....
Y.-C. Tseng, T.-H. Lin, S. K. S. Gupta, and D. K. Panda. Bandwidth-optimal complete exchange on wormholerouted 2D/3D torus networks: A diagonal-propagation approach. IEEE Transactions on Parallel and Distributed Systems, 8(4):380--396, Apr. 1997.
....exchange include matrix algorithms, fast Fourier transformation (FFT) graph algorithms, and data redistribution in HPF [13] It can also be used to evaluate the quality of an interconnection network. Pervious work for complete exchange can be found in [3, 9, 23, 27, 28, 29] for meshes, and [6, 10, 25, 26, 31, 32] for tori. Here the torus network is considered, which architecture has been adopted by commercial machines such as Cray T3D T3E. The switching model under consideration is wormhole routing, which has been widely used in existing machines such as Caltech MOSAIC, Cray T3D T3E, IBM SP2, Intel ....
....Cray T3D T3E. The switching model under consideration is wormhole routing, which has been widely used in existing machines such as Caltech MOSAIC, Cray T3D T3E, IBM SP2, Intel Touchstone Delta, Intel Paragon, MIT J machine, and nCUBE3. Works related to the problem considered in this paper include [1, 6, 10, 14, 25, 26, 31, 32]. The results in [1, 6, 14] are based on a torus mesh using packet switching (or storeand forward) Such schemes are inappropriate for wormhole routed networks as the distanceinsensitive property is hardly exploited. Communication in a wormhole routed network typically incurs two kinds of costs: ....
[Article contains additional citation context not shown here]
Y.-C. Tseng, T.-H. Lin, S. K. S. Gupta, and D. K. Panda. Bandwidth-optimal complete exchange on wormhole-routed 2D/3D torus networks: A diagonal-propagation approach. IEEE Transactions on Parallel and Distributed Systems, 8(4):380-396, Apr. 1997.
....of complete exchange include matrix algorithms, fast Fourier transformation (FFT) graph algorithms, and data distribution in HPF. It can also be used to evaluate the quality of an interconnection network. Pervious work for complete exchange can be found in [1, 3, 6, 8, 9, 10] for meshes, and [2, 4, 7, 11, 12] for tori. Here the torus network is considered, which architecture has been adopted by commercial machines such as Cray T3D T3E. The switching model under consideration is wormhole routing, which has been widely used in existing machines. Works related to such problem include [2, 4, 7, 11, 12] ....
....and [2, 4, 7, 11, 12] for tori. Here the torus network is considered, which architecture has been adopted by commercial machines such as Cray T3D T3E. The switching model under consideration is wormhole routing, which has been widely used in existing machines. Works related to such problem include [2, 4, 7, 11, 12]. The result in [2] is based on a torus using packet switching. Such schemes are inappropriate for wormhole routed networks as the distance insensitive property is hardly exploited. Communication in a wormholerouted network typically incurs two kinds of costs: startup time and transmission time. ....
[Article contains additional citation context not shown here]
Y.-C. Tseng, T.-H. Lin, S. K. S. Gupta, and D. K. Panda. Bandwidth-optimal complete exchange on wormhole-routed 2D/3D torus networks: A diagonal-propagation approach. IEEE Trans. Parallel and Distributed Systems, 8(4):380-- 396, Apr. 1997.
....exchange include matrix algorithms, fast Fourier transformation (FFT) graph algorithms, and data redistribution in HPF [12] It can also be used to evaluate the quality of an interconnection network. Pervious work for complete exchange can be found in [3, 9, 22, 26, 27, 28] for meshes, and [6, 10, 24, 25, 30, 31] for tori. Here the torus network is considered, which architecture has been adopted by commercial machines such as Cray T3D T3E. The switching model under consideration is wormhole routing, which has been widely used in existing machines such as Caltech MOSAIC, Cray T3D T3E, IBM SP2, Intel ....
....Cray T3D T3E. The switching model under consideration is wormhole routing, which has been widely used in existing machines such as Caltech MOSAIC, Cray T3D T3E, IBM SP2, Intel Touchstone Delta, Intel Paragon, MIT J machine, and nCUBE3. Works related to the problem considered in this paper include [1, 6, 10, 13, 24, 30, 31, 25]. The results in [1, 6, 13] are based on a torus mesh using packet switching (or store and forward) Such schemes are inappropriate for wormhole routed networks as the distance insensitive property is hardly exploited. Communication in a wormhole routed network typically incurs two kinds of costs: ....
[Article contains additional citation context not shown here]
Y.-C. Tseng, T.-H. Lin, S. K. S. Gupta, and D. K. Panda. Bandwidth-optimal complete exchange on wormhole-routed 2D/3D torus networks: A diagonal-propagation approach. IEEE Transactions on Parallel and Distributed Systems, 8(4):380--396, Apr. 1997.
....distributed table lookup, fast Fourier transformation, and cache coherence. The one to all broadcast, together with other operators such as all to all broadcast, personalized broadcast, and data reduction, are termed as collective communication and have received intensive attention recently [1, 2, 14, 16, 26, 27, 28]. We consider the communication network using wormhole routing switching technology [6, 18] which is characterized with low communication latency and is quite insensitive to routing distance in the absence of link contention. Such technology has been adopted by many new generation parallel ....
Y.-C. Tseng, T.-H. Lin, S. K. S. Gupta, and D. K. Panda. Bandwidth-optimal complete exchange on wormhole-routed 2D/3D torus networks: A diagonal-propagation approach. IEEE Trans. on Paral. and Distrib. Sys. to appear.
....of contention free phases. Further in order to fully utilize the available communication bandwidth the number of communication phases are minimized by scheduling as many communications in a phase as possible. Such is also the case for algorithms designed for wormhole routed torus networks [TG96, TLGP97] However, programs based on such algorithms would not be able to get maximum benefit from the underlying torus network if the jobs are allocated on a submesh rather that a subtorii. Our purpose in the present paper is to investigate job scheduling in 2D torus connected networks under different ....
Y.-C. Tseng, T.-H. Lin, S. K. S. Gupta, and D. K. Panda. Bandwidth-optimal complete exchange on wormholerouted torus networks: a diagonal-propagation approach. IEEE Trans. on Parallel and Distributed Systems, 1997. To appear.
No context found.
Y.-C. Tseng, T.-H. Lin, S. Gupta, and D.K. Panda, "Bandwidth-Optimal Complete Exchange on Wormhole Routed2D/3D Torus Networks: ADiagonal-PropagationApproach, " IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 4, pp. 380-396, 1997.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC