12 citations found. Retrieving documents...
M. Barnett, R. Littlefield, D. Payne, and R. van de Geijn. Efficient communication primitives on mesh architectures with hardware routing. In Sixth SIAM Conference on Parallel Processing for Scientific Computing, March 1993.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Optimizing Data-Parallel Stencil Computations In A.. - Chappelow, Hatcher.. (1995)   (2 citations)  (Correct)

....7.2 21 4 Theta 4 14.2 8.7 39 Figure 8 Optimization results for a communication kernel on the Intel Delta and the Meiko CS 2. The kernel simulates the communication for a nine point stencil. Optimizing Data Parallel Stencil Computations 13 the capability of the hardware congestion control [2] [5] Our staged message delivery along real or logical two dimensional grid connections is reminiscent of the Subway software router that delivers unstructured communications on SIMD processor grids [3] 7 CONCLUSIONS We have designed and implemented in the UNH C system a portable optimizer ....

M. Barnett, R. Littlefield, D. Payne, and R. van de Geijn. Efficient communication primitives on mesh architectures with hardware routing. In Sixth SIAM Conference on Parallel Processing for Scientific Computing, March 1993.


Parallel Bandreduction and Tridiagonalization - Christian Bischof (1993)   (7 citations)  (Correct)

....that the matrix size divides evenly by the block size. We also note that the Chamelon tools currently provide only unoptimized fan in fan out broadcast and global sum primitives (see, for example [22, 3] which are substantially slower than primitives that are optimized for the Intel Delta (e.g. [2, 16]) These issues will be addressed in future versions of our code. 4 Preliminary Performance Results In this section, we present preliminary performance results that we have obtained with a double precision version of our code running on 64 processors of the Intel Delta. The purpose of these ....

M. Barnett, R. Littlefield, D. G. Payne, and R. van de Geijn, Efficient communication primitives on mesh architectures with hardware routing, in Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientfic Computing, R. Sincovec, ed., Philadelphia, 1993, SIAM.


Parallel Bandreduction and Tridiagonalization - Bischof, Marques, Sun (1993)   (7 citations)  (Correct)

....that the matrix size divides evenly by the block size. We also note that the Chamelon tools currently provide only unoptimized fan in fan out broadcast and global sum primitives (see, for example [22, 3] which are substantially slower than primitives that are optimized for the Intel Delta (e.g. [2, 16]) These issues will be addressed in future versions of our code. 4 Preliminary Performance Results In this section, we present preliminary performance results that we have obtained with a double precision version of our code running on 64 processors of the Intel Delta. The purpose of these ....

M. Barnett, R. Littlefield, D. G. Payne, and R. van de Geijn, Efficient communication primitives on mesh architectures with hardware routing, in Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientfic Computing, R. Sincovec, ed., Philadelphia, 1993, SIAM.


Efficient Communication Operations On Passive Optical Star.. - Desprez, Ferreira, al. (1994)   (2 citations)  (Correct)

....for most of the algorithms by reducing the communications costs. Remark that if the message are not atomic, i.e. L 6= 1, the preceding results can be easily adapted and compared directly with the results of the classical studies (usually with m = 1) on non optical wormhole protocol networks [4, 5, 6, 14]. Our ongoing work is the implementation of these communication routines on an Intel Paragon machine to validate the TCP approach. We also want to have timings to compare the TCP algorithms with the classical communications routines implemented on a nonoptical wormhole interconnection network ....

M. Barnett, R. Littlefield, D.G. Payne, and R. Van De Geijn. Efficient Communication Primitives on Mesh Architectures with Hardware Routing. In R.F. Sincovec, D.E. Keyes, M.R. Leuze, L.R. Petzold, and D.A. Reed, editors, Sixth SIAM Conference on Parallel Processing for Scientific Computing, pages 943--948. SIAM, 1993.


Global Combine on Mesh Architectures with Wormhole Routing - Barnett Littlefield Payne (1993)   (26 citations)  Self-citation (Barnett Littlefield Payne Geijn)   (Correct)

....a hypercube, is often the fastest method for meshes containing p = 2 d nodes, if care is taken to order the communications to minimize network contention. The state of the art for performing the global combine on hypercubes is described in [5] Earlier work on 2 D mesh combining is reported in [2]. The global combine operation can be stated as follows: Given p processing nodes, each of which owns a vector of data, x i , of length n, a global combine forms y = Phi p Gamma1 i=0 (x i ) where Phi is a commutative and associative operator defined on the elements of the vectors. In this ....

M. Barnett, R. Littlefield, D.G. Payne, and R. van de Geijn. Efficient Communication Primitives on Mesh Architectures with Hardware Routing, Sixth SIAM Conf. on Par. Proc. for Sci. Comp., Norfolk, Virginia, March 22-24, 1993.


Global Combine Algorithms for 2-D Meshes With Wormhole Routing - Barnett Littlefield   Self-citation (Barnett Littlefield Payne Geijn)   (Correct)

....show that a different algorithm, optimized for a hypercube, is often the fastest method for meshes containing p = 2 d nodes, if care is taken to order the communications to minimize network contention. We have previously summarized some of the early stages of this work in conference proceedings [3, 4]. 2 System Model and Notation Our target system is assumed to be a 2 D (r rows by c columns) mesh comprising p = rc processing nodes, each having communication links to only its horizontal and vertical neighbors. The nodes are numbered 0 to p Gamma 1 in row major order. We assume that the ....

M. Barnett, R. Littlefield, D.G. Payne, and R. van de Geijn. Efficient Communication Primitives on Mesh Architectures with Hardware Routing, Sixth SIAM Conf. on Par. Proc. for Sci. Comp., Norfolk, Virginia, March 22-24, 1993.


Building a High-Performance Collective Communication.. - Barnett, Gupta, Payne..   (23 citations)  Self-citation (Barnett Payne)   (Correct)

No context found.

M. Barnett, R. Littlefield, D.G. Payne and R. van de Geijn. Efficient Communication Primitives on Mesh Architectures with Hardware Routing. Sixth SIAM Conference on Parallel Processing for Scientific Computing, Norfolk, VA, Mar. 22-- 24, 1993.


Broadcasting on Meshes with Worm-Hole Routing - Barnett, Payne (1996)   (20 citations)  Self-citation (Barnett Payne Geijn)   (Correct)

....root node of the broadcast. In [15] a broadcast is presented that has somewhat of a flavor of our scatter collect . In essence, the author followed our suggestion that the broadcast can be implemented as a modified global summation and used some of the techniques for such algorithms developed in [1, 3, 4, 17, 18]. The resulting algorithms are not asymptotically optimal, but do avoid network conflicts. They are limited to meshes that contain a power of two number of nodes, with extensions for general meshes that double the cost of the algorithms. 8 Other applications of the techniques Global combine ....

M. Barnett, R. Littlefield, D.G. Payne, and R. van de Geijn. Efficient Communication Primitives on Mesh Architectures with Hardware Routing. Sixth SIAM Conf. on Par. Proc. for Sci. Comp., Norfolk, Virginia, March 22-24, 1993.


Broadcasting on Meshes with Worm-Hole Routing - Michael Barnett (1996)   (20 citations)  Self-citation (Barnett Payne Geijn)   (Correct)

....in a worm hole mesh. In [11] a broadcast is presented that has somewhat of a flavor of our scatter collect . In essence, the author followed our suggestion that the broadcast can be implemented as a modified global summation and used some of the techniques for such algorithms developed in [1, 2, 3, 12]. The resulting algorithms are not asymptotically optimal, but do avoid network conflicts. They are limited to meshes that contain a power of two number of nodes, with extensions for general meshes that double the cost of the algorithms. 1 For the more robust standard kernel, all times must be ....

M. Barnett, R. Littlefield, D.G. Payne, and R. van de Geijn. Efficient Communication Primitives on Mesh Architectures with Hardware Routing. Sixth SIAM Conf. on Par. Proc. for Sci. Comp., Norfolk, Virginia, March 22-24, 1993.


Matrix-Vector Multiplication and Conjugate Gradient Algorithms .. - Lewis, Payne (1994)   Self-citation (Payne Geijn)   (Correct)

....Bucket collect on four processors. P0 P1 P2 P3 x0 x 1 x 2 x3 x 3 x0 x1 x0 x2 x1 x 3 x2 x3 x2 x0 x3 x1 x0 x0 x2 x1 x3 x2 x1 x3 x2 x0 x1 x3 x1 x0 x2 x0 x2 x1 x3 x3 x2 x1 x0 duplicated on all processors. On a linear array, for long vectors, we use a bucket al..gorithm [3], for which the time required is (p Gamma 1)ff (p Gamma 1) p nfi: 1) We illustrate this method in Fig. 2 for a four processor linear array. Global Summation Given that each processor, P i , owns a vector, x i , of length n, we wish to form y = P p Gamma1 i=0 y i . For a linear array, the ....

M. Barnett, R. Littlefield, D.G. Payne, and R. van de Geijn. Efficient Communication Primitives on Mesh Architectures with Hardware Routing, Sixth SIAM Conf. on Par. Proc. for Sci. Comp., Norfolk, Virginia, March 22-24, 1993.


Interprocessor Collective Communication Library (InterCom) - Barnett, Gupta, Payne, al. (1994)   (57 citations)  Self-citation (Barnett Payne Geijn)   (Correct)

....techniques are appropriate. For a general purpose library, it is crucial that an implementation performs well for all vector lengths. In our previous papers on collective communication, we studied individual communication operations and their implementation, including possible hybrid approaches [1, 2, 3, 5]. It is through this progression of studies that we have discovered that all the aforementioned collective communication operations can be built from similar primitives. It is this observation that has led us to propose a unified approach to hybrid design. Finally, there is a strong applications ....

M. Barnett, R. Littlefield, D.G. Payne, and R. van de Geijn. Efficient Communication Primitives on Mesh Architectures with Hardware Routing, Sixth SIAM Conf. on Par. Proc. for Sci. Comp., Norfolk, Virginia, March 22-24, 1993.


Broadcasting on Meshes with Worm-Hole Routing - Michael Barnett (1996)   (20 citations)  Self-citation (Barnett Payne Geijn)   (Correct)

....in a worm hole mesh. In [11] a broadcast is presented that has somewhat of a flavor of our scatter collect . In essence, the author followed our suggestion that the broadcast can be implemented as a modified global summation and used some of the techniques for such algorithms developed in [1, 2, 3, 12]. The resulting algorithms are not asymptotically optimal, but do avoid network conflicts. They are limited to meshes that contain a power of two number of nodes, with extensions for general meshes that double the cost of the algorithms. 8 Other applications of the techniques Global combine ....

M. Barnett, R. Littlefield, D.G. Payne, and R. van de Geijn. Efficient Communication Primitives on Mesh Architectures with Hardware Routing. Sixth SIAM Conf. on Par. Proc. for Sci. Comp., Norfolk, Virginia, March 22-24, 1993.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC