Results 1 -
4 of
4
Transformations to parallel codes for communication-computation overlap
- In Supercomputing 2005
, 2005
"... This paper presents program transformations directed toward improving communication-computation overlap in parallel programs that use MPI’s collective operations. Our transformations target a wide variety of applications focusing on scientific codes with computation loops that exhibit limited depend ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
This paper presents program transformations directed toward improving communication-computation overlap in parallel programs that use MPI’s collective operations. Our transformations target a wide variety of applications focusing on scientific codes with computation loops that exhibit limited dependence among iterations. We include guidance for developers for transforming an application code in order to exploit the communicationcomputation overlap available in the underlying cluster, as well as a discussion of the performance improvements achieved by our transformations. We present results from a detailed study of the effect of the problem and message size, level of communication-computation overlap, and amount of communication aggregation on runtime performance in a cluster environment based on an RDMA-enabled network. The targets of our study are two scientific codes written by domain scientists, but the applicability of our work extends far beyond the scope of these two applications. 1.
Asynchronous Communications in MPI - the BIP/Myrinet Approach
"... . In this paper, we present our experiments on asynchronous communications using the BIP and MPI interfaces on a cluster of PCs connected with a Myrinet network. We describe the implementation details of those communications and point the problems and solutions. 1 Introduction and motivations T ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
. In this paper, we present our experiments on asynchronous communications using the BIP and MPI interfaces on a cluster of PCs connected with a Myrinet network. We describe the implementation details of those communications and point the problems and solutions. 1 Introduction and motivations The implementations of the Message Passing Interface (MPI)[8] are now available on every kind of platforms, from SMP to clusters of PCs. This ensures a very good portability. However, the use of distributed memory machines or network of workstations adds an overhead due to the communications. To hide this overhead, non-blocking communications can be used to overlap computations and communications. However, the assumption that the communication layer provides a real overlap and an asynchronous execution of the communication is not obvious. Several papers have presented some ways to hide communication latency [1] or to use asynchronous communications to improve the implementation of parallel ...
Parallel 3D air flow simulation on workstation clusters
, 1998
"... Thesee is a 3D panel method code, which calculates the characteristic of a wing in an inviscid, incompressible, irrotational, and steady airflow, in order to design new paragliders and sails. In this paper, we present the parallelization of Thesee for low cost workstation /PC clusters. Thesee has be ..."
Abstract
- Add to MetaCart
Thesee is a 3D panel method code, which calculates the characteristic of a wing in an inviscid, incompressible, irrotational, and steady airflow, in order to design new paragliders and sails. In this paper, we present the parallelization of Thesee for low cost workstation /PC clusters. Thesee has been parallelized using the ScaLAPACK library routines in a systematic manner that lead to a low cost development. The code written in C is thus very portable since it uses only high level libraries. This design was very efficient in term of manpower and gave good performance results. The code performances were measured on 3 clusters of computers connected by different LANs: an Ethernet LAN of SUN SPARCstation, an ATM LAN of SUN SPARCstation and a Myrinet LAN of PCs. The last one was the less expensive and gave the best timing results and super-linear speedup.

