| Hong Tang, Kai Shen, and Tao Yang. Program transformation and runtime support for threaded MPI execution on shared-memory machines. ACM Transactions on Programming Languages and Systems, 22(4):673--700, 2000. |
....the user to perform architecturedependent communication optimization across communication patterns. The success of the MPI standard can be attributed to the wide availability of two MPI implementations: MPICH[11] and LAM MPI [27] Many researchers have been trying to optimize the MPI library [16, 20, 25, 29]. In [16] optimizations are proposed for collective communications over Wide Area Networks. In [25] a compiler based optimization approach is developed to reduce the software overheads in the library, which focuses on point to point communications. In [20] MPI point to point communication ....
....is developed to reduce the software overheads in the library, which focuses on point to point communications. In [20] MPI point to point communication routines are optimized using a more e#cient primitive (Fast Message) Optimizations for a thread based MPI implementation are proposed in [29]. Our research is di#erent from the existing work in that we develop a MPI library that allows static communication information to be exploited. 3. BACKGROUND CC MPI optimizes one to all, one to many, many to many, and all to all communication routines for Ethernet switched clusters by ....
H. Tang, K. Shen, and T. Yang. Program Transformation and Runtime Support for Threaded MPI Execution on Shared-Memory Machines. ACM Transactions on Programming Languages and Systems, 22(4):673--700, July 2000.
....communicate with its peers. As a result, global variables declared in an MPI program are private to each MPI node. It is natural to map an MPI node to a process. However, communication between processes have to go through operating system kernels, which could be very costly. Our previous studies [16, 18] show that process based implementations can su#er large performance loss on multiprogrammed shared memory machines (SMMs) Mapping each MPI node to a thread opens the possibility of fast synchronization through address space sharing. This approach requires a compiler to transform an MPI program ....
....loss on multiprogrammed shared memory machines (SMMs) Mapping each MPI node to a thread opens the possibility of fast synchronization through address space sharing. This approach requires a compiler to transform an MPI program into a thread safe form. As demonstrated in our previous TMPI work [16, 18], the above approach can deliver significant performance gain for a large class of MPI C programs on multiprogrammed SMMs. Extending a threaded MPI implementation for a single SMM to support an SMP cluster is not straightforward. In an SMP cluster environment, processes (threads) within the same ....
[Article contains additional citation context not shown here]
H. Tang, K. Shen, and T. Yang. Program Transformation and Runtime Support for Threaded MPI Execution on Shared Memory Machines. ACM Transactions on Programming Languages and Systems, 2000.
No context found.
Hong Tang, Kai Shen, and Tao Yang. Program transformation and runtime support for threaded MPI execution on shared-memory machines. ACM Transactions on Programming Languages and Systems, 22(4):673--700, 2000.
No context found.
H. Tang, K. Shen, and T. Yang. Program Transformation and Runtime Support for Threaded MPI Execution on Shared-Memory Machines. ACM Transactions on Programming Languages and Systems, 22(4):673--700, July 2000.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC