| Shen K, Tang H, Yang T. Adaptive two-level thread management for fast MPI execution on shared memory machines. Proceedings of the IEEE/ACM Supercomputing Conference (SC), Seattle, WA, 1999. IEEE Computer Society Press: Los Alamitos, CA, 1999. |
....communicate with its peers. As a result, global variables declared in an MPI program are private to each MPI node. It is natural to map an MPI node to a process. However, communication between processes have to go through operating system kernels, which could be very costly. Our previous studies [16, 18] show that process based implementations can su#er large performance loss on multiprogrammed shared memory machines (SMMs) Mapping each MPI node to a thread opens the possibility of fast synchronization through address space sharing. This approach requires a compiler to transform an MPI program ....
....loss on multiprogrammed shared memory machines (SMMs) Mapping each MPI node to a thread opens the possibility of fast synchronization through address space sharing. This approach requires a compiler to transform an MPI program into a thread safe form. As demonstrated in our previous TMPI work [16, 18], the above approach can deliver significant performance gain for a large class of MPI C programs on multiprogrammed SMMs. Extending a threaded MPI implementation for a single SMM to support an SMP cluster is not straightforward. In an SMP cluster environment, processes (threads) within the same ....
[Article contains additional citation context not shown here]
K. Shen, H. Tang, and T. Yang. Adaptive two-level thread Management for fast MPI execution on shared memory machines. In Proceedings of ACM/IEEE SuperComputing '99, New York, November 1999. ACM/IEEE. Available from www.cs.ucsb.edu/research/tmpi.
....on Shared Memory Machines 1023 potential advantage is that when we use a user level thread to execute an MPI node, we can dynamically control the number of active kernel threads to match the number of available physical processors in order to minimize kernel level context switch cost. Recently [Shen et al. 1999] we have studied this idea and we find that minimizing unnecessary use of kernel level threads in a multiprogrammed environment can lead to an additional 88 performance improvement. TMPI is a proof of concept system intended for demonstrating the effectiveness of our techniques. Our current ....
SHEN, K., TANG, H., AND YANG, T. 1999. Adaptive two-level thread Management for fast MPI execution on shared memory machines. In Proceedings of ACM/IEEE SuperComputing '99. ACM/IEEE, New York. Will be available from www.cs.ucsb.edu/research/tmpi.
....potential advantage is that managing MPI nodes in terms of threads can allow us to dynamically switch kernel level and user level threads based on the number of available physical processors since context switch of kernel level threads is more expensive than that of user level threads. Recently [32] we have verified this adavantage and avoiding use of unnecessary kernel threads in a multiprogrammed environment can lead to an additional 88 performance improvement. TMPI is a proof of concept system to demonstrate the effectiveness of our techniques, and we plan to add more MPI functions to ....
K. Shen, H. Tang, and T. Yang. Adaptive Two-level Thread Management for Fast MPI Execution on Shared Memory Machines. In Proc. of ACM/IEEE SuperComputing'99 (SC'99), November 1999. Will be available from www.cs.ucsb.edu/research/tmpi.
No context found.
Shen K, Tang H, Yang T. Adaptive two-level thread management for fast MPI execution on shared memory machines. Proceedings of the IEEE/ACM Supercomputing Conference (SC), Seattle, WA, 1999. IEEE Computer Society Press: Los Alamitos, CA, 1999.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC