29 citations found. Retrieving documents...
W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University, 1996.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Adaptivity and Predictable Performance - Sandhya Senapathi Dhabaleswar   (Correct)

....executing in parallel. All communication is performed within the definition of a communicator. A communicator is a group of processes that are communicating with each other, in which each process has a unique id between 0 and N 1, N being the number of processes in the communicator. MPICH [15] is a freely available, portable implementation of MPI. The mechanism for achieving portability is a specification called the Abstract Device Interface (ADI) All MPI functions are defined in terms of a set of basic MPI communication primitives, which are implemented in terms of the ADI layer. ....

W. Gropp, E. Lusk, N, Doss, N. Skjellum. A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard. Tech. Report, Argonne National Laboratory and Mississippi State University.


Communication Modeling of Heterogeneous Networks.. - Banikazemi.. (1999)   (5 citations)  (Correct)

....the paper with future research directions. 2 Modeling Point to point Communication For the characterization of collective communication operations on heterogeneous networks, the knowledge of send and receive costs on the various nodes is imperative. In most implementations of MPI, such as MPICH [8], collective communication is implemented using a series of point topoint messages. Hence, from the characterization of pointto point communication on the various type of nodes in a heterogeneous network, the cost of any collective communication operation can be estimated. This section explains ....

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


Efficient Multicast on Irregular Switch-based Networks with.. - Kesavan, Panda   (Correct)

....to node contention specifically as node contention. We then compare the four proposed algorithms using extensive detailed simulation experiments. 5 In addition to comparing these algorithms with each other, we compare them against a naive random ordering (RO) algorithm which is used in MPICH [14], an implementation of MPI. We first use single multicast experiments to isolate the effect of each of the following parameters on the algorithms: system size, switch size, message length, input buffer size, degree of connectivity, destination set size, and communication start up time. Finally, we ....

....of the set D[fn s g into a list, L 0 , and executes a binomial tree based multicast on it. Current generation communication layers use such an algorithm for implementing multicast. For example, the popular MPICH implementation of the MPI standard uses this algorithm for supporting multicast [14, 26, 39]. This algorithm is very simple to implement and it takes dlog 2 (jDj 1)e communication startups (steps) to complete. Since the destinations and the source are ordered randomly, nothing can be said about the contention among messages of the multicast. Therefore, it is likely that this algorithm ....

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


All-to-All Broadcast on Switch-Based Clusters of Workstations - Matt Jacunski Sadayappan (1998)   (4 citations)  (Correct)

....a flit from the network) 16.0 nanoseconds. 4 All to all broadcast on irregular networks of arbitrary size For irregular networks of arbitrary size and complexity, the potential for link contention increases. The rank ordering ring (RO R) algorithm, the default implementation supplied by MPICH [6], becomes less attractive as the potential for link contention increases. Two algorithms which perform well for arbitrary irregular networks are discussed below. 4.1 Switch Ordered Ring (SO R) algorithm For an irregular network, an efficient ordering of nodes for the ring algorithm is found by ....

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A HighPerformance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


Profile-Based Load Balancing for Heterogeneous Clusters - Banikazemi, Prabhu..   (Correct)

....and the standard algorithm. We used these algorithms to perform load balancing of the M n application on a heterogenous cluster of workstations. M n is an essential part of the algorithm for computing all pairs shortest paths [7] We wrote the program as an SPMD application using the MPICH [12] implementation of the MPI [14] standard. The results were obtained for a matrix (M) size of 400x400 and n=9. Our testbed comprised of 12 nodes connected through switched Fast Ethernet. Four of the nodes were Pentium Pro 200MHz PCs with 128MB memory and 16KB 256KB L1 L2 cache. These machines are ....

W. Gropp, E. Lusk, N. Doss and A. Skjellum. A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard. Argonne National Laboratory and Mississippi State University.


Efficient Collective Communication on Heterogeneous Networks .. - Banikazemi, Panda (1998)   (15 citations)  (Correct)

....the effect of communication capabilities of workstations on the latency of MPI point to point communication. We measured round trip latency between four different pairs of workstations in a heterogeneous environment. The workstations were connected via Ethernet and used MPICH communication library [11] to communicate. Table 1 shows these results. Since the results are symmetric, values are shown for only the upper triangle entries. These values indicate how processor speed affects the time taken to transmit a message from one workstation to another. The fastest workstation we used was an HP ....

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


Low-Latency Message Passing on Workstation Clusters using.. - Moorthy, al. (1998)   (Correct)

....the data partition and then a buffer is allocated. 4. Implementing MPI over SCRAMNet MPI is a popular standard for writing portable parallel programs. This section describes our implementation of MPI, which is built on top of the BillBoard API. Our implementation of MPI is a derivative of MPICH [6], a publicly available implementation of MPI from Argonne National Labs. Most research implementations of MPI as well as many commercial MPI implementations are derivatives of MPICH. The architecture of MPICH is briefly described below. For details, readers are referred to [6, 11] MPICH has a ....

....a derivative of MPICH [6] a publicly available implementation of MPI from Argonne National Labs. Most research implementations of MPI as well as many commercial MPI implementations are derivatives of MPICH. The architecture of MPICH is briefly described below. For details, readers are referred to [6, 11]. MPICH has a 4 layered architecture. The top two layers contain the bindings for the MPI functions. The lower of these two layers, the point to point binding layer sits on top of the Abstract Device Interface (ADI) layer. Though complex functions such as collective communication operations and ....

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A HighPerformance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


Efficient Collective Communication on Heterogeneous.. - Banikazemi, Moorthy.. (1998)   (15 citations)  (Correct)

....the effect of communication capabilities of workstations on the latency of MPI point to point communication. We measured round trip latency between four different pairs of workstations in a heterogeneous environment. The workstations were connected via Ethernet and used MPICH communication library [12] to communicate. Table 1 shows these results. Since the results are symmetric, values are shown for only the upper triangle entries. These values indicate how processor speed affects the time taken to transmit a message from one workstation to another. The fastest workstation we used was an HP 735 ....

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


Communication Modeling of Heterogeneous Networks.. - Banikazemi.. (1999)   (5 citations)  (Correct)

....conclusions and future research directions. 2 Modeling Point to point Communication For the characterization of collective communication operations on heterogenous networks, the knowledge of send and receive costs on the various nodes is imperative. In most implementations of MPI, such as MPICH [7], collective communication is implemented using a series of point to point messages. Hence, from the characterization of point to point communication on the various type of nodes in a heterogenous network, the send and receive costs of any collective communication operation can be estimated. This ....

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


Efficient Collective Communication on Heterogeneous Networks of .. - Banikazemi (1998)   (15 citations)  (Correct)

....the effect of communication capabilities of workstations on the latency of MPI point to point communication. We measured round trip latency between four different pairs of workstations in a heterogeneous environment. The workstations were connected via Ethernet and used MPICH communication library [4] to communicate. Table 1 shows these results. Since the results are symmetric, values are shown for only the upper triangle entries. These values indicate how processor speed affects the time taken to transmit a message from one workstation to another. The fastest workstation we used was an HP 735 ....

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A HighPerformance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


All-to-All Broadcast on Switch-Based Clusters of.. - Jacunski, Sadayappan, Panda (1998)   (4 citations)  (Correct)

....a flit from the network) 16.0 nanoseconds. 4 All to all broadcast on irregular networks of arbitrary size For irregular networks of arbitrary size and complexity, the potential for link contention increases. The rank ordering ring (RO R) algorithm, the default implementation supplied by MPICH [6], becomes less attractive as the potential for link contention increases. Two algorithms which perform well for arbitrary irregular networks are discussed below. 4.1 Switch Ordered Ring (SO R) algorithm For an irregular network, an efficient ordering of nodes for the ring algorithm is found by ....

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A HighPerformance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


Low Latency Message-Passing for Reflective Memory.. - Jacunski, Moorthy.. (1999)   (Correct)

....Protocol (BBP) In this section we present the design and implementation of the BBP. The general design goal of the BBP was to provide low latency send, receive, and multicast primitives for reflective memory networks. Throughout the design process, the higher level requirements of MPICH [7] implementation and use of the library as a platform for TreadMarks [9] such as pairwise in order delivery, were considered. 4 Matt Jacunski et al. Although reflective memory hardware may provide a variety of features, the only functionality assumed by the BBP is the ability to map a portion of a ....

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


Efficient Collective Communication on Heterogeneous.. - Banikazemi, Moorthy.. (1998)   (15 citations)  (Correct)

....the effect of communication capabilities of workstations on the latency of MPI point to point communication. We measured round trip latency between four different pairs of workstations in a heterogeneous environment. The workstations were connected via Ethernet and used MPICH communication library[9] to communicate. Table 1 shows these results. Since the results are symmetric, values are shown for only the upper triangle entries. These values indicate how processor speed affects the time taken to transmit a message from one workstation to another. The fastest workstation we used was an HP 735 ....

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


Low Latency Message-Passing for Reflective Memory Networks - Matt Jacunski (1999)   (Correct)

....Protocol (BBP) In this section we present the design and implementation of the BBP. The general design goal of the BBP was to provide low latency send, receive, and multicast primitives for reflective memory networks. Throughout the design process, the higher level requirements of MPICH [7] implementation and use of the library as a platform for TreadMarks [9] such as pairwise in order delivery, were considered. Although reflective memory hardware may provide a variety of features, the only functionality assumed by the BBP is the ability to map a portion of a process virtual ....

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


Low Latency Message Passing on Workstation Clusters.. - Moorthy, Jacunski.. (1998)   (Correct)

....SCRAMNet MPI is a popular standard for writing portable parallel programs. It is well suited as a programming paradigm for distributed memory clusters. This section describes our implementation of MPI which is built on top of the BillBoard API. Our implementation of MPI is a derivative of MPICH [9], a publicly available implementation of MPI from Argonne National Labs. Most commercial MPI implementations as well as many research implementations of MPI are derivatives of MPICH. 5.1 MPICH Architecture MPICH has a layered architecture. Figure 2 shows the various layers in the MPICH stack with ....

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


All-to-All Broadcast on Switch-Based Clusters of.. - Jacunski, Sadayappan, Panda (1998)   (4 citations)  (Correct)

....these parameters are given for reference. 5 All to all broadcast on irregular networks of arbitrary size For irregular networks of arbitrary size and complexity, the potential for link contention increases. The rank ordering ring (RO R) algorithm, the default implementation supplied by MPICH [7], becomes less attractive as the potential for link contention increases. Two algorithms which perform well for arbitrary irregular networks are discussed below. 5.1 Switch Ordered Ring (SO R) algorithm For an irregular network, an efficient ordering of nodes for the ring algorithm is found by ....

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


Impact of On-Demand Connection Management in MPI over VIA - Jiesheng Wu Jiuxing   Self-citation (Mpi)   (Correct)

No context found.

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A HighPerformance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


Jiuxing Liu, Jiesheng Wu, Sushmitha P. Kini, Darius.. - Ranjit Noronha Pete   Self-citation (Mpi)   (Correct)

No context found.

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


High Performance Implementation of MPI Derived Datatype.. - Wu, Wyckoff, Panda (2004)   (1 citation)  Self-citation (Mpi)   (Correct)

No context found.

W. Gropp and E. Lusk and N. Doss and A. Skjellum. A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard.


High Performance Implementation of MPI Derived Datatype.. - Wu, Wyckoff, Panda (2004)   (1 citation)  Self-citation (Mpi)   (Correct)

No context found.

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A HighPerformance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


MPI over InfiniBand: Early Experiences - Liu, Wu, Kini, Buntinas, Yu.. (2003)   (1 citation)  Self-citation (Mpi)   (Correct)

No context found.

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


Impact of On-Demand Connection Management in MPI over VIA - Wu, Liu, Wyckoff, Panda   Self-citation (Mpi)   (Correct)

....can initiate communication setup. The symmetry of the model also makes the implementation task easier. 3.3. Progress rule The progress rules of MPI [15] are both a promise to users and a set of constraints on implementors, though different interpretations seem to be possible. For example, MPICH [11] and MPI Pro [8] follow a loose interpretation and a strict interpretation, respectively. In incorporating the on demand connection mechanism into MPI for the VI Architecture, it is important to maintain this progress rule. The on demand connection mechanism can be incorporated into the above two ....

....We implemented the on demand connection mechanism for MVICH on top of both GigaNet cLAN VIA and Berkeley VIA on Myrinet. MVICH is a freely available port of MPICH on several VIA implementations. All modification for incorporating the on demand connection mechanism occurs in the ADI layer [11]. Unlike the original MVICH implementation with static connection management, there are no VI creation and VI connection setup in the MPI low level initialization routine, MPID Init( Instead, a VI is created and a peer to peer connection request is issued during the processing of the first ....

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A HighPerformance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


MPI-LAPI: An Efficient Implementation of MPI for.. - Banikazemi.. (2001)   (2 citations)  Self-citation (Mpi)   (Correct)

.... GAM (generic active message) implementation on the SP [4] and (b) the effort at University of Illinois in porting MPICH on top of the FM (fast messages) communication interface on a workstation cluster connected with the Myrinet network [11] In both cases the public domain version of MPI (MPICH [8]) has been the starting point of these implementations. In the MPI implementation on top of AM, short messages are copied into a retransmission buffer after they are injected into the network. Lost messages are retransmitted from the retransmission buffers. The retransmission buffers are freed ....

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


Implementing Efficient MPI on LAPI for IBM RS/6000 SP.. - Mohammad Banikazemi Rama (1999)   Self-citation (Mpi)   (Correct)

.... (generic active message) implementation on SP systems [2] and (b) the effort at University of Illinois in porting MPICH on top of the FM (fast messages) communication interface on a workstation cluster connected with the Myrinet network [5] In both cases the public domain version of MPI (MPICH [3]) has been the starting point of these implementations. In the MPI implementation on top of AM, short messages are copied into a retransmission buffer after they are injected into the network. Lost messages are retransmitted from the retransmission buffers. The retransmission buffers are freed ....

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A HighPerformance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


High Performance RDMA Based All-to-all Broadcast - For Infiniband Clusters   (Correct)

No context found.

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University, 1996.


Supporting Efficient Noncontiguous Access in PVFS over.. - Jiesheng Wu Pete (2003)   (Correct)

No context found.

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A HighPerformance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


Supporting Efficient Noncontiguous Access in PVFS over.. - Wu, Wyckoff, Panda (2003)   (Correct)

No context found.

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A HighPerformance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


Supporting Efficient Noncontiguous Access in PVFS over.. - Wu, Wyckoff, Panda (2003)   (Correct)

No context found.

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A HighPerformance, Portable Implementation of the MPI, Message Passing Interface Standard. Technical report, Argonne National Laboratory and Mississippi State University.


A QoS Framework for Clusters to Support Applications with - Resource Adaptivity And   (Correct)

No context found.

W. Gropp, E. Lusk, N, Doss, N. Skjellum. A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard. Tech. Report, Argonne National Laboratory and Mississippi State University.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC