Results 1 - 10
of
123
A high-performance, portable implementation of the MPI message passing interface standard
- Parallel Computing
, 1996
"... MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we d ..."
Abstract
-
Cited by 651 (37 self)
- Add to MetaCart
MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we describe MPICH, unique among existing implementations in its design goal of combining portability with high performance. We document its portability and performance and describe the architecture by which these features are simultaneously achieved. We also discuss the set of tools that accompany the free distribution of MPICH, which constitute the beginnings of a portable parallel programming environment. A project of this scope inevitably imparts lessons about parallel computing, the specification being followed, the current hardware and software environment for parallel computing, and project management; we describe those we have learned. Finally, we discuss future developments for MPICH, including those necessary to accommodate extensions to the MPI Standard now being contemplated by the MPI Forum. 1
The LAM/MPI checkpoint/restart framework: System-initiated checkpointing
- in Proceedings, LACSI Symposium, Sante Fe
, 2003
"... As high-performance clusters continue to grow in size and popularity, issues of fault tolerance and reliability are becoming limiting factors on application scalability. To address these issues, we present the design and implementation of a system for providing coordinated checkpointing and rollback ..."
Abstract
-
Cited by 67 (7 self)
- Add to MetaCart
As high-performance clusters continue to grow in size and popularity, issues of fault tolerance and reliability are becoming limiting factors on application scalability. To address these issues, we present the design and implementation of a system for providing coordinated checkpointing and rollback recovery for MPI-based parallel applications. Our approach integrates the Berkeley Lab BLCR kernellevel process checkpoint system with the LAM implementation of MPI through a defined checkpoint/restart interface. Checkpointing is transparent to the application, allowing the system to be used for cluster maintenance and scheduling reasons as well as for fault tolerance. Experimental results show negligible communication performance impact due to the incorporation of the checkpoint support capabilities into LAM/MPI. 1
A Component Architecture for LAM/MPI
- In Proceedings, 10th European PVM/MPI Users’ Group Meeting, number 2840 in Lecture Notes in Computer Science
, 2003
"... Abstract. To better manage the ever increasing complexity of LAM/MPI, we have created a lightweight component architecture for it that is specifically designed for high-performance message passing. This paper describes the basic design of the component architecture, as well as some of the particular ..."
Abstract
-
Cited by 63 (11 self)
- Add to MetaCart
Abstract. To better manage the ever increasing complexity of LAM/MPI, we have created a lightweight component architecture for it that is specifically designed for high-performance message passing. This paper describes the basic design of the component architecture, as well as some of the particular component instances that constitute the latest release of LAM/MPI. Performance comparisons against the previous, monolithic, version of LAM/MPI show no performance impact due to the new architecture—in fact, the newest version is slightly faster. The modular and extensible nature of this implementation is intended to make it significantly easier to add new functionality and to conduct new research using LAM/MPI as a development platform. 1
Adaptive MPI
- In Proceedings of the 16th International Workshop on Languages and Compilers for Parallel Computing (LCPC 03
, 2003
"... Processor virtualization is a powerful technique that enables the runtime system to carry out intelligent adaptive optimizations like dynamic resource management. Charm++ is an early language/system that supports processor virtualization. ..."
Abstract
-
Cited by 51 (11 self)
- Add to MetaCart
Processor virtualization is a powerful technique that enables the runtime system to carry out intelligent adaptive optimizations like dynamic resource management. Charm++ is an early language/system that supports processor virtualization.
Towards Portable Message Passing in Java: Binding MPI
- In Recent Advances in PVM and MPI, number 1332 in Lecture Notes in Computer Science
, 1997
"... . In this paper we present a way of successfully tackling the difficulties of binding MPI to Java with a view to ensuring portability. We have created a tool for automatically binding existing native C libraries to Java, and have applied the Java--to--C Interface generating tool (JCI) to bind MPI to ..."
Abstract
-
Cited by 34 (9 self)
- Add to MetaCart
. In this paper we present a way of successfully tackling the difficulties of binding MPI to Java with a view to ensuring portability. We have created a tool for automatically binding existing native C libraries to Java, and have applied the Java--to--C Interface generating tool (JCI) to bind MPI to Java. The approach of automatic binding by JCI ensures both portability across different platforms and full compatibility with the MPI specification. To evaluate the resulting combination we have run a Java version of the NAS parallel IS benchmark on a distributed--memory IBM SP2 machine. 1 Introduction It is generally accepted that computers based on the emerging hybrid shared/distributed-memory parallel architectures will become the fastest and most cost-effective supercomputers over the next decade. This, however, makes the search for the most appropriate programming model even more important than it has been so far. Users need a flexible yet comprehensive interface which covers both th...
Design and Evaluation of Nemesis, a Scalable, Low-Latency, Message-Passing Communication Subsystem
- Proceedings of the International Symposium on Cluster Computing and the Grid
, 2006
"... This paper presents a new low-level communication subsystem called Nemesis. Nemesis has been designed and implemented to be scalable and efficient both in the intranode communication context using shared-memory and in the internode communication case using high-performance networks and is natively m ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
This paper presents a new low-level communication subsystem called Nemesis. Nemesis has been designed and implemented to be scalable and efficient both in the intranode communication context using shared-memory and in the internode communication case using high-performance networks and is natively multimethod-enabled. Nemesis has been integrated in MPICH2 as a CH3 channel and delivers better performance than other dedicated communication channels in MPICH2. Furthermore, the resulting MPICH2 architecture outperforms other MPI implementations in point-to-point benchmarks. 1
Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries
- Modern Software Tools in Scientific Computing
, 1997
"... Parallel numerical software based on the message-passing model is enormously complicated. This paper introduces a set of techniques to manage the complexity, while maintaining high efficiency and ease of use. The PETSc 2.0 package uses object-oriented programming to conceal the details of the messag ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
Parallel numerical software based on the message-passing model is enormously complicated. This paper introduces a set of techniques to manage the complexity, while maintaining high efficiency and ease of use. The PETSc 2.0 package uses object-oriented programming to conceal the details of the message passing, without concealing the parallelism, in a high-quality set of numerical software libraries. In fact, the programming model used by PETSc is also the most appropriate for NUMA shared-memory machines, since they require the same careful attention to memory hierarchies as do distributed-memory machines. Thus, the concepts discussed are appropriate for all scalable computing systems. The PETSc libraries provide many of the data structures and numerical kernels required for the scalable solution of PDEs, offering performance portability. 1 Introduction Currently the only general-purpose, efficient, scalable approach to programming distributed-memory parallel systems is the message-pass...
MPI-2: Extending the Message-Passing Interface
, 1996
"... This paper describes current activities of the MPI-2 Forum. The MPI-2 Forum is a group of parallel computer vendors, library writers, and application specialists working together to define a set of extensions to MPI (Message Passing Interface). MPI was defined by the same process and now has many im ..."
Abstract
-
Cited by 28 (15 self)
- Add to MetaCart
This paper describes current activities of the MPI-2 Forum. The MPI-2 Forum is a group of parallel computer vendors, library writers, and application specialists working together to define a set of extensions to MPI (Message Passing Interface). MPI was defined by the same process and now has many implementations, both vendor-proprietary and publicly available, for a wide variety of parallel computing environments. In this paper we present the salient aspects of the evolving MPI-2 document as it now stands. We discuss proposed extensions and enhancements to MPI in the areas of dynamic process management, one-sided operations, collective operations, new language binding, real-time computing, external interfaces, and miscellaneous topics. 1 Introduction During 1993 and 1994, a group of parallel computer vendors, library writers, and application scientists met regularly to define a standard interface for message-passing libraries. The result of this effort was MPI (Message-Passing Interfa...
Encapsulating Multiple Communication-Cost Metrics in Partitioning Sparse Rectangular Matrices for Parallel Matrix-Vector Multiplies
"... This paper addresses the problem of one-dimensional partitioning of structurally unsymmetricsquare and rectangular sparse matrices for parallel matrix-vector and matrix-transposevector multiplies. The objective is to minimize the communication cost while maintaining the balance on computational load ..."
Abstract
-
Cited by 26 (18 self)
- Add to MetaCart
This paper addresses the problem of one-dimensional partitioning of structurally unsymmetricsquare and rectangular sparse matrices for parallel matrix-vector and matrix-transposevector multiplies. The objective is to minimize the communication cost while maintaining the balance on computational loads of processors. Most of the existing partitioning models consider only the total message volume hoping that minimizing this communication-cost metric is likely to reduce other metrics. However, the total message latency (start-up time) may be more important than the total message volume. Furthermore, the maximum message volume and latency handled by a single processor are also important metrics. We propose a two-phase approach that encapsulates all these four communication-cost metrics. The objective in the first phase is to minimize the total message volume while maintainingthe computational-load balance. The objective in the second phase is to encapsulate the remaining three communication-cost metrics. We propose communicationhypergraph and partitioning models for the second phase. We then present several methods for partitioning communication hypergraphs. Experiments on a wide range of test matrices show that the proposed approach yields very effective partitioning results. A parallel implementation on a PC cluster verifies that the theoretical improvements shown by partitioning results hold in practice.

