| Jonathan Hardwick. Porting a vector library: A comparison of MPI, paris, CMMD and PVM. Technical Report CMU-CS-94-200, School of Computer Science, Carnegie Mellon University, November 1994. |
....and a subtle performance issue in multiparty communications. 1 Introduction PVM [10] and MPI [8] are both specifications for message passing libraries that can be used for writing portable parallel programs. It is is natural to compare them, and many useful comparisons have been carried out [17,16,11,20,15]. We consider it worthwhile to do so again for two reasons. The most obvious is that some convergence has recently taken place in the functionality offered by the two systems (e.g. dynamic processes in MPI, static groups and message contexts in PVM) and the very different approaches taken in ....
J. C. Hardwick. Porting a vector library: a comparison of MPI, Paris, CMMD and PVM. In IEEE, editor, Proceedings of the 1994.
....for Windows [27] contain chapters on using both MPI and PVM. Since there are freely available versions of each, users have a choice, and beginning users in particular can be confused by their superficial similarities. Several comparisons of PVM and MPI have been carried out since the mid 1990s [18, 17, 12, 23, 16]. We consider it worthwhile to do so again for two reasons. We treat the Oak Ridge version of PVM as represented by [5, 11] as the PVM specification. MPI is represented by the MPI 2 specification. The most obvious is that some convergence has recently taken place in the functionality offered by ....
J. C. Hardwick. Porting a vector library: a comparison of MPI, Paris, CMMD and PVM. In IEEE, editor, Proceedings of the 1994.
....could generate efficient code. I will point out the limitations of some of the implementations of the algorithms as they are discussed by comparing the performance obtained to that of a library offering similar functionality: the CVL library for MPI, implemented by Jonathon Hardwick at CMU [5]. The compilers used were xlC on the SP 2, and the GNU compiler (gcc, version 2.6.0) on the Paragon. Full optimization was enabled for these tests. Sophisticated template usage is currently beyond the capabilities of many C compilers, but the IBM compiler had no problem with any of the ....
J. C. Hardwick. Porting a vector library: a comparison of MPI, Paris, CMMD and PVM. Technical Report CMU-CS-94-200, Carnegie Mellon University, 1994.
.... operate on sequences of atomic values (scalars are implemented as sequences of length 1) A Vcode interpreter has been implemented for running Vcode on the Cray C90 and J90, the Connection Machine CM 5, or any machine serial machine with a C compiler [8] We also have an MPI [20] version of VCODE [25], which will run on machines that support MPI, such as the IBM SP 2, the Intel Paragon, or clusters of workstations. The sequence functions in this interpreter have been highly optimized [7, 17] and, for large sequences, the interpretive overhead becomes relatively small yielding high ....
Jonathan C. Hardwick. Porting a vector library: a comparison of MPI, Paris, CMMD and PVM. In Scalable Parallel Libraries Conference, pages 68--77, Starkville, Mississippi, October 1994. A longer version appears as CMU-CS-94-200, School of Computer Science, Carnegie Mellon University.
....communication and computation, ffl Choice of send semantics with respect to blocking and buffering; this makes it easy to get a quick Skjellum, et al. Early MPI: 20 and dirty version running, which can then be refined to increase performance. Complete performance results are given in (Hardwick 1994), MPI CVL comes close to the performance of a machine specific CVL implementation on the CM 5, the only platform on which the developers can currently perform a comparison. Furthermore, the developer noted that better support in MPI for finegrained communication (e.g. put get, active messages, or ....
Hardwick, J. C. (1994, November). Porting a vector library: a comparison of mpi, paris, cmmd and pvm.
No context found.
Jonathan Hardwick. Porting a vector library: A comparison of MPI, paris, CMMD and PVM. Technical Report CMU-CS-94-200, School of Computer Science, Carnegie Mellon University, November 1994.
.... assumes an implicitly load balanced vector PRAM model [27] This can be efficiently implemented on parallel machines with very high memory and communication bandwidth, but achieves relatively poor performance on current RISC based multiprocessor architectures, due to the high cost of communication [28]. The final Delaunay triangulation algorithm was therefore reimplemented for production purposes using the Machiavelli [22] toolkit, which has been specifically designed for the efficient implementation of parallel divide and conquer algorithms on machines with limited communication bandwidth. ....
J. C. Hardwick. Porting a vector library: a comparison of MPI,Paris, CMMD andPVM. In Proceedings of the
....row format by a dense vector. It uses a nested data parallel algorithm. We give the source code and test data for the benchmarks in Appendix A. All three benchmarks have asymptotic running times that are linear in the size of the problem. Timings for the benchmarks have previously been reported in [7, 12]. 5.1 Methodology To minimize performance effects due to machine architecture, we used two different machines for benchmarking: a Sun SPARCstation 5 85 running Solaris 2.5, and a DX4 120 PC running Windows 95. Compilation was done using Nesl 3.1, gcc v2.7.0, and JDK 1.1.1, with full optimization. ....
Jonathan C. Hardwick. Porting a vector library: a comparison of MPI, Paris, CMMD and PVM. In Proceedings of the 1994 Scalable Parallel Libraries Conference, pages 68--77, October 1994.
....in a language such as C, it was easy to implement the operations in CVL in terms of the native data parallel primitives provided by each machine. However, it is difficult to write an efficient implementation of CVL (and hence NESL) for the current generation of distributed memory multiprocessors [17]. There are at least three reasons for this. First, to enforce the guarantees of complexity that apply to each NESL primitive, CVL must perform a loadbalancing step whenever a vector changes length. In the quicksort example, the machine representations of the three intermediate sequences S1, S2 ....
....returns all (x; y) coordinates in the sequence points that are to one side of the line formed by the points p1 and p2. For further details, see [7] All experiments were performed on an Intel Paragon running OSF R1.2, using icc O3 and MPICH 1.0.11. The NESL system used an MPI based version of CVL [17]. For quicksort, sequences of pseudo random 64 bit floatingpoint numbers were used as input. For quickhull, uniform distributions of points inside the unit square were projected onto (x; x 2 y 2 ) resulting in convex hulls of approximately p n points for an input sequence of size n. For ....
J. C. Hardwick. Porting a vector library: a comparison of MPI, Paris, CMMD and PVM. In Proceedings of Scalable Parallel LibrariesConference,pages 68--77, Starkville, Mississippi, Oct. 1994.
....library that implements an abstract vector machine [6] An example of a Cvl function is addwuz, which adds the corresponding elements of two integer vectors together and returns the results in a third vector. Cvl is the only part of the system that must be rewritten for a new architecture [11]. 3 Implementing Vcode in Java To use Java as an intermediate language in an existing compiler, the current intermediate language (assuming that one exists) either can be totally replaced by Java or it can be translated into Java by an additional stage of the compilation process. The first ....
....multiplication. This function multiplies a sparse matrix stored in compressed row format by a dense vector, using a nested data parallel algorithm. We give the source code and test data for the benchmarks in Appendix A. Timings for supercomputer platforms have previously been reported [7, 11]. All three benchmarks have asymptotic running times that are linear in the size of the problem. 5.1 Methodology To try to expose any performance effects that could be due to machine architecture rather than to the code being tested, we used two different machines for benchmarking: a Sun ....
Jonathan C. Hardwick. Porting a vector library: a comparison of MPI, Paris, CMMD and PVM. In Proceedings of the 1994 Scalable Parallel Libraries Conference, pages 68--77, October 1994.
.... implementation layer assumes a vector PRAM model [6] This can be efficiently implemented on vector processors with high memory bandwidth, but it is harder to do so on current RISC based NUMA multiprocessor architectures, due to the higher relative costs of communication and poor data locality [21]. Machiavelli [24] is a new parallel toolkit for divide andconquer algorithms that is intended to alleviate some of these problems. It is designed to be usable both as an implementation layer for languages such as Nesl, and as a programmer s toolkit for the direct implementation of efficient ....
Jonathan C. Hardwick. Porting a vector library: a comparison of MPI, Paris, CMMD and PVM. In Proceedings of the 1994 Scalable Parallel Libraries Conference, pages 68--77, October 1994.
....read back into Nesl only the type is returned. You can use file variables in expressions just like any other variables. Here is an example: Nesl a = index(10000) a : int] Nesl sum(a) it = 49995000 : int Nesl function foo(n) take(a,n) foo = fn : int [int] Nesl foo(10) it = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] : int] File variables are stored in the temp dir specified by the configuration ( tmp by default) This means that if you switch your configuration to a new configuration with a different temp dir, the Vcode interpreter won t be able to find the variable and will give a runtime error message. ....
....Monitoring execution set arg check fon,offg; This turns argument checking on or off. Argument checking is on by default and includes bounds checking, divide by zero checking, and range checking. Runtime errors detected by argument checking print a message of the form: Nesl let a = 2,3,4] in a[5]; Compiling. Writing. Loading. Running. RUNTIME ERROR: Sequence reference (a[i] out of bounds. Exiting. Reading. Argument checking takes time, so it can be turned off to generate faster code. set trace funname n; This sets the tracing level for any non primitive function. Tracing is used for ....
[Article contains additional citation context not shown here]
Jonathan Hardwick. Porting a vector library: A comparison of MPI, paris, CMMD and PVM. Technical Report CMU-CS-94-200, School of Computer Science, Carnegie Mellon University, November 1994.
.... implementation layer assumes a vector PRAM model [6] This can be efficiently implemented on vector processors with high memory bandwidth, but it is harder to do so on current RISC based NUMA multiprocessor architectures, due to the higher relative costs of communication and poor data locality [21]. Machiavelli [24] is a new parallel toolkit for divide and conquer algorithms that is intended to alleviate some of these problems. It is designed to be usable both as an implementation layer for languages such as Nesl, and as a programmer s toolkit for the direct implementation of efficient ....
Jonathan C. Hardwick. Porting a vector library: a comparison of MPI, Paris, CMMD and PVM. In Proceedings of the 1994 Scalable Parallel Libraries Conference, pages 68--77, October 1994.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC