• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

The Multicomputer Toolbox: Scalable Parallel Libraries for Large-Scale Concurrent Applications (1991)

by Anthony Skjellum, Chuck H Baldwin
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 20
Next 10 →

Scalability Issues Affecting the Design of a Dense Linear Algebra Library

by Jack J. Dongarra, Robert A. van de Geijn, David W. Walker - JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING , 1994
"... This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distributed memory concurrent computers. These routines form part of the ScaLAPACK mathematical software library that extends the widely-used LAPACK library to run efficiently on scalable concurrent computers ..."
Abstract - Cited by 29 (15 self) - Add to MetaCart
This paper discusses the scalability of Cholesky, LU, and QR factorization routines on MIMD distributed memory concurrent computers. These routines form part of the ScaLAPACK mathematical software library that extends the widely-used LAPACK library to run efficiently on scalable concurrent computers. To ensure good scalability and performance, the ScaLAPACK routines are based on block-partitioned algorithms that reduce the frequency of data movement between different levels of the memory hierarchy, and particularly between processors. The block cyclic data distribution, that is used in all three factorization algorithms, is described. An outline of the sequential and parallel block-partitioned algorithms is given. Approximate models of algorithms' performance are presented to indicate which factors in the design of the algorithm have an impact upon scalability. These models are compared with timings results on a 128-node Intel iPSC/860 hypercube. It is shown that the routines are highl...

The Multicomputer Toolbox Approach to Concurrent BLAS

by Robert D. Falgout, Anthony Skjellum, Steven G. Smith, Charles H. Still - Proc. Scalable High Performance Computing Conf. (SHPCC , 1993
"... Concurrent Basic Linear Algebra Subprograms (CBLAS) are a sensible approach to extending the successful Basic Linear Algebra Subprograms (BLAS) to multicomputers. We describe many of the issues involved in general-purpose CBLAS. Algorithms for dense matrix-vector and matrix-matrix multiplication on ..."
Abstract - Cited by 29 (8 self) - Add to MetaCart
Concurrent Basic Linear Algebra Subprograms (CBLAS) are a sensible approach to extending the successful Basic Linear Algebra Subprograms (BLAS) to multicomputers. We describe many of the issues involved in general-purpose CBLAS. Algorithms for dense matrix-vector and matrix-matrix multiplication on general P \Theta Q logical process grids are presented, and experiments run demonstrating their performance characteristics. This work was supported in part by the Applied Mathematical Sciences subprogram of the Office of Energy Research, U.S. Department of Energy. Work performed under the auspices of the U. S. Department of Energy by the Lawrence Livermore National Laboratory under contract No. W-7405-ENG-48. Submitted to the Concurrency: Practice & Experience. y Address correspondence to: Mississippi State University, Engineering Research Center, PO Box 6176, Mississippi State, MS 39762. 601-325-8435. tony@cs.msstate.edu. Falgout, Skjellum, Smith & Still --- The Multicomputer Toolbo...

The Design and Evolution of Zipcode

by Anthony Skjellum, Steven G. Smith, Nathan E. Doss, Alvin P. Leung, Manfred Morari - Parallel Computing , 1994
"... Zipcode is a message-passing and process-management system that was designed for multicomputers and homogeneous networks of computers in order to support libraries and large-scale multicomputer software. The system has evolved significantly over the last five years, based on our experiences and iden ..."
Abstract - Cited by 22 (9 self) - Add to MetaCart
Zipcode is a message-passing and process-management system that was designed for multicomputers and homogeneous networks of computers in order to support libraries and large-scale multicomputer software. The system has evolved significantly over the last five years, based on our experiences and identified needs. Features of Zipcode that were originally unique to it, were its simultaneous support of static process groups, communication contexts, and virtual topologies, forming the "mailer" data structure. Point-to-point and collective operations reference the underlying group, and use contexts to avoid mixing up messages. Recently, we have added "gather-send" and "receive-scatter" semantics, based on persistent Zipcode "invoices," both as a means to simplify message passing, and as a means to reveal more potential runtime optimizations. Key features in Zipcode appear in the forthcoming MPI standard. Keywords: Static Process Groups, Contexts, Virtual Topologies, Point-to-Point Communica...
(Show Context)

Citation Context

...er systems, are those that support parallel libraries. Zipcode is currently the effective basis for a collection of parallel libraries, called the "Multicomputer Toolbox," which we describe =-=elsewhere [13, 26, 27, 30]-=-. Its development also commenced in 1988, and its requirements largely drove the evolution of Zipcode. Because key features in Zipcode that are needed to support libraries will also appear in the new ...

Broadway: A Software Architecture for Scientific Computing

by Samuel Z. Guyer, Calvin Lin - THE ARCHITECTURE OF SCIENTIFIC SOFTWARE , 2000
"... Scientific programs rely heavily on software libraries. This paper describes the limitations of this reliance and shows how it degrades software quality. We offer a solution that uses a compiler to automatically optimize library implementations and the application programs that use them. Using exa ..."
Abstract - Cited by 20 (2 self) - Add to MetaCart
Scientific programs rely heavily on software libraries. This paper describes the limitations of this reliance and shows how it degrades software quality. We offer a solution that uses a compiler to automatically optimize library implementations and the application programs that use them. Using examples and experiments with the PLAPACK parallel linear algebra library and the MPI message passing interface, we present our solution, which includes a simple declarative annotation language that describes certain aspects of a library's implementation. We also show how our approach can yield simpler scientific programs that are easier to understand, modify and maintain.

A Poly-Algorithm for Parallel Dense Matrix Multiplication on Two-Dimensional Process Grid Topologies

by Jin Li, Anthony Skjellum, Robert D. Falgout , 1995
"... In this paper, we present several new and generalized parallel dense matrix multiplication algorithms of the form C = αAB + βC on two-dimensional process grid topologies. These algorithms can deal with rectangular matrices distributed on rectangular grids. We classify these algori ..."
Abstract - Cited by 14 (1 self) - Add to MetaCart
In this paper, we present several new and generalized parallel dense matrix multiplication algorithms of the form C = αAB + βC on two-dimensional process grid topologies. These algorithms can deal with rectangular matrices distributed on rectangular grids. We classify these algorithms coherently into three categories according to the communication primitives used and thus we offer a taxonomy for this family of related algorithms. All these algorithms are represented in the data distribution independent approach and thus do not require a specific data distribution for correctness. The algorithmic compatibility condition result shown here ensures the correctness of the matrix multiplication. We define and extend the data distribution functions and introduce permutation compatibility and algorithmic compatibility. We also discuss a permutation compatible data distribution (modified virtual 2D data distribution). We conclude that no single algorithm always achieves the best performance...
(Show Context)

Citation Context

...eeks to avoid such redistributions without further justification. 2 1.2 Design Methodology The design of parallel dense matrix multiplication algorithms is based on the Multicomputer Toolboxsapproach =-=[21]-=-. Two key ideas underlying scalable programming are logical process grids and data distribution independence [26, 27]. A logical process grid, denoted here by G P \ThetaQ , is a collection of processe...

The Data-Distribution-Independent Approach to Scalable Parallel Libraries

by Purushotham V. Bangalore , 1995
"... ..."
Abstract - Cited by 13 (1 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...al for cost savings (of tinhe and memory) by avoiding explicit redistribution (at both the entry- and exitinterfaces of a library) because of these available optimizations. The Multicomputer Toolbox (=-=Skjellum and Baldwin 1991-=-; Skjellum et al. 1994) is a collection of scalable parallel libraries that provide data-distribution-independent programming for a large number of numerical algorithms, while providing needed portabi...

The Multicomputer Toolbox { First-Generation Scalable Libraries

by Anthony Skjellum, Alvin Leung, Steven G. Smith, Robert D. Falgout, Anthony Skjellum, Alvin P. Leung, Steven G. Smith, Robert D. Falgout, Charles H. Still, Chuck H. Baldwin - In Proceedings of HICSS{ 27. IEEE Computer , 1994
"... 1 \First-generation " scalable parallel libraries have been achieved, and are maturing, within the Mul-ticomputer Toolbox. The Toolbox includes sparse, dense, iterative linear algebra, a sti ODE/DAE solver, and an open software technology for additional numerical algorithms, plus an inter-arch ..."
Abstract - Cited by 11 (8 self) - Add to MetaCart
1 \First-generation " scalable parallel libraries have been achieved, and are maturing, within the Mul-ticomputer Toolbox. The Toolbox includes sparse, dense, iterative linear algebra, a sti ODE/DAE solver, and an open software technology for additional numerical algorithms, plus an inter-architecture Make le mechanism for building applications. We have devised C-based strategies for useful classes of distributed data structures, including distributed matrices and vectors. The underlying Zip-codemessage passing system has enabled process-grid abstractions of multicomputers, communi-cation contexts, and process groups, all characteristics needed for building scalable libraries, and
(Show Context)

Citation Context

...ded, and that this question is strongly application dependent. 1 Introduction First-generation scalable libraries have been developed within the Multicomputer Toolbox schema, also described elsewhere =-=[9, 10, 8]-=-. In this system, we have devised distributed data structures for vectors and matrices, dened relative to virtual process topologies (logical grids), as well as an advanced message-passing notation a...

The Multicomputer Toolbox: Current and Future Directions

by Anthony Skjellum - Proceedings of the Scalable Parallel Libraries Conference. IEEE Computer , 1993
"... The Multicomputer Toolbox is a set of "firstgeneration " scalable parallel libraries. The Toolbox includes sparse, dense, direct and iterative linear algebra, a stiff ODE/DAE solver, and an open software technology for additional numerical algorithms. The Toolbox has an object-oriented des ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
The Multicomputer Toolbox is a set of "firstgeneration " scalable parallel libraries. The Toolbox includes sparse, dense, direct and iterative linear algebra, a stiff ODE/DAE solver, and an open software technology for additional numerical algorithms. The Toolbox has an object-oriented design; C-based strategies for classes of distributed data structures (including distributed matrices and vectors) as well as uniform calling interfaces are defined. At a high level in the Toolbox, data-distributionindependence (DDI) support is provided. DDI is needed to build scalable libraries, so that applications do not have to redistribute data before calling libraries. Data-distribution-independent mapping functions implement this capability. Data-distribution-independent algorithms are sometimes more efficient than fixeddata -distribution counterparts, because redistribution of data can be avoided. Underlying the system is a "performance and portability layer," which includes interfaces to sequent...
(Show Context)

Citation Context

...logies, all needed for building efficient scalable libraries, and large-scale application software. 1 Introduction The Multicomputer Toolbox is a set of "firstgeneration " scalable parallel =-=libraries [12, 13, 14]-=-. The Toolbox includes sparse, dense, direct and iterative linear algebra, a stiff ODE/DAE solver, and an open software technology for additional numerical algorithms. The Toolbox has an object-orient...

Explicit Parallel Programming in C++ based on the Message-Passing Interface (MPI)

by Anthony Skjellum, Ziyang Lu, Purushotham V. Bangalore, Nathan Doss , 1996
"... Introduction Explicit parallel programming using the Message Passing Interface (MPI), a de facto standard created by the MPI Forum [15], is quickly becoming the strategy of choice for performance-portable parallel application programming on multicomputers and networks of workstations, so it is inev ..."
Abstract - Cited by 5 (1 self) - Add to MetaCart
Introduction Explicit parallel programming using the Message Passing Interface (MPI), a de facto standard created by the MPI Forum [15], is quickly becoming the strategy of choice for performance-portable parallel application programming on multicomputers and networks of workstations, so it is inevitably of interest to C++ programmers who use such systems. MPI programming is currently undertaken in C and/or Fortran-77, via the language bindings defined by the MPI Forum [15]. While the committee deferred the job of defining a C++ binding for MPI to MPI-2 [16], it is already possible to develop parallel programs in C++ using MPI, with the added help of one of several support libraries [2, 6, 13]. These systems all strive to enable immediate C++ programming based on MPI. The first such enabling system, MPI++, is the focus of this chapter. MPI++ was an early effort on our part to let us leverage MPI while programming in C++. Here this system is, to a large extent, our vehicle to i

Dense and Iterative Concurrent Linear Algebra in the Multicomputer Toolbox

by Purushotham Bangalore, Anthony Skjellum, Chuck Baldwin, Steven G. Smith - in Proceedings of the Scalable Parallel Libraries Conference (SPLC '93 , 1993
"... The Multicomputer Toolbox includes sparse, dense, and iterative scalable linear algebra libraries. Dense direct, and iterative linear algebra libraries are covered in this paper, as well as the distributed data structures used to implement these algorithms; concurrent BLAS are covered elsewhere. We ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
The Multicomputer Toolbox includes sparse, dense, and iterative scalable linear algebra libraries. Dense direct, and iterative linear algebra libraries are covered in this paper, as well as the distributed data structures used to implement these algorithms; concurrent BLAS are covered elsewhere. We discuss uniform calling interfaces and functionality for linear algebra libraries. We include a detailed explanation of how the level-3 dense LU factorization works, including features that support data distribution independence with a blocked algorithm. We illustrate the data motion for this algorithm, and for a representative iterative algorithm, PCGS. We conclude that data distribution independent libraries are feasible and highly desirable. Much work remains to be done in performance tuning of these algorithms, though good portability and application-relevance have already been achieved.
(Show Context)

Citation Context

... The following issues remain to be studied and further demonstrated in the future: ffl It is possible to obtain good performance and functionality from a data distribution independent algorithms (see =-=[8]), ffl Som-=-etimes it is necessary to "rethink" traditional ideas in order to design a scalable data distribution independent algorithm, ffl Performance is sometimes not the only guiding principle in de...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University