| Huss-Lederman, S., Jacobson E., and Tsao, A., "Comparison of Scalable Parallel Matrix Multiplication Libraries," in Proceedings of the Scalable Parallel Libraries Conference, Starksville, MS, Oct. 1993. |
....again be done with implicit permutations. Our implementation can be easily adjusted to use other parallel matrix multiplication implementations for the lowest level multiplication. A number of implementations based on the broadcast multiply role method [7, 8] have been developed. For details see [3, 11, 12]. Acknowledgements This research was performed in part using the Intel Paragon System operated by the California Institute of Technology on behalf of the Concurrent Supercomputing Consortium. Access to this facility was provided by Intel Supercomputer Systems Division and the California Institute ....
Huss-Lederman, S., Jacobson E., and Tsao, A., "Comparison of Scalable Parallel Matrix Multiplication Libraries," in Proceedings of the Scalable Parallel Libraries Conference, Starksville, MS, Oct. 1993.
....yields better performance than the 2D ScaLAPACK PDGEMM algorithm [18] The literature describing matrix multiplication algorithms is very extensive. Some descriptions are given by Demmel, Heath, and van der Vorst [7] by Choi, Dongarra, and Walker [6] by Huss Lederman, Jacobson, and Tsao [15], by Agarwal, Gustavson, and Zubair [2] by van der Geijn and Watts [23] Aggarwal, Chandra, and Snir [3] show that a 3D type algorithm is optimal for an LPRAM. Johnsson and Ho [20] and Ho, Johnsson, and Edelman [14] discuss 3D and other algorithms for boolean cubes and hypercubes. Gupta and ....
S. Huss-Lederman, E. M. Jacobson, and A. Tsao. Comparison of scalable parallel matrix multiplication libraries. In Proceedings of the Scalable Parallel Libraries Conference. IEEE Computer Society Press, 1994.
....Cannon s algorithm [7] the broadcast multiply roll algorithm [16, 15] and Parallel Universal Matrix Multiplication Algorithm (PUMMA) 11] This last algorithm is a generalization of broadcast multiply roll to non square meshes of processors. An alternative generalization of this algorithm [19, 20] and an interesting algorithm based on a three dimensional data distribution have also been developed [2] The approach now considered the most practical, sometimes known as broadcast broadcast, was first proposed by Agarwal et al. 1] who showed that a sequence of parallel rank k (panel panel) ....
Huss-Lederman, S., E. Jacobson, A. Tsao, "Comparison of Scalable Parallel Matrix Multiplication Libraries," in Proceedings of the Scalable Parallel Libraries Conference, Starksville, MS, Oct. 1993.
....sponsored by ARPA. Additional support came from the Intel Research Council. In a recent paper [41] we describe a highly efficient implementation of matrix matrix multiplication, Scalable Universal Matrix Multiplication Algorithm (SUMMA) that has many benefits over alternative implementations [14, 29, 30]. These benefits include better performance, simpler and more flexible implementation, and a lower workspace requirement. In this paper, we show how the simple techniques developed in that paper can be extended to all the level 3 BLAS. 2 Notation and Assumptions 2.1 Model of computation We ....
Huss-Lederman, S., E. Jacobson, A. Tsao, "Comparison of Scalable Parallel Matrix Multiplication Libraries," in Proceedings of the Scalable Parallel Libraries Conference, Starksville, MS, Oct. 1993.
.... [6, 15] Broadcast Multiply Roll [13, 14] and the Transpose algorithm [21] Two recent efforts extend the work by Fox et al. to general meshes of nodes: the paper by Choi et al. 9] uses a two dimensional block wrapped (block cyclic) data decomposition, while the papers by Huss Lederman et al. [18, 19] use a virtual 2 D torus wrap data layout. Both these efforts report very good performance attained on the Intel Touchstone Delta, achieving a sizeable percentage of peak performance. The method presented in our paper has the benefit of being more general, simpler and more efficient. We explain ....
Huss-Lederman, S., E. Jacobson, A. Tsao, "Comparison of Scalable Parallel Matrix Multiplication Libraries," in Proceedings of the Scalable Parallel Libraries Conference, Starksville, MS, Oct. 1993.
.... [5, 14] Broadcast Multiply Roll [12, 13] and the Transpose algorithm [19] Two recent efforts extend the work by Fox et al. to general meshes of nodes: the paper by Choi et al. 8] uses a two dimensional block wrapped (block cyclic) data decomposition, while the papers by Huss Lederman et al. [17, 18] use a virtual 2 D torus wrap data layout. Both these efforts report very good performance attained on the Intel Touchstone Delta, achieving a sizeable percentage of peak performance. The method presented our paper has the benefit of being more general, simpler and more efficient. We explain our ....
Huss-Lederman, S., E. Jacobson, A. Tsao, "Comparison of Scalable Parallel Matrix Multiplication Libraries," in Proceedings of the Scalable Parallel Libraries Conference, Starksville, MS, Oct. 1993.
....are available, for Sun and Cray machines. Contact: Steve Lederman, Supercomputing Research Center, USA Email: lederman super.org FTP: Implementations and technical reports are available for anonymous ftp at ftp: ftp.super.org pub prism Comments: Tested and available at Daresbury [3] References: [19, 10, 15, 11, 12, 115, 79, 18, 80, 16, 77, 17, 76, 78] 3 GRID BASED TOOLS 16 3 Grid based tools 3.1 AMR Name: AMR , Adaptive Mesh Refinement Class Library Description: A C class library for building self adaptive mesh refinement applications. Parallelisation and array handling are inherited from P (see entry for P ) Systems: Built on P ....
....are available, for Sun and Cray machines. Contact: Steve Lederman, Supercomputing Research Center, USA Email: lederman super.org FTP: Implementations and technical reports are available for anonymous ftp at ftp: ftp.super.org pub prism Comments: Tested and available at Daresbury [3] References: [19, 10, 15, 11, 12, 115, 79, 18, 80, 16, 77, 17, 76, 78] 7.2 BLACS Name: BLACS: Basic Linear Algebra Communication Subroutines Description: Package of communication skeletons for use in parallel linear algebra codes on message passing machines. Designed for efficient communication operations on 2D arrays and sub arrays on a rectangular mesh of ....
S Huss-Lederman, E Jacobson, and A Tsao. Comparison of scalable parallel matrix multiplication libraries. In Proceedings of the Scalable Parallel Libraries Conference, October 1993. Available by anonymous ftp from ftp.super.org in file pub/prism/wn13.ps.
....from the adjacent processes. When the grid has a shape of P Q, the columns of sub matrix A to be broadcast may lie in more than two processes; this form of generalized broadcasting is expensive. Therefore, the row version of BiMMeR s BMR algorithm will apply more aptly for a grid shape of P Q (Huss Lederman, Jacobson, and Tsao 1993). Conversely, the column version of BiMMeR s BMR algorithm will apply for a grid shape of P Q (Huss Lederman, Jacobson, and Tsao 1993) At present, however, BiMMeR s BMR algorithm only deals with square matrices. Falgout et al. 1992; 1993) introduced two algorithms (i.e. the MM 3 and MM 4 ....
....processes; this form of generalized broadcasting is expensive. Therefore, the row version of BiMMeR s BMR algorithm will apply more aptly for a grid shape of P Q (Huss Lederman, Jacobson, and Tsao 1993) Conversely, the column version of BiMMeR s BMR algorithm will apply for a grid shape of P Q (Huss Lederman, Jacobson, and Tsao 1993). At present, however, BiMMeR s BMR algorithm only deals with square matrices. Falgout et al. 1992; 1993) introduced two algorithms (i.e. the MM 3 and MM 4 algorithms) Both the MM 3 and MM 4 algorithms have no restrictions on the shapes of matrices and process grids. The MM 3 algorithm is a ....
[Article contains additional citation context not shown here]
Huss-Lederman, S., E. M. Jacobson, and A. Tsao. 1993. Comparison of scalable parallel matrix multiplication libraries. In Proceedings of the scalable parallel libraries conference, edited by A. Skjellum and D. Reese, 142--9. Los Alamitos, CA: IEEE Computer Society Press.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC