MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Algorithmic redistribution methods for block cyclic decompositions (1996) [16 citations — 2 self]

Download:
pdf | ps
by Antoine P. Petitet, Jack J. Dongarra
IEEE Trans. on PDS
http://sesame.hensa.ac.uk/lapack/lawns/lawn133.ps
Add To MetaCart

Abstract:

Abstract. In a serial computational environment, transportable efficiency is the essential motivation for developing blocking strategies and block-partitioned algorithms. An algorithmic blocking factor adjusts the granularity of the subtasks to maximize the efficiency of the hardware resources. In a distributed-memory environment, load balance is the essential motivation for distributing array entries over a collection of processes according to the block cyclic decomposition scheme. A distribution blocking factor is used to partition an array into blocks that are then mapped onto the processes. Optimal values of the algorithmic and distribution blocking factors often differ for a given algorithm and target architecture. Despite this fact, most of the parallel algorithms proposed in the literature assume the values of these blocking factors to be identical. This feature limits the flexibility and ease of use of such algorithms. When these blocking factors differ, methods are necessary to redistribute some data into the appropriate algorithmic form. This paper presents and discusses such algorithmic redistribution methods for the block cyclic decomposition scheme. Algorithmic redistribution methods attempt to reorganize logically the computations and communications within an algorithmic context. In order to derive such methods, some properties of the block cyclic data distribution are first exhibited. Various algorithmic redistribution methods are

Citations

231 ScaLAPACK Users’ Guide – Blackford, Choi, et al. - 1997
154 Optimizing matrix multiply using PHiPAC: A portable, highperformance, ANSI C coding methodology – Bilmes, Asanovic, et al. - 1997
150 ScaLAPACK: a scalable linear algebra library for distributed memoryconcurrent computers – Choi, Dongarra, et al. - 1992
141 ScaLAPACK: a Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance," presented at Supercomputing '96 – Blackford, al - 1996
85 A linear algebra framework for static hpf code distribution – Ancourt, Coelho, et al. - 1993
72 Software libraries for linear algebra computations on high performance computers – Dongarra, Walker - 1994
53 The torus-wrap mapping for dense matrix calculations on massively parallel computers – Hendrickson, Womble - 1994
47 Matrix algorithms on a hypercube I: Matrix multiplication – Fox, Otto, et al. - 1987
43 PUMMA: Parallel Universal Matrix Multiplication Algorithms,” Concurrency – Choi, Dongarra, et al. - 1994
37 Parallel Solution of Triangular Systems on Distributed-Memory Multiprocessors – Heath, Romine - 1988
32 A high performance matrix multiplication algorithm on a distributed memory parallel computer using overlapped communication – Agarwal, Gustavson, et al. - 1994
31 Parallel matrix transpose algorithms on distributed memory concurrent computers. Technical Report, TM-12309. Oak Ridge Bational Laboratory, Mat heinatical Sciences Section – Choi, Dongarra, et al. - 1993
29 de Geijn. Parallelizing the QR Algorithm for the Unsymmetric Algebraic Eigenvalue Problem: Myths and Reality – Henry, Van - 1997
28 Scheduling block-cyclic array redistribution – Desprez, Dongarra, et al. - 1998
27 PB-BLAS: A set of parallel block basic linear algebra subprograms – Choi, Dongarra, et al. - 1994
27 de Geijn. Two dimensional basic linear algebra communication subprograms. LAPACK Working Note 37 – Dongarra, van - 1992
25 A User’s Guide to the BLACS v1.0 – Dongarra, Whaley - 1995
20 Improving performance of linear algebra algorithms for dense matrices, using algorithmic prefetch. IBM – Agarwal, Gustavson, et al. - 1994
19 de Geijn. Parallel implementation of BLAS: General techniques for level 3 BLAS – Chtchelkanova, Gunnels, et al. - 1995
18 de Vorst. Parallel LU decomposition on a transputer network – Bisseling, van
16 The distributed solution of linear systems using the torus wrap data mapping. Engineering Computing and Analysis – Ashcraft - 1990
14 QR factorization of a dense matrix on a hypercube multiprocessor – Chu, George - 1990
11 A Parallel Block Implementation of Level 3 BLAS for MIMD Vector Processors – Dayde, Duff, et al. - 1994
11 A Parallel Eigensolver for Dense Symmetric Matrices. submitted to – Hendrickson, Jessup, et al. - 1996
9 A new parallel matrix multiplication algorithm on distributed memory concurrent computers. Concurrency: Practice and Experience – Choi - 1998
9 Scalability issues in the design of a library for dense linear algebra – Dongarra, Geijn, et al. - 1994
8 Generating Local Adresses and Communication Sets for Data-Parallel Program – Chatterjee, Gilbert, et al. - 1993
7 LU Factorization Algorithms on Distributed Memory Multiprocessor Architectures – Geist, Romine - 1988
6 The Parallelization of Level 2 – Aboelaze, Chrisochoides, et al. - 1991
6 The Data-Distribution-Independent Approach to Scalable Parallel Libraries – Bangalore - 1995
5 der Vorst. Parallel Triangular System Solving on a mesh network of Transputers – BISSELING, VAN - 1991
3 The LINPACK Benchmark on the AP 1000 – Brent - 1992