MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Block Size Selection of ParallelÄÍandÉÊon PVP-based and RISC-based Supercomputers

Download:
Download as a PDF | Download as a PS
unknown authors
http://www.rdcps.ac.cn/~zyq/papers/blocksize-model-sr2201-dw3k-2004.ps.gz
Add To MetaCart

Abstract:

Abstract — Optimal block size selection for blocked sequential algorithm is a complicate problem in uniprocessor as it will change with different application loop characteristics, machine architecture and memory hierarchy. It is still a difficult task in parallel computing since it becomes the interaction among local block size selection problem, load balance problem, application memory requirement problem and better communication systems utilization problem. On machines with different local computation speed versus block size variation, different communication capacity versus message length, different physical memory size configuration, and for applications with different communication and computation characteristics, these factors will play different roles in determining the final optimal block size. The task of optimal or near optimal block size

Citations

676 A data locality optimizing algorithm – Wolf, Lam - 1991
251 Strategies for cache and local memory management by global program transformation – Gannon, Jalby, et al. - 1988
176 Parallel Programming with MPI – Pacheco - 1997
141 ScaLAPACK: a Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance," presented at Supercomputing '96 – Blackford, al - 1996
62 Impact of hierarchical memory systems on linear algebra algorithm design – Gallivan, Jalby, et al. - 1988
44 A storage efficient WY representation for products of Householder transformations – Schreiber, Loan - 1989
20 Block-Cyclic Dense Linear Algebra – Lichtenstein, Johnsson - 1992
20 The design and implementation – Choi, Dongarra, et al. - 1996
12 Matrix Factorization using Distributed Panels on the Fujitsu AP1000 – Strazdins - 1995
3 Architecture and performance of the Hitachi SR 2201 massively parallel processor system – Fujii, Yasuda, et al. - 1997
3 and Gudula Runger. Optimal data distributions for LU decomposition – Rauber - 1995
2 The Cache Performance and – Lam, Rothberg, et al. - 1991
2 Improving the ratio of memory operations to floating point operations in loops – Carr, Kennedy - 1994
2 Installing and testing the BLACS – Whaley - 1995
2 Empirical Modelling of Parallel Linear Algebra Routines – Cuenca, Garcła, et al. - 2003
1 Iteration Space Tilling for Memory Hierarchies – Wolfe - 1987
1 More Iteration Space Tilling – Wolde - 1989
1 Quantifying the Multi–Level Nature of Tilling – Mitchell, Högstedt, et al. - 1998
1 Speedup Methods and Implementation – LI, ZHU - 1998
1 Block Size Selection of ParallelÄÍFactorization – ZHANG - 2000
1 Parallel Implementation ofÄÍFactorization – CHI - 1998
1 Optimization of an ÄÍFactorization Routine Using Communication/Computation Overlap – Desprez, Domas, et al. - 1997