A Super-Programming Technique for Large Sparse Matrix Multiplication on PC Clusters (2004)
| Venue: | on PC clusters, IEICE Trans. Info. Systems E87-D |
| Citations: | 4 - 3 self |
BibTeX
@INPROCEEDINGS{Jin04asuper-programming,
author = {Dejiang Jin and Sotirios G. Ziavras},
title = {A Super-Programming Technique for Large Sparse Matrix Multiplication on PC Clusters},
booktitle = {on PC clusters, IEICE Trans. Info. Systems E87-D},
year = {2004},
pages = {1774--1781}
}
OpenURL
Abstract
The multiplication of large spare matrices is a basic operation for many scientific and engineering applications. There exist some high-performance library routines for this operation. They are often optimized based on the target architecture. The PC cluster computing paradigm has recently emerged as a viable alternative for high-performance, low-cost computing. In this paper, we apply our super-programming approach [24] to study the load balance and runtime management overhead for implementing parallel large matrix multiplication on PC clusters. For a parallel environment, it is essential to partition the entire operation into tasks and assign them to individual processing elements. Most of the existing approaches partition the given sub-matrices based on some kinds of workload estimation. For dense matrices on some architectures estimations may be accurate. For sparse matrices on PC, however, the workloads of block operations may not necessarily depend on the size of data. The workloads may not be well estimated in advance. Any approach other than run-time dynamic partitioning may degrade performance. Moreover, in a heterogeneous environment, statically partitioning is NP-complete. For embedded problems, it also introduces management overhead. In this paper We adopt our super-programming approach that partitions the entire task into medium-grain tasks that are implemented using super-instructions; the workload of super-instructions is easy to estimate. These tasks are dynamically assigned to member computer nodes. A node may execute more than one super-instruction. Our results prove the viability of our approach.







