MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Multiprocessor Runtime Support for Fine-Grained Irregular DAGs (1995) [20 citations — 1 self]

Download:
Download as a PDF
by Frederic T. Chong, Shamik D. Sharma Y, Eric A. Brewer Z, Joel Saltz X
In Rajiv K. Kalia and Priya Vashishta, editors, Toward Teraflop Computing and New Grand Challenge Applications
ftp://ftp.cs.umd.edu/pub/hpsl/papers/papers-pdf/irreg-dags.pdf
Add To MetaCart

Abstract:

We examine multiprocessor runtime support for ne-grained, irregular directed acyclic graphs (DAGs) such as those that arise from sparse-matrix triangular solves. We conduct our experiments on the CM-5, whose lower latencies and active-message support allow us to achieve unprecedented speedups for a general multiprocessor. Where as previous implementations have maximum speedups of less than 4 on even simple banded matrices, we are able to obtain scalable performance on extremely small and irregular problems. On a matrix with only 5300 rows, we are able to achieve scalable performance with a speedup of 34 for 128 processors, resulting in an absolute performance of over 33 million double-precision oating point operations per second. We achieve these speedups with non-matrix-speci c methods which are applicable to any DAG. We compare a range of run-time preprocessed and dynamic approaches on matrices from the Harwell-Boeing benchmark set. Although precomputed data distributions and execution schedules produce the best performance, we nd that it is challenging to keep their cost low enough to make them worthwhile on small, ne-grained problems. Additionally, we nd that a policy of frequent network polling can reduce communication overhead by a factor of three over the standard CM-5 policies. We present a detailed study of runtime overheads and demonstrate that send and receive processor overhead still dominate these applications on the CM-5. We conclude that these applications would highly bene t from architectural support for low-overhead communication.

Citations

237 Users' guide for the Harwell-Boeing sparse matrix collection (release I).Technical Report TR/PA/92/86, Research and Technology Division, Boeing Computer Services – Duff, Grimes, et al. - 1992
154 The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor – Agarwal, Chaiken, et al. - 1991
115 RunTime Parallelization and Scheduling of Loops – Saltz, Mirchandaney, et al. - 1991
98 A Comparison of Clustering Heuristics for Scheduling DAGS on Multiprocessors – Gemsoulis, Yang - 1992
96 Partitioning and Scheduling Parallel Programs for Execution on Multiprocessors – Sarkar - 1989
70 How to get good performance from the cm-5 data network – Brewer, Kuszmaul - 1994
48 Implementing an irregular application on a distributed memory multiprocessor – Chakrabarti, Yelick - 1993
39 Optimal parallel solution of sparse triangular systems – Alvarado, Schreiber - 1990
32 Assessing the benefits of fine-grained parallelism in dataflow programs – Arvind, Maa - 1988
30 A parallel solution method for large sparse systems of equations – Lucas, Blank, et al. - 1987
26 Eicken et al., Active Messages: a Mechanism for Integrated Communication and Computation – von - 1992
26 Scheduling and Code Generation for Parallel Architectures – Yang - 1993
23 Performance of the iPSC/860 Node Architecture – Moyer - 1991
23 Experience with fine-grain synchronization in mimd machines for preconditioned conjugate gradient – Yeung, Agarwal - 1993
21 Leiserson et al. The Network Architecture of the Connection Machine CM-5 – Charles - 1992
16 et al, “The message-driven processor: A multicomputer processing node with efficient mechanisms – Dally - 1992
12 Distributed solution of sparse linear systems – Heath, Raghavan - 1993
9 Aggregation methods for solving sparse triangular systems on multiprocessors – Saltz - 1990
7 and Padma Raghavan. Distributed solution of sparse linear systems – Heath - 1993
7 T3D system architecture overview – CRAY - 1993
6 Data Flow Computing and the Conjugate Gradient Method – Rubin - 1992
5 Overview of the START(*T) multithreaded computer – Beckerle - 1993
4 Strata: A high-performance communications library – Brewer, Blumofe - 1994
3 Experience with ne-grain synchronization in MIMD machines for preconditioned conjugate gradient – Yeung, Agarwal - 1993
2 A fast rerdering algorithm for parallel sparse triangular solution – Pothen, Alvarado - 1992
1 Assessing the bene ts of ne-grained parallelism in data ow programs – Arvind, Maa - 1988
1 Data ow computing and the conjugate gradient method – Rubin - 1992