Download:
by Kevin B. Theobald, Gagan Agrawal, Rishi Kumar, Gerd Heber, Guangr. Gao, Paul Stodghill, Keshav Pingali
In Proceedings of Supercomputing’2000
http://www.sc2000.org/techpapr/papers/pap.pap293.pdf
Add To MetaCart
Abstract:
We report on our work in developing a fine-grained multithreaded solution for the communicationintensive Conjugate Gradient (CG) problem. In our recent work, we developed a simple yet efficient program for sparse matrix-vector multiply on a multithreaded system. This paper presents an effective mechanism for the reduction-broadcast phase, which is integrated with the sparse MVM, resulting in a scalable implementation of the complete CG application. Three major observations from our experiments on the EARTH multithreaded testbed are: (1) The scalability of our CG implementation is impressive, e.g., absolute speedup is 90 on 120 processors for the NAS CG class B input. (2) Our dataflow-style reductionbroadcast network based on fine-grain multithreading is twice as fast as a serial reduction scheme on the same system. (3) By slowing down the network by a factor of 2, no notable degradation of overall CG performance was observed. 1.
Citations
|
175
|
Supporting Compiling Global Name-Space Parallel Loops for Distributed Execution
– Koelbel, Mehrotra
- 1991
|
|
114
|
Run-time scheduling and execution of loops on message passing machines
– Saltz, Crowley, et al.
- 1991
|
|
99
|
The design and implementation of a parallel unstructured Euler solver using software primitives
– Das, Mavriplis, et al.
- 1994
|
|
76
|
A Cellular Computer to Implement the Kalman Filter Algorithm
– Cannon
- 1969
|
|
76
|
der Vorst. Numerical Linear Algebra for High-Performance Computers
– Dongarra, Duff, et al.
- 1998
|
|
66
|
Distributed Memory Compiler Design for Sparse Problems
– Saltz, Wu, et al.
- 1991
|
|
50
|
Execution time support for adaptive scientific algorithms on distributed memory machines. Concurrency: Practice and Experience
– Berryman, Saltz, et al.
- 1991
|
|
46
|
Polling Watchdog: Combining polling and interrupts for efficient message handling
– Maquelin, Gao, et al.
- 1996
|
|
45
|
A study of the EARTH-MANNA multithreaded system
– Hum, Maquelin, et al.
- 1996
|
|
43
|
Parallelizing molecular dynamics programs for distributed memory machines
– Hwang, Das, et al.
- 1995
|
|
43
|
Runtime and language support for compiling adaptive irregular programs. Software-Practice and Experience
– Hwang, Moon, et al.
- 1995
|
|
39
|
Building multithreaded architectures with off-the-shelf microprocessors
– Hum, Theobald, et al.
|
|
38
|
Runtime Compilation Techniques for Data Partitioning and Communication Schedule Reuse
– Ponnusamy, Saltz, et al.
- 1993
|
|
32
|
Handling irregular problems with Fortran D - a preliminary report
– Hanxleden
- 1993
|
|
31
|
A design study of the EARTH multiprocessor
– Hum, Maquelin, et al.
- 1995
|
|
28
|
Latency hiding in message-passing architectures
– Bruening, Giloi, et al.
|
|
21
|
EARTH: An Efficient Architecture for Running Threads
– Theobald
- 1999
|
|
19
|
Interprocedural compilation of irregular applications for distributed memory machines
– Agrawal, Saltz
- 1995
|
|
17
|
An efficient hybrid dataflow architecture model
– Gao
- 1993
|
|
12
|
Modeling the weather with a data flow supercomputer
– Dennis, Gao, et al.
- 1984
|
|
10
|
Overview of the Threaded-C language
– Theobald, Amaral, et al.
- 1998
|
|
7
|
Developing a communication intensive application on the EARTH multithreaded architecture
– Theobald, Kumar, et al.
- 2000
|
|
6
|
A Framework for Sparse Matrix Code Synthesis from High-level Specifications
– Ahmed, Mateev, et al.
- 2000
|
|
6
|
Handling irregular problems with Fortran D -- a preliminary report
– Hanxleden
- 1993
|
|
2
|
Definition of the MultiThreaded Architecture (MTA) model
– Hum, Theobald, et al.
- 1993
|