MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Landing CG on EARTH: A case study of fine-grained multithreading on an evolutionary path (2000) [2 citations — 2 self]

Download:
Download as a PDF
by Kevin B. Theobald, Gagan Agrawal, Rishi Kumar, Gerd Heber, Guangr. Gao, Paul Stodghill, Keshav Pingali
In Proceedings of Supercomputing’2000
http://www.sc2000.org/techpapr/papers/pap.pap293.pdf
Add To MetaCart

Abstract:

We report on our work in developing a fine-grained multithreaded solution for the communicationintensive Conjugate Gradient (CG) problem. In our recent work, we developed a simple yet efficient program for sparse matrix-vector multiply on a multithreaded system. This paper presents an effective mechanism for the reduction-broadcast phase, which is integrated with the sparse MVM, resulting in a scalable implementation of the complete CG application. Three major observations from our experiments on the EARTH multithreaded testbed are: (1) The scalability of our CG implementation is impressive, e.g., absolute speedup is 90 on 120 processors for the NAS CG class B input. (2) Our dataflow-style reductionbroadcast network based on fine-grain multithreading is twice as fast as a serial reduction scheme on the same system. (3) By slowing down the network by a factor of 2, no notable degradation of overall CG performance was observed. 1.

Citations

175 Supporting Compiling Global Name-Space Parallel Loops for Distributed Execution – Koelbel, Mehrotra - 1991
114 Run-time scheduling and execution of loops on message passing machines – Saltz, Crowley, et al. - 1991
99 The design and implementation of a parallel unstructured Euler solver using software primitives – Das, Mavriplis, et al. - 1994
76 A Cellular Computer to Implement the Kalman Filter Algorithm – Cannon - 1969
76 der Vorst. Numerical Linear Algebra for High-Performance Computers – Dongarra, Duff, et al. - 1998
66 Distributed Memory Compiler Design for Sparse Problems – Saltz, Wu, et al. - 1991
50 Execution time support for adaptive scientific algorithms on distributed memory machines. Concurrency: Practice and Experience – Berryman, Saltz, et al. - 1991
46 Polling Watchdog: Combining polling and interrupts for efficient message handling – Maquelin, Gao, et al. - 1996
45 A study of the EARTH-MANNA multithreaded system – Hum, Maquelin, et al. - 1996
43 Parallelizing molecular dynamics programs for distributed memory machines – Hwang, Das, et al. - 1995
43 Runtime and language support for compiling adaptive irregular programs. Software-Practice and Experience – Hwang, Moon, et al. - 1995
39 Building multithreaded architectures with off-the-shelf microprocessors – Hum, Theobald, et al.
38 Runtime Compilation Techniques for Data Partitioning and Communication Schedule Reuse – Ponnusamy, Saltz, et al. - 1993
32 Handling irregular problems with Fortran D - a preliminary report – Hanxleden - 1993
31 A design study of the EARTH multiprocessor – Hum, Maquelin, et al. - 1995
28 Latency hiding in message-passing architectures – Bruening, Giloi, et al.
21 EARTH: An Efficient Architecture for Running Threads – Theobald - 1999
19 Interprocedural compilation of irregular applications for distributed memory machines – Agrawal, Saltz - 1995
17 An efficient hybrid dataflow architecture model – Gao - 1993
12 Modeling the weather with a data flow supercomputer – Dennis, Gao, et al. - 1984
10 Overview of the Threaded-C language – Theobald, Amaral, et al. - 1998
7 Developing a communication intensive application on the EARTH multithreaded architecture – Theobald, Kumar, et al. - 2000
6 A Framework for Sparse Matrix Code Synthesis from High-level Specifications – Ahmed, Mateev, et al. - 2000
6 Handling irregular problems with Fortran D -- a preliminary report – Hanxleden - 1993
2 Definition of the MultiThreaded Architecture (MTA) model – Hum, Theobald, et al. - 1993