MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  A streaming multi-threaded model (2001) [12 citations — 0 self]

Download:
Download as a PDF
by Eylon Caspi, André Dehon, John Wawrzynek
In Proceedings of the Third Workshop on Media and Stream Processors
http://brass.cs.berkeley.edu/documents/msp3.pdf
Add To MetaCart

Abstract:

We present SCORE (Stream Computations Organized for Reconfigurable Execution), a multi-threaded model that relies on streams to expose thread parallelism and to enable efficient scheduling, low-overhead communication, and scalability. We present work to-date on SCORE for scalable reconfigurable logic, as well as implementation ideas for SCORE for processor architectures. We demonstrate that streams can be exposed as a clean architectural feature that supports forward compatibility to larger, more parallel hardware. 1. OVERVIEW For the past several decades, the predominant architectural abstraction for programmable computation systems has been the instruction set architecture (ISA). An ISA defines an instruction set and semantics for executing it. A key benefit of the ISA model is that those semantics decouple software from hardware development. A piece of software, written and compiled once, is guaranteed to run on any ISA-compatible device. This guarantee allows hardware to evolve over time, growing larger and faster with each process generation. The existing software base is preserved, and its performance automatically improves with each hardware generation. The ISA abstraction has been instrumental in protecting our investment in software and allowing it to ride Moore’s law to better performance. Two shining examples are the IBM 360 and Intel x86 architectures, which have survived commercially for decades. The latter, in its 23 years of existence, has seen clock speeds increase nearly 400x and transistor counts grow nearly 10,000x. 1 Increasingly, however, ISA uniprocessors are running out of headroom for performance improvement, due primarily to the increasing costs of extracting and exploiting instruction level parallelism (ILP). Today’s state-of-the

Citations

459 Semantics of a Simple Language for Parallel Programming – Kahn - 1974
318 The Stanford FLASH Multiprocessor – Kuskin, Ofelt, et al. - 1994
164 Dataflow Process Networks – Lee, Parks - 1995
163 The MIT Alewife Machine: Architecture and Performance – Agarwal, Bianchini, et al. - 1995
139 Software Synthesis from Dataflow Graphs – Bhattacharyya, Murthy, et al. - 1996
121 Scheduling Dynamic Dataflow Graphs with Bounded Memory Using the Token Flow Model – Buck - 1993
109 The M-machine multicomputer – Fillo, Keckler, et al. - 1995
98 Data ow supercomputers – Dennis - 1980
56 Two fundamental issues in multiprocessing – Arvind, Ianucci - 1987
49 Ultra-low-power domain-specific multimedia processors – Abnous, Rabaey - 1996
40 Cheops: A reconfigurable data-flow system for video processing – Bove, Watlington - 1995
28 Mosaic C: an experimental fine-grain multicomputer. Future Tendencies – Seitz - 1992
26 Eicken et al., Active Messages: a Mechanism for Integrated Communication and Computation – von - 1992
16 et al, “The message-driven processor: A multicomputer processing node with efficient mechanisms – Dally - 1992
8 Thorsten von Eicken. TAM -- a compiler controlled threaded abstract machine – Culler, Goldstein, et al. - 1993
6 André DeHon. HSRA: High-speed, hierarchical synchronous reconfigurable array – Tsu, Macy, et al. - 1999
5 André DeHon. Stream computations organized for reconfigurable execution (SCORE): Extended Abstract – Caspi, Chu, et al.
4 E cient, protected message interface in the MIT M-Machine – Lee, Dally, et al. - 1998
3 Overview and status of the stanford dash multiprocessor – Lenoski, Laudon, et al. - 1991
3 Analysis of Quasi-Static Scheduling Techniques in a Virtualized Reconfigurable Machine – Markovskiy - 2002
2 MagicEight: An architecture for media processing and an implementation. Thesis proposal – Watlington - 1999