Download:
by Eylon Caspi, André Dehon, John Wawrzynek
In Proceedings of the Third Workshop on Media and Stream Processors
http://brass.cs.berkeley.edu/documents/msp3.pdf
Add To MetaCart
Abstract:
We present SCORE (Stream Computations Organized for Reconfigurable Execution), a multi-threaded model that relies on streams to expose thread parallelism and to enable efficient scheduling, low-overhead communication, and scalability. We present work to-date on SCORE for scalable reconfigurable logic, as well as implementation ideas for SCORE for processor architectures. We demonstrate that streams can be exposed as a clean architectural feature that supports forward compatibility to larger, more parallel hardware. 1. OVERVIEW For the past several decades, the predominant architectural abstraction for programmable computation systems has been the instruction set architecture (ISA). An ISA defines an instruction set and semantics for executing it. A key benefit of the ISA model is that those semantics decouple software from hardware development. A piece of software, written and compiled once, is guaranteed to run on any ISA-compatible device. This guarantee allows hardware to evolve over time, growing larger and faster with each process generation. The existing software base is preserved, and its performance automatically improves with each hardware generation. The ISA abstraction has been instrumental in protecting our investment in software and allowing it to ride Moore’s law to better performance. Two shining examples are the IBM 360 and Intel x86 architectures, which have survived commercially for decades. The latter, in its 23 years of existence, has seen clock speeds increase nearly 400x and transistor counts grow nearly 10,000x. 1 Increasingly, however, ISA uniprocessors are running out of headroom for performance improvement, due primarily to the increasing costs of extracting and exploiting instruction level parallelism (ILP). Today’s state-of-the
Citations
|
459
|
Semantics of a Simple Language for Parallel Programming
– Kahn
- 1974
|
|
318
|
The Stanford FLASH Multiprocessor
– Kuskin, Ofelt, et al.
- 1994
|
|
164
|
Dataflow Process Networks
– Lee, Parks
- 1995
|
|
163
|
The MIT Alewife Machine: Architecture and Performance
– Agarwal, Bianchini, et al.
- 1995
|
|
139
|
Software Synthesis from Dataflow Graphs
– Bhattacharyya, Murthy, et al.
- 1996
|
|
121
|
Scheduling Dynamic Dataflow Graphs with Bounded Memory Using the Token Flow Model
– Buck
- 1993
|
|
109
|
The M-machine multicomputer
– Fillo, Keckler, et al.
- 1995
|
|
98
|
Data ow supercomputers
– Dennis
- 1980
|
|
56
|
Two fundamental issues in multiprocessing
– Arvind, Ianucci
- 1987
|
|
49
|
Ultra-low-power domain-specific multimedia processors
– Abnous, Rabaey
- 1996
|
|
40
|
Cheops: A reconfigurable data-flow system for video processing
– Bove, Watlington
- 1995
|
|
28
|
Mosaic C: an experimental fine-grain multicomputer. Future Tendencies
– Seitz
- 1992
|
|
26
|
Eicken et al., Active Messages: a Mechanism for Integrated Communication and Computation
– von
- 1992
|
|
16
|
et al, “The message-driven processor: A multicomputer processing node with efficient mechanisms
– Dally
- 1992
|
|
8
|
Thorsten von Eicken. TAM -- a compiler controlled threaded abstract machine
– Culler, Goldstein, et al.
- 1993
|
|
6
|
André DeHon. HSRA: High-speed, hierarchical synchronous reconfigurable array
– Tsu, Macy, et al.
- 1999
|
|
5
|
André DeHon. Stream computations organized for reconfigurable execution (SCORE): Extended Abstract
– Caspi, Chu, et al.
|
|
4
|
E cient, protected message interface in the MIT M-Machine
– Lee, Dally, et al.
- 1998
|
|
3
|
Overview and status of the stanford dash multiprocessor
– Lenoski, Laudon, et al.
- 1991
|
|
3
|
Analysis of Quasi-Static Scheduling Techniques in a Virtualized Reconfigurable Machine
– Markovskiy
- 2002
|
|
2
|
MagicEight: An architecture for media processing and an implementation. Thesis proposal
– Watlington
- 1999
|