Download:
|
by Jeffrey Kuskin, David Ofelt, Mark Heinrich, John Heinlein, Richard Simoni, Kourosh Gharachorloo, John Chapin, David Nakahira, Joel Baxter, Mark Horowitz, Anoop Gupta, Mendel Rosenblum, John Hennessy
http://www.csl.cornell.edu/~heinrich/papers/FLASH.ps
Add To MetaCart
Abstract:
The FLASH multiprocessor efficiently integrates support for cache-coherent shared memory and high-performance message passing, while minimizing both the hardware and software overhead. Each node in FLASH contains a microprocessor, a portion of the machine's global memory, a port to the interconnection network, an I/O interface, and a custom node controller called MAGIC. The MAGIC chip handles all communication both within the node and among nodes, using hardwired data paths for efficient data movement and a programmable processor optimized for executing protocol operations. The use of the protocol processor makes FLASH very flexible--- it can support a variety of different communication mechanisms--- and simplifies the design and implementation. This paper presents the overall architecture of FLASH and MAGIC, and discusses the base cachecoherence and message-passing protocols. Latency and occupancy numbers, which are derived from our system-level simulator and our Verilog code, are given for several of the common protocol operations. The paper also describes our software strategy, implementation strategy, and current status. The two architectural techniques for communicating data among processors in a scalable multiprocessor are
Citations
|
3148
|
Computer architecture: a quantitative approach
– Hennessy, Patterson
- 1990
|
|
705
|
SPLASH: Stanford Parallel Applications for Shared Memory
– Singh, Weber, et al.
- 1992
|
|
338
|
The Directory-Based Cache Coherence Protocol for the Dash Multiprocessor
– Lenoski
- 1990
|
|
254
|
APRIL: a processor architecture for multiprocessing
– Agarwal, Lim, et al.
- 1990
|
|
137
|
The J-Machine multicomputer: an architectural evaluation
– Noakes, Wallach, et al.
- 1993
|
|
116
|
A multithreaded massively parallel architecture
– unknown authors
- 1992
|
|
78
|
A tightly-coupled processor-network interface
– Henry, Joerg
- 1992
|
|
78
|
Meiko cs-2 interconnect elan-elite design
– Homewood, McLaren
- 1993
|
|
69
|
E cient superscalar performance through boosting
– Smith, Horowitz, et al.
- 1992
|
|
67
|
et al., “The Stanford DASH Multiprocessor
– Lenoski
- 1992
|
|
66
|
Using and Porting GNU CC. Free Software Foundation
– Stallman
- 1988
|
|
64
|
Anatomy of a Message in the Alewife Multiprocessor
– Kubiatowicz, Agarwal
- 1993
|
|
62
|
Simulation of Multiprocessors: Accuracy and Performance
– Goldschmidt
- 1993
|
|
26
|
Eicken et al., Active Messages: a Mechanism for Integrated Communication and Computation
– von
- 1992
|
|
21
|
Cache Coherence Directories for Scalable Multiprocessors
– Simoni
- 1992
|
|
9
|
Integrating multiple communication paradigms in high performance multiprocessors
– Heinlein, Gharachorloo, et al.
- 1994
|
|
3
|
Spertus et al. Evaluation of Mechanisms for Fine-Grained Parallel Programs in the J-Machine and the CM-5
– Ellen
- 1993
|
|
1
|
A Technique for Efficient Machine Simulation
– Mable
|