Clusters of symmetric multiprocessors (SMPs), connected by commodity system-area networks (SANs) and interfaces are fast being adopted as platforms for parallel computing. Page-grained shared virtual memory (SVM) is a popular way to support a coherent shared address space programming model on these clusters. Previous research has identified several key bottlenecks in the communication, protocol and application layers of a software SVM system that are not so significant in more mainstream, hardware-coherent multiprocessors. A key question for the communication layer is how much and what kind of hardware support is particularly valuable in improving the performance of such systems. This paper examines a popular form of hardware support---namely, support for automatic, hardware propagation of writes to remote memories---discussing new design issues and evaluating performance in the context of emerging clusters. Since much of the performance difference is due to differences in contention effects in various parts of the system, performance is examined through very detailed simulation, utilizing the deep visibility into the simulated system to analyze the causes of observed effects. 1
|
848
|
Memory coherence in shared virtual memory systems
– Li, Hudak
- 1989
|
|
784
|
Myrinet: A Gigabit-per-second Local Area Network
– Boden, Cohen, et al.
- 1995
|
|
477
|
Treadmarks: Distributed shared memory on standard workstations and operating systems
– Keleher, Cox, et al.
- 1994
|
|
363
|
The Stanford Dash Multiprocessor
– Lenoski, Laudon, et al.
- 1992
|
|
323
|
Tempest and Typhoon: User-Level Shared Memory
– Reinhardt, Larus, et al.
- 1994
|
|
269
|
Virtual memory mapped network interface for the SHRIMP multicomputer
– Blumrich, Li, et al.
- 1994
|
|
202
|
Shasta: A Low Overhead, Software-Only Approach for Supporting Fine-Grain Shared Memory
– Scales, Gharachorloo, et al.
- 1996
|
|
137
|
Performance evaluation of two home-based lazy release consistency protocols for shared memory virtual memory systems
– Zhou, Iftode, et al.
- 1996
|
|
133
|
Scope consistency: A bridge between release consistency and entry consistency
– Iftode, Singh, et al.
- 1996
|
|
115
|
Image and Video Compression Standards: Algorithms and
– Bhaskaran, Konstantinides
- 1997
|
|
100
|
Performance of a software mpeg video decoder
– Patel, Smith, et al.
- 1993
|
|
95
|
Effects of communication latency, overhead, and bandwidth in a cluster architecture
– Martin, Vahdat, et al.
- 1997
|
|
93
|
Improving Release-Consistent Shared Virtual Memory Using Automatic Update
– Iftode, Dubnicki, et al.
- 1996
|
|
82
|
SoftFLASH: Analyzing the Performance of Clustered Distributed Virtual Shared Memory
– Erlichson, Nuckolls, et al.
- 1996
|
|
75
|
Software Versus Hardware Shared-Memory Implementation: a Case Study
– Cox, Dwarkadas, et al.
- 1994
|
|
68
|
Methodological Considerations and Characterization of the SPLASH-2 Parallel Application Suite
– Woo, Ohara, et al.
- 1995
|
|
65
|
Lazy consistency for software distributed shared memory
– KELEHER, COX, et al.
- 1992
|
|
60
|
Application restructuring and performance portability across shared virtual memory and hardwarecoherent multiprocessors
– Jiang, Shan, et al.
- 1997
|
|
59
|
Decoupled Hardware Support for Distributed Shared Memory
– Reinhardt, Pfile, et al.
- 1996
|
|
58
|
Design and Implementation of Virtual Memory-Mapped Communication on Myrinet
– Dubnicki, Bilas, et al.
- 1997
|
|
57
|
Software DSM protocols that adapt between single writer and multiple writer
– Amza, Cox, et al.
- 1997
|
|
54
|
Understanding Application Performance on Shared Virtual Memory Systems
– Iftode, Singh, et al.
|
|
47
|
VM-Based Shared Memory on Low-Latency, Remote-Memory-Access Networks
– Kontothanassis, Hunt, et al.
- 1997
|
|
45
|
Home-based svm protocols for smp clusters: design and performance
– Samanta, Bilas, et al.
- 1998
|
|
44
|
Relaxed Consistency and Coherence Granularity in DSM Systems: A Performance Evaluation
– Zhou, Iftode, et al.
- 1997
|
|
41
|
Memory Consistency and Event Ordering
– Gharachorloo, Lenoski, et al.
- 1990
|
|
38
|
Design Issues and Tradeoffs for Write Buffers
– Skadron, Clark
- 1997
|
|
37
|
Using MemoryMapped Network Interfaces to Improve the Performance of Distributed Shared Memory
– Kontothanassis, Scott
- 1996
|
|
35
|
Hiding Communication Latency and Coherence Overhead in Software DSMs
– Bianchini, Kontohanassis, et al.
- 1996
|
|
33
|
Performance Evaluation of a Cluster-Based Multiprocessor Built from ATM Switches and Bus-Based Multiprocessor Servers
– Karlsson, Stenstrom
- 1996
|
|
31
|
Fast Interrupt Priority Management for Operating System Kernels
– Stodolsky, Bershad, et al.
- 1993
|
|
29
|
Overview of network memory channel for PCI
– Gillett, Collins, et al.
- 1996
|
|
24
|
The effects of communication parameters on end performance of shared virtual memory clusters
– Bilas, Singh
- 1997
|
|
23
|
Augmint: A multiprocessor simulation environment for intel x86 architectures
– Sharma, Nguyen, et al.
- 1996
|
|
22
|
Tango Introduction and Tutorial
– Goldschmidt, Davis
- 1990
|
|
22
|
Shared Virtual Memory across SMP Nodes Using Automatic Update: Protocols and Performance
– Belias, Iftode, et al.
- 1996
|
|
18
|
A Data-Parallel Approach for Real-Time MPEG-2 Video Encoding
– Akramullah, Ahmad, et al.
- 1995
|
|
18
|
A parallel implementation of an MPEG1 encoder: faster than real-time
– Shen, Rowe, et al.
- 1995
|
|
17
|
ISO/IEC MPEG-2 software video codec
– Eckart, Fogg
- 1995
|
|
15
|
High Performance Software Coherence for Current and Future Architectures
– Kontothanassis, Scott
- 1995
|
|
11
|
Improving the Performance of Shared Virtual Memory on System Area Networks
– Bilas
- 1998
|
|
11
|
Design issues and tradeo s for write bu ers
– Skadron, Clark
- 1997
|
|
9
|
An Evaluation of Software Distributed Shared Memory for Next-Generation Processors and Networks
– Dwarkadas, Keleher, et al.
- 1993
|
|
9
|
Limits to the performance of software shared memory: A layered approach
– BILAS, JIANG, et al.
- 1999
|
|
8
|
Supporting a coherent shared address space across SMP nodes: An application-driven investigation
– Bilas, Iftode, et al.
- 1996
|
|
8
|
Telegraphos: A Substrate for High Performance Computing on Workstation Clusters
– Katevenis, Markatos, et al.
- 1997
|
|
6
|
Real-time MPEG Video Codec on a Single-Chip Multiprocessor
– Lee, Golston, et al.
- 1994
|
|
6
|
Software Implementation of MPEG-2 Video Encoding Using Socket Programming
– Yu, Anastassiou
- 1994
|
|
6
|
The effects of latency and occupancy on the performance of dsm multiprocessors
– Holt, Heinrich, et al.
- 1995
|
|
5
|
A Real-Time MPEG Software Decoder Using a Portable Message-Passing Library
– Kwong, Tang, et al.
|