Download:
|
by Yuanyuan Zhou, Liviu Iftode, Kai Li
In Proceedings of the Operating Systems Design and Implementation Symposium
http://jazz.snu.ac.kr/~joonwon/dsm/paper/245_PerformanceEvaluationOfTwoHomeBasedLazyReleaseConsistencyProtocolsForSharedVirtualMemorySystems_OSDI96.ps
Add To MetaCart
Abstract:
This paper investigates the performance of shared virtual memory protocols on large-scale multicomputers. Using experiments on a 64-node Paragon, we show that the traditional Lazy Release Consistency (LRC) protocol does not scale well, because of the large number of messages it requires, the large amount of memory it consumes for protocol overhead data, and because of the difficulty of garbage collecting that data. To achieve more scalable performance, we introduce and evaluate two new protocols. The first, Homebased LRC (HLRC), is based on the Automatic Update Release Consistency (AURC) protocol. Like AURC, HLRC maintains a home for each page to which all updates are propagated and from which all copies are derived. Unlike AURC, HLRC requires no specialized hardware support. We find that the use of homes provides substantial improvements in performance and scalability over LRC. Our second protocol, called Overlapped Home-based LRC (OHLRC), takes advantage of the communication processor found on each node of the Paragon to offload some of the protocol overhead of HLRC from the critical path followed by the compute processor. We find that OHLRC provides modest improvements over HLRC. We also apply overlapping to the base LRC protocol, with similar results. Our experiments were done using five of the Splash-2 benchmarks. We report overall execution times, as well as detailed breakdowns of elapsed time, message traffic, and memory use for each of the protocols. 1
Citations
|
847
|
Memory coherence in shared virtual memory systems
– Li, Hudak
- 1989
|
|
784
|
Myrinet: A Gigabit-per-second Local Area Network
– Boden, Cohen, et al.
- 1995
|
|
705
|
SPLASH: Stanford Parallel Applications for Shared Memory
– Singh, Weber, et al.
- 1992
|
|
637
|
Memory consistency and event ordering in scalable shared-memory multiprocessors
– Gharachorloo, Lenoski, et al.
- 1990
|
|
530
|
Implementation and performance of Munin
– Carter, Bennett, et al.
- 1991
|
|
477
|
TreadMarks: Distributed shared memory on standard workstations and operating systems
– Keleher, Dwarkadas, et al.
- 1994
|
|
360
|
The Midway distributed shared memory system
– Bershad, Zekauskas, et al.
- 1993
|
|
323
|
Tempest and Typhoon: User-Level Shared Memory
– Reinhardt, Larus, et al.
- 1994
|
|
318
|
The Stanford FLASH Multiprocessor
– Kuskin, Ofelt, et al.
- 1994
|
|
269
|
Virtual memory mapped network interface for the SHRIMP multicomputer
– Blumrich, Li, et al.
- 1994
|
|
199
|
High-performance all-software distributed shared memory
– Johnson
- 1995
|
|
133
|
Scope consistency: A bridge between release consistency and entry consistency
– Iftode, Singh, et al.
- 1996
|
|
99
|
The Relative Importance of Concurrent Writers and Weak Consistency Models
– Keleher
- 1996
|
|
98
|
The DASH Prototype: Logic Overhead and Performance
– Lenoski, Laudon, et al.
- 1993
|
|
93
|
Improving Release-Consistent Shared Virtual Memory Using Automatic Update
– Iftode, Dubnicki, et al.
- 1996
|
|
78
|
Meiko cs-2 interconnect elan-elite design
– Homewood, McLaren
- 1993
|
|
75
|
Software Versus Hardware Shared-Memory Implementation: a Case Study
– Cox, Dwarkadas, et al.
- 1994
|
|
71
|
Hardware and Software Support for Efficient Exception Handling
– Thekkath, Levy
- 1994
|
|
65
|
Lazy consistency for software distributed shared memory
– KELEHER, COX, et al.
- 1992
|
|
54
|
Understanding Application Performance on Shared Virtual Memory Systems
– Iftode, Singh, et al.
|
|
45
|
Software support for virtual memory-mapped communication
– Dubnicki, Felten, et al.
- 1996
|
|
35
|
Hiding Communication Latency and Coherence Overhead in Software DSMs
– Bianchini, Kontohanassis, et al.
- 1996
|
|
33
|
Performance Evaluation of a Cluster-Based Multiprocessor Built from ATM Switches and Bus-Based Multiprocessor Servers
– Karlsson, Stenstrom
- 1996
|
|
29
|
A comparison of entry consistency and lazy release consistency implementations
– Adve, Cox, et al.
- 1996
|
|
29
|
A Distributed Implementation of the Shared Data-Object Model
– Bal, Kaashoek, et al.
- 1989
|
|
29
|
Overview of network memory channel for PCI
– Gillett, Collins, et al.
- 1996
|
|
27
|
Early Experience with Message-Passing on the SHRIMP Multicomputer
– Felten, Alpert, et al.
- 1996
|
|
27
|
The paragon implementation of the NX message passing interface
– Pierce, Regnier
- 1994
|
|
17
|
Routing Chip Set for Intel Paragon Parallel Supercomputer
– Traylor, Dunning
- 1992
|
|
5
|
Hardware and Software Support for Ecient Exception Handling
– Thekkath, Levy
- 1994
|
|
1
|
The Paragon Implementationof the NX Message Passing Interface
– Pierce, Regnier
- 1994
|