Download:
|
by Yuanyuan Zhou, Liviu Iftode, Jaswinder Pal Singh, Kai Li, Brian R. Toonen, Ioannis Schoinas, Mark D. Hill, David A. Wood
In Proceedings of the 6th ACM Symposium on Principles and Practice of Parallel Programming
http://www.cs.princeton.edu/~yzhou/paper/ppopp97.ps
Add To MetaCart
Abstract:
During the past few years, two main approaches have been taken to improve the performance of software shared memory implementations: relaxing consistency models and providing ne-grained access control. Their performance tradeos, however, are not well understood. This paper studies these tradeos on a platform that provides access control in hardware but runs coherence protocols in software. We compare the performance of three protocols across four coherence granularities, using 12 applications on a 16-node cluster of workstations. Our results show that no single combination of protocol and granularity performs best for all the applications. The combination of a sequentially consistent (SC) protocol and ne granularity works well with 7 of the 12 applications. The combination of a multiple-writer, homebased lazy release consistency (HLRC) protocol and page granularity works well with 8 out of the 12 applications. For applications that suer performance losses in moving to coarser granularity under sequential consistency, the performance can usually be regained quite eectively using relaxed protocols, particularly HLRC. We also nd that the HLRC protocol performs substantially better than a single-writer lazy release consistent (SW-LRC) protocol at coarse granularity for many irregular applications. For our applications and platform, when we use the original versions of the applications ported directly from hardware-coherent shared memory, we nd that the SC protocol with 256-byte granularity performs best on average. However, when the best versions of the applications are compared, the balance shifts in favor of HLRC at page granularity. 1
Citations
|
847
|
Memory coherence in shared virtual memory systems
– Li, Hudak
- 1989
|
|
801
|
How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs
– Lamport
- 1979
|
|
784
|
Myrinet: A Gigabit-per-second Local Area Network
– Boden, Cohen, et al.
- 1995
|
|
637
|
Memory consistency and event ordering in scalable shared-memory multiprocessors
– Gharachorloo, Lenoski, et al.
- 1990
|
|
530
|
Implementation and performance of Munin
– Carter, Bennett, et al.
- 1991
|
|
477
|
TreadMarks: Distributed shared memory on standard workstations and operating systems
– Keleher, Dwarkadas, et al.
- 1994
|
|
360
|
The Midway distributed shared memory system
– Bershad, Zekauskas, et al.
- 1993
|
|
323
|
Tempest and Typhoon: User-Level Shared Memory
– Reinhardt, Larus, et al.
- 1994
|
|
291
|
High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet
– Pakin, Lauria, et al.
- 1995
|
|
202
|
Shasta: A Low Overhead, Software-Only Approach for Supporting Fine-Grain Shared Memory
– Scales, Gharachorloo, et al.
- 1996
|
|
179
|
A Shared Virtual Memory System for Parallel Computing
– Li
- 1988
|
|
137
|
Performance evaluation of two home-based lazy release consistency protocols for shared memory virtual memory systems
– Zhou, Iftode, et al.
- 1996
|
|
133
|
Scope consistency: A bridge between release consistency and entry consistency
– Iftode, Singh, et al.
- 1996
|
|
121
|
Adaptive software cache management for distributed shared memory architectures
– Bennett, Carter, et al.
- 1990
|
|
107
|
Analysis of cache invalidation patterns in multiprocessors
– Weber, Gupta
- 1989
|
|
99
|
The Relative Importance of Concurrent Writers and Weak Consistency Models
– Keleher
- 1996
|
|
93
|
Improving Release-Consistent Shared Virtual Memory Using Automatic Update
– Iftode, Dubnicki, et al.
- 1996
|
|
82
|
SoftFLASH: Analyzing the Performance of Clustered Distributed Virtual Shared Memory
– Erlichson, Nuckolls, et al.
- 1996
|
|
65
|
Lazy consistency for software distributed shared memory
– KELEHER, COX, et al.
- 1992
|
|
60
|
Application restructuring and performance portability across shared virtual memory and hardwarecoherent multiprocessors
– Jiang, Shan, et al.
- 1997
|
|
54
|
Understanding Application Performance on Shared Virtual Memory Systems
– Iftode, Singh, et al.
|
|
49
|
LogP Performance Assessment of Fast Network Interfaces
– Culler, Liu, et al.
|
|
28
|
Implementing Fine-Grain Distributed Shared Memory On Commodity SMP Workstations
– Schoinas, Falsafi, et al.
- 1996
|
|
16
|
Typhoon-zero implementation: The vortex module
– Pfile
- 1995
|
|
15
|
A coherency model for virtually shared memory
– Borrmann, Herdieckerhoff
- 1990
|
|
8
|
Fine-grain Access for Distributed Shared Memory
– Schoinas, Falsafi, et al.
- 1994
|
|
2
|
Typhoon-Zero Implementation: The Vortex Module
– Robert
- 1995
|
|
2
|
PerformanceEvaluation of Two Home-Based Lazy Release Consistency Protocols for Shared Virtual Memory Systems
– Zhou, Iftode, et al.
- 1996
|
|
1
|
Decoupled Hardware Support for Distributed Shared Memory
– Reinhard, Pfile, et al.
- 1996
|