Download:
|
by R. W. Ford, A. P. Nisbet
http://www.cs.man.ac.uk/cnc/staff/andy/IPPS97.ps
Add To MetaCart
Abstract:
This paper presents a new scheme to replace course grain barriers with fine grain synchronisation in virtual shared memory systems. Traditionally, shared memory programming models separate data access from synchronisation. In our scheme synchronisation between both writes and their subsequent reads, and reads and their following writes, is achieved through the coherence tags associated with each coherence unit. All potential latency hiding for the data is utilised as, an asynchronous update is sent as soon as new data is available and an asynchronous exclusive message is sent as soon as a copy of data is no longer required. The associated coherence traffic is also removed. Performance improvements are presented for two codes 1, representing the core communication found in shallow ( a well known numerical weather prediction benchmark), and CG (from the NAS parallel benchmarks). These are run on a 30 processor EDS prototype, with sequentially consistent invalidation based VSM. Although implemented in software there is much potential to support these optimisations in hardware. Currently these optimisations are performed by the programmer but there is also much scope for automating this process within a compiler. 1
Citations
|
871
|
Memory Coherence in Shared Virtual Memory Systems
– Li, Hudak
- 1989
|
|
475
|
The NAS Parallel Benchmarks
– Bailey, Barszcz, et al.
- 1991
|
|
350
|
The directory-based cache coherence protocol for the DASH multiprocessor
– Lenoski
- 1990
|
|
328
|
Tempest and Ty-phoon: User-Level Shared Memory
– Reinhardt, Larus, et al.
- 1994
|
|
170
|
Midway: shared memory parallel programming with entry consistency for distributed memory multiprocessors
– Bershad, Zekauskas
- 1991
|
|
134
|
Scope consistency: a bridge between release consistency and entry consistency
– Iftode, Singh, et al.
- 1996
|
|
64
|
The KSR1: Bridging the gap between shared memory and MPPs
– Frank, Burkhardt, et al.
- 1993
|
|
33
|
Improving the performance of DSM systems via compiler involvement.In
– Mirchandaney, Hiranandani, et al.
- 1994
|
|
27
|
Fortran-S: A Fortran Interface for Shared Virtual Memory Architecture
– Bodin, Kervella, et al.
- 1993
|
|
9
|
Integrating multiple communication paradigms in high performance multiprocessors
– Heinlein, Gharachorloo, et al.
- 1994
|
|
8
|
A compiler strategy for SVM
– Bodin, O'Boyle
- 1995
|
|
6
|
A compiler algorithm to reduce invalidation latency in virtual shared memory systems
– O'Boyle, Nisbet, et al.
- 1996
|
|
6
|
The Shallow Benchmark Weather Prediction Program,’’ National Center for Atmospheric Research
– Swartzrauber
- 1984
|
|
5
|
Memory Consistency and Event Ordering in Scaleable Shared-Memory Multiprocessors
– Gharachorloo, Lenoski, et al.
- 1990
|
|
3
|
User level vsm optimisation and its application
– Ford, Nisbet, et al.
- 1995
|
|
3
|
The impact of cache coherence protocols on systems using fine-grain data synchronisation
– Glasco, Delagi, et al.
- 1994
|
|
3
|
et al. Eds a parallel computer system for advanced inoformation processing
– Skelton
- 1992
|
|
2
|
Compiler Reduction of Invalidation Traffic in Virtual Shared Memory Systems
– O'Boyle, Ford, et al.
- 1996
|
|
1
|
Spinning-on-coherency: A new vsm optimisation for write-invalidate
– Nisbet, Ford
- 1996
|