Download:
|
by Sarah A. M. Talbot, Paul H. J. Kelly
http://www.doc.ic.ac.uk/~samt/improving_tech.letter.ps.gz
Add To MetaCart
Abstract:
In simple cache coherency protocols, serialisation can occur when many simultaneous accesses are made to data held in a single node, and when many accesses involve a common "home " node controller. This is ameliorated in various designs with a hierarchical or clustered structure. In this paper we investigate the idea of routing requests via an intermediate "proxy " node where combining is used to reduce contention. We present a hashing-based proxy placement scheme, and evaluate a "reactive " approach which invokes proxying only when contention occurs. Simulation results using various benchmarks show that the hotspot contention which occurs in pathological examples can be dramatically reduced, while performance on well-behaved applications is essentially unaffected.
Citations
|
724
|
The SPLASH-2 programs: Characterization and methodological considerations
– Woo, Ohara, et al.
- 1995
|
|
680
|
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and
– Jouppi
- 1990
|
|
95
|
Effects of communication latency, overhead, and bandwidth in a cluster architecture
– Martin, Vahdat, et al.
- 1997
|
|
91
|
The Wisconsin multicube: A new largescale cache-coherent multiprocessor
– Goodman, Woest
- 1988
|
|
81
|
The NYU Ultracomputer { designing an MIMD shared memory parallel computer
– Gottlieb, Grishman, et al.
- 1983
|
|
63
|
The IBM research parallel processor prototype (RP3): Introduction and architecture
– PFISTER, BRANTLEY, et al.
- 1985
|
|
50
|
The S3.mp Scalable Shared Memory Multiprocessor
– Nowatzyk
- 1995
|
|
38
|
Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten von Eicken. LogP: Towards a realistic model of parallel computation
– Culler, Karp, et al.
- 1993
|
|
33
|
The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors
– Holt, Heinrich, et al.
- 1995
|
|
30
|
A survey of PRAM simulation techniques
– Harris
- 1994
|
|
30
|
Application and Architectural Bottlenecks in Large Scale Distributed Shared Memory Machines
– Holt, Singh, et al.
- 1996
|
|
30
|
et al., “Coherence Controller Architectures for SMPBased CC-NUMA
– Michael
- 1997
|
|
22
|
The cache coherence protocol of the data diffusion machine
– Haridi, Hagersten
- 1989
|
|
20
|
Extending the Scalable Coherent Interface for Large-Scale Shared-Memory Multiprocessors
– Johnson
- 1993
|
|
15
|
The GLOW Cache Coherence Protocol Extensions for Widely Shared Data
– Kaxiras, Goodman
- 1996
|
|
14
|
Distributed-Directory Protocol
– Stanford
- 1990
|
|
14
|
Optimality of a two-phase strategy for routing in interconnection networks
– Valiant
- 1983
|
|
13
|
Eager combining: a coherency protocol for increasing effective network and memory bandwidth in shared-memory multiprocessors
– Bianchini, LeBlanc
- 1994
|
|
4
|
Building the 4 Processor SB-PRAM Prototype
– Bach, Braun, et al.
- 1997
|
|
4
|
A cache coherence mechanism for scalable, shared-memory multiprocessors
– Scott
- 1991
|
|
3
|
Using proxies to reduce cache controller contention in large shared-memory multiprocessors
– Bennett, Kelly, et al.
- 1996
|
|
3
|
Scalable Coherent Interface
– SCI
- 1993
|
|
1
|
Development and validation of an analytical model of a distributed cache coherency protocol
– Bennett, Field, et al.
- 1996
|