Download:
|
by Richard N. Zucker, Richard N. Zucker, Jean-loup Baer, Jean-loup Baer
In Proceedings of the 19th Annual International Symposium on Computer Architecture
http://web.cps.msu.edu/~wrightr7/cps822/zucker.ps
Add To MetaCart
Abstract:
Recent advances in technology are such that the speed of processors is increasing faster than memory latency is decreasing. Therefore the relative cost of a cache miss is becoming more important. However, the full cost of a cache miss need not be paid every time in a multiprocessor. The frequency with which the processor must stall on a cache miss can be reduced by using a relaxed model of memory consistency. In this paper, we present the results of instruction-level simulation studies on the relative performance benefits of using different models of memory consistency. Our vehicle of study is a shared-memory multiprocessor with processors and associated write-back caches connected to global memory modules via an Omega network. The benefits of the relaxed models, and their increasing hardware complexity, are assessed with varying cache size, line size, and number of processors. We find that substantial benefits can be accrued by using relaxed models but the magnitudes of the benefits depend on the architecture being modeled, the benchmarks, and how the code is scheduled. We did not find any major difference in levels of improvement among the various relaxed models. 1
Citations
|
801
|
How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs
– Lamport
- 1979
|
|
637
|
Memory consistency and event ordering in scalable shared-memory multiprocessors
– Gharachorloo, Lenoski, et al.
- 1990
|
|
487
|
The cache performance and optimizations of blocked algorithms
– LAM, ROTHBERG, et al.
- 1991
|
|
338
|
The Directory-Based Cache Coherence Protocol for the Dash Multiprocessor
– Lenoski
- 1990
|
|
274
|
Lockup-free instruction fetch/prefetch cache organisation
– Kroft
- 1981
|
|
264
|
Tolerating Latency Through SoftwareControlled Prefetching in Shared-Memory Multiprocessors
– Mowry, Gupta
- 1991
|
|
236
|
Cooperating Sequential Processes
– DIJKSTRA
- 1968
|
|
220
|
A New Solution to Coherence Problems in Multicache Systems
– Censier, Feautrier
- 1978
|
|
207
|
Weak Ordering -- A New Def-inition
– Adve, Hill
- 1990
|
|
196
|
Memory access buffering in multiprocessors
– Dubois, Scheurich, et al.
- 1986
|
|
151
|
Reducing memory latency via nonblocking and prefetching caches
– Chen, Baer
- 1992
|
|
132
|
Performance evaluation of memory consistency models for shared-memory multiprocessors
– Gharachorloo, Gupta, et al.
- 1991
|
|
109
|
Comparative evaluation of latency reducing and tolerating techniques
– Gupta, Hennessy, et al.
- 1991
|
|
106
|
Two techniques to enhance the performance of memory consistency models
– Gharachorloo, Gupta, et al.
- 1991
|
|
80
|
The effect of sharing on the cache and bus performance of parallel programs
– Eggers, Katz
- 1989
|
|
43
|
Programming for different memory consistency models
– Gharachorloo, Adve, et al.
- 1992
|
|
22
|
Implementing sequential consistency in cache-based systems
– Adve, Hill
- 1990
|
|
12
|
PCP: A parallel extension of C that is 99% fat free
– Brooks
- 1988
|
|
11
|
The Cerberus Multiprocessor Simulator
– Axelrod, Darmohray
- 1989
|
|
6
|
Gaussian techniques on shared-memory multiprocessors
– Darmohray
- 1988
|
|
3
|
Parallel Quicksand: Sorting on the Sequent
– Kahan, Ruzzo
- 1991
|
|
2
|
On synchronization patterns of parallel programs
– Baer, Zucker
- 1991
|
|
1
|
32 User's Guide. A Read/Write Statistics Program Reads Hit Rate (%) by line and cache size 16K cache 64K cache 8 bytes 16 bytes 64 bytes 8 bytes 16 bytes 64
– Ridge
|