Results 1 - 10
of
23
A Correctness Condition for High-Performance Multiprocessors (Extended Abstract)
, 1992
"... Hybrid consistency, a new consistency condition for shared memory multiprocessors, attempts to cap ture the guarantees provided by contemporary high-performance architectures. It combines the exprea-siveneas of strong consistency conditions (e.g., sequen-tial consistency, linearizability) and the ef ..."
Abstract
-
Cited by 65 (7 self)
- Add to MetaCart
Hybrid consistency, a new consistency condition for shared memory multiprocessors, attempts to cap ture the guarantees provided by contemporary high-performance architectures. It combines the exprea-siveneas of strong consistency conditions (e.g., sequen-tial consistency, linearizability) and the efficiency of weak consistency conditions (e.g., Pipelined RAM, causal memory). Memory access operations are classified as either strong or weak. A global ordering of strong operations at different processes is guaran-teed, but there is very little guarantee on the ordering of weak operations at different processes, except for what is implied by their interleaving with the strong operations. A formal and precise definition of this condition is given. An efficient implementation of hy-brid consistency on distributed memory machines is presented. In this implementation, weak operations are executed instantaneously, while the response time for strong operations is linear in the network delay. (It is proven that this is within a constant factor of the optimal time bounds.) To motivate hybrid consistency it is shown that weakly consistent memories do not support non-cooperative (in particular, non-centralized) alg~ rithms for mutual exclusion.
Cache Coherence for Shared Memory Multiprocessors Based on Virtual Memory Support
, 1992
"... This paper presents a software cache coherence scheme that uses virtual memory (VM) support to maintain cache coherency for shared memory multiprocessors and requires no special hardware to do so. Traditional VM translation hardware in each processor is used to detect memory access attempts that wou ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
This paper presents a software cache coherence scheme that uses virtual memory (VM) support to maintain cache coherency for shared memory multiprocessors and requires no special hardware to do so. Traditional VM translation hardware in each processor is used to detect memory access attempts that would violate cache coherence and system software is used to enforce coherence. The implementation of this class of coherence schemes is extremely economical: it requires neither special multiprocessor hardware nor compiler support, and easily incorporates different consistency models. We evaluated two consistency models for the VM-based approach: sequential consistency and lazy release consistency. The VM-based schemes are compared with a bus based snoopy caching architecture, and our trace-driven simulation results show that the VM-based cache coherence schemes are practical for small-scale, shared memory multiprocessors. Keywords: shared memory, multiprocessors, cache coherence, memory manag...
Automatic Software Cache Coherence through Vectorization
- In Proceedings of the 1992 International Conference on Supercomputing
, 1992
"... Access latency in large-scale shared-memory multiproces- sors is a concern since most (if not all) memory is one or more hops away through an interconnection network. Providing processors with one or more levels of cache is an accepted way to reduce the average access latency; however, in a multipro ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Access latency in large-scale shared-memory multiproces- sors is a concern since most (if not all) memory is one or more hops away through an interconnection network. Providing processors with one or more levels of cache is an accepted way to reduce the average access latency; however, in a multiprocessor, cached values must be kept coherent for the multiprocessor to support the abstraction of a shared global memory. There is no generally accepted hardware solution to provide cache coherence for large-scale shared-memory multiprocessors. Software coherence strategies offer sealability with current hardware. In this paper we examine a compiler-based software strategy for maintaining cache coherence that rehes on dependence analysis and a vectorization algorithm to insert cache control directives. Experiments on the BBN TC2000 for a pair of numerical problems show that the run-time cost of coherence using our strategy is less than that for previously proposed compiler-based software methods and suggest that it should compare favorably with proposed hardware schemes.
Design and Analysis of a Scalable Cache Coherence Scheme based on Clocks and Timestamps
, 1992
"... this paper, we restrict ourselves to a study of caching of shared variables. The presence of multiple private caches introduces the well-known cache coherence problem [7]. Hardware based protocols to solve the cache coherence problem are well understood in a shared-bus environment (e.g., [17, 22, 32 ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
this paper, we restrict ourselves to a study of caching of shared variables. The presence of multiple private caches introduces the well-known cache coherence problem [7]. Hardware based protocols to solve the cache coherence problem are well understood in a shared-bus environment (e.g., [17, 22, 32, 37]). However these solutions cannot be extended to the dance-hall multiprocessors since they make use of the instantaneous broadcast and "snoopy" mechanisms provided by the shared-bus. Software-assisted [10, 25, 27, 33, 38, 40] and directory-based [1, 4, 7, 36, 41] schemes are usually advocated in such an environment. In this paper, we propose a software-assisted cache coherence scheme which overcomes some of the inefficiencies of previous approaches by using a combination of a compile-time marking of references and a hardware-based local incoherence detection scheme. We also give a performance evaluation of our proposed scheme. In Section 2, we give the notation used throughout the paper. Section 3 reviews previous software-assisted approaches to enforcing cache coherence. In Section 4, a complete description of our approach is given. A correctness proof of our proposed scheme is given elsewhere [29] and is omitted here. Section 5 gives a quantitative comparison of our scheme with previous approaches. Section 6 provides some concluding remarks. 2 Definitions Programs written for shared-memory multiprocessors may use explicit parallel constructs or may be conventional sequential programs transformed into equivalent parallel ones by a restructuring compiler or a preprocessor like Parafrase [24, 39], PFC [3] or PTRAN [2]. The parallelism is constrained by data dependences : flow-dependence, anti-dependence, and
CICO: A Practical Shared-Memory Programming Performance Model
- Workshop on Portability and Performance for Parallel Processing
, 1993
"... A programming performance model provides a programmer with feedback on the cost of program operations and is a necessary basis to write efficient programs. Many sharedmemory performance models do not accurately capture the cost of interprocessor communication caused by non-local memory references ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
A programming performance model provides a programmer with feedback on the cost of program operations and is a necessary basis to write efficient programs. Many sharedmemory performance models do not accurately capture the cost of interprocessor communication caused by non-local memory references, particularly in computers with caches. This paper describes a simple and practical programming performance model--called check-in, check-out (CICO)--for cache-coherent, shared-memory parallel computers. cica consists of two components. The first is a collection of annotations that a programmer adds to a program to elucidate the communication arising from shared-memory references. The second is a model that calculates the communication cost of these annotations. An annotation's cost models the cost of the memory references that it summarizes and serves as a metric to compare alternative implementations. Several examples demonstrate that cica accurately predicts cache misses and identifies changes that improve program performance.
Concurrency Control in Asynchronous Computations
, 1993
"... When independently executing processes share data, some form of concurrency control is needed to enforce the atomicity and sequencing constraints imposed by the program. We believe that concurrency control is hard largely because existing architectural support is inadequate. We define a new class of ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
When independently executing processes share data, some form of concurrency control is needed to enforce the atomicity and sequencing constraints imposed by the program. We believe that concurrency control is hard largely because existing architectural support is inadequate. We define a new class of interconnection networks called isotach networks and explore isotach-based concurrency control by describing techniques that use the isotach network to achieve causal message delivery, atomicity, sequential consistency, and cache coherence. We show processes can pipeline their accesses to shared data in an isotach system without sacrificing sequential consistency. We define the isochron, a multicast with strong ordering properties implemented on an isotach network, and describe techniques by which processes can use isochrons to execute atomic actions without obtaining locks or other exclusive access rights. We describe compatible techniques for enforcing data dependences and show that the i...
A Compiler-Directed Cache Coherence Scheme with Improved Intertask Locality
, 1994
"... In this paper 1 , we introduce a compiler-directed coherence scheme which can exploit most of the temporal and spatial locality across task boundaries. It requires only an extended tag field per cache word, one modified memory access instruction, and a counter called the epoch counter in each proc ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
In this paper 1 , we introduce a compiler-directed coherence scheme which can exploit most of the temporal and spatial locality across task boundaries. It requires only an extended tag field per cache word, one modified memory access instruction, and a counter called the epoch counter in each processor. By using the epoch counter as a system-wide version number, the scheme simplifies the cache hardware of previous version control [5] or timestamp-based schemes [12], but still exploits most of the temporal and spatial locality across task boundaries. We present a compiler algorithm to generate the appropriate memory access instructions for the proposed scheme. The algorithm is based on a data flow analysis technique. It identifies potential stale references by examining memory reference patterns in a source program. 1 Introduction Reducing memory latency is critical to the performance of large-scale parallel systems. Due to the temporal and spatial locality of memory reference patter...
Hardware And Compiler Support For Cache Coherence In Large-Scale Shared-Memory Multiprocessors
, 1996
"... ompiler can detect potentially stale references and what kind of performance can be obtained using a real compiler. iii Also, most of the compiler-directed coherence schemes proposed to date have not addressed the real cost of the required hardware support. For example, many of the schemes require ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
ompiler can detect potentially stale references and what kind of performance can be obtained using a real compiler. iii Also, most of the compiler-directed coherence schemes proposed to date have not addressed the real cost of the required hardware support. For example, many of the schemes require expensive hardware support and assume a cache organization with singleword cache lines and a word-addressable architecture. Also, the issues of synchronization, such as lock variables and critical sections, have been addressed rarely. This dissertation addresses these hardware and compiler implementation issues and investigates the feasibility and performance of the compiler-directed cache coherence approach. We propose a new compiler-directed scheme that can be implemented on a largescale multiprocessor using off-the-shelf microprocessors. The scheme can be adapted to various cache organizations, including multi-word cache lines and byte-addressable architectures. Several system related is
Using Virtual Synchrony to Develop Efficient Fault Tolerant Distributed Shared Memories
, 1995
"... This paper shows how to define consistency conditions for distributed shared memories in virtually synchronous environments. Such definitions allow to develop fault tolerant implementations of distributed shared memories, in which during normal execution, operations can be performed very efficiently ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
This paper shows how to define consistency conditions for distributed shared memories in virtually synchronous environments. Such definitions allow to develop fault tolerant implementations of distributed shared memories, in which during normal execution, operations can be performed very efficiently, and only those operations which take place during a configuration change must be delayed. Three well known consistency conditions, namely, linearizability, sequential consistency, and causal memory, are redefined for virtually synchronous environments. It is then shown how to provide efficient fault tolerant implementations for these definitions. This work was supported by ARPA/ONR grant N00014-92-J-1866 1 Introduction Distributed shared memory is an important communication paradigm for making massively parallel computers usable, and for the successful treatment of networks of workstations as a parallel machine. Distributed shared memory is a convenient model to work with, it is a nat...
An Evaluation of a Compiler Optimization for Improving the Performance of a Coherence Directory
- Proceedings of the International Conference on Supercomputing
, 1994
"... Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache coherence in largescale shared-memory multiprocessors, but both of these approaches have significant limitations. We examine the potential performance improvement of a new software-hardware controlled ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache coherence in largescale shared-memory multiprocessors, but both of these approaches have significant limitations. We examine the potential performance improvement of a new software-hardware controlled cache coherence mechanism [18]. This approach augments the run-time information available to a directorybased coherence mechanism with compile-time analysis that statically identifies write references that cannot cause coherence problems and writes that should be written through to memory. These references are marked as not needing to send invalidation messages to thereby reduce the network traffic produced by the directory while maintaining cache consistency. For those memory references that are ambiguous, due to conditional branches, or due to the need for complex data flow analysis, for instance, the compiler conservatively marks the references and relies on the hardware directory to ensu...

