49 citations found. Retrieving documents...
J. Archibald and J. Baer. An Economical Solution to the Cache Coherence Problem. pages 355--362, June 1984.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Using Compiler Assistance to Reduce the Network Traffic - Requirements Of..   (Correct)

....23, 25, 30] With extra cost, multiprocessors may employ dedicated hardware for cache coherence maintenance by allowing processors to communicate with each other about the data reference status, and to invalidate or update cached copies. Snoopy buses [12, 15, 29, 35, 38] and memory directories [2, 4, 5, 14, 37] are two prominent hardware coherence mechanisms. With run time interproces2 sor dataflow information, the coherence hardware never over invalidates the cached data like the software schemes, and therefore generally outperforms the software schemes. Moreover, since the hardware schemes do not ....

....The last subsection discusses related work. 2.1 Required Hardware Support In addition to the five instructions i write, n write, t write, c read, and m read, a directory is required to monitor which processors have valid copies of each block. There are several variations of directories [2, 4, 5, 14], any of which can be used with this compiler optimization. This study 7 uses a directory structure similar to Censier and Feautrier s [4] in which each memory module is associated with its own directory. Each directory entry consists of a P bit vector and a dirty bit, where P is the number of ....

J. K. Archibald and J. Baer. An economical solution to the cache coherence problem. International Symposium on Computer Architecture, pages 355--362, 1984.


SUDS: Automatic Parallelization for Raw Processors - Frank (2003)   (Correct)

.... it is relatively straight forward to add a caching system on top of SUDS, and in fact a software based L1 cache was implemented on top of an earlier version of SUDS [128] The basic idea behind adding caching on top of SUDS would be to implement a standard directory based cache coherence scheme [21, 10, 2]. The key to a directory based cache coherence scheme is that the directory is guaranteed to see all the traffic to a particular memory location, and in the same global order that is observed in all other parts of the system. Thus, the directory controller can simply forward the list of requests ....

....out the similarity between cache coherence schemes and coherence control in transaction processing. The Liquid system used a bus based protocol similar to a snooping cache coherence protocol [47] SUDS uses a scalable protocol that is more similar to a directory based cache coherence protocol [21, 10, 2] with only a single pointer per entry, sometimes referred to as a Dir1B protocol. The ParaTran system for parallelizing mostly functional code [116] was another early proposal that relied on speculation. ParaTran was implemented in software on a shared memory multiprocessor. The protocols were ....

James Archibald and Jean-Loup Baer. An economical solution to the cache coherence problem. In 11th International Symposium on Computer Architecture, pages 355--362, Ann Arbor, MI, June 1984.


Algorithms for Scalable Synchronization on Shared-Memory.. - Mellor-Crummey, Scott (1990)   (25 citations)  (Correct)

....P , but its critical path is longer by a factor of about 1.5. Our tree based barrier with wakeup ag should be the fastest algorithm on large scale multiprocessors that use broadcast to maintain cache coherence (either in snoopy cache protocols [14] or in directory based protocols with broadcast [7]) It requires only O(P ) updates to shared variables in order to tally arrivals, compared to O(P log P ) for the dissemination barrier. Its updates are simple writes, which are cheaper than the read modify write operations of a centralized counterbased barrier. Its space needs are lower than ....

J. Archibald and J.-L. Baer. An economical solution to the cache coherence problem. In Proc. of International Symposium on Computer Architecture, pages 355-362, 1984.


Efficient Integration of Compiler-directed Cache Coherence And.. - Lim, Yew (2000)   (1 citation)  (Correct)

....manner, the processors will not access potentially stale data in their caches. From the compiler optimization perspective, the performance of the CBP scheme will give an indication of the impact of applying stale reference analysis alone. Finally, the HWD scheme uses a full map hardware directory [1] with a standard threestate (invalid, read shared, write exclusive) invalidation based coherence protocol. The directories are distributed across the nodes and are organized as pointer caches [14] to reduce 21 storage. We augment the HWD scheme with software controlled data prefetching so as to ....

J. Archibald and J.-L. Baer. An economical solution to the cache coherence problem. In Proceedings of the 11th International Symposium on Computer Architecture, pages 355--362, June 1984.


WaveScalar - Swanson, Michelson, Oskin (2003)   (Correct)

....Microarchitecture: Chief among the unexplored microarchitectural issues are data caching and deadlock avoidance. Our current architecture distributes the datacache into several small caches throughout the architecture. Our plan is to apply an existing directory based cache coherence protocol [67, 68] to these on chip cache. We are building a model of this protocol and cache hierarchy into our simulation framework, to explore its effect on performance. Additionally, the current WaveCache architecture uses a dynamically routed switched network, similar to [5] Without the ability to drop ....

J. Archibald and J. L. Baer, "An economical solution to the cache coherence problem," in The 11th Annual International Symposium on Computer Architecture, pp. 355--362, 1984.


Algorithms for Scalable Synchronization on Shared-Memory.. - Mellor-Crummey, Scott (1991)   (25 citations)  (Correct)

....one application, if network load proves to be a problem. Our tree based barrier with wakeup flag should be the fastest algorithm on large scale multipro cessors that use broadcast to maintain cache coherence (either in snoopy cache protocols [15] or in directory based protocols with broadcast [7]) It requires only O(P) updates to shared variables in order to tally arrivals, compared to O(Plog P) for the dissemination barrier. Its updates are sim ple writes, which are cheaper than the read modify write operations of a centralized counter based barrier. Note, however, that the ....

J. Archibald and J.-L. Baer. An economical solution to the cache coherence problem. In Proceedings of the International Symposium on Computer Architecture, pages 355-362, 1984.


Bus And Cache Memory Organizations For Multiprocessors - Winsor (1989)   (2 citations)  (Correct)

....of processors is that imposed by the total bus and memory bandwidth. They are called snooping cache schemes [KEWPS85] since each cache must monitor addresses on the system bus, checking each reference for a possible cache hit. They have also been referred to as two bit directory schemes [AB84], since each line in the cache usually has two bits associated with it to specify one of four states for the data in the line. Goodm83] describes the use of a cache memory to reduce bus traffic and presents a description of the write once cache policy, a simple snooping cache scheme. The ....

....tie up cache cycles that might be used by the processor on that cache, and the probability of the line being used may be low. 16 [RS84] is concerned primarily with formal correctness proofs of these schemes and does not consider the performance implications of practical implementations of them. [AB84] discusses various solutions to the cache consistency problem, including broadcast, global directory, and snooping approaches. Emphasis is on a snooping approach in which the states are called Absent, Present1, Present , and PresentM. This scheme is generally similar to that of [PP84] except that ....

JAMES ARCHIBALD AND JEAN-LOUP BAER. "An Economical Solution to the Cache Coherence Problem". The 11th Annual International Symposium on Computer Architecture Conference Proceedings, Ann Arbor, Michigan, IEEE Computer Society Press, June 5--7, 1984, pages 355--362.


A Software Framework for Supporting General Purpose.. - Frank, Lee, Amarasinghe (2001)   (3 citations)  (Correct)

....out the similarity between cache coherence schemes and coherence control in transaction processing. The Liquid system used a bus based protocol similar to a snooping cache coherence protocol [21] SUDS uses a scalable protocol that is more similar to a directory based cache coherence protocol [9, 2, 1] with only a single pointer per entry, sometimes referred to as a Dir1B protocol. The ParaTran system for parallelizing mostly functional code [49] was another early proposal that relied on speculation. ParaTran was implemented in software on a shared memory multiprocessor. The protocols were ....

J. Archibald and J.-L. Baer. An Economical Solution to the Cache Coherence Problem. In 11th International Symposium on Computer Architecture, pages 355--362, Ann Arbor, MI, June 1984.


A Communication Architecture for Multiprocessor Networks - Nowatzyk (1989)   (4 citations)  (Correct)

....is prone to become a bottleneck. Therefore all bus based shared memory multiprocessors employ caches in each processor to reduce the main memory bandwidth demand. Since the bus traffic is visible by all processors, cache controllers can exploit this broadcast property to maintain coherence [69, 70, 5, 108]. Synchronization operations benefit from the simplicity of a single bus. The test and test and set instruction uses the snooping cache coherency mechanism to avoid the memory traffic associated with spin locks [122] However, in situations with many waiting processors, performance degrades as ....

James Archibald, Jean-Loup Baer. An Economical Solution to the Cache Coherence Problem. Proceedings of the 11th International Symposium on Computer Architecture, SIGARCH Newsletter; IEEE 12(3):355-362, June 1984.


Toward The Design Of Large-Scale, Shared-Memory Multiprocessors - Scott (1992)   (3 citations)  (Correct)

....lack the property that all communications are broadcast, and thus traditional snooping protocols are not feasible. Directory based protocols rely instead on coherence information stored in special purpose directories associated with main memory. While broadcast invalidations can still be used [Arch84], most proposed designs maintain records of cached data, and selectively invalidate cached lines when they are modified. The former approach doesn t scale because of bandwidth constraints, and the latter poses difficulties because of the possibly very large amount of storage necessary to keep ....

Archibald, J. and J. -L. Baer, An Economical Solution to the Cache Coherence Problem, Proc. 11th Annual International Symposium on Computer Architecture, June 1984, 355-362.


Efficient Integration of Compiler-directed Cache Coherence and.. - Lim, Yew (2000)   (1 citation)  (Correct)

....potentially stale references. Then, it uses bypass cache fetch operations to bypass the cache and directly access up to date data in the main memory. In this manner, the processors will not access potentially stale data in their caches. Finally, the HWD scheme uses a full map hardware directory [1] with a standard three state (invalid, readshared, write exclusive) invalidation based coherence protocol. The directories are distributed across the nodes and are organized as pointer caches to reduce storage. We augment the HWD scheme with software controlled data prefetching so as to provide a ....

J. Archibald and J.-L. Baer. An economical solution to the cache coherence problem. In Proc. of the 11th Intl. Symp. on Computer Architecture, pages 355--362, June 1984.


A Distributed Directory Based Cache Coherence Scheme - Gupta (1994)   (Correct)

....in non bus based systems since directory based schemes are significantly more complex than snoopy schemes. Furthermore, in a bus based system, the performance of directory based schemes is not as good as that of the snoopy schemes. Many people have proposed a variety of directory based schemes [1] [2] [5] 11] 23] 29] 31] The schemes differ in both the protocol and the hardware used. The first directory scheme was proposed by C. K. Tang in 1976 [31] Soon after that, Censier and Feautrier [5] presented the Full Vector Scheme in 1978 which has become the most widely used directory scheme. ....

....performance. In this chapter, we will study one of the first coherency protocols called the Full Vector Directory scheme [5] since it is one of the most basic and highest performing protocols. Over the years, many new directory based protocols have been proposed. The Two bit Directory Scheme [2], Limited Pointer Directory Scheme [1] Sectored Directory Scheme [23] Sparse Directory Scheme [11] Stenstrom s Scheme [29] and Scalable Cache Coherent Interface [13] are a few examples of some of the newer schemes. However, except for Stenstrom s Scheme and the Scalable Cache Coherence ....

J. Archibald and J. L. Baer. An economical solution to the cache coherence problem. In International Symposium on Computer Architecture, pages 124--131, 1984.


Compiling Techniques for Improving Decoupled Virtual Shared Memory.. - Zhu   (Correct)

....or software mechanism. Several snoopy protocols [3, 61, 46, 36, 72, 65, 75] for systems with a broadcast medium such as a bus interconnection network have been proposed. For more scalable multiprocessors with general interconnection network between processors, 10 directory based protocols [2, 4, 14, 44, 73, 86] and compiler assisted software protocols [20, 51, 57, 79] have been suggested. Recently, dynamically tagged directory protocols [15, 41, 55, 56] have evolved from previous directory based schemes. Snoopy Protocols Snoopy protocols are also called bus based protocols. All processors in the ....

....the entry for the block, invalidation or update messages are sent to the processors indicated by the bits of the directory entry. A full map directory is not scalable because its directory size is proportional to the number of processors in the system. On the other hand, the broadcast directory [4] maintains only two bits for each block in memory. These bits are used to indicate that no cache has a copy of the block, that one or more caches have a shared read only copy of the block, or that a single processor has a modified copy of the block. When a processor requests exclusive access, ....

J. Archibald and J. Baer. An economical solution to the cache coherence problem. In Proceedings of the 11th International Symposium on Computer Architecture, pages 355--362, June 1984.


A Decentralized Hierarchical Cache-Consistency Scheme For.. - Farkas (1991)   (Correct)

....bus systems. To date, however, most of the work has focused on the development and implementation of snoopy protocols for single bus systems. The single bus protocols proposed differ mainly in the cache consistency policy used. Write through with invalidating caches is employed by the VAX 11 780 [Archibald 1984], the Sequent Balance 8000 [Thakkar 1988] and the Encore Multimax [Dubois 1988] In an attempt to limit the bus traffic arising from unnecessary invalidations, the SPUR multiprocessor [Hill 1986] and the Symmetry Multiprocessor [Lovett 1988] use write once with invalidating caches. Finally, the ....

Archibald, J. and Baer, J.-L. (1984). An Economical Solution to the Cache Coherence Problem. Proc. of the 11th Annual International Symposium on Computer Architecture, pages 355--362.


Affinity Scheduling of Unbalanced Workloads - Saskatoon (1993)   (Correct)

....If shared data is cacheable, however, this sharing can result in several copies of a shared block of data in one or more caches at the same time. To maintain a coherent view of the memory, these copies must be consistent. This is the cache coherence problem. There exist both hardware [8] and software [13] solutions to the cache coherence problem. The two main categories of hardware based solutions to the cache coherence problem are the snoopy cache protocols and the directory protocols. Snoopy cache protocols [25] are mainly suited for bus based multiprocessors. A snoopy cache ....

J. Archibald, Jean-Loup Baer, "An Economical Solution to the Cache Coherence Problem", Proceedings of the 12th International Symposium on Computer Architecture, June 1985, pp. 355-362.


Highly Concurrent Cache Coherence Protocols - Williams, Reynolds, Jr. (1990)   (Correct)

....a significant bottleneck from Tang s protocol, the protocol still scales poorly since the size of the bit vector increases linearly with the number of PE s. The focus of much of the subsequent work on directory protocols has been on improving the scalability of the directory representation [Aga88, ArB84, CKA91,GWM90, Jam90, LiY90, OKN90, SiH91, Ste89, ThD91]. Although reducing the space complexity of the directory representation is an important problem, our focus is different: on improving the scalability of cache coherence protocols by increasing their concurrency. For simplicity, we assume the bit vector representation proposed by Censier and ....

J. Archibald and J. L. Baer, An Economical Solution to the Cache Coherence Problem, Proc. 11th International Symp. Computer Architecture, 1984, 355-362.


Binding Time in Distrubuted Shared Memories - Kong (1999)   (Correct)

....above three functions. The broadcasting schemes do not maintain information about data arrangement in the memory locations (see Figure 4. 1 (a) Instead, they require every system node to snoop on a broadcasting medium, such as a bus, to check the network transactions transmitted on the medium [AB84, Goo83, SS86] An advantage of the broadcasting schemes is the short length of the critical path of a network request, i.e. a network request can be completed in two network traversals. However, contention in the broadcast medium can become very severe when the system size increases because each ....

James Archibald and Jean-Loup Baer. An economical solution to the cache coherence problem. In Proceedings of the 11th Annual International Symposium on Computer Architecture, pages 355--362, Ann Arbor, Michigan, June 5--7, 1984. IEEE Computer Society and ACM SIGARCH.


Shared Regions: A strategy for efficient cache management in.. - Sandhu (1995)   (2 citations)  (Correct)

....conclusions Chapter 1: Introduction 2 architecture. Early small scale shared memory multiprocessors exploited the inherent broadcast nature of the shared buses that were used in their design, using caches that monitored the bus and thereby detected when data needed to be updated or invalidated [6][34] 58] Unfortunately, the use of broadcast is not feasible in larger systems. On these architectures, scalable strategies for coherence enforcement, not relying on broadcast, are necessary. To date, a wide variety of such strategies have been proposed that make various trade offs in cost, ....

J. Archibald and J.L. Baer. An economical solution to the cache coherence problem. In 11th International Symposium on Computer Architecture, 1984.


A Cache Technique for Synchronization Variables in Highly.. - Berke (1988)   (2 citations)  (Correct)

....representatives of each shared variable accurately reflect their values in main memory. This is commonly referred to as the multicache consistency or coherency problem. There have been several attempts to address the consistency problem in the general case. The hardware based methods [CeFe78] [ArBa84] [ASHH88] use directories with global state information for each sharable item that appears in any cache. The usual convention is that data can be in one of two states: shared read only or exclusive read write. Before any PE can write a shared cacheable datum, it must initiate an ownership ....

....our methods for implementing this technique. 3. The consistency problem In order for polling the cache to be feasible, cache lines corresponding to shared data must be kept consistent with their globally accessible values in memory. The classical solution (as it is referred to in [CeFe78] [ArBa84]) accomplishes this by following every modification of a shared variable with an update or invalidation of all its copies in any local cache. A bus based architecture with snooping caches allows this to be done in a decentralized manner by having each cache monitor the bus [Good83] RuSe84] ....

James Archibald and Jean-Loup Baer, "An Economical Solution to the Cache Coherence Problem," Eleventh Annual Symposium on Computer Architecture, 1984, pp. 355-362.


Memory Management in Symunix II: A Design for.. - Edler, Lipkis, Schonberg (1988)   (4 citations)  (Correct)

....cache coherence in hardware without serious performance penalties. Currently available cache coherence technology, based on inter cache communication over a bus, does not scale up to highly parallel configurations which are interconnected via multi stage networks. Proposed directory schemes [CF78, AB85] may provide a partial solution for large systems, but the expense is sufficiently large that automatic coherence (without software control) appears undesirable. Currently, both the Ultracomputer and the RP3 allow the operating system to suppress cacheing for certain areas of memory; the ....

Archibald, J., Baer, J., "An Economical Solution to the Cache Coherence Problem", Proc. 12th International Symposium on Computer Architecture, 1985.


Emulation of a Virtual Shared Memory Architecture - Raina (1993)   (3 citations)  (Correct)

....mechanisms, or protocols have been developed and implemented. The area of cache coherence has received a great deal of attention in recent years. A number of exhaustive papers have reviewed cachecoherence mechanisms including Smith [173] Hill [93] Stenstr om [179] Chaiken [40] and Archibald [16]. In its most general sense, the problem of cache consistency can be stated as the requirement that a read of a datum always returns the most recently modified copy of that datum. Solutions to the problem of cache coherence are numerous. Some early simple solutions rely on software enforced, ....

....a vector of bits with one bit per cache. The vector is generally referred to a presence flag vector. Each bit indicates the presence or absence of the block in the corresponding cache. This method, though not very space efficient, is very time efficient. The Two bit scheme by Archibald and Baer [16] uses two bits as the state of each cache line. A cache line can be in one of the four global states: ffl not present in any cache; ffl present in one cache in read only mode; ffl present in none or more caches in read only mode; and, ffl present in one cache and modified. The scheme resorts ....

J. Archibald and J. Baer. An Economical Solution to the Cache Coherence Problem. In Proceedings of the 11th Annual International Symposium on Computer Architecture, pages 355--362, June 1984.


Compiler Support for the Efficient Use of Cache Coherence.. - Trung Nguyen   (Correct)

....private caches, however, cache coherence must be enforced using either a hardware or software mechanism [21] Several schemes [12, 16, 30, 34, 36] have been proposed for bus based systems. For more scalable systems that use general interconnection networks, directory based hardware schemes [2, 3, 5, 15, 35, 38] and compiler assisted software schemes [7, 18, 17, 25, 37] have been suggested. Recently, several authors have proposed dynamically tagged directories [6, 14, 22, 23, 24, 30] in which pointers to processors with a copy of a memory block are allocated only when the block is actually cached. These ....

....memory references. Other factors, such as aliases, procedure calls, and unknown symbolic terms, can also reduce the precision of the dependence analysis. Additionally, if parallel tasks are scheduled dynamically, some temporal locality cannot be detected by the compiler. Directory based schemes [2, 3, 5, 15, 35, 38] , on the other hand, can perfectly disambiguate memory references at run time so that they invalidate only cache blocks that are actually stale. Unfortunately, directory based schemes require a large amount of memory to store the cache block sharing information. Several dynamically tagged ....

J. Archibald and J. Baer. An economical solution to the cache coherence problem. In Proc. 11th Annual International Symposium on Computer Architecture, pages 355--362, June 1984.


SUDS: Primitive Mechanisms for Memory Dependence.. - Frank, Moritz.. (1999)   (3 citations)  (Correct)

....out the similarity between cache coherence schemes and coherence control in transaction processing. The Liquid system used a bus based protocol similar to a snooping cache coherence protocol [12] SUDS uses a scalable protocol that is more similar to a directory based cache coherence protocol [6, 2, 1] with only a single pointer per entry, sometimes referred to as a Dir1B protocol. The ParaTran system for parallelizing mostly functional code [33] was another early proposal that relied on speculation. ParaTran was implemented in software on a shared memory multiprocessor. The protocols were ....

James Archibald and Jean-Loup Baer. An Economical Solution to the Cache Coherence Problem. In 11th International Symposium on Computer Architecture, pages 355--362, Ann Arbor, MI, June 1984.


Emulation of a Virtual Shared Memory Architecture - Raina (1993)   (3 citations)  (Correct)

No context found.

J. Archibald and J. Baer. An Economical Solution to the Cache Coherence Problem. pages 355--362, June 1984.


A Superassociative Tagged Cache Coherence Directory - Lilja, Ambalavanan (1994)   (1 citation)  (Correct)

No context found.

J. Archibald and J.-L. Baer, "An Economical Solution to the Cache Coherence Problem," Intl. Symp. Computer Architecture, pp. 355-362, 1984.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC