22 citations found. Retrieving documents...
James Archibald. A cache coherence approach for large multiprocessor systems. In International ConferenceonSupercomputing, pages 337--345, November 1988.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Bus And Cache Memory Organizations For Multiprocessors - Winsor (1989)   (2 citations)  (Correct)

....the best performance, while for the other two, the average write run length was 6.0 and a write invalidate protocol provided the best performance. An adaptive protocol that attempts to incorporate some of the best features of each of the two classes of cache consistency schemes is proposed in [Archi88]. This protocol, called EDWP (Efficient Distributed Write Protocol) is essentially a write broadcast protocol with the following modification: if some processor issues three writes to a shared line with no intervening references by any other processors, then all the other cached copies of that ....

JAMES K. ARCHIBALD. "A Cache Coherence Approach For Large Multiprocessor Systems".


Architectural Support for Compiler-Generated Data-Parallel Programs - Klaiber (1994)   (1 citation)  (Correct)

....the type of memory consistency model that the system provides. Cache coherent architectures therefore provide latency hiding either through separate mechanisms such as data prefetching or by modifying the cache coherence mechanism. For example, the cache coherence protocol could be made adaptive [Archibald 88, Stenstrom et al. 93, Bennett et al. 90, Carter et al. 91] or it could implement a weaker memory consistency model [Hutto Ahamad 90, Gharachorloo et al. 90] In this section, we evaluate our three communication architectures with respect to the amount of communication latency they incur. We ....

J. K. Archibald. A cache coherence approach for large multiprocessor systems. In 1988 International Conference on Supercomputing, pages 337--345, 1988.


Maximizing Memory Bandwidth for Streamed Computations - McKee (1995)   (7 citations)  (Correct)

....broadcasts the effects of a write operation immediately, these snooping coherence mechanisms usually implement a strongly ordered consistency model. The shared bus can become a severe bottleneck. Proposed solutions increase the number of buses and use more elaborate interconnection strategies [Arc88,Goo88,Wil87], but any snooping scheme is ultimately limited by contention for the shared interconnect. This limits the use of this class of coherence schemes to small scale multiprocessor systems. Since the multiprocessor SMC systems we consider here contain only a modest number of processors, it may be ....

J.K. Archibald, "A Cache Coherence Approach for Large Multiprocessor Systems", Proceedings of the ACM/IEEE International Conference on Supercomputing, pages 337-345, 1988.


Two Adaptive Hybrid Cache Coherency Protocols - Anderson, Karlin (1996)   (6 citations)  (Correct)

....had performance which was intermediate between the two. Archibald introduced a protocol which improves over hybrid protocols in which the counter is associated with the reader (as opposed to the writer) In his protocol, a block is not automatically invalidated when its count reaches 0 [3]. Instead, a cache simply doesn t raise the shared line if the relevant block s count is 0. If some cache whose block count is not 0 indicates that it is keeping the block by raising the shared line, then no cache invalidates its block. The block is invalidated in the sharing caches only when all ....

J. Archibald. A cache coherence approach for large multiprocessor systems. In International Conference on Supercomputing, pages 337--345, 1988.


A Cost-Comparison Approach for Adaptive Distributed Shared Memory - Kim, Vaidya (1996)   (2 citations)  (Correct)

....selectively updating a set of processors, or requesting a stream of dataahead of its intended use (prefetch) The basic difference between our approach and [25] is that our scheme does not need to know whether a particular synchronization controls access to a given shared memory page or not. [2] dynamicallychoosesto update or invalidate copies of ashareddata object. If there are three writes byasingle processorwithout intervening references by any other processor, all other cached copies are invalidated in [2] Competitive update scheme [1, 9, 10, 13] invalidates apage if the numberof ....

....synchronization controls access to a given shared memory page or not. 2] dynamicallychoosesto update or invalidate copies of ashareddata object. If there are three writes byasingle processorwithout intervening references by any other processor, all other cached copies are invalidated in [2]. Competitive update scheme [1, 9, 10, 13] invalidates apage if the numberof remote updates to the page (between local accesses) exceedsa threshold or a limit parameter. Quarks [15] uses a variation of the competitive update scheme. Protocols presented in [8, 9, 20, 28] dynamically identify ....

J. Archibald, "A cache coherence approach for large multiprocessor systems," in International Conference on Supercomputing, pp. 337-- 345, July 1988.


Distributed Shared Memory: Recoverable and Non-recoverable - Limited Update   (Correct)

....DSM based on an approach that invalidates a copy of a page at some node A, if it is updated by other nodes too many times before node A accesses it. This protocol is called the limited update protocol, and is based on a similar protocol for cache coherence in hardware shared memory systems [1, 7]. The proposed reliable DSM can tolerate a single node failure without significant recovery overhead. Also, the proposed scheme incorporates the release consistency model. For future reference, note that we use the terms node and processor interchangeably. A node may execute one or more processes, ....

....protocol. Limited Update Protocol The limited update protocol presented here is intended for a software implementation of DSM that uses release consistency (summarized above) This protocol is similar to two protocols previously proposed for maintaining sequential consistency in hardware caches [1, 7]. The advantage of this protocol is that it facilitates a simple implementation of a recoverable DSM. The basic idea of the limited protocol is to update those copies of a page that are expected to be used in the near future, while selectively invalidating other copies. Now we summarize the ....

[Article contains additional citation context not shown here]

J. Archibald, "A cache coherence approach for large multiprocessor systems," in International Conference on Supercomputing, pp. 337--345, July 1988.


Improving Performance of Bus-Based Multiprocessors - Anderson (1995)   (1 citation)  (Correct)

....will continue to be broadcast, since the block is being shared. Unfortunately, this requires some hardware mechanism to determine when writes originate from different processors. Archibald introduced an improved protocol in which a block is not automatically invalidated when its count reaches 0 [Arc88] Instead, a cache simply doesn t raise the shared line if the relevant block s count is 0. If some cache whose block count is not 0 indicates that it is keeping the block by raising the shared line, then no cache invalidates its block. The block is invalidated in the sharing caches only when all ....

James Archibald. A cache coherence approach for large multiprocessor systems. In International Conference on Supercomputing, pages 337--345, 1988.


Synchronization, Coherence, and Consistency for High Performance .. - Dwarkadas (1992)   (Correct)

....in software. Software based coherence gives the programmer flexibility but increases the off cluster access time. The locking protocol is based in memory. Archibald presents a distributed write, adaptive snooping protocol that dynamically determines whether a block is being actively shared [7]. He extends the protocol to cluster based hierarchical multiprocessor organizations, introducing cluster ownership for writing. Copies in other clusters are invalidated on a write in any cluster. Since cluster controllers do not contain data, all traffic must go to the processor level to obtain ....

....case it is still O(N 2 ) The protocols do not always provide optimal performance for other shared and non shared data, however, since write through increases the amount of traffic on the bus irrespective of whether it is really needed. A class of protocols based on write broadcast ( 69] and [7]) provide dynamic type classification of the datum cached (i.e. read only, local, or shared) Rudolph and Segall [69] use a mixed broadcast write first protocol, which dynamically determines whether a block is written by more than one processor by updating all other cached copies on the first ....

[Article contains additional citation context not shown here]

J. Archibald. A Cache Coherence Approach for Large Multiprocessor Systems. In Proceedings of the 1988 International Conference on Supercomputing, pages 337--345. ACM, July 1988. 134


The Prospects for On-Line Hybrid Coherency Protocols on.. - Veenstra, Fowler (1994)   (7 citations)  (Correct)

....Related work on hybrid protocols Hybrid protocols in which both WI and WU are used have been proposed several times. Karlin et al. 19] explored generalized hybrid snooping protocols, including competitive protocols those that are guaranteed to be within a constant factor of optimal. Archibald [2] proposed a hybrid protocol for scalable multiprocessors with single level caches that uses an all or none rule to drop a block from all caches receiving the update after three unused updates. Veenstra and Fowler [33] compared several kinds of optimal off line protocols in which decisions are ....

....cache as long as any one node would have retained a copy under the AXP protocol. This modification does not require any extra bus lines, since there is already a bus line that is used to signal that some processor has accepted the update and still has the block present. Archibald s protocol [2] uses a similar all or none criterion. AXPa This is the AXPa protocol augmented with the migrate dirty block on read miss rule. 3.4 Parallel execution phase The application programs spend a significant fraction of the total simulated time initializing data structures. For example, the MP3D ....

J. Archibald. A cache coherence approach for large multiprocessor systems. In International Conference on Supercomputing, pages 337--345, November 1988.


Effectiveness of Hardware-Based and.. - Dahlgren.. (1995)   (Correct)

....cache coherence, according to Dahlgren [6] keeps the update traffic low because of three fundamental mechanisms: hybrid update invalidate, write caches, and read snarfing. First, in contrast to a pure writeupdate protocol, the hybrid update invalidate scheme we assume, according to Archibald [2], invalidates copies after a certain SLC P FLWB SLWB FLC Figure 1: The processor environment. number of updates by the same processor with no intervening access from others. To support this scheme, the bus is extended with a sharing line. Moreover, each block in the SLC has a counter initially ....

Archibald, J.K. "A Cache Coherence Approach For Large Multiprocessor Systems," in Proc. of the 1988 Int. Conference on Supercomputing, pp.337-345, July 1988.


Adaptive Software Cache Management for Distributed.. - Bennett, Carter.. (1990)   (75 citations)  (Correct)

....not needlessly broadcast the new value during subsequent writes. Archibald described a cache coherence protocol that attempts to adapt to the current reference pattern and dynamically choose to update or invalidate the other copies of a shared data object depending on how they are being used [2]. His protocol is designed for hardware implementation, and therefore is fairly simple and not as aggressive in its attempts to adapt to the expected access behavior as what we propose. Nevertheless, his simulation study indicates that even a simple adaptive protocol can enhance performance. Other ....

James Archibald. A cache coherence approach for large multiprocessor systems. In International Conference on Supercomputing, pages 337--345, November 1988.


Towards an Adaptive Distributed Shared Memory - Kim, Vaidya (1995)   (3 citations)  (Correct)

....mechanism to invalidate those copies of a page which have not been accessed by a node for a long time. 1, 9, 14, 10] present competitiveupdate mechanisms to invalidate a copy of a page at a node, if the copy is updated by other nodes too many times without an intervening local access. [2] presents a similar scheme. The advantage of this approach, as compared to [7] is as follows: the decision mechanism used in this approach (to determine when to invalidate a page) is dependent only on the application s access pattern, instead of real time as in Munin [7] Quarks [16] also ....

....node 1 becomes 3 (column 23) Therefore, the page copy at node 1 is also invalidated. At the end, only node 0 has a copy of the page, with update counter 0 (column 23) 2 [7] to implement its time out mechanism. 2. 1 Generalization of the Competitive Update Protocol Unlike other similar schemes [2, 1, 9, 14, 10], implemented in hardware caches, the software implementation can be more flexible and complex (without affecting the performance adversely) The basic competitive protocol can be generalized in four ways, as summarized below. The adaptive protocol presented later uses these generalizations. ffl ....

[Article contains additional citation context not shown here]

J. Archibald, "A cache coherence approach for large multiprocessor systems," in International Conference on Supercomputing, pp. 337--345, July 1988.


Implementation and Evaluation of Update-Based Cache.. - Grahn, Stenström, Dubois (1995)   (14 citations)  (Correct)

....to the other caches. When another cache re reads the block all caches with an invalid copy of the block catch the copy of the block as it propagates on the bus (read broadcasting) and reset their counters to the threshold. Clearly, readbroadcast is not feasible in a directory based environment. In [4], Archibald proposed an adaptive write invalidate write update snoopy cache protocol. His adaptive protocol starts in update mode, just like ours, and when a single processor has issued three consecutive writes to the same block without any intervening access by another processor, all other copies ....

J.K. Archibald, A cache coherence approach for large multiprocessor systems, Proc. Internat. Conf. on Supercomputing, St. Malo, France (Jul. 1988) 337-345.


The Potential of Compile-Time Analysis to Adapt the Cache.. - Mounes-Toussi, Lilja (1995)   (1 citation)  (Correct)

....can be considered the best coherence enforcement scheme. To combine the best aspects of both approaches, hybrid schemes attempt to adapt to the current memory referencing pattern by choosing to update or invalidate the other copies of shared data depending on how they are currently being used [4]. In general, adaptive schemes such as EDWP [4] Competitive [17] and the scheme by Wilson et al. [31] use a variety of dynamic heuristics for determining when to switch between updating and invalidating. These schemes typically rely on the broadcasting capability of a single shared bus, making it ....

....scheme. To combine the best aspects of both approaches, hybrid schemes attempt to adapt to the current memory referencing pattern by choosing to update or invalidate the other copies of shared data depending on how they are currently being used [4] In general, adaptive schemes such as EDWP [4], Competitive [17] and the scheme by Wilson et al. [31] use a variety of dynamic heuristics for determining when to switch between updating and invalidating. These schemes typically rely on the broadcasting capability of a single shared bus, making it difficult to extend them to large scale ....

[Article contains additional citation context not shown here]

J. K. Archibald. A cache coherence approach for large multiprocessor system. 2nd International Conference on Supercomputing, pages 337--345, 1988.


AdaptiveSoftware Cache Managementfor - Distributed Shared Memory   (Correct)

No context found.

James Archibald. A cache coherence approach for large multiprocessor systems. In International ConferenceonSupercomputing, pages 337--345, November 1988.


Minerva: An Adaptive Subblock Coherence Protocol for Improved .. - Rothman, Smith   (Correct)

No context found.

J. K. Archibald. A cache coherence approach for large multiprocessor systems. Proc. 2nd International Conference on Supercomputing, pages 337-345, July 1988.


Assessment of Cache Coherence Protocols in Shared-memory.. - Grbic (2003)   (Correct)

No context found.

James K. Archibald. A Cache Coherence Approach for Large Multiprocessor Systems. In Proceedings of the 2nd International Conference on Supercomputing, pages 337--345, St. Malo, France, July 1988.


Interactive Locality Optimization on NUMA Architectures - Mu, Tao, Schulz, McKee (2003)   (Correct)

No context found.

J. Archibald. A Cache Coherence Approach for Large Multiprocessor Systems. In Proceedings of the International Conference on Supercomputing, pages 337-345, Vovember 1988.


A Communication Architecture for Multiprocessor Networks - Nowatzyk (1989)   (4 citations)  (Correct)

No context found.

James Archibald. A Cache Coherence Approach for Large Multiprocessor Systems. In 1988 International Conference on Supercomputing (ICS); ACM Press, pages 337-345. St. Malo, France, July 1988.


Willow: A Scalable Shared-Memory Multiprocessor - Bennett, Dwarkadas.. (1992)   (7 citations)  (Correct)

No context found.

J. Archibald. A Cache Coherence Approach for Large Multiprocessor Systems. In Proceedings of the 1988 International Conference on Supercomputing, pages 337--345, July 1988.


Models for Performance Prediction of Cache Coherence.. - Srbljic, Vranesic..   (Correct)

No context found.

J. Archibald, "A Cache Coherence Approach for Large Multiprocessor Systems", In International Conference on Supercomputing, St. Malo, France, pages 337-345, July 1988.


Cache Coherence in Large-Scale Shared Memory Multiprocessors.. - Lilja (1993)   (34 citations)  (Correct)

No context found.

James K. Archibald, "A Cache Coherence Approach for Large Multiprocessor Systems," ACM International Conference on Supercomputing, pp. 337-345, 1988.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC