| S. S. Mukherjee and M. D. Hill. Using prediction to accelerate coherence protocols. In Proceedings of the 25th Int. Symp. on Computer Architecture, pages 179--190, June 1998. |
....in DSM systems for performance improvement, while the other is using speculative threads to improve computation. The latter direction proposes the use of new architectural designs to allow hardware and software collaboration to support a monitor and recover programming paradigm [15] Cosmos [14] is a general pattern predictor. It accurately predicts the future coherence operations and performs them in a speculative manner to improve performance. By using such a pattern predictor, a predictor based DSM might eliminate the overhead of maintaining coherence. However, the assumption is that ....
S. S. Mukherjee and M. D. Hill. Using prediction to accelerate coherence protocols. In Proceedings of the 25th annual international symposium on Computer architecture, pages 179--190. IEEE Press, 1998.
....and it has proved useful in improving microprocessor performance. Prediction in the context of shared memory was first studied by Mukherjee and Hill, who showed that it is possible to use address based 2 level predictors at the directories and caches to track and predict coherence messages [19]. Subsequently, Lai and Falfasi modified these predictors to reduce their size and showed how they can be used to accelerate reading of data [16] Finally, Kaxiras and Young [15] used prediction to reduce access latency in distributed shared memory systems by attempting to move data from their ....
S. S. Mukherjee and M. D. Hill. "Using Prediction to Accelerate Coherence Protocols". Proc. of the 25th Int'l Symposium on Computer Architecture (ISCA'98), pp. 179--190, July 1998.
....the system adaptive to different memory access patterns. Tyson et al. 14] proposed instruction based prediction to bypass selectively the cache for loads that are responsible for most cache misses. Cache predictions in multi processor system are yet to be thoroughly analyzed. Mukherjee and Hill[1] proposed branch predictor like predictors to predict incoming cache coherent messages to accelerate coherence protocols. Lai and Flasafi[2] gave an improved version of Mukherjee s predictor by only predicting requesting messages and combining several patterns into one so as to improve the ....
....thus reducing the delay on a cache miss. 6 Pattern Prediction Current Pattern Next Miss ( pid, set ) pid, set ) pid, set ) pid, set ) pid, set ) pid, set ) Figure 3: Cache Access Miss Predictor Our predictor is similar to the branch predictor in [13] and the message predictor in [1] and [2] A single history table, as shown in Figure 3, is used to maintain a set of cache miss patterns that happened before. The number of entries of the history table is called the history width(w h ) It determines how many different patterns can be kept for future reference. Each entry has ....
Shubhendu S. Mukherjee and Mark D. Hill, "Using Prediction to Accelerate Coherence Protocols", in Proceedings of the 25th Annual International Symposium on Computer Architecture, pp. 179-190, July 1998.
.... by analyzing the program and then inserting prefetch instructions before the actual data request in the program [95] In hardware controlled prefetching, dedicated hardware is used to predict the future accesses of sharing patterns and coherence activities by looking at their observed behavior [96, 77, 73, 133, 34, 107]. Thus, there is no need to add instructions to the program. These techniques assume that memory accesses and coherence activities in the near future will follow past patterns. Then, the hardware prefetches the data based on its prediction. In sender initiated systems (that is, in message passing ....
....the case for point to point communications, the approach mentioned above cannot be used. 44 Prediction techniques have been proposed in the past to predict the future accesses of sharing patterns and coherence activities in distributed shared memory (DSM) by looking at their observed behavior [96, 77, 73, 133, 34, 107]. These techniques assume that memory accesses and coherence activities in the near future will follow past patterns. Sakr and his colleagues have used time series and neural networks for the prediction of the next memory sharing requests [107] Dahlgren and his colleagues devised hardware regular ....
[Article contains additional citation context not shown here]
S. S. Mukherjee and M. D. Hill, "Using Prediction to Accelerate Coherence Protocols ", Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998.
....reducing directory height (the number of directory entries) by combining several directory entries into a single entry [21] or by organizing the directory as a cache [9] 19] Recently, some proposals have appeared focusing on reducing the penalty introduced by remote accesses. In [13] 14] and [17], coherence messages are predicted. Bilir et al. 2] try to predict which nodes must receive each coherence transaction. If the prediction hits, the protocol approximates the snooping behavior (although the directory must be accessed in order to verify the prediction) 3. Multilayer Clustering ....
S.S. Mukherjee and M.D. Hill. Using Prediction to Accelerate Coherence Protocols. Proc. of the 24th Int'l Symposium on Computer Architecture, July 1998.
.... have been proposed to address some of these 0 7803 9802 5 2000 10.00 (c) 2000 IEEE specific patterns [Per93] Kax98] More general schemes have also been proposed, but they remain costly in hardware, they require on chip modification, or large extension of directory structure (memory overhead) MH98] In the scientific computing world, it is well known that data access are highly structured. This property is induced by the specific nature of scientific algorithms. Success of vector architectures, and a large amount of research in latency optimization technique rely on this fact. Stream ....
.... classified according first how anticipation actions are inserted (either statically by the compiler or dynamically at run time by a specialized hardware) and second on the type of information they are correlated to ( instructions or address) Address oriented methods for prediction are expensive [MH98] and often outperformed by instructions driven schemes [Kax98] However instruction oriented method need to be tightly integrated with the CPU which could be rather difficult. Furthermore, for scientific code, the high degree of regularity clearly advocated for the choice of address based method ....
[Article contains additional citation context not shown here]
Shubhendu S. Mukherjee and Mark D. Hill. Using Prediction to Accelerate Coherence Protocol. In Proceedings of the 25th International Symposium on Computer Architecture, July 1998.
....operation. In the next section, we propose an implementation of the LL SC semantics that provides good atomic read modify write performance even under high contention. We then extend this mechanism to optimize for specific lock behaviors. This work is inspired by the work of Mukherjee and Hill [28] who proposed to use prediction to speed up coherence protocols, by the work of Kaxiras and Goodman [19] who discussed instruction and address based prediction to improve the performance of multiprocessor systems, and Lai and Falsafi [22] who first designed and analyzed a full parallel system ....
Shubhendu S. Mukherjee and Mark D. Hill. Using Prediction to Accelerate Coherence Protocols. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 179--190, June 1998.
....classes. When delayed update is used writes to a line are propagated to other processors caching the line when a synchronization point is crossed [1] The work reported was preliminary and so did not include dynamic methods for classifying data or evaluating performance. Mukherjee and Hill [21] propose a coherence protocol message predictor analogous to a two level PAp (local history) branch predictor [25] The first level tables are indexed by e#ective address, first level table entries hold the last message (or last few messages) arriving for the address and its sender. These are ....
Mukherjee, S.S., and Hill, M.D. Using prediction to accelerate coherence protocols. Proceedings of the 25th Annual International Symposium on Computer Architecture. June 1998, pp. 179--190.
....the block in advance so that subsequent accesses by other processors find the block available at the directory node. Accurate speculative invalidation can virtually eliminate all invalidation messages and can significantly reduce communication time. In conjunction with sharing [8] or coherence [12] predictors predicting a subsequent sharer of a memory block a speculative invalidation can forward a memory block to its subsequent consumer early, potentially eliminating the remote memory access latency in DSM. Lebeck and Wood [10] proposed the first speculative invalidation technique, ....
S. S. Mukherjee and M. D. Hill. Using prediction to accelerate coherence protocols. In Proceedings of the 25th Annual International Symposium on Computer Architecture, June 1998.
....of different message predicting techniques for the receiving side of message passing systems. 3 This paper concentrates on message predictions at the destinations in message passing systems using MPI in isolation. This is analogous to branch prediction, and coherence activity prediction [27] in isolation. Our tools are not ready for measuring the effectiveness of our predictors on the application run time yet. Our preliminary evaluation measures the accuracy of the predictors in terms of hit ratio. The results are quite promising and suggest that prediction has the potential to ....
....their poor performance for short messages which is extremely important for parallel computing. Prediction techniques have been proposed in the past to predict the future accesses of sharing patterns and coherence activities in distributed shared memory (DSM) by looking at their observed behavior [27, 23, 20, 39, 11, 31]. These techniques assume that memory accesses and coherence activities in the near future will follow past patterns. Sakr and his colleagues have used time series and neural networks for the prediction of the next memory sharing requests [31] Dahlgren and his colleagues devised hardware regular ....
[Article contains additional citation context not shown here]
S. S. Mukherjee and M. D. Hill, "Using Prediction to Accelerate Coherence Protocols", Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998.
....caches, where the directory controller identifies cache lines to be self invalidated based on prior behavior. There is no direct tapes analog because the point of self invalidating caches is to eliminate coherence messages. Tapes affect only data movement, not coherence. Mukherjee [25] described the use of branch predictors in anticipating coherence operations. Similar techniques could be used to direct tape operations. However, they are likely to be less useful because SDSM systems have larger, and therefore fewer, operations. Statistical techniques usually work best with ....
S. S. Mukherjee and M. D. Hill, "Using Prediction to Accelerate Coherence Protocols," in 25 th Annual International Symposium on Computer Architecture (ISCA), June 1998.
....each address based prediction scheme is tailored for a specific sharing pattern. Each may require its own states in the coherence protocol and its own storage (usually on a per block basis) for history information. Although Mukherjee and Hill showed how to generalize address based prediction [75], the issue of excessive storage for history information remains. However, there are important issues involved with instruction based prediction in sharedmemory: 1. Implementation issues: Instruction based prediction calls for a tight integration of the processor core and the coherence mechanisms ....
.... consumers to be the new set, ii) INTERSECTION PREDICTION that predicts the intersection of the last two sets of consumers to be the new set, and (iii) two level adaptive prediction inspired by the analogous scheme in branch prediction [105] and by Mukherjee and Hill s work on protocol prediction [75]. The three schemes work as follows: 1. LAST PREDICTION: The predictor is both updated and probed on a store miss or a store write fault. The predictor is updated when the producer node invalidates a sharing list. The update collects the identities of the invalidated nodes on a temporary bit map ....
[Article contains additional citation context not shown here]
Shubhendu S. Mukherjee and Mark D. Hill "Using Prediction to Accelerate Coherence Protocols." In Proceedings of the 25th International Symposium on Computer Architecture (ISCA), July 1998.
....Hardware Mechanisms 148 patterns by changing the shared memory protocol used on a given block of data over the course of program execution, potentially providing better performance than could be achieved by compile time analysis of the program to determine the best shared memory protocol. [27] showed that conventional prediction techniques are effective at predicting the behavior of shared memory protocols to determine what the next coherence message received at a node would be, albeit at a high overhead. The predictor used in an adaptive shared memory system would have to operate at a ....
Shubhendu S. Mukherjee and Mark D. Hill. Using prediction to accelerate coherence protocols. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 179-190, June 1998.
....at the same time, or allow the coherence protocol to adapt to some identifiable access patterns [3, 11] The main difference in these systems is regarding what and how access patterns are detected. Some heuristic mechanisms have been proposed to predict and trigger appropriate protocol behavior [8]. The implementation of an adaptive cache coherence protocol involves two issues: what adaptivity can be embodied in the protocol, and how and when such adaptivity can be invoked. This paper addresses the first issue and attacks the adaptivity problem from a new perspective. It proposes a cache ....
S. S. Mukherjee and M. D. Hill. Using Prediction to Accelerate Coherence Protocols. In International Symposium on Computer Architecture, 1998.
....use complex finite statemachines which are difficult to design and require large amounts of computational resources to verify [6] Moreover, capturing sharing patterns in protocol states often limits the protocol to learning one sharing pattern per memory block at a time. In a recent paper [17], Mukherjee and Hill advocate using a general pattern based predictor derived from Yeh and Patt s two level adaptive Pap branch predictor [23] to learn and predict the coherence activity for a memory block in a DSM. By accurately predicting and performing the necessary coherence operations ....
....operations early, obviating the need to modify the base coherence protocol. This paper proposes novel pattern based predictors, Memory Sharing Predictors (MSPs) that dramatically improve prediction accuracy and implementation cost over previous proposals. Unlike a general message predictor [17], an MSP only predicts memory request messages i.e. the primary messages that invoke a sequence of protocol actions. In common DSM sharing patterns, multiple coherence messages in a read sharing phase often arrive in an arbitrary order due to system contention or load imbalance in the ....
[Article contains additional citation context not shown here]
Shubhendu S. Mukherjee and Mark D. Hill. Using prediction to accelerate coherence protocols. In Proceedings of the 25th Annual International Symposium on Computer Architecture, June 1998.
No context found.
Shubhendu S. Mukherjee and Mark D. Hill. Using Prediction to Accelerate Coherence Protocols. In Proceedings of the 25th Annual International Symposium on Computer Architecture, June 1998.
....predict sharers [2] Bilir et al. 7] studied multicast snooping with a 4K entry StickySpatial(1) destination set predictor. Many papers have examined or exploited other forms of coherence prediction (e.g. dynamic self invalidation [20, 21] Coherence predictors have been indexed with addresses [27], program counters [16] message history [19] and other state [17] Researchers have also developed protocols that optimize for specific sharing behaviors [6] read modify write sequences [28, 29] and migratory sharing [8, 33] Other hybrid protocols adapt between writeinvalidate and ....
S. S. Mukherjee and M. D. Hill. Using Prediction to Accelerate Coherence Protocols. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 179--190, June 1998.
....given a cache block address (index for MHRs) and the history of messages (i.e. sender processor, message type tuples) Cosmos can predict with high accuracy the sender processor, message type tuple of the next message 1. Guri Sohi suggested this approach. This improvement does not appear in [90]. Depth of MHR appbt barnes dsmc moldyn unstructured Base P Base P Base P Base P Base P 1 84 13 62 5 84 8 86 1 74 1 2 85 2 69 8 86 7 86 1 88 3 3 85 3 69 7 93 1 85 3 89 2 4 85 2 68 10 93 1 84 5 92 0 Table 6.8: Using processor numbers to improve Cosmos accuracy. Depth ....
Shubhendu S. Mukherjee and Mark D. Hill. Using Prediction to Accelerate Coherence Protocols. In Proceedings of the 25th Annual International Symposium on Computer Architecture, June 1998.
No context found.
S. S. Mukherjee and M. D. Hill. Using prediction to accelerate coherence protocols. In Proceedings of the 25th Int. Symp. on Computer Architecture, pages 179--190, June 1998.
No context found.
S. S. Mukherjee and M. D. Hill. Using prediction to accelerate coherence protocols. In Proc. of the 25th Annual Int'l Symp. on Computer Architecture (ISCA'98), pages 179 -- 190, June 1998.
No context found.
S. S. Mukherjee and M. D. Hill. Using prediction to accelerate coherence protocols. In Proc. of the 25th Annual Int'l Symp. on Computer Architecture (ISCA'98), pages 179 -- 190, June 1998.
No context found.
Shubhendu S. Mukherjee and Mark D. Hill. Using Prediction to Accelerate Coherence Protocols. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 179--190, Barcelona, Spain, June 1998.
No context found.
Shubhendu S. Mukherjee and Mark D. Hill. Using prediction to accelerate coherence protocols. In Proceedings of the 25th annual international symposium on Computer architecture, pages 179--190. IEEE Press, 1998.
No context found.
S. S. Mukherjee and M. D. Hill. "Using Prediction to Accelerate Coherence Protocols". Proc. of the 25th Int'l Symposium on Computer Architecture, pp. 179--190, July 1998.
No context found.
S. S. Mukherjee and M. D. Hill. Using Prediction to Accelerate Coherence Protocols. In Proc. of ISCA-25, May 1998.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC