| D. J. Lilja and P.-C. Yew, "Improving Memory Utilization in Cache Coherence Directories," IEEE Tran. Parallel and Distributed Systems, 4(10):1130-1146, Oct. 1993. |
....cache associated with each cache line size block in memory. The fullmap strategy suffers from the problem of memory overhead for storing the bit vectors, and variations of this basic directory strategy try to reduce this overhead by limiting the amount of state information that is kept [15] 29] 48][54][55] For the general class of hardware coherence strategies, cost, complexity, and inflexibility are considered some of the primary drawbacks. However, the reliance of this class of strategies on fixed cache line sized blocks for cache management also introduces a fundamental limitation in terms ....
D. Lilja and P. Chung. Improving memory utilization in cache coherence directories. In IEEE Transactions on Parallel and Distributed Systems, Vol. 4, No. 10, pages 11301146, October 1993.
....the way that an application interacts with shared memory. 14] and [34] propose and evaluate this method for improving the performance of software extended shared memory. The studies show that given appropriate annotations, a large class of applications can perform well on Dir 1 H 1 S B,LACK . [24] demonstrates a compiler annotation scheme for optimizing the performance of protocols that dynamically allocate directory pointers. Dynamic detection [12] and [27] propose a hardware mechanism that dynamically adapts to migratory data. Protocol extension software could perform similar ....
David J. Lilja and Pen-Chung Yew. Improving Memory Utilization in Cache Coherence Directories. IEEE Transactions on Parallel and Distributed Systems, 4(10):1130--1146, October 1993.
....selects an appropriate memory for each object. Similar optimization techniques work for software extended implementations of distributed shared memory. Lilja and Yew demonstrate a compiler annotation scheme for optimizing the performance of protocols that dynamically allocate directory pointers [58]. Hill, Wood, and others propose and evaluate a programmer directed method for improving the performance of software extended shared memory [34, 81] The studies show that given appropriate annotations, a large class of applications can perform well on Dir 1 H 1 S B,LACK . Cachier [21] takes this ....
David J. Lilja and Pen-Chung Yew. Improving Memory Utilization in Cache Coherence Directories. IEEE Transactions on Parallel and Distributed Systems, 4(10):1130--1146, October 1993.
....with cached copies of a block are associated with each block in the memory. Since the data caches are significantly smaller than the main memory, however, most of the memory blocks will not be cached at any given time. As a result, most of these pointer bits will be unused. The tagged directories [5, 8, 10, 12], in contrast, dynamically allocate pointers from a special purpose pointer cache to individual memory blocks when the block is moved from the memory to a data cache. Each entry in the pointer cache requires two fields: 1) an address tag to identify the memory block to which the pointer is ....
D. J. Lilja and P.-C. Yew, "Improving Memory Utilization in Cache Coherence Directories," IEEE Tran. Parallel and Distributed Systems, 4(10):1130-1146, Oct. 1993.
....for bus based systems. For more scalable systems that use general interconnection networks, directory based hardware schemes [2, 3, 5, 15, 35, 38] and compiler assisted software schemes [7, 18, 17, 25, 37] have been suggested. Recently, several authors have proposed dynamically tagged directories [6, 14, 22, 23, 24, 30] in which pointers to processors with a copy of a memory block are allocated only when the block is actually cached. These directories maintain a cache of pointers in each memory module. Typically, each pointer consists of an address tag to identify the block plus a bit vector to point to the ....
....the other hand, can perfectly disambiguate memory references at run time so that they invalidate only cache blocks that are actually stale. Unfortunately, directory based schemes require a large amount of memory to store the cache block sharing information. Several dynamically tagged directories [6, 14, 22, 23, 24, 30] have been proposed to reduce the memory requirements. In dynamically tagged directories, a pointer from a cache of pointers in the directory is allocated to a particular memory block only when the block is moved to the data cache. Typically, each pointer consists of an address tag and a vector of ....
[Article contains additional citation context not shown here]
D. J. Lilja and P. Yew. Improving memory utilization in cache coherence directories. IEEE Transactions on Parallel and Distributed Systems, 4(10):1130--1146, October 1993.
.... the shared memory to keep track of which processors have cached copies of which blocks [2, 21, 24] These traditional static directories typically maintain this information for each block in the shared memory, which can require an inordinate amount of hardware [12] Dynamically tagged directories [5, 10, 13, 18] have been proposed to reduce the hardware requirements of a coherence directory. These tagged directories use special purpose caches of pointers that are dynamically allocated to memory blocks to point to processors with a copy of a specific memory block only when that block is actually cached ....
....with cached copies of a block are associated with each block in the memory. Since the data caches are significantly smaller than the main memory, however, most of the memory blocks will not be cached at any given time. As a result, most of these pointer bits will be unused. The tagged directories [5, 10, 13, 18] take advantage of the fact that pointers from the directory to the processors are necessary only when a memory block is actually cached in one or more processors. The tagged directories dynamically allocate pointers to individual memory blocks when the block is moved from the memory to a data ....
David J. Lilja and Pen-Chung Yew, "Improving Memory Utilization in Cache Coherence Directories," IEEE Transactions on Parallel and Distributed Systems, Vol. 4, No. 10, pp. 1130-1146, October 1993.
....that force invalidations, as shown in the last column of Table 2. This increasing number of invalidations can significantly increase the average memory delay, however, since the directory must wait for each processor with a cached copy of the block being invalidated to acknowledge the invalidation [17]. The adaptive prefetching scheme tends to produce the smallest number of invalidations, and thus will reduce the average memory delay when compared to the nonadaptive schemes. To summarize these tables, it should be noted that increasing the number of single word blocks fetched on a miss tends to ....
David J. Lilja and Pen-Chung Yew, "Improving Memory Utilization in Cache Coherence Directories," IEEE Transactions on Parallel and Distributed Systems, (to appear).
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC