| A. Smith, "Sequential Program Prefetching in Memory Hierarchies, " IEEE Computer, Vol. 11, pp. 7--21, 1978. |
....that with no additional fetches, this policy works about as well as the best fixed policy, chosen a posteriori.Wedefinea refetch as a fetch of a previously seen object that is favored by the current policy but was discarded from the real cache at some time prior. Refetching is like prefetching [23, 8, 13], except that the refetched objects have recently been discarded. With refetching, the master policy can outperform the best fixed policy. In particular, when all required objects are refetched continuously, it has a 15 23 lower miss rate than the best fixed policy, and almost the same ....
....there need not be a single governing policy. Viewing the experts predictions as rankings makes them easy to combine. How rankings can be computed, combined, used to facilitate replacements, and suggest refetches, will be the topic of the following subsections. Refetches are like prefetches [23, 8, 13] except that the objects fetched have necessarily resided in the real cache at some time prior. More precisely, we define a refetch as a fetch of a previously seen object that was kept in the virtual cache(s) of higher weight expert policies, but was discarded from the real cache. Based on these ....
A. J. Smith. Sequential program prefetching in memory hierarchies. IEEE Computer, 11(12):7--21, December 1978.
....without prefetching. Below we summarize some representative compiler and non compiler assisted techniques for prefetching. 2.1. 1 Static Look Ahead Schemes One block look ahead (OBL) probably the simplest prefetch policy, has long been used to improve the performance of sequential workloads [54, 55]. There are several variations to OBL; all opti mistically assume that the next block to be accessed is next to the current block, or one located at a fixed distance S (stride based prefetching) For applications with small loop execution times or high data request rates, Jouppi extended OBL to ....
SMITH, A. J. Sequential Program Prefetching in Memory Hierarchies. In IEEE Computer (Dec. 1978), vol. 11.
.... we experiment with the W2R algorithm [JN98] We select the W2R scheme for comparison mainly because it is an integrated caching and prefetching algorithm like MICP and performance results have shown that W2R outperforms caching and or prefetching policies such as LRU, 2Q [JS94] and LRU OBL [Smi78, Smi85] However, since W2R has been designed for conventional page based database systems, we need to adapt it to the characteristics of a mobile hybrid data delivery environment in order to make it competitive to MICP. Our goal was to re design W2R in such a way that its original design ....
A. J. Smith. Sequential Program Prefetching in Memory Hierarchies. IEEE Computer, 3(3): pages 7-21, Dec. 1978.
....cache before its other parts are accessed, the CPM for this refill policy can be approximated as: CPM = latency (Eqn 4.33) As memory systems become more complex, it becomes increasingly difficult to accurately estimate values for CPM. For example, a memory system can prefetch instructions or data [Farrens89, Hill87, Smith78, Smith92, Pierce95], caches can be designed to pipeline accesses [Jouppi90, Olukotun92, Palcharla94] and process multiple outstanding misses in parallel (a nonblocking or lockup free cache) and dirty write backs can be queued and processed as a background operation [Smith82, Kessler91] Each of these ....
Smith, A. J. Sequential program prefetching in memory hierarchies. IEEE Computer 11 (12): 7-21, 1978.
....degrade performance also becomes smaller. One technique to address the speed gap between processors and memory that has been advocated by many researchers is hardware data prefetching. Many hardware techniques have been proposed that target a specific access pattern; sequential prefetching [1, 2], stride prefetchers [3, 4, 5, 6] and prefetchers targeting pointer chasing, namely Mehrotra s technique [7] and dependence based prefetching (DBP) 8] There are also hardware techniques targeted for multiple reference patterns e.g. Markov prefetching [9] One limitation of previous work is ....
A. J. Smith. Sequential Program Prefetching in Memory Hierarchies. IEEE Computer, 11(12):7--21, December 1978.
....some preliminary results. We consider two simple prefetch methods. Whenever a miss occurs, the sequential prefetch method prefetches the next sequential line if it is not already in the cache. Upon the first access to a prefetched line, it triggers the prefetch of the following sequential line [22]. The filtered sequential prefetch method starts prefetching the next sequential line on a miss only when a previous sequential access pattern has been identified [18, 16] In general, the filtered sequential scheme has a higher prefetch accuracy but results in a smaller cache hit ratio ....
....For the column associative cache, we determine the secondary location by flipping the highest order index bit. As described in [2] we include a rehash bit with each entry of the tag array to guide the search and replacement. We consider two simple data prefetching techniques, sequential prefetch [22] and filtered sequential prefetch [18, 16] For the filter, we use an 8 entry history table to identify sequential access patterns. Based on these techniques, we evaluate the hit ratio improvement as well as the extra memory traffic generated by data prefetching. For the directmapped cache, we ....
A. Smith, "Sequential Program Prefetching in Memory Hierarchies," IEEE Computer, Vol. 11(12), Dec. 1978, pp. 7--21.
....is found to remove all load and store faults, after some initial misses for some applications. By contrast, for irregular algorithms it is less efficient owing to the fact that previous access patterns cannot guide which pages to prefetch. This motivates us to also adopt sequential prefetching [10,23] of consecutive pages that are not present in the local memory. While sequential prefetching alone is only effective when spatial locality is high, the combination of history and sequential prefetching improves performance for all applications. Our simulations assume ATM technology that is ....
....For example, consider multiplication of two matrices that each consists of several pages. Clearly, one could in this case benefit from prefetching consecutive pages on an access fault. This simple technique has been proposed in the context of hardware based cache prefetch algorithms by Smith [23] and is known as sequential prefetching. We will simulate a variation of this simple scheme that on the first access to a page checks if any of the next N pages are invalid, where N is the degree of prefetching. If so, read prefetches are sent for those pages. 2.4 Combining History and Sequential ....
Smith, A. J., "Sequential Program Prefetching in Memory Hierarchies," IEEE Computer, Vol. 11, No. 12, December 1978, pp. 7-21.
....future, either by using the previous reference pattern, by prefetching all children of a node, or by regularizing data structures [Luk96] Techniques have also been developed for prefetching instructions. The simplest is to prefetch the cache line directly after the line currently being executed [Smith78, Hsu98], since this is the next line needed unless a jump or branch intervenes. To handle such branches information can be kept on all previous successors to this block [Kim93] or the most recent successor ( target prefetch ) Hsu98] and prefetch these lines. Alternatively, a lookahead PC can use ....
A. J. Smith, "Sequential Program Prefetching in Memory Hierarchies", Computer, pp. 7-21, December 1978.
....paper we used OM [24] which implements a modified Pettis and Hansen algorithm to do feedback directed code layout. This algorithm is discussed further in Section 5.1. Our results show that using OM with CGP improves the performance by 45 over an O5 optimized binary. Next N line prefetching (NL) [21] is another prefetching technique that is often used. In this technique when a line is being fetched by the CPU, the next N sequential lines are prefetched, unless they are already in cache. This scheme works well in programs that execute long sequences of straight line code. CGP uses NL ....
A.J. Smith. Sequential Program Prefetching in Memory Hierarchies. IEEE Computer, 11(2):7--21, 1978.
....80 of the prefetched lines arrived in cache before they were referenced, even on a 4 wide issue machine with a 15 cycle L2 access penalty. Prior hardware based instruction prefetching techniques can be broadly classified into sequential and non sequential techniques. The sequential techniques [7, 14] prefetch one or more physically contiguous (sequential) cache lines from the memory and are easy to implement. Although they achieve good miss coverage, we show that the prefetches are not issued early enough to cover the access latency of the L2 cache. Moreover, these techniques do not attempt ....
.... Furthermore, despite the fact that the target address of a branch instruction 1 is the beginning of a basic block which may span multiple cache lines, most of the prior techniques prefetch only the cache line containing the target address; these techniques rely on next sequential prefetching [7, 14] to prefetch the remaining lines of the basic block. However, BHGP maintains both the address and the length of prefetch candidate blocks, so that entire blocks can be prefetched in a timely fashion. In our evaluations, BHGP on average eliminates 66 of the I cache misses for some important ....
[Article contains additional citation context not shown here]
A. Smith, "Sequential Program Prefetching in Memory Hierarchies, " IEEE Computer, No. 11, Dec. 1978, pp. 7-21.
....main memory into the cache. In effect, such block memory transfers prefetch the words surrounding the current reference in hope of taking advantage of the spatial locality of memory references. Hardware prefetching of separate cache blocks was later implemented in the IBM 370 168 and Amdahl 470V [ 50]. Software techniques are more recent. Smith first alluded to this idea in his survey of cache memories [ 49] but at that time doubted its usefulness. Later, Porterfield [40] proposed the idea of a cache load instruction with several RISC implementations following shortly thereafter. Prefetching ....
Smith, A.J., "Sequential Program Prefetching in Memory Hierarchies," IEEE Computer, Vol. 11, No. 12, December 1978, p. 7-21.
....Prefetching Work on hardware sequential prefetching for serial systems dates back to the 70 s, when the number of instructions that could execute in a miss delay was much smaller than it is today. Sequential prefetching schemes were investigated by Bennett et al. [1,2] Gindele [11] and Smith [24] and are summarized and evaluated by Smith in [25] These schemes vary in how prefetching is initiated, for example, on all accesses, all misses, or on all misses and first hits to prefetched lines. Always prefetching and tagged prefetch reduced half or more misses, while prefetch on miss was less ....
A.J. Smith, "Sequential program prefetching in memory hierarchies," IEEE Computer, vol. 11, no. 12, pp.7-21, Dec. 1978
....previously, we limit our scope to widely shared data. Even with these restrictions, our experiments based on an accurate execution driven simulator with ILP processors indicate that substantial improvements in execution time (up to 37 ) are possible. Our method is germane to both data prefetching [5, 20, 22, 18, 27, 14, 8, 2] and producer initiated communication (also known as data forwarding) 26, 16, 1] Hence, we think that a short comparison is required. Memory controller forwarding di ers from producer initiated communication in that it does not require any specialized compiler intervention and or modi cations ....
A. J. Smith. Sequential program prefetching in memory hierarchies. IEEE Computer, 11(12):7-21, Dec. 1978.
....a miss. By contrast, hardware controlled prefetching schemes utilize the regularity of data accesses in applications, and need no software support to decide what and when to prefetch. Two promising non binding hardware based prefetching strategies in shared memory multiprocessors are sequential [6, 20] and stride prefetching [5, 9, 13, 19] Sequential prefetching tries to exploit spatial locality across block boundaries by prefetching consecutive blocks in anticipation of future misses. By contrast, stride prefetching detects and prefetches blocks associated with strides only but does not ....
A.J. Smith, "Sequential Program Prefetching in Memory Hierarchies," in IEEE Comput., Vol. 11, No. 12, pp.7-21, Dec. 1978.
....main memory into the cache. In effect, such block memory transfers prefetch the words surrounding the current reference in hope of taking advantage of the spatial locality of memory references. Hardware prefetching of separate cache blocks was later implemented in the IBM 370 168 and Amdahl 470V [33]. Software techniques are more recent. Smith first alluded to this idea in his survey of cache memories [34] but at that time doubted its usefulness. Later, Porterfield [29] proposed the idea of a cache load instruction with several RISC implementations following shortly thereafter. Prefetching ....
Smith, A.J., "Sequential Program Prefetching in Memory Hierarchies," IEEE Computer, Vol. 11, No. 12, December 1978, p. 7-21.
....prefetching has unfortunately not gained widespread acceptance in industry; few contemporary computer designs provide hardware prefetching of the sort described in the works discussed below. Prior research on hardware prefetching Some of the first work on hardware prefetching was by Smith [87, 88]. Smith concentrated on a simple sequential prefetching scheme called one block lookahead, in which the hardware fetches block (i 1) on an access to (or cache miss on) block i. He used a cache simulator that analyzed traces generated from a set of programs running on 14 an IBM mainframe; his ....
A. J. Smith. Sequential program prefetching in memory hierarchies. Computer, 11(12):7--21, December 1978.
....Each request can be serviced more rapidly however, and requests from different processors may get interleaved. This latter feature is particularly important under tight synchronization constraints. Although sequential prefetching has been studied extensively in the context of uniprocessors (e.g. [Smith, 1978]) the same is not true for multiprocessors. Dahlgren et al. 1993] compares the performance of sequential prefetching with an adaptive sequential prefetching technique for scalable multiprocessors. They also studied the performance of large cache blocks that fetch the same amount of data as the ....
A. J. Smith, "Sequential Program Prefetching in Memory Hierarchies," IEEE Computer, 11(12):7--21, December 1978.
....On the other hand, cold start misses and misses due to context switches are becoming the major source of cache misses[7] There have been a number of attempts to reduce the cold start and context switch misses. One such attempt was to use prefetch techniques such as sequential prefetching [10] or implicit prefetching using large cache block sizes. However, both of these techniques fall short of delivering the optimal performance because the prefetched words may never be requested by the processor when the prefetch prediction is wrong. In fact, prefetching can actually degrade the ....
....speed up. In the third case, the prefetch failed and the mechanism degenerates into the demanding fetching. In all the three cases considered, all the previous prefetch requests in progress are aborted since the need for the prefetched block is already satisfied. 3. 1 Design Decisions Smith in [10] identified the following three important issues in the design of a prefetch scheme 1 . 1 In fact, Smith identified two more issues (move out and move in overheads) in the design of a prefetch scheme. They are, respectively, the costs for writing back a dirty cache block if a prefetched block ....
A. J. Smith. Sequential program prefetching in memory hierarchies. IEEE Computer, pages 7--21, Dec. 1978.
....of long latency memory operations. Prefetching has been researched heavily, and for a long time. Hardware, software, and hybrid hardware software schemes have all been extensively explored for prefetching instructions and data references, both in the context of uniprocessors and multiprocessors [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37]. 2.2 Instruction prefetching and branch prediction The instruction prefetching and branch prediction problems are closely related. Efficient solutions for both are critical, because they affect the degree to which speculative execution is effective in current machines. In their research, ....
A. J. Smith, "Sequential Program Prefetching in Memory Hierarchies," IEEE Computer, vol. 11, pp. 7--21, December 1978.
....of long latency memory operations. Prefetching has been researched heavily, and for a long time. Hardware, software, and hybrid hardware software schemes have all been extensively explored for prefetching instructions and data references, both in the context of uniprocessors and multiprocessors [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37]. 2.2 Instruction prefetching and branch prediction The instruction prefetching and branch prediction problems are closely related. Efficient solutions for both are critical, because they affect the degree to which speculative execution is effective in current machines. In their research, ....
A. J. Smith, "Sequential Program Prefetching in Memory Hierarchies," IEEE Computer, vol. 11, pp. 7--21, December 1978.
....Reduction Caches and write buffers use locality of reference and the size speed characteristic of small memory structures to provide low latency, high bandwidth access on the frequent cache hit case, and long latency, low bandwidth access on the infrequent cache miss case. Hardware prefetching [5], stream buffers [6] and software directed prefetching [7] effectively trade increased bandwidth for reduced latency. Latency Tolerance Non blocking and lockup free caches [8] lessen the impact of data dependencies by allowing execution to proceed while a cache miss is serviced. Dynamic ....
....16 entries of 64 bits control control synchronizers Figure 2: Bus Interface Unit Block Diagram. Cycle N 4 Cycle N Cycle N 1 Cycle N 2 Cycle N 3 Cmd Addr D[0] Transmit clock H Data 31:00 Arbitration Bus 1:0 Cycle 0 CPU MMU Idle BIU cycle . MMU MMU MMU MMU Idle . D[1] D[2] D[3] D[4] D[5] D[6] D[7] Cmd Addr Arbitration Bus Encodings: 00: Bus is idle. 01: MMU is transmitting to CPU. 10: CPU is transmitting to MMU. 11: Collision state. Either: the CPU and MMU simultaneously initiated a transaction resulting in a true collision, or the MMU or CPU rFIFO filled and the receiving chip ....
A. Smith, "Sequential program prefetching in memory hierarchies," IEEE Computer, vol. 11, no. 12, pp. 7-- 21, December 1978.
....approach eliminates the unpredictability caused by task preemption, it has a disadvantage of limiting the total caching capacity that is available to a task. 2.2. Instruction prefetching Prefetching techniques improve system performance by prefetching memory blocks before they are actually needed [22]. In the literature, both instruction prefetching and data prefetching [3] 4] 15] have been studied although we restrict ourselves to instruction prefetching in this paper. In the past, instruction prefetching has been limited to sequential prefetching. Smith, in [21] studies the following ....
A. J. Smith. Sequential program prefetching in memory hierarchies. IEEE Computer, pages 7--21, Dec. 1978.
....Instruction Prefetching Several researchers have considered instruction prefetching in the past. We will begin by discussing and then quantitatively evaluating four of the most promising techniques that have been proposed to date, all of which are purely hardware based: next N line prefetching [10, 11], target line prefetching [12] wrong path prefetching [8] and Markov prefetching [3] Before we begin our discussion, we briefly introduce some prefetching terminology. The coverage factor is the fraction of original cache misses that are prefetched. A prefetch is unnecessary if the line is ....
.... the elapsed time between initiating and consuming the result of a prefetch) should be large enough to fully hide the miss latency, but not so large that the line is likely to be displaced by other accesses before it can be used (i.e. a useless prefetch) The idea behind next N line prefetching [10, 11] is to prefetch the N sequential lines following the one currently being fetched by the CPU. A larger value of N tends to increase the prefetching distance, but also increases the likelihood of polluting the cache with useless prefetches. The optimal value of N depends on the line size, the cache ....
A. Smith. Sequential program prefetching in memory hierarchies. IEEE Computer, 11(2):7--21, 1978.
....section also describes extensions of the proposed technique. Finally, we give our concluding remarks in Section 4. 2 Related work Caches are small buffer memories used for speeding up memory access. They maintain parts of main memory that are expected to be accessed by the CPU in the near future[17]. The use of caches has been an efficient means of bridging the speed gap between high speed processors and relatively slow main memory. However, designers of hard real time systems are wary of using caches in their systems due to an unavailability of a technique for accurately analyzing their ....
A. J. Smith. Sequential program prefetching in memory hierarchies. IEEE Computer, pages 7--21, Dec. 1978.
....of long latency memory operations. Prefetching has been researched heavily, and for a long time. Hardware, software, and hybrid hardware software schemes have all been extensively explored for prefetching instructions and data references, both in the context of uniprocessors and multiprocessors [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37]. 2.2 Instruction prefetching and branch prediction The instruction prefetching and branch prediction problems are closely related. Efficient solutions for both are critical, because they affect the degree to which speculative execution is effective in current machines. In their research, ....
A. J. Smith, "Sequential Program Prefetching in Memory Hierarchies," IEEE Computer, vol. 11, pp. 7--21, December 1978.
....on Instruction Prefetching There has been a long history of research on instruction prefetching. We will begin by discussing and then quantitatively evaluating four of the most promising techniques that have been proposed to date, all of which are purely hardware based: next N line prefetching [10, 11], target line prefetching [12] wrong path prefetching [8] and Markov prefetching [3] Before we begin our discussion, we briefly introduce some prefetching terminology. The coverage factor is the fraction of original cache misses that are prefetched. A prefetch is unnecessary if the line is ....
....instruction is used. The prefetching distance should be large enough to fully hide the cache miss latency, but not so large that the line is likely to be displaced by other accesses before it can be used (i.e. a useless prefetch) As its name implies, the idea behind next N line prefetching [10, 11] is to prefetch the N sequential lines following the one currently being fetched by the CPU. A larger value of N tends to increase the prefetching distance, but also increases the likelihood of polluting the cache with useless prefetches. The optimal value of N depends on the line size, the cache ....
A. Smith. Sequential program prefetching in memory hierarchies. IEEE Computer, 11(2):7--21, 1978.
No context found.
A. Smith, "Sequential Program Prefetching in Memory Hierarchies ", IEEE Computer, Vol. 11(12), Dec., 1978, pp. 7-21.
....static representation. This can cause capacity misses and exacerbate the compulsory miss problem. It also reduces the robustness of trace caches to varying workloads and environments. Instruction prefetching is a common remedy for capacity and compulsory misses in conventional instruction caches [12][14] 15] When applying the concept of prefetching to trace caches, the dynamic aspect of traces presents a number of obstacles. First, trace caches are not part of a true memory hierarchy, as there is no base level that contains all possible traces. Therefore the term prefetching is not ....
....to service the multiple constructor units. The benefit of this extra hardware can be substantial, and our performance results in section 5 use these performance enhancements. 3. 1 Preconstruction Buffers When prefetching into conventional instruction caches, it is common to use prefetch buffers [12][14] 15] The prefetch buffers and cache are accessed in parallel. If the cache misses, but the line is in the prefetch buffer, then it is copied into the cache. Using prefetch buffers in this way avoids polluting the instruction cache whenever prefetched instructions are not actually used. ....
A. J. Smith, "Sequential Program Prefetching in Memory Hierarchies," IEEE Computer 11 (12), pp. 721, Dec 1978.
....pre loading some buffer area (perhaps the cache) with data in order to reduce future misses. If the anticipation strategy is successful, data will be available in fast storage when needed by the program, thus reducing the miss rate. Prefetching has been studied in many settings in the past. Smith [17] discussed sequential prefetching in the general case of a multi tiered hierarchy of memories. Based on traces derived from IBM 370 architecture machines, he showed that prefetching was most effective with small page sizes (e.g. 32 64 bytes) and suggested the conclusion that the CPU s cache was ....
A. J. Smith, "Sequential program prefetching in memory hierarchies," IEEE Computer, pp. 7-21 (December 1978).
No context found.
A. Smith, "Sequential Program Prefetching in Memory Hierarchies, " IEEE Computer, Vol. 11, pp. 7--21, 1978.
No context found.
A.J. Smith. "Sequential Program Prefetching in Memory Hierarchies ". IEEE Computer (11) 12, Dec. 1978: 7-21.
No context found.
A.J. Smith, "Sequential program prefetching in memory hierarchies," IEEE Computer, vol. 11, no. 12, pp.7-21, Dec. 1978
No context found.
A. J. Smith, " Sequential Program Prefetching in Memory Hierarchies," IEEE Computer 11 (12), pp. 7-21, Dec 1978.
No context found.
Smith, A., Sequential Program Prefetching in Memory Hierarchies, IEEE Computer, 11(12), December 1978.
No context found.
Smith, A. (1978a). Sequential Program Prefetching in Memory Hierarchies.
No context found.
Smith, A.J., "Sequential Program Prefetching in Memory Hierarchies," IEEE Computer, December 1978, pp. 7--21.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC