DMCA
Run-time spatial locality detection and optimization (1997)
Cached
Download Links
- [www.crhc.uiuc.edu]
- [impact.crhc.illinois.edu]
- [impact.crhc.illinois.edu]
- [www.crhc.uiuc.edu]
- [www.crhc.uiuc.edu]
- [www.crhc.uiuc.edu]
- DBLP
Other Repositories/Bibliography
Citations: | 51 - 1 self |
Citations
926 | Improving direct-mapped cache performance by the addition of a small fully-associative cache prefetch buffers
- Jouppi
- 1998
(Show Context)
Citation Context ...etch size to vary based on the detected spatial locality. Another method allows the number of blocks fetched on a miss to vary across program execution, but not across different data [6]. Hardware [7]=-=[8]-=-[9][10][11] and software [12][13][14] prefetching methods for uniprocessor machines have been proposed. However, many of these methods focus on prefetching regular array accesses within well-structure... |
681 | Cache Memories
- Smith
(Show Context)
Citation Context ...e fetch size to vary based on the detected spatial locality. Another method allows the number of blocks fetched on a miss to vary across program execution, but not across different data [6]. Hardware =-=[7]-=-[8][9][10][11] and software [12][13][14] prefetching methods for uniprocessor machines have been proposed. However, many of these methods focus on prefetching regular array accesses within well-struct... |
495 | Design and evaluation of a compiler algorithm for prefetching
- Mowry, Lam, et al.
- 1992
(Show Context)
Citation Context ...etected spatial locality. Another method allows the number of blocks fetched on a miss to vary across program execution, but not across different data [6]. Hardware [7][8][9][10][11] and software [12]=-=[13]-=-[14] prefetching methods for uniprocessor machines have been proposed. However, many of these methods focus on prefetching regular array accesses within well-structured loops, which are access pattern... |
253 | An effective on-chip preloading scheme to reduce data access penalty
- Baer, Chen
- 1991
(Show Context)
Citation Context ...h size to vary based on the detected spatial locality. Another method allows the number of blocks fetched on a miss to vary across program execution, but not across different data [6]. Hardware [7][8]=-=[9]-=-[10][11] and software [12][13][14] prefetching methods for uniprocessor machines have been proposed. However, many of these methods focus on prefetching regular array accesses within well-structured l... |
216 | IMPACT: an architectural framework for multiple-instruction-issue processors
- Chang, Mahlke, et al.
- 1991
(Show Context)
Citation Context ...erence inputs, and 099.go, 147.vortex, 130.li, 134.perl, and 124.m88ksim from the SPEC95 benchmark suite using the training inputs. The last two benchmarks consist of modules from the IMPACT compiler =-=[24]-=- that we felt were representative of many real-world integer applications.sPcode, the front end of IMPACT, is run performing dependence analysis with the internal representation of the combine.c file ... |
202 | Compiled-based prefetching for recursive data structures
- Luk, Mowry
- 1996
(Show Context)
Citation Context ... many of these methods focus on prefetching regular array accesses within well-structured loops, which are access patterns primarily found in numeric codes. Other methods geared towards integer codes =-=[15]-=-[16] focus on compiler-inserted prefetching of pointer targets, and could be used in conjunction with our techniques. The dual data cache [17] attempts to intelligently exploit both spatial and tempor... |
152 |
Stride Directed Prefetching in Scalar Processors
- Fu, Patel, et al.
- 1992
(Show Context)
Citation Context ...to vary based on the detected spatial locality. Another method allows the number of blocks fetched on a miss to vary across program execution, but not across different data [6]. Hardware [7][8][9][10]=-=[11]-=- and software [12][13][14] prefetching methods for uniprocessor machines have been proposed. However, many of these methods focus on prefetching regular array accesses within well-structured loops, wh... |
152 | A Data Cache with Multiple Caching Strategies Tunned to Different Types of Locality
- Gonzalez, Aliagas, et al.
- 1997
(Show Context)
Citation Context ...numeric codes. Other methods geared towards integer codes [15][16] focus on compiler-inserted prefetching of pointer targets, and could be used in conjunction with our techniques. The dual data cache =-=[17]-=- attempts to intelligently exploit both spatial and temporal locality, however the temporal and spatial data must be placed in separate structures, and therefore the relative amounts of each type of d... |
135 |
Software Methods for Improvement of Cache Performance
- Porterfield
- 1989
(Show Context)
Citation Context ...he detected spatial locality. Another method allows the number of blocks fetched on a miss to vary across program execution, but not across different data [6]. Hardware [7][8][9][10][11] and software =-=[12]-=-[13][14] prefetching methods for uniprocessor machines have been proposed. However, many of these methods focus on prefetching regular array accesses within well-structured loops, which are access pat... |
124 | A Modified Approach to Data Cache Management
- Tyson, Farrens, et al.
- 1995
(Show Context)
Citation Context ...zed in this work. We showed that cache bypassing decisions could be effectively made at run-time, based on the previous usage of the memory address being accessed. Other bypassing schemes include [20]=-=[21]-=-[17][22]. In particular, our scheme dynamically kept track of the accessing frequencies of memory regions called macroblocks. The macroblocks are staticallydefined blocks of memory with uniform size, ... |
113 | Run-time adaptive cache hierarchy management via reference analysis
- Johnson, Hwu
- 1997
(Show Context)
Citation Context ...cesses to multiple adjacent cache blocks, facilitating detection of spatial locality across those blocks while they are cached. The resulting information is later recorded in the Memory Address Table =-=[3]-=- for long-term tracking of larger regions called macroblocks. We show that these extensions to the cache microarchitecture significantly improve the performance of integer applications, achieving up t... |
107 |
Data Prefetching in Multiprocessor Vector Cache Memories
- Fu, Patel
- 1991
(Show Context)
Citation Context ... lines, equal to the smaller fetch size, and optionally fill in multiple, consecutive blocks when the larger fetch size is chosen. This approach is similar to that used in some prefetching strategies =-=[23]-=-. As a result, the cache can be fully utilized, even when the smaller sizes are fetched. It also eliminates conflict misses resulting from accesses to different subblocks. However, this approach makes... |
90 | Efficient simulation of caches under optimal replacement with applications to miss characterization
- Sugumar, Abraham
- 1993
(Show Context)
Citation Context ...tilized in this work. We showed that cache bypassing decisions could be effectively made at run-time, based on the previous usage of the memory address being accessed. Other bypassing schemes include =-=[20]-=-[21][17][22]. In particular, our scheme dynamically kept track of the accessing frequencies of memory regions called macroblocks. The macroblocks are staticallydefined blocks of memory with uniform si... |
84 | Reducing conflicts in direct-mapped caches with temporality-based design
- Rivers, Davidson
- 1996
(Show Context)
Citation Context ...his work. We showed that cache bypassing decisions could be effectively made at run-time, based on the previous usage of the memory address being accessed. Other bypassing schemes include [20][21][17]=-=[22]-=-. In particular, our scheme dynamically kept track of the accessing frequencies of memory regions called macroblocks. The macroblocks are staticallydefined blocks of memory with uniform size, larger t... |
70 | A quantitative analysis of loop nest locality
- McKinley, Temam
- 1996
(Show Context)
Citation Context ...-byte block boundaries). We chose this block size because past studies have found that 16 or 32-byte block sizes maximize data cache performance [4]. These measurement techniques differ from those in =-=[19]-=-, which explicitely measure the reuse distance (in time). Our goal is to measure both the reused and unused portions of the cache blocks, for different cache organizations. Figure 1(a) shows the spati... |
69 | Data access microarchitectures for superscalar processors with compilerassisted data prefetching
- Chen, Mahlke, et al.
- 1991
(Show Context)
Citation Context ...ted spatial locality. Another method allows the number of blocks fetched on a miss to vary across program execution, but not across different data [6]. Hardware [7][8][9][10][11] and software [12][13]=-=[14]-=- prefetching methods for uniprocessor machines have been proposed. However, many of these methods focus on prefetching regular array accesses within well-structured loops, which are access patterns pr... |
60 |
Block) Size Choice for CPU Cache Memories
- Smith
- 1987
(Show Context)
Citation Context ... Section 6 performs a cost analysis of the added hardware; and Section 7 concludes with future directions. 2 Related Work Several studies have examined the performance effects of cache block sizes [4]=-=[5]-=-. One of the studies allowed multiple consecutive blocks to be fetched with one request [4], and found that for data caches the optimal statically-determined fetch size was generally twice the block s... |
59 | SPAID: Software prefetching in pointer- and call-intensive environments
- Lipasti, Schmidt, et al.
- 1995
(Show Context)
Citation Context ...y of these methods focus on prefetching regular array accesses within well-structured loops, which are access patterns primarily found in numeric codes. Other methods geared towards integer codes [15]=-=[16]-=- focus on compiler-inserted prefetching of pointer targets, and could be used in conjunction with our techniques. The dual data cache [17] attempts to intelligently exploit both spatial and temporal l... |
51 | The Performance Impact of Block Sizes and Fetch Strategies - Przybylski - 1990 |
41 |
Fixed and adaptive sequential prefetching in shared memory multiprocessors
- Dahlgren, Dubois, et al.
- 1993
(Show Context)
Citation Context ...r, we allow the fetch size to vary based on the detected spatial locality. Another method allows the number of blocks fetched on a miss to vary across program execution, but not across different data =-=[6]-=-. Hardware [7][8][9][10][11] and software [12][13][14] prefetching methods for uniprocessor machines have been proposed. However, many of these methods focus on prefetching regular array accesses with... |
29 |
The split temporal/spatial cache: initial performance analysis
- Milutinovic, Markovic, et al.
- 1996
(Show Context)
Citation Context ... the spatial locality detection method was tuned to numeric codes with constant stride vectors. In integer codes, the spatial locality patterns may not be as regular. The split temporal/spatial cache =-=[18]-=- is similar in structure to the dual data cache, however, the runtime locality detection mechanism is quite different than that of both the dual data cache and this paper. 3 Spatial Locality Caches se... |
26 |
A Dollas. Predicting and precluding problems with memory latency
- Boland
- 1994
(Show Context)
Citation Context ...spatial locality is absent, and the prefetching e ect of large fetch sizes when spatial locality exists. 1 Introduction This paper introduces an approach to solving the growing memory latency problem =-=[1]-=- by intelligently exploiting spatial locality. Spatial locality refers to the tendency for neighboring memory locations to be referenced close together in time. Traditionally there have been two main ... |
13 |
Predicting and Precluding Problems with Memory Latency
- Bolland, Dollas
- 1994
(Show Context)
Citation Context ...patial locality is absent, and the prefetching effect of large fetch sizes when spatial locality exists. 1 Introduction This paper introduces an approach to solving the growing memory latency problem =-=[2]-=- by intelligently exploiting spatialslocality. Spatial locality refers to the tendency for neighboring memory locations to be referenced close together in time. Traditionally there have been two main ... |
12 |
How to simulate 100 billion references cheaply
- Fu, Patel
- 1991
(Show Context)
Citation Context ...le 2. The base machine configuration is described in Table 3. Since simulating the entire applications at this level of detail would be impractical, uniform sampling is used to reduce simulation time =-=[25]-=-, however emulation is still performed Technical Report IMPACT-97-02 7 Function Latency Function Latency Int ALU 1 FP ALU 2 memory load 2 FP multiply 2 memory store 1 FP divide (single prec.) 8 branch... |
9 | Classifying the performance potential of a dataprefetch mechanism for pointer-intensive and numeric programs
- Mehrotra, Harrison
- 1995
(Show Context)
Citation Context ...ize to vary based on the detected spatial locality. Another method allows the number of blocks fetched on a miss to vary across program execution, but not across different data [6]. Hardware [7][8][9]=-=[10]-=-[11] and software [12][13][14] prefetching methods for uniprocessor machines have been proposed. However, many of these methods focus on prefetching regular array accesses within well-structured loops... |
9 |
A.K.: Software methods for improvement of cache performance on supercomputer applications
- eld
- 1989
(Show Context)
Citation Context ... the detected spatial locality. Another method allows the number of blocks fetched on a miss to vary across program execution, but not across di erent data [5]. Hardware [6][7][8][9][10] and software =-=[11]-=-[12][13] prefetching methods for uniprocessor machines have been proposed. However, many of these methods focus on prefetching regular array accesses within well-structured loops, which are access pat... |
5 |
Reducing Con icts in Direct-Mapped Caches with a Temporality-Based Design
- Rivers, Davidson
- 1996
(Show Context)
Citation Context ...this work. We showed that cache bypassing decisions could be e ectively made at run-time, based on the previous usage of the memory address being accessed. Other bypassing schemes include [19][20][16]=-=[21]-=-. In particular, our scheme dynamically kept track of the accessing frequencies of memory regions called macroblocks. The macroblocks are staticallyde ned blocks of memory with uniform size, larger th... |
5 |
A Modi ed Approach to Data Cache Management
- Tyson, Farrens, et al.
- 1995
(Show Context)
Citation Context ...ized in this work. We showed that cache bypassing decisions could be e ectively made at run-time, based on the previous usage of the memory address being accessed. Other bypassing schemes include [20]=-=[21]-=-[17][22]. In particular, our scheme dynamically kept track of the accessing frequencies of memory regions called macroblocks. The macroblocks are staticallyde ned blocks of memory with uniform size, l... |