96 citations found. Retrieving documents...
T.-F. Chen and J.-L. Baer. Effective Hardware-Based Data Prefetching for High Performance Processors. IEEE Transactions on Computers, 44(5):609--623, May 1995.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Data Locality Optimizations for Multigrid Methods on Structured.. - Weiß   (Correct)

....One of the simplest hardware based prefetching schemes is sequential prefetching [Smi82] Whenever a cache line l is accessed the cache line l 1 and maybe some subsequent cache lines are prefetched. More sophisticated prefetch schemes have been invented by researchers [Jou90, JT93, CB95] but most microprocessors still implement only stride one stream detection or even no prefetching. In general prefetching will only be successful when the data stream is predicted correctly (in hardware or by a compiler) and if there is enough space left in the cache to keep the prefetched data ....

T.-F. Chen and J.-L. Baer. Effective Hardware Based Data Prefetching for High--Performance Processors. IEEE Transactions on Computers, 44(5):609--623, 1995.


Efficient Integration of Compiler-directed Cache Coherence And.. - Lim, Yew (2000)   (1 citation)  (Correct)

....CCDP scheme. Although our Cray T3D implementation provided substantial performance improvements for the system [20, 21] the performance of the scheme can be further improved by optimizing the prefetch hardware support. Various researchers have developed sophisticated hardware prefetching schemes [4, 10, 11, 15, 18]. These schemes make use of hardware features to dynamically predict the data references to prefetch at run time. However, these prefetch hardware designs are not suitable for the CCDP scheme because they cannot distinguish between potentially stale and nonstale references and take proper actions ....

T.-F. Chen and J.-L. Baer. Effective hardware-based data prefetching for high performance processors. IEEE Transactions on Computers, 44(5):609--623, May 1995.


An Automated Method for Software Controlled Cache Prefetching - Zucker, Lee, Flynn (1998)   (7 citations)  (Correct)

....have proposed a similar structure called the reference prediction table. Their scheme additionally includes state bits, so that state information can be maintained concerning the character of each memory operation. This is then used to limit unnecessary prefetching. Further analysis of this scheme [7] investigate the timing issues of prefetching by use of a cycle by cycle processor simulation. Sklenar [8] presents a third variation on the same theme of the use of an external table to predict future memory ref erences. A number of techniques also exist to do software profetching. Porterfield ....

Tien-Pu Chen and Jean-Loup Baer, "Effective hardwarebased data prefetching for high-performance processors," IEEE Transactions on Computers, vol. 44, pp. 318-328, May 1995.


Improving the Performance of Software Distributed Shared.. - Kistler, Alvisi   (Correct)

....be required by the critical path and then perform this processing ahead of the point where its outcome is required. Speculation has been applied in the past to a variety of contexts in computer hardware and software, including branch prediction [21] value prediction [19] and data prefetching [5, 16]. The focus of our work is to study how speculation can be used to improve the performance of a software DSM system. Since the primary source of latency in these systems is the time required to obtain a copy of remote data for access by the local processor, we attempt to identify which remote ....

T.-F. Chen and J.-L. Baer. Effective hardware-based data prefetching for high performance processors. IEEE Transactions on Computers, 5(44):609--623, May 1995.


A Combined Hardware/Software Solution for Stream.. - Struik, van der.. (1998)   (1 citation)  (Correct)

....is introduced to keep track of the process of issuing the prefetch requests. Again, there is one table entry per stream. Whereas the pure hardware solution fills an entry in the prediction table each time a new load instruction is executed, the hw sw stream prefetch technique uses an instruction [6] to fill the prediction table. Such a stream prefetch instruction is inserted before the loop that processes the corresponding stream, like in the following code fragment streamprefetch ( S, process( s[i] Either the programmer inserts a stream prefetch instruction ....

....like e.g. the run ahead . There are a number of ways to identify a stream. How this is done also determines how to synchronize the issuing of prefetch requests. One possibility is to identify a stream by the instruction address of the memory operation (often a load operation) that operates on it [6]. In that case, prefetching is controlled by synchronization on the instruction address. Each time the memory operation s instruction address hits in the prefetch table it is determined whether or not to issue a prefetch. The instruction address of a memory operation is one of the parameters of ....

Tien-Fu Chen, and Jean-Loup Baer, "Effective hardware-based data prefetching for high-performance processors", IEEE Transactions on Computers, Vol.44, No.5, May 1995.


Hardware versus Hybrid Data Prefetching in.. - Pimentel.. (2000)   (2 citations)  (Correct)

....Subsequently, a request is issued to prefetch data from the location which is anticipated to be accessed next, being the current data address plus the stride. So, this method synchronises the issuing of the prefetches using the PC. Chen and Baer proposed several optimisations to this scheme [2]. In order to reduce the number of erroneous prefetch( a[0] streamprefetch( a[0] N, 4, 1) for (i=0; i N 1; i ) for (i=0; i N; i ) f sum = a[i] sum; prefetch( a[i 1] g (a) b) Figure 1. Software prefetching (a) and hybrid hardware software prefetching (b) The parameters of the ....

....buffer which ignores new prefetch requests when it is full. The data cache model includes a 4 way set associative SPT for hardware prefetching and an 8 way fully associative PIT for hybrid prefetching. Both tables apply LRU replacement. The SPT uses state bits, similar to those from Chen and Baer [2], to guarantee that prefetch requests are issued only if a stream exhibits a constant stride. Cache blocks Prefetch Prefetch Block Block 1 block i (normal) Prefetch block 1 (early pf) block i 1 (normal) Prefetch block 2 (early pf) Prefetch block 2 (normal) Prefetch ....

[Article contains additional citation context not shown here]

T.-F. Chen and J.-L. Baer. Effective hardware-based data prefetching for high-performance processors. IEEE Transactions on Computers, 44(5):609--623, May 1995.


A Framework for Data Prefetching using Off-line Training of.. - Kim, Palem, Wong (2002)   (Correct)

.... prefetching of recursive data structure proposed by Luk and Mowry [20] Hardware approach includes Jouppi s stream buffers [12] Fu and Patel s prefetching for superscalar and vector processors [8, 9] and Chen and Baer s lookahead mechanism [6] and known as the Reference Prediction Table (RPT) [7]. Mehrota [21] proposed a hardware data prefetching scheme that attempts to recognize and use recurrent relations that exist in address computation of link list traversals. Extending the idea of correlation prefetchers [5] Joseph and Grunwald [11] implemented a simple Markov model to dynamically ....

....architecture simulation infrastructure [3] to evaluate the performance of our proposed system and of each of the three off line learning algorithms outlined above. We compared the performance of our system against that of using larger caches, and the RPT hardware prefetch scheme of Chen and Baer [7]. For evaluation, we used 130.li of SPEC 95, 181.mcf, 183.equake, 164.gzip, 188.ammp all from the SPEC 2000 suite [2] and bisort, mst, treeadd, tsp, health from wellknown Olden Pointer Benchmark suite. Our baseline setup is an IA64 like EPIC machine [15] with four integer, two floating point and ....

T.-F. Chen and J.-L. Baer. Effective hardware-based data prefetching for high-performance processor computers. IEEE Transactions on Computers, 44-5:609 -- 623, May 1995.


Implementations of Context-Based Value Predictors - Sazeides, Smith (1997)   (32 citations)  (Correct)

....and in a subsequent work to predict all values produced by instructions[2] Significant performance benefits were obtained with last value prediction driving the speculative execution of instructions. Stride predictors traditionally have been used for address prediction and data prefetching [11, 12]. They have also been proposed for speculative execution of load and store instructions [13, 14, 15] Stride prediction was proposed for value prediction in [3] and its accuracy and performance potential were compared against last value prediction. A two level value predictor was proposed in ....

T. F. Chen and J. L. Baer, "Effective hardware-based data prefetching for high performance processors," IEEE Transactions on Computers, vol. 44, pp. 609--623, May 1995.


Pointer Cache Assisted - Collins, Sair, Calder, Tullsen (2002)   (Correct)

....the memory subsystem. Data prefetching is one technique that reduces the observed latency of memory accesses by bringing data into the cache or dedicated prefetch buffers before it is accessed by the CPU. One can classify data prefetchers into three general categories. Hardware data prefetchers [3, 9, 10, 23] observe the data stream and use past access patterns and or miss patterns to predict future misses. Software prefetchers [15] insert prefetch directives into the code with enough lead time to allow the cache to acquire the data before the actual access is executed. Recently, the expected ....

T.F. Chen and J.L. Baer. Effective hardware-based data prefetching for high performance processors. IEEE Transactions on Computers, 5(44):609--623, May 1995.


Optimizations Enabled by a Decoupled Front-End Architecture - Reinman, Calder, Austin (2001)   (4 citations)  (Correct)

....design, when taking into consideration the interconnect scaling bottleneck. There are a number of other optimizations which could be enabled by the FTQ design. In addition to instruction cache prefetching, the FTQ could be used to direct data cache prefetch in a manner similar to that proposed in [8]. The FTQ could also be used to index into other PC based predictors (such as value or address predictors) further ahead in the pipeline. This would allow these predictors to be large without compromising processor cycle times. Additionally, the use of a multi level branch predictor, like the ....

T-F. Chen and J-L. Baer. Effective hardware-based data prefetching for high performance processors. IEEE Transactions on Computers, 5(44):609--623, May 1995.


Techniques Utilizing Memory Reference Characteristics for Improved .. - Wong   Self-citation (Baer)   (Correct)

....in buffers distinct from the cache and dedicated to particular streams [Jouppi 90] These stream buffers were continuously filled as their data was consumed, thus helping to maintain the initial prefetching lead time. These unit stride stream buffers, and the related reference prediction tables [Chen 95] were extended to recognize non unit strides [Palacharla 94, Dahlgren 95] Some recent prefetching strategies have targeted specific data structures such as applications using LDS. These strategies can be separated into two classes. The first class prefetches each link sequentially by using ....

....of correlation and other prefetchers [Joseph 97, Sherwood 00] While it is important to use the appropriate prefetcher for the targeted stream, it is equally important to achieve timeliness without sacrificing accuracy. Strategies to throttle the issuing of prefetches include control speculation [Chen 95, Pinter 96] and dynamically increasing or decreasing the amount being prefetched [Dahlgren 95, Farkas 97] 4.3 Methodology This section presents aspects of the system model and benchmarks not already presented in Chapter 2. Before demonstrating the ability of existing strategies to effectively ....

T-F. Chen and J-L. Baer. Effective Hardware-Based Data Prefetching for High Performance Processors. IEEE Transactions on Computers, 44(5):609-- 623, May 1995.


The Impact of Timeliness for Hardware-based Prefetching from.. - Wong, Baer (2002)   Self-citation (Baer)   (Correct)

....were stored in buffers distinct from the cache and dedicated to particular streams. These stream buffers were continuously filled as their data was consumed, thus helping to maintain the initial prefetching lead time. These unit stride stream buffers, and the related reference prediction tables [5], were extended to recognize non unit strides [12, 7] Recent prefetching strategies have targeted applications using LDS. These strategies can be separated into two classes. The first class prefetches each link sequentially by using either a cache assist that detects the execution of recurrent ....

....as the combination of correlation and other prefetchers [8, 19] While it is important to use the appropriate prefetcher for the targeted stream, it is equally important to achieve timeliness while retaining accuracy. Strategies to throttle the issuing of prefetches include control speculation [5, 13] and dynamically increasing or decreasing the amount being prefetched [6, 7] memory hierarchy micro architecture L1 inst perfect inst queue size 32 L1 data 16 KB, 4 way, 32 B lines, 1 cycle latency fetch size 8 L2 unified; 256 KB, 8 way, 64 B lines; write back, 10 cycle latency branch ....

T-F. Chen and J-L. Baer. Effective Hardware-Based Data Prefetching for High Performance Processors. IEEE Transactions on Computers, 44(5):609--623, May 1995.


Managing Wire Delay in Large Chip-Multiprocessor Caches - Beckmann, Wood (2004)   (Correct)

No context found.

T.-F. Chen and J.-L. Baer. Effective Hardware-Based Data Prefetching for High Performance Processors. IEEE Transactions on Computers, 44(5):609--623, May 1995.


Addressing Mode Driven Low-Power Data Caches for Embedded .. - Peri, Fernando, Kolagotla (2004)   (Correct)

No context found.

Chen, T. F., Bear, J. L., `Effective Hardware-Based Data Prefetching for High-Performance Processors', IEEE Transactions on Computers, May 1995.


Memory Access Pattern Analysis and Stream Cache Design - For Multimedia Applications   (Correct)

No context found.

T.F, Chen and J. L. Baer, "Effective hardware-based data prefetching for high-performance processors," IEEE Trans. on Computers. VOL 44, No. 5, May 1995.


Latency Tolerant Architectures - Bennett (1998)   (2 citations)  (Correct)

No context found.

T-F. Chen and J-L. Baer. Effective hardware-based data prefetching for highperformance processors. IEEE Transactions on Computers, 44:609--23, May 1995.


Memory System Support for Image Processing - Lixin Zhang John (1999)   (4 citations)  (Correct)

No context found.

T.-F. Chen and J.-L. Baer. Effective hardware-based data prefetching for high performance multiprocessors. IEEE Transactions on Computers, 44(5):609--623, May 1995.


Memory System Support for Image Processing - Lixin Zhang John (1999)   (4 citations)  (Correct)

No context found.

T.-F. Chen and J.-L. Baer. Effective hardware-based data prefetching for high performance multiprocessors. IEEE Transactions on Computers, 44(5):609--623, May 1995.


A Quantitative Framework for Automated . . . - Roth, al. (2002)   (Correct)

No context found.

T. Chen and J. Baer. "Effective Hardware Based Data Prefetching for High Performance Processors." IEEE Transactions on Computers, 44:609--623, May. 1995.


Address Prediction and Recovery Mechanisms - Llena (2002)   (Correct)

No context found.

T.F. Chen, J.L. Baer. (1995). Effective Hardware-Based Data Prefetching for High-Performance Processors. In IEEE Transactions on Computers 44 (5), pp. 609-623.


Automated Design of Finite State Machine Predictors - Sherwood, Calder (2001)   (Correct)

No context found.

T-F. Chen and J-L. Baer. Effective hardware-based data prefetching for high performance processors. IEEE Transactions on Computers, 5(44):609--623, May 1995.


Hardware Optimizations Enabled by a Decoupled Fetch Architecture - Reinman (2001)   (Correct)

No context found.

T-F. Chen and J-L. Baer. Effective hardware-based data prefetching for high performance processors. IEEE Transactions on Computers, 5(44):609--623, May 1995.


Compiler Support for Dynamic Speculative Pre-Execution - Ro, Gaudiot   (Correct)

No context found.

T.-F. Chen and J.-L. Baer. Effective Hardware-Based Data Prefetching for High-Performance Processors. IEEE Transactions on Computers, 44(5):609--623, May 1995.


The Performance Potential of Data Dependence.. - Sazeides, Vassiliadis, .. (1996)   (43 citations)  (Correct)

No context found.

T. F. Chen and J. L. Baer. Effective hardware-based data prefetching for high performance processors. IEEE Transactions on Computers, 44(5):609--623, May 1995.


Exploiting the Prefetching Effect Provided by Executing.. - Lilja, Kunkel (2002)   (1 citation)  (Correct)

No context found.

T.F. Chen and J.L Baer, "Effective Hardware-Based Data Prefetching for High Performance Processors," IEEE Transactions on Computers, Vol. 44, No.5, May 1995, pp. 609-623.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC