| A. K. Porterfield, "Software methods for improvement of cache performance on supercomputer applications," Ph.D. dissertation, Department of Computer Science, Rice University, Technical Report Rice COMP TR88-93, May 1989. 156 |
....for j = jj to min(jj B 1; n) do instruction is often handled as a hint for the processor to load a certain data item but the fulfillment of the prefetch is not guaranteed by the CPU. Prefetch instructions can be inserted into the code manually by the programmer or automatically by a compiler [Por89, KL91, CKP91, Mow94] In both cases prefetching involves overhead. The prefetch instructions themselves have to be executed, i.e. pipeline slots will be filled with prefetch instructions instead of other instructions ready to be executed. Furthermore, the memory address of the prefetched data ....
A.K. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Department of Computer Science, Rice University, Houston, Texas, USA, May 1989.
....investigate the timing issues of prefetching by use of a cycle by cycle processor simulation. Sklenar [8] presents a third variation on the same theme of the use of an external table to predict future memory ref erences. A number of techniques also exist to do software profetching. Porterfield [9] proposed a technique for prefetching certain types of array data. Mowry, et al. [10] proposed an early practical software prefetch scheme based on information obtained at compile time. While software prefetching clearly has a cost advantage, it does introduce additional overhead to the ....
A.K. Porterfield, "Software methods for improvement of cache performance on supercomputer applications," Technical Report COMP TR 89-93, Rice University, May 1989.
....prospects for code optimization. Many loop restructuring optimizations influence cache performance. Examples are loop interchange, fusion, distribution, iteration space blocking, and skewing (c.f. 15, 9, 4] which can dramatically improve the performance of loops and therefore pro grams (c.f. [8, 11]) For example, Figure i shows the application of an iteration space blocking optimization, which can improve performance by preserving the locality of reference independently of problem size. For large problem sizes, blocking modifies the loops in such a way that arrays are iterated over in ....
....it is difficult to make conclusions about how to modify the source code to improve performance. Static techniques use program analysis to estimate the number of cache misses generated by a program fragment. Most of the recent work has focused, as we do, on the loop nests of a program. Porterfield [11] estimates the number of cache lines referenced by a loop, but considers only caches with unit line size. Moreover, his method is applicable only to loops with constant data dependency vectors. Ferrante et al. 7] determine an upper bound for the number of distinct cache lines referenced in a ....
[Article contains additional citation context not shown here]
A. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Rice University, Houston, Texas, May 1989.
....and novel cache architectures that can detect program access patterns at run time and can fine tune some cache policies so that the overall cache utilization and locality are maximized. Among the techniques proposed are victim caches [18] column associative caches [1] hardware prefetching [3, 32, 22], cache bypassing using memory address table (MAT) 16, 17] dual split caches [26, 13] skewed associative caches [37, 4] multi port caches [27, 28] and careful page placement techniques [38] 1.2 Software Techniques In the software area, there is a considerable work on compiler directed data ....
A. K. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications, PhD thesis, Department of Computer Science, Rice University, May 1989.
....investigate the timing issues of prefetching by use of a cycle by cycle processor simulation. Sklenar [Skl92] presents a third variation on the same theme of the use of an external table to predict future memory references. A number of techniques also exist to do software prefetching. Porterfield [Por89] proposed a technique for prefetching certain types of array data. Mowry, et al. [MLG92] is generally recognized as having the most practical software prefetch scheme. While software prefetching clearly has a cost advantage, it does introduce additional overhead to the application. Extra cycles ....
A.K. Porterfield. Software methods for improvement of cache performance on supercomputer applications. Technical Report COMP TR 8993, Rice University, May 1989.
....been proposed. Coherent caches [3, 4, 18, 30] allow shared read write data to be cached and significantly reduce the memory latency seen by the processors. Relaxed memory consistency models [1, 5, 8] hide latency by allowing buffering and pipelining of memory references. Prefetching techniques [11, 16, 21, 23] hide the latency by bringing data close to the processor before it is actually needed. Multiple contexts [3, 12, 13, 26, 29] allow a processor to hide latency by switching from one context to another when a high latency operation is encountered. Our primary objective in this paper is to ....
A. K. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Rice University, May 1989.
....of their time stalled for memory accesses. 1. 2 Memory Hierarchy Optimizations Various hardware and software approaches to improve the memory performance have been proposed recently[15] A promising technique to mitigate the impact of long cache miss penalties is softwarecontrolled prefetching[5, 13, 16, 22, 23]. Software controlled prefetching requires support from both hardware and software. The processor must provide a special prefetch instruction. The soft ware uses this instruction to inform the hardware of its intent to use a particular data item; if the data is not currently in the cache, the ....
....into effective scientific engines. 1.3 An Overview This paper proposes a compiler algorithm to insert prefetch instructions in scientific code. In particular, we focus on those numerical algorithms that operate on dense matrices. Various algorithms have previously been proposed for this problem [13, 16, 23]. In this work, we improve upon previous algorithms and evaluate our algorithm in the context of a full optimizing compiler. We also study the interaction of prefetching with other data locality optimizations such as cache blocking. There are a few important concepts useful for developing ....
[Article contains additional citation context not shown here]
A. K. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Department of Computer Science, Rice University, May 1989.
....CPU speed and memory speed widens, systems are being constructed with deeper hierarchies. Achieving high performance on such systems requires tailoring the reference behavior of applications to better match the characteristics of a machine s memory hierarchy. Techniques such as loop blocking [1, 2, 3, 4, 5, 6] and data prefetching [4, 7, 8] have significantly improved memory hierarchy utilization for regular applications. A limitation of these techniques is that they aren t as effective for irregular applications. Improving performance for irregular applications is extremely important since large scale ....
....systems are being constructed with deeper hierarchies. Achieving high performance on such systems requires tailoring the reference behavior of applications to better match the characteristics of a machine s memory hierarchy. Techniques such as loop blocking [1, 2, 3, 4, 5, 6] and data prefetching [4, 7, 8] have significantly improved memory hierarchy utilization for regular applications. A limitation of these techniques is that they aren t as effective for irregular applications. Improving performance for irregular applications is extremely important since large scale scientific and engineering ....
[Article contains additional citation context not shown here]
A. K. Porterfield, Software Methods for Improvement of Cache Performance on Supercomputer Applications, PhD Dissertation, Rice University, Houston, TX (May 1989).
....per iteration, C, is likely to decrease with more aggressive processor architectures. Both hardware trends increase the prefetch distance. Additionally, software that more aggressively uses locality transformations such as tiling sees shorter inner loops with each inner loop initiated more times [1, 25, 32]. These hardware and software trends increase the impact of prologue late prefetches, short steady states, and hard toprefetch references, all of which can be addressed by read miss clustering. On the other hand, we expect the impact of prefetching instruction overhead to be less important as ....
A. K. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Rice University, Apr. 1989.
....the major changes to the original algorithm [19] Two of our modifications to support I O prefetching are related to spatial locality i.e. when strided accesses fall within the same page in which case page faults only occur on iterations that cross page boundaries. First, we use strip mining [24] rather than loop unrolling to isolate these faulting iterations, since replicating a loop body 1000 times or more is clearly infeasible. Notice in Figure 2(b) that loop i has been strip mined twice (into loops i0 and i1) to account for the spatial locality of b[i] and c[i] j] The i loop has ....
A. K. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Department of Computer Science, Rice University, May 1989.
....[8] investigate the timing issues of prefetching by use of a cycle by cycle processor simulation. Sklenar [9] presents a third variation on the same theme of the use of an external table to predict future memory references. A number of techniques also exist to do software prefetching. Porterfield [10] proposed a technique for prefetching certain types of array data. Mowry et al. 11] proposed an early practical software prefetch scheme based on information obtained at compile time. While software prefetching clearly has a cost 1051 8215 00 10.00 2000 IEEE ZUCKER et al. HARDWARE AND SOFTWARE ....
A. K. Porterfield, "Software methods for improvement of cache performance on supercomputer applications," Rice Univ., Houston, TX, Tech. Rep. COMP TR 89-93, May 1989.
....speedups over the original matrix codes. Later Abu Sufah et al. 1] focused on automating page locality improving techniques within a compilation framework, and discussed a transformation technique called vertical distribution, which is very similar to tiling. In his dissertation, Porterfield [46] uses loop transformation techniques such as skewing and tiling. His main objective is to model fully associative caches with a least recently used (LRU) policy. Like Gannon et al. 20] he focuses on estimating the cache miss rates for a given loop nest. These approaches, however, do not propose ....
A. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. Ph.D. Thesis, Rice University, Houston, May 1989.
....dependences result or when the iteration spaces of candidate loops are not identical. In contrast, the shift and peel transformation overcomes loop carried dependences to enable fusion and parallel execution, and also permits loops with differing iteration spaces to be fused. Porterfield [25] suggests a peel and jam transformation in which iterations are peeled from the beginning or end of one loop nest to allow fusion with another loop nest. However, no systematic method is described for fusion of multiple loop nests, nor is the parallelization of the fused loop nest considered. In ....
.... The basic idea is to make backward dependences loop independent in the fused loop by shifting the iteration space containing the sink of the dependences with respect to the iteration space containing the source of the dependence, which is similar to the alignment techniques described in [8, 25]. The amount by which to shift is determined by the dependence distance. Other dependences between the loops may be affected, but do not prevent fusion. After this shifting, the loops can be legally fused. We illustrate this procedure in Figure 6, using the iteration spaces shown earlier in Figure ....
A.K. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Dept. of Computer Science, Rice University, April 1989.
....investigate the timing issues of prefetching by use of a cycle by cycle processor simulation. Sklenar [12] presents a third variation on the same theme of the use of an external table to predict future memory references. A number of techniques also exist to do software prefetching. Porterfield [11] proposed a technique for prefetching certain types of array data. Mowry, et al. [8] is generally recognized as having the most practical software prefetch scheme. While software prefetching clearly has a cost advantage, it does introduce additional overhead to the application. Extra cycles must be ....
A. Porterfield. Software methods for improvement of cache performance on supercomputer applications. Technical Report COMP TR 89-93, Rice University, May 1989.
....locality of reference makes caching less effective than it might be for other parts of the program. Chapter 1: Introduction 7 In addition to traditional caching, other proposed solutions to the memory bandwidth problem range from software prefetching [Cal91,Kla91,Mow92] and iteration space tiling [Car89,Gal87,Gan87,Lam91,Por89,Wol89], to prefetching or non blocking caches [Bae91,Che92,Soh91] unusual memory systems [Bud71,Gao93,Rau91,Val92, Yan92] and address transformations [Har87,Har89] The following chapters discuss the merits and limitations of each of these in the context of streaming, but all these solutions overlook ....
....accessed in the computation s natural order, even when loop unrolling is applied. Note that the effectiveness of naive ordering decreases rapidly as vector stride increases. 2.3.1. 2 Block Prefetching Blocking or tiling changes a computation so that sub blocks of data are repeatedly manipulated [And92,Gal87,Gan87,Lam91,Por89,Wol89]. This technique reduces average access latency by reusing data at faster levels of the memory hierarchy, and may be applied to registers, cache, TLB, and even virtual memory. For example, multiplication of matrices can be blocked to reuse cached data. Figure 2.5 illustrates the data access ....
[Article contains additional citation context not shown here]
A.K. Porterfield, Software Methods for Improvement of Cache Performance on Supercomputer Applications, Ph.D. Thesis, Rice University, May 1989.
....to enhance cache bandwidth. Cache memory can be accessed very rapidly, but has limited storage capacity. Hence, it is necessary to perform program transformations to make efficient utilization of the memory hierarchy. A number of program transformations can be performed to improve data locality [8]. Certain transformations such as loop fusion may result in better temporal locality, and others such as blocking [5] may improve spatial locality. Other methods have been considered to improve cache performance, such as data prefetching [1] partitioning the iteration set into groups that reuse ....
A. K. Porterfield. Software methods for improvement of cache performance on supercomputer applications. PhD thesis, Rice University, May 1989.
No context found.
A. K. Porterfield, "Software methods for improvement of cache performance on supercomputer applications," Ph.D. dissertation, Department of Computer Science, Rice University, Technical Report Rice COMP TR88-93, May 1989. 156
No context found.
A. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Rice University, May 1989.
No context found.
A. Porterfield. Software Methods for Improvement of Cache Performance. PhD thesis, Dept. of Computer Science, Rice University, May 1989.
No context found.
A. K. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Rice University, May 1989.
No context found.
A. Porterfield. Software Methods for Improvement of Cache Performance. PhD thesis, Dept. of Computer Science, Rice University, May 1989.
No context found.
A. K. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Rice University, Apr. 1989.
No context found.
A. K. Porterfield. Software Methods for improvement of cache performance on supercomputer applications. PhD thesis, Department of Computer Science, Rice University, May 1989.
No context found.
Porterfield, A.K., "Software Methods for Improvement of Cache Performance on Supercomputer Applications", Ph.D. Thesis, Rice University, May, 1989.
No context found.
A. K. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Rice University, Houston, TX, May 1989. Available as technical report CRPC-TR89009.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC