9 citations found. Retrieving documents...
W. Abu-Sufah, D.J. Kuck, and D.H. Lawrie. Automatic program transformations for virtual memory computers. In Proceedings of the 1979 National Computer Conference, pages 969--974, June 1979.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
An Analysis of a Combined Hardware-software Mechanism for .. - Stefanos Damianakis Kai (1994)   (2 citations)  (Correct)

....of these approaches is that compilers cannot issue these non blocking load instructions speculatively above conditional branches because they may cause unnecessary faults. Blocking is a software technique that reduces the working set size of a program to use the memory hierarchy effectively [1, 7, 12, 24]. Blocking can be applied to any level of memory hierarchy, including virtual memory, caches, and registers. In particular, blocking can be very effective for scientific programs, but automatic blocking transformations are quite limited and manual transformation is difficult. Much work has been ....

....loads can be hidden. 14 10 100 1000 10000 100000 Problem Size 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4. 0 Average CPR Baseline Bypassing Unified Spec cache (8K 4K) #define MAX 100000 double PX[15] MAX 1] double DM28, DM27, DM26, DM25, DM24, DM23, DM22, C0; for (i=1; i =N; i) PX[1][i] DM28 PX[13] i] DM27 PX[12] i] DM26 PX[11] i] DM25 PX[10] i] DM24 PX[ 9] i] DM23 PX[ 8] i] DM22 PX[ 7] i] C0 (PX[ 5] i] PX[6] i] PX[3] i] PX[13] i] PX[12] i] PX[11] i] PX[10] i] PX[ 9] i] PX[ 8] i] PX[ 7] i] PX[ 5] i] PX[6] i] PX[3] i] Figure 9: Kernel K9 ....

W. Abu-Sufah, D.J. Kuck, and D.H. Lawrie. Automatic program transformations for virtual memory computers. In Proceedings of the 1979 National Computer Conference, pages 969--974, June 1979.


Tiling for Parallel Execution - Optimizing Node Cache.. - Kaplow, Szymanski (1996)   (1 citation)  (Correct)

....of the tiled loop nest and the cache design of the target processor. 1.1 Review of Tiling for Improving Memory Reference Locality Loop tiling is an optimization technique that has come full circle in its application. Originally explored as a technique to improve the virtual memory performance [1] of uniprocessors, the technique has also been applied to explore fine grained parallelism exposed by loop skewing and wavefront transformations [11,12] for parallel machines. Designers of modern multi processor machines have focused on architectures with the high speed interconnection of moderate ....

....Tree The first phase creates a standard parse tree and symbol table that are used to determine the initial reference identifiers for the event list, as well as the loop variable limits and base and dimension information for each array. Figure 2 shows an example of the input program. 1 A. range[1] = A.range[2] B.range[1] B.range[2] 2048 2 A.base = 10 3 B.base = 200000 4 for k = 1, 1024 5 for i = 2, 256; j = 2, 256 6 A[k,i] B[k 1,i] B[i 1,j 1] B[i 1,j] B[i,j 1] 6 end 7 for r = 1, 256; s = 1, 256 8 B[r,s] A[r,s] 9 end 10 end Fig. 2: Sample Simulation Source File The ....

[Article contains additional citation context not shown here]

W. Abu-Safah, D. J. Kuck, and D. H. Lawrie. Automatic program transformations for virtual memory computers. In Proceedings of the 1979 National Computer Conference, pages 969--974, June 1979.


Software Support for Speculative Loads - Rogers (1992)   (49 citations)  (Correct)

....of these approaches is that compilers cannot issue these non blocking load instructions speculatively above conditional branches because they may cause unnecessary faults. Blocking is a software technique that reduces the working set size of a program to use memory hierarchy effectively [1, 6, 10, 20]. Blocking can be applied to any levels of memory hierarchy, including virtual memory, caches, and registers. In particular, blocking can be very effective for scientific programs, but automatic blocking transformation is quite limited and manual transformation is difficult. Much work has been ....

....have an average latency of 15.12 cycles and more than 78 are completed within 15 cycles. For N = 100, the average cost per reference is 1.15. The impact of the write buffer on the #define MAX 100000 double PX[15] MAX 1] double DM28, DM27, DM26, DM25, DM24, DM23, DM22, C0; for (i=1; i =n; i) PX[1][i] DM28 PX[13] i] DM27 PX[12] i] DM26 PX[11] i] DM25 PX[10] i] DM24 PX[ 9] i] DM23 PX[ 8] i] DM22 PX[ 7] i] C0 (PX[ 5] i] PX[6] i] PX[3] i] Figure 8: Kernel K9 latency of a speculative load is substantial. Replacing the stores with one cycle loads reduces the latency of a ....

W. Abu-Sufah, D.J. Kuck, and D.H. Lawrie. Automatic program transformations for virtual memorycomputers. In Proceedings of the 1979 National Computer Conference, pages 969--974, June 1979.


Thread Scheduling Cache Locality - Philbin, Edler, Anshus, Douglas, Li (1996)   (23 citations)  (Correct)

....with data sets larger than the cache size that do not exhibit a high degree of memory reference locality. One attractive way to ameliorate the processor memory performance gap is to improve the data locality of applications. Tiling (also called blocking) a well known software technique [1, 12, 18, 29], achieves this goal by restructuring a program to re use certain blocks of data that fit in the cache. Tiling can reduce cache misses and can be applied to any level of the memory hierarchy, including virtual memory, caches, and registers. It can be done either automatically, by a compiler, or ....

....N body SOR PDE (2049) Figure 4: Execution times versus block sizes. tiling, multi threaded architectures, and implementations of lightweight thread packages. Early research efforts have proposed ways of rearranging data structures and altering algorithms to reduce page faulting in virtual memory [16, 1]. Tiling has become a wellknown software technique for using the memory hierarchy effectively [18, 29] Tiling can be applied to any levels of memory hierarchy, including virtual memory, caches, and registers. In particular, tiling can be very effective for scientific programs, but automatic ....

W. Abu-Sufah, D.J. Kuck, and D.H. Lawrie. Automatic Program Transformations for Virtual Memory Computers. In Proceedings of the 1979 National Computer Conference, pages 969--974, June 1979.


Informing Memory Operations: Providing Memory Performance.. - Horowitz (1996)   (26 citations)  (Correct)

....a number of promising software techniques have been proposed to avoid or tolerate memory latency. These software techniques have resorted to a variety of different approaches for gathering information and reasoning about memory performance. Compiler based techniques, such as cache blocking [AKL79,GJMS87,WL91] and prefetching [MLG92, Por89] use static program analysis to predict which references are likely to suffer misses. Memory performance tools have relied on sampling or simulation based approaches to gather memory statistics [CMM 88,DBKF90,GH93,LW94,MGA95] Operating systems have used ....

W. Abu-Sufah, D. J. Kuck, and D. H. Lawrie. Automatic Program Transformations for Virtual Memory Computers. Proc. 1979 National Computer Conf. pp 969-974, June 1979.


COP - Cache Optimization Tools for Scientific Computing - Szymanski (1997)   (Correct)

....the ranges of the added loops, which are dependent on the body of the tiled loop nest and the cache design of the target processor. Loop tiling is an optimization technique that has come full circle in its application. Originally explored as a technique to improve the virtual memory performance [1] of uniprocessors, the technique has also been applied to explore fine grained parallelism exposed by loop skewing and wavefront transformations [19, 21] for parallel machines. Compilation methods for improving cache performance through modification of a program s memory reference pattern have ....

W. Abu-Safah, D. J. Kuck, and D. H. Lawrie. Automatic program transformations for virtual memory computers. In Proceedings of the 1979 National Computer Conference, pages 969--974, June 1979.


Rule-Based Program Restructuring For High Performance Parallel.. - Tenny (1992)   (Correct)

....also interchanges the loop bounds on the innermost loop. This can change a loop s stride through memory, often an important consideration for optimizing memory access. Together with strip mining, loop permutation can be used to optimize cache usage by increasing the locality of data reference [36, 1]. Interchanging loops may increase the size of the vector, hence reduce the startup overhead. For example, do i=1,100 do j=1,5 a(i,j) a(i,j) b(i) c(i,j) enddo enddo can be vectorized, but the resulting code executes 100 small vectors. Interchanging the loops and vectorizing yields code the ....

....on memory caches [36] By blocking a loop into sections small enough to fit into the cache, the locality of reference of a loop can be strengthened, thus increasing the number of cache hits and improving performance. This strategy has also been used to improve performance of virtual memory systems [1]. Rule 5.1 (rule strip mine (vectorize stmt st ) statement stmt st depth d ) loop member id id stmt st ) loop id id depth d ) vector size best size ) strip mine id id ) make strip mine id id size size ) 5.2.4 M Consistency The representation of program properties as ....

Abu-Sufah, W., Kuck, D., and Lawrie, D. Automatic Program Transformations for Virtual Memory Computers. In Proc. National Computer Conference (June 1979), pp. 969--974.


Informing Loads: Enabling Software To Observe And.. - Horowitz.. (1995)   (2 citations)  (Correct)

....4.0 Improving Memory Performance Automatically To improve an application s performance using informing loads requires us to merge the methods described in the previous section with software techniques for improving memory performance. These techniques include compiler optimization like blocking [ASKL79, GJMS87, MC69, WL91, GL89] and software controlled prefetching [Mow94, Por89, CMCH91] and the operating system optimizations like page coloring [KH92, BLRC94] and page migration [CDV 94, LE91, CF89, BFS89] Without informing loads, the success of these automatic techniques depend heavily on ....

W. Abu-Sufah, D. J. Kuck, and D. H. Lawrie. Automatic program transformations for virtual memory computers. Proc. of the 1979 National Computer Conference, pages 969--974, June 1979.


Dynamic Access Ordering for Symmetric Shared-Memory Multiprocessors - McKee (1994)   (Correct)

No context found.

Abu-Sufah, W., Kuck, D.J., and Lawrie, D.H., "Automatic Program Transformations for Virtual Memory Computers", Proc. 1979 National Computer Conference, June, 1979.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC