31 citations found. Retrieving documents...
W. Abu-Sufah, D. J. Kuck, and D. H. Lawrie. On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations. IEEE Trans. on Computers, C-30(5):341--356, May 1981.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Comparing and Combining Read Miss Clustering and Software.. - Pai, Adve (2001)   (1 citation)  (Correct)

....per iteration, C, is likely to decrease with more aggressive processor architectures. Both hardware trends increase the prefetch distance. Additionally, software that more aggressively uses locality transformations such as tiling sees shorter inner loops with each inner loop initiated more times [1, 25, 32]. These hardware and software trends increase the impact of prologue late prefetches, short steady states, and hard toprefetch references, all of which can be addressed by read miss clustering. On the other hand, we expect the impact of prefetching instruction overhead to be less important as ....

W. Abu-Sufah, D. J. Kuck, and D. H. Lawrie. On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations. IEEE Trans. on Computers, C-30(5):341--356, May 1981.


Exploiting Instruction-Level Parallelism for Memory System.. - Pai (2000)   (Correct)

....per iteration, C, is likely to decrease with more aggressive processor architectures. Both hardware trends increase the prefetch distance. Additionally, software that more aggressively uses locality transformations such as tiling sees shorter inner loops with each inner loop initiated more times [AKL81, Por89, WL91] These hardware and software trends increase the impact of prologue late prefetches, short steady states, and hard to prefetch references, all of which can be addressed by read miss clustering. 5.2.2 Addressing a Limitation of Clustered Prefetching Unroll and jam produces a ....

Walid Abu-Sufah, David J. Kuck, and Duncan H. Lawrie. On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations. IEEE Transactions on Computers, C-30(5):341--356, May 1981.


A Matrix-Based Approach to Global Locality Optimization - Kandemir, Choudhary.. (1999)   (16 citations)  (Correct)

....is based on iteration space transformations. McKellar and Coffman [42] performed one of the first studies on program transformations for locality. They showed that by using sub matrix operations it is possible to obtain impressive speedups over the original matrix codes. Later Abu Sufah et al. [1] focused on automating page locality improving techniques within a compilation framework, and discussed a transformation technique called vertical distribution, which is very similar to tiling. In his dissertation, Porterfield [46] uses loop transformation techniques such as skewing and tiling. ....

A. Abu-Sufah, D. Kuck, and D. Lawrie. On the performance enhancement of paging systems through program analysis and transformations. IEEE Trans. on Computers, C-30(5):341--356, 1981.


Optimizing And Parallelizing Loops In Object-Oriented Database.. - Lieuwen (1992)   (1 citation)  (Correct)

....flavor. Thus the general idea is similar although the analysis used is different. Loop fission has been used to optimize FORTRAN programs. Loop fission breaks a single loop into several smaller loops to improve the locality of data reference. This can improve paging performance dramatically [ABU81]. Our transformations serve a similar function breaking a large loop into several small ones to enable database style optimization. 2.2. RELATED PARALLELIZATION WORK Our parallelization work uses both transformations and pointer based join techniques. We discuss the related work for each in ....

Walid Abu-Sufah, David J. Kuck, and Duncan H. Lawrie. On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations. IEEE Trans. on Computers C-30,5 (May 1981), 341-355.


Compilation Techniques for Out-of-Core Parallel Computations - Kandemir Choudhary.. (1998)   (3 citations)  (Correct)

.... policies [2, 11] and (2) techniques which consider re shaping the data reference patterns in order to exploit the given hardware facilities and system software [23, 32] The latter group then paved the way for automatic program restructuring techniques like loop distribution and page indexing [1]. In general, these techniques apply to already written programs and consist of re arranging the code and data to make program s access pattern more local. Another system software based technique is built upon the file systems and run time libraries. Several approaches considered extending the ....

....in the sense that once the I O time is reduced by our optimization, the remaining I O time can be hidden by prefetching. There has been a few papers on out of core compilation. Some of them consider optimizing the performance of virtual memory (VM) The most notable work is from Abu Sufah et al. [1], which deals with optimizations to enhance the locality properties of programs in a VM environment. Among the program transformations used are loop fusion, loop distribution and tiling (page indexing) It should be emphasized that in principle, our file layout determination scheme can be applied ....

W. Abu-Sufah, D. Kuck, and D. Lawrie. On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations. IEEE Transactions on Computers, C-30(5):341--356, May 1981.


Improving the Performance of Out-of-Core Computations - Kandemir Ramanujam Choudhary (1997)   (7 citations)  (Correct)

....our optimization, the remaining I O time can be hidden by prefetching. There has been a few papers on out of core compilation. The approaches can be divided into two groups: The first group considers optimizing the performance of virtual memory (VM) The most notable work is from Abu Sufah et al. [1], which deals with opti mizations to enhance the locality properties of programs in a VM environment. Among the program transformations used are loop fusion, loop distribution and tiling (page indexing) In principle, our disk layout determination scheme can be applied for optimizing the ....

W. Abu-Sufah. On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations, IEEE Transactions on Computers, C-30(5):341--355, 1981.


Improving Effective Bandwidth through Compiler Enhancement of.. - Ding, Kennedy   (10 citations)  (Correct)

.... reuse distance (log scale, base 2) 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 NAS SP, 28x28x28 program order reuse based fusion reuse driven execution Figure 3: Effect of reuse driven execution A[i] f(A[i 1] end for for i=2,N for i=2,N A[i] f(A[i 1] end for A[1] = A[N] b) example of loops that cannot be fused for i=2, N 2 if (i=2) A[i] 0.0 else A[i] f(A[i 1] end for A[1] A[N] B[3] g(A[1] B[i 2] g(A[i] for i=3, N 2 A[i] f(A[i 1] end for B[i] g(A[i 2] end for for i=3, N A[1] A[N] A[2] 0.0 (a) fusion by statement ....

....order reuse based fusion reuse driven execution Figure 3: Effect of reuse driven execution A[i] f(A[i 1] end for for i=2,N for i=2,N A[i] f(A[i 1] end for A[1] A[N] b) example of loops that cannot be fused for i=2, N 2 if (i=2) A[i] 0. 0 else A[i] f(A[i 1] end for A[1] = A[N] B[3] g(A[1] B[i 2] g(A[i] for i=3, N 2 A[i] f(A[i 1] end for B[i] g(A[i 2] end for for i=3, N A[1] A[N] A[2] 0.0 (a) fusion by statement embedding, loop alignment and loop splitting s3 s4 Figure 4: Examples of loop fusion can be considered by adding loop ....

[Article contains additional citation context not shown here]

W. Abu-Sufah, D. Kuck, and D. Lawrie. On the performance enhancement of paging systems through program analysis and transformations. IEEE Transactions on Computers, C-30(5):341--356, May 1981. 11


Value Locality And Speculative Execution - Lipasti (1997)   (31 citations)  (Correct)

....matches the capabilities of the cache hardware. Such improvements have primarily been limited to scientific code with predictable control flow and regular memory access patterns, due to the ease with which rudimentary loop transformations can dramatically improve temporal and spatial locality [40,41]. Explicit prefetching in advance of memory references with poor or no locality has also been examined extensively in this context, both with [21,22] and without additional hardware support [23,24] Dynamic hardware techniques for controlling cache memory allocation that significantly reduce ....

W. Abu-Sufah, D. J. Kuck, and D. H. Lawrie. "On the performance enhancement of paging systems through program analysis and transformations." IEEE Transactions on Computers, C-- 30(5):341--356, May 1981.


SPAID: Software Prefetching in Pointer- and.. - Lipasti, Schmidt.. (1995)   (38 citations)  (Correct)

....in loops [MPS94] Improvements to the data cache performance of programs have primarily been limited to scientific code that operates on loops. Many of these investigations focus on analysis of data utilization to guide program transformations, particularly on loops, to improve data locality [ASKL81, GJG88, CK89, FST91, LRW91,WL91,KM93,CMT94] With the advent of prefetch instructions, other researchers have sought ways for compilers to intelligently insert such instructions to improve data cache performance. Most such techniques do not require additional hardware support [Por89, CKP91, FP91, ....

Walid Abu-Sufah, David J. Kuck, and Duncan H. Lawrie. On the performance enhancement of paging systems through program analysis and transformations. IEEE Transactions on Computers, C-- 30(5):341--356, May 1981.


Optimizing Loops in Database Programming Languages - Lieuwen, DeWitt   (4 citations)  (Correct)

....flavor. Thus the general idea is similar although the analysis used is different. Loop fission has been used to optimize FORTRAN programs. Loop fission breaks a single loop into several smaller loops to improve the locality of data reference. This can improve paging performance dramatically [ABU81]. Our transformations serve a similar function breaking a large loop into several small ones to enable database style optimization. 3 Introduction to Self commutativity Before examining the different transformation strategies that we have developed, we first examine when a simple group by loop ....

Walid Abu-Sufah, David J. Kuck, and Duncan H. Lawrie. On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations. IEEE Trans. on Computers C-30,5 (May 1981), 341-355.


Improving Cache Performance in Dynamic Applications through.. - Ding, Kennedy (1999)   (49 citations)  (Correct)

....memory management has focused on increasing temporal and spatial reuse in regular applications. Cache and register blocking techniques group computations on data tiles to enhance temporal reuse[5, 22] Various loop reordering schemes seek to arrange stride one data access to maximize spatial reuse[1, 12, 17]. Data transformations can often be used to effect spatial reuse when computation transformation is insufficient or illegal[8] None of these strategies, however, works well with dynamic and irregular computations because the unpredictable nature of data reuse prevents effective static analysis. ....

....processor to remap memory. However, compiler analysis similar to ours is necessary to effectively control such hardware features. The goal of improving data reuse has been pursued for regular applications by loop and data transformations such as cache blocking[5, 22] memory order loop permutation[1, 12, 17], and data reshaping[8, 3, 15] However, static loop and data transformations developed for regular applications cannot optimize dynamic computations where the data access pattern remains unknown until run time and changes during the computation. Various static data placement schemes have been ....

W. Abu-Sufah, D. Kuck, and D. Lawrie. On the performance enhancement of paging systems through program analysis and transformations. IEEE Transactions on Computers, C-30(5):341-- 356, May 1981.


Data Access Reorganizations in Compiling Out-of-core.. - Bordawekar.. (1994)   (3 citations)  (Correct)

....earlier, using the loop bounds and index variables, the compiler can determine which array requires more I O accesses and accordingly allocate the available memory. 5 Related Work Abu Sufah first investigated strategies for improving performance of fortran programs in virtual memory environment [ASKL81] Compiler transformations such as tiling, strip mining, loop interchange, loop skewing are proposed by Wolfe [Wol89b] Transformations like Unroll and jam and Scalar replacement are proposed by Carr [Car93] Callahan studies the problem of register allocation [CCK90] The notion reference window ....

W. Abu-Sufah, David Kuck, and D. Lawrie. On the performance enhancement of paging systems through program analysis and program transformation. IEEE Transactions on Computers, C-30(5):341--356, May 1981.


A Unified Compiler Algorithm for Optimizing Locality.. - Kandemir Choudhary (1997)   (Correct)

....To achieve the best performance in out of core computations, both data (layout) and control (loop) transformations are necessary; and parallelism and locality should be handled in a unified way. We note that our approach is based on explicit file I O and is different from those presented in [1, 17, 19, 27]. 3 Algorithm for Optimizing Locality in Files In this section we present an algorithm based on explicit I O to reduce the time spent in I O. Our algorithm automatically transforms a given loop nest to exploit spatial locality in files, assigns appropriate file layouts for out of core arrays, ....

....message passing systems. It should be noted that our algorithms are general in the sense that they can be incorporated to any out of core compilation framework for parallel and sequential machines. Previous work considers optimizing the performance of virtual memory (VM) Abu Sufah et al. [1], dealt with optimizations to enhance the locality properties of programs in a VM environment. In principle, our file layout determination scheme can be applied for optimizing the performance of the VM as well (by changing tile sizes to take the page size into account) But, we believe that the ....

W. Abu-Sufah. On the performance enhancement of paging systems through program analysis and transformations. IEEE Transactions on Computers, C-30(5), pages 341--355, May 1981.


Quantifying Behavioral Differences Between C and C++ Programs - Calder, Grunwald, Zorn (1995)   (47 citations)  (Correct)

....will be more effective for C programs. The negligible difference in data cache performance shown in x4.9.2 implies that specific C optimizations for data cache locality may not be necessary. By comparison, optimizations for instruction caches [40, 41, 50] and possibly virtual memory systems [51, 52, 53, 54] will be more important for C programs than for C programs. Our data also indicates that link time optimizations, such as those proposed by Wall [55] and others will become more important. Object oriented languages, such as C , allow programmers to extend the class hierarchy without affecting ....

W. A. Abu-Sufah, D. J. Kuck, and D. H. Lawrie. On the performance enhancement of paging systems through program analysis and transformation. IEEE Transactions on Computers, C-30(5):341--356, May 1981.


Data Access Reorganizations in Compiling Out-of-Core Data.. - Kandemir Bordawekar (1994)   (4 citations)  (Correct)

....compiling perfectly nested loops for distributed memory message passing machines is addressed. In [2] a solution to the problem of determining loop and data partitions automatically for programs with multiple loops and arrays is presented. Our work also bears similarity to that of Abu Sufah et al.[1] which deals with optimizations to enhance the locality properties of programs in a virtual memory environment. Among the program transformations used by them are loop fusion, loop distribution and tiling (page indexing) We, instead, do not assume the existence of virtual memory; but sensing the ....

W.Abu-Sufah. On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations, IEEE Transactions on Computers, C30 (5), pages 341-355, May 1981.


An Efficient Solution to the Cache Thrashing Problem - Jin, Li, Chen (1996)   (Correct)

....some basic concepts and assumptions. After that, the solutions to the thrashing problem are given, and finally the experimental results conducted on a SGI multiprocessor system are shown. 2 Related Work Extensive research has been reported in the literature regarding efficient memory hierarchy [2, 6, 7, 10, 11, 14, 15, 28, 33, 34]. Abu Sufah, Kuck and Lawrie used loop blocking to improve the paging performance by improving locality of references [2] Wolfe proposed iteration space tiling as a way to improve data reuse in cache or local memories [34] Gallivan, Jalby and Gannon defined a reference window for a dependence as ....

....multiprocessor system are shown. 2 Related Work Extensive research has been reported in the literature regarding efficient memory hierarchy [2, 6, 7, 10, 11, 14, 15, 28, 33, 34] Abu Sufah, Kuck and Lawrie used loop blocking to improve the paging performance by improving locality of references [2]. Wolfe proposed iteration space tiling as a way to improve data reuse in cache or local memories [34] Gallivan, Jalby and Gannon defined a reference window for a dependence as the variables referenced by both the source and the sink of the dependence [14, 15] After executing the source of the ....

W. Abu-Sufah, D. Kuck, and D. Lawrie. On the performance enhancement of paging systems through program analysis and transformations. IEEE Transactions on Computers, C-30(5), May 1981.


Directions in Parallel Programming: HPF, Shared.. - Bodin, Priol..   (Correct)

....the copy is amortized by exploiting the temporal locality. However if there was no page thrashing and no false sharing on array A in the original loop, there is no gain in using this transformation. When applying this kind of optimization, the size of temporaries must be limited. These techniques [63, 4, 41, 3, 23, 64, 55, 16, 44] are well known but usually targeted for hardware cache or local memory. Most of these techniques should be revisited to take into account the characteristics of shared virtual memory and in particular the false sharing phenomena. Barrier Removal When programming with a shared memory model ....

Kuck D. Abu-Sufah W. and Lawrie D. On the performance enhancement of paging system through program analysis and transformations. IEEE Transactions on COmputers, May 1981.


Quantifying Behavioral Differences Between C and C++ Programs - Calder (1994)   (47 citations)  (Correct)

....[18] will be more effective for C programs. The negligible difference in data cache performance shown in x4.7.2 implies that specific C optimizations for data cache locality are not necessary. By comparison, optimizations for instruction caches [35, 37] and possibly virtual memory systems [1, 3, 16, 20, 21] will be more important for C programs than for C programs. One of the most notable observations from the programs we instrumented is that C programs have deeper call stacks, with more variation in the call depth stack depth, than C programs. Procedure activation and calling conventions are at ....

W. A. Abu-Sufah, D. J. Kuck, and D. H. Lawrie. On the performance enhancement of paging systems through program analysis and transformation. IEEE Transactions on Computers, C-30(5):341--356, May 1981.


Scalar vs. Parallel Optimizations - Wolfe (1990)   (Correct)

.... of size M N (assuming normal row major array storage) the stride of the references to the arrays in the inner loop will be N (the distance in memory words between successive accesses to the same array) if N is very large (larger than the page size) this could cause virtual memory thrashing [AKL81]. However, most compilers proceed without considering this effect. For a vector computer, the choice is no longer so obvious. The inner loop here cannot be vectorized, due to the dependence self cycle S 3# d ( # )# S 3# . Also, vector computers are much more sensitive to the stride of memory ....

W. A. Abu-Sufah, D. J. Kuck and D. H. Lawrie, On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations, IEEE Trans. on Computers C-30, 5 (May 1981), 341-356.


The Precomputed Branch Architecture - Calder, Grunwald (1999)   (Correct)

....are not spilt across branch spaces; only procedure calls will span branch spaces. Therefore, all intra procedural branches will be compiled as pre computed branches. There are myriad ways to partition programs, and a number of alternatives have been examined in the effort to reduce page faults [1, 2, 15, 13], and instruction cache conflicts [16, 25, 30] The goals of our study are different than these other studies; we are more interested in reducing the number of indirect jumps than reducing cache conflicts and paging. None the less, the best performing algorithm we examined (MaxCut) for code ....

W. A. Abu-Sufah, D. J. Kuck, and D. H. Lawrie. On the performance enhancement of paging systems through program analysis and transformation. IEEE Transactions on Computers, C-30(5):341--356, May 1981.


Automatic Blocking of Nested Loops - Schreiber, Dongarra (1990)   (53 citations)  (Correct)

....and QR decompositions and methods for partial differential equations. This approach to automatic blocking, through loop strip mining and interchange, was first advocated by Wolfe [18] It is derived from earlier work of Abu Sufah, Kuck, and Lawrie on optimization in a paged virtual memory system [1]. Wolfe introduced the term tiling. A tile is the collection of work to be done, i.e. the set of values of the point loop indices, for a fixed set of values of the block or outer loop indices. We like this terminology since it allows us to distinguish what we are doing which is to decompose ....

W. A. Abu-Sufah, D. J. Kuck, and D. H. Lawrie. On the performance enhancement of paging systems throught program analysis and transformations. IEEE Transactions on Computers, C-30:341--356, 1981.


Compile-Time Techniques for Data Distribution in.. - Ramanujam, Sadayappan (1991)   (76 citations)  (Correct)

....are executed by the owner processor. 3. There is a fixed distribution of array elements. Data re organization costs are architecturespecific) 2. 1 Related work The research on problems related to memory optimizations goes back to studies of the organization of data for paged memory systems [1]. Balasundaram and others [3] are working on interactive parallelization tools for multicomputers that provide the user with feedback on the interplay between data decomposition and task partitioning on the performance of programs. Gallivan et al. 7] discuss problems associated with automatically ....

W. Abu-Sufah, D. Kuck and D. Lawrie, "On the Performance Enhancement of Paging Systems through Program Analysis and Transformations," IEEE Trans. Computers, Vol. C-30, No. 5, pages 341--356, May 1981.


A Unified Tiling Approach for Out-Of-Core Computations - Kandemir, Bordawekar.. (1996)   (1 citation)  (Correct)

....flow dependences. 1 Introduction and Related Work Since, today, almost every processor has some kind of memory hiearachy organized into layers with different costs, compiler optimizations to reduce memory access costs are important. Tiling, one such optimization, was first used by Abu Sufah et al.[Abu81] in order to optimize loop nests in a paging memory system. The later applications were generally on cache memories and registers [Wol89, WL91] In [RS92] a number of loop iterations were aggregated into tiles that execute atomically without any synchronization in a distributedmemory ....

W.Abu-Sufah. On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations, IEEE Transactions on Computers, C-30(5), pages 341-355, May 1981.


CoD3 Optimizing Locality Of Programs Apparc Deliverable - Bodin, Jalby, Seznec..   (Correct)

No context found.

Kuck D. Abu-Sufah W. and Lawrie D. On the performance enhancement of paging system through program analysis and transformations. IEEE Transactions on COmputers, May 1981.


Algorithms for Data Locality Optimization - Bodin, Eisenbeis, Jalby.. (1994)   (2 citations)  (Correct)

No context found.

Kuck D. Abu-Sufah W. and Lawrie D. On the performance enhancement of paging system through program analysis and transformations. IEEE Transactions on COmputers, May 1981.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC