72 citations found. Retrieving documents...
Allen, F.E. and Cocke, J.A., "A catalogue of optimizing transformations," pp. 1-30 in Design and Optimization of Compilers, ed. R. Rustin,Prentice-Hall, Englewood Cliffs, NJ (1972).

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Optimizing Aggregate Array Computations in Loops - Liu, Stoller, Li, Rothamel   (Correct)

....lacking. Optimizations similar to incrementalization have been studied for various language features, e.g. 8, 16, 34, 52, 51, 53, 57, 58, 59, 71, 79] but no systematic technique handles aggregate computations on arrays. At the same time, many optimizations have been studied for arrays, e.g. [1, 2, 3, 4, 6, 22, 29, 31, 37, 41, 55, 56, 63, 69, 74], but none of them achieves incrementalization. This paper presents a method and algorithms for incrementalizing aggregate array computations. The method is composed of algorithms for four major problems: 1) identifying an aggregate array computation and how its parameters are updated, 2) ....

....application problems. APL compilers optimize aggregate array operations by performing computations in a piece wise and on demand fashion, avoiding unnecessary storage of large intermediate results in sequences of operations [29, 37, 77] The same basic idea underlies techniques such as fusion [3, 4, 15, 31, 74], deforestation [73] and transformation of series expressions [75] These optimizations do not aim to compute each piece of the aggregate operations incrementally using previous pieces and thus cannot produce as much speedup as our method can. Specialization techniques, such as data ....

F. E. Allen and J. Cocke. A catalogue of optimizing transformations. In R. Rustin, editor, Design and Optimization of Compilers, pages 1-30. Prentice-Hall, 1971.


Efficient Pipelining of Nested Loops: Unroll-and-Squash - Petkov (2001)   (Correct)

....j ) b 1 = f(a 1 ) b 2 = f(a 2 ) a 1 = g(b 1 ) a 2 = g(b 2 ) data out[i] a 1 ; data out[i 1] a 2; DFG pipeline register f g f g Figure 2.2: A simple example: unroll and jam by 2. Traditional loop optimizations such as loop unrolling, flattening, permutation and pipelining [29] fail to exploit the parallelism and improve the performance for this set of loops. One successful approach in this case is the application of unroll and jam (Figure 2.2) which unrolls the outer loop but fuses the resulting sequential inner loops to maintain a single inner loop [28] explained ....

....This chapter gives a brief overview of the loop transformation theory including data dependence analysis and several examples of traditional loop transformations relevant to the unroll and squash method. More comprehensive presentations of the loop transformation theory can be found in [27] [29] and [30] Other applicable compiler analysis and optimization techniques are discussed in Chapter 4. 3.1 Iteration Space Graph A FOR style loop nest of depth n can be represented as an iteration space graph with axes corresponding to the different loops in the loop nest (Figure 3.1) The axes ....

[Article contains additional citation context not shown here]

F. E. Allen and J. Cocke. A catalogue of optimizing transformations. In Design and Optimization of Compilers, Prentice-Hall, 1972.


Efficient Pipelining of Nested Loops: Unroll-and-Squash - Petkov, Harr, Amarasinghe (2001)   (Correct)

....and the total time for the loop nest is 2MN. for (i=0; i M; i ) a = data in[i] for (j=0; j N; j ) b = f(a) a = g(b) data out[i] a; f g DFG pipeline register Figure 2. A simple loop nest Traditional loop optimizations such loop unrolling, flattening and permutation [13] fail to exploit the parallelism efficiently and improve the performance for this loop nest. One successful approach is the application of unroll and jam (Figure 3) which unrolls the outer loop but fuses the resulting sequential inner loops to maintain a single inner loop [12] Unroll and jam ....

F. E. Allen and J. Cocke. A catalogue of optimizing transformations. Design and Optimization of Compilers, Prentice-Hall, 1972.


Comparing and Combining Read Miss Clustering and Software.. - Pai, Adve (2001)   (1 citation)  (Correct)

.... (by sending multiple prefetches in parallel) The read miss clustering transformation is based on a novel adaptation of unroll and jam [22] Specifically, it extends unroll and jam by mapping memory parallelism in a modern ILP system to the previously studied problem of floating point pipelining [2, 3, 20]. The new transformation aims to cluster multiple expected read misses together within the same instruction window of an out of order processor, without sacrificing data locality. Read miss clustering was experimentally shown to improve latency tolerance and exploit hardware support for memory ....

.... more thoroughly and proposes a compile time algorithm for increasing read miss clustering for modern out of order processors [22] The algorithm is based on a novel adaptation of unroll and jam, by which an outer loop is unrolled and the resulting inner loop copies are fused (jammed) together [2]. Figures 1(a) and 1(b) provide pseudocode for the base and clustered code, respectively. The pseudocode shows a traversal of a 2 D matrix with dimensions I o Theta I i , before and after unroll and jam. All pseudocode in this paper uses row major notation. The initial code is a row wise ....

[Article contains additional citation context not shown here]

F. E. Allen and J. Cocke. A Catalogue of Optimizing Transformations. In R. Rustin, editor, Design and Optimization of Compilers, pages 1--30. Prentice-Hall, 1972.


Managing Interprocedural Optimization - Hall (1990)   (41 citations)  (Correct)

....loops) as good targets on which to concentrate optimization. The ALPHA translator optimized parameter passing using information about accesses to the parameters within the called procedure. The Allen Cocke optimization catalog defined four different ways of implementing procedure call linkages [AC72] These are open, closed, semi open and semi closed linkages. Open linkage is another name for inline substitution. Closed linkage is the usual linkage style for separate compilation. With semi open linkage, a procedure definition is compiled with its caller. The called procedure and the caller ....

....time. To eliminate dependences and improve parallelism, compound statements such as loops and conditionals must be located and moved outside of the loop. Such extensions to the traditional algorithm have been described [CLZ86] FOW87] 141 Loop unswitching. A similar technique, loop unswitching [AC72] can be applied when a condition in the loop is loop invariant, but the code guarded by the condition is not. For an if then else clause within a loop, two copies of the loop are created. One is guarded by the if condition, eliminating the portion of the loop that the else condition guards. The ....

F. E. Allen and J. Cocke. A catalogue of optimizing transformations. In J. Rustin, editor, Design and Optimization of Compilers. Prentice-Hall, Englewood Cliffs, NJ, 1972.


Exploiting Instruction-Level Parallelism for Memory System.. - Pai (2000)   (Correct)

....transformations already known and implemented in compilers for other purposes, providing the analysis needed to relate them to read miss clustering. The key transformation we use is unroll and jam, which was originally proposed for improving floating point pipelining and for scalar replacement [AC72, CCK88, CK94, Nic87] We develop an analysis and transformation frame 4 work that maps the read miss clustering problem to floating point pipelining. We evaluate the clustering transformations applied by hand to a latency detection microbenchmark and seven scientific applications running on ....

....4.2(a) This transformation unrolls an outer loop and fuses (jams) the resulting inner loop copies into a single inner loop. Previous work has used unroll and jam for scalar replacement (replacing array memory operations with register accesses) better floating point pipelining, or cache locality [AC72, CCK88, CK94, Nic87, Car96] Using unroll and jam for read miss clustering requires different heuristics, and may help even when the previously studied benefits are unavailable. We prefer to use unroll and jam instead of strip mine and interchange for two reasons. First, unroll and jam allows us ....

[Article contains additional citation context not shown here]

Frances E. Allen and John Cocke. A Catalogue of Optimizing Transformations. In Randall Rustin, editor, Design and Optimization of Compilers, pages 1--30. Prentice-Hall, 1972.


JaMake: A Java Compiler Environment - Budimli, Kennedy (2001)   (1 citation)  (Correct)

....implementation of these optimizations is not straightforward in Java. In particular, the Java exception mechanism prevents or seriously limits most code motion optimizations. Exception hiding is a program transformation that enables more ecient code motion. This technique uses loop peeling [1] and guard insertion to create exception free zones for code motion. Code motion transformations can then freely move the code within these zones, without concern for exceptions. Static Single Assignment (SSA) form, the standard intermediate representation for programs in a compiler, enables ....

F. Allen and J. Cocke, A catalogue of optimizing transformations, in Design and Optimization of Compilers, Prentice-Hall, 1972, pp. 1-30.


Compiling Java for High Performance and the Internet - Budimlic (2001)   (Correct)

....optimizations is not straightforward in Java. In particular, the Java exception mechanism prevents or seriously limits most of the code motion optimizations. Chapter 4 introduces exception hiding, a program transformation that enables more efficient code motion. This technique uses loop peeling [7] and guard insertion to create exception 4 free zones for code motion. Code motion transformations can then freely move the code within these zones, without concern for exceptions. Static Single Assignment (SSA) form, the standard intermediate representation for programs in a compiler, enables ....

F.E. Allen and J. Cocke. A catalogue of optimizing transformations. In Design and Optimization of Compilers, pages 1--30. Prentice-Hall, 1972.


A Comparative Study of Static and Dynamic Heuristics for .. - Arnold, Fink, Sarkar.. (2000)   (7 citations)  (Correct)

.... = 15 # pages used by figures and tables = 4 # pages of text = 11 Dynamo 00 submission Page 0 Privileged material please do not distribute 1 Introduction Procedure inlining (i.e. inline expansion of procedure calls) has been a well known program transformation for almost three decades [1]. The inlining transformation replaces a call site by an in line copy of the body of the procedure being called. Procedure inlining can result in at least three kinds of benefits. First, the inlining transformation eliminates linkage overhead of the call. Second, the compiler can use data flow ....

F. E. Allen and J. Cocke. A catalogue of optimizing transformations. In Design and Optimization of Compilers, pages 1--30. Prentice-Hall, 1972.


Software Construction Using Components - Neighbors (1980)   (23 citations)  (Correct)

....into an executable program. Optimization Oriented Transformation Systems Some early work on optimizing transformation systems stems from the desire to make the optimization process visible to the user [Schneck72] These systems would like to perform the standard optimizations done by a compiler [Allen72] and exploit standard rules of exchange for the operators of general purpose languages [Standish76a] Recent interest in optimizing transformations was renewed by Loveman [Loveman77] in his attempt to define source to source Software Construction Using Components 49 transformations which group ....

Allen, F.E., and Cocke, J., A Catalogue of Optimizing Transformations, In Rustin, R., editor, Design and Optimization of Compilers, pages 1-30. Prentice-Hall, 1972.


SETL for Internet Data Processing - Bacon (2000)   (Correct)

.... quotation above, sprang from the strong perception in the late 1960s that there was a need for a set oriented language capable of expressing concisely the kind of set intensive algorithm that kept arising in studies of compiler optimization, such as those by Allen, Cocke, Kennedy, and Schwartz [5, 43, 6, 41, 7, 10, 130, 11, 8, 9, 131, 178, 179, 180, 12, 132, 42]. Programming Languages and their Compilers [44] published early in 1970, devoted more than 200 pages to optimization algorithms. It included many of the now familiar techniques such as redundant code elimination and strength reduction, dealt extensively with graphs of control flow and their ....

Frances E. Allen and John Cocke. A catalogue of optimizing transformations. In R. Rustin, editor, Design and Optimization of Compilers, pages 1--30. PrenticeHall, 1971.


A Comparative Study of Static and Profile-Based.. - Arnold, Fink, Sarkar, .. (2000)   (2 citations)  (Correct)

....in this paper can lead to significant speedups in execution time (up to 1. 68 Theta) even with modest limits on code size expansion (at most 10 ) 1 Introduction Procedure inlining (i.e. inline expansion of procedure calls) has been a well known program transformation for almost three decades [1]. The inlining transformation replaces a call site by an in line copy of the body of the procedure being called. Procedure inlining can result in at least three kinds of benefits. First, the inlining transformation eliminates linkage overhead of the call. Second, the compiler can use data flow ....

F. E. Allen and J. Cocke. A catalogue of optimizing transformations. In Design and Optimization of Compilers, pages 1--30. Prentice-Hall, 1972.


Garbage Collection and Other Optimizations - Chase (1987)   (6 citations)  (Correct)

....do not squander memory, and have low run time overhead. 7. Conclusions. 3 Chapter 2 Background This chapter reviews algorithms, analyses, and terminology that will appear later in the dissertation. Review and tutorial information on data flow analysis and optimizations are widely available [AC72, Ken81, Hec77, ASU86] The literature on garbage collection has been summarized by Cohen [Coh81] and Nicolau [CN83] though there has been a great deal of recent work [Ung84, BW88, LH83, Rov85, Bro85, Hug85, SCN84, Moo84] 2.1 Garbage collection Garbage collection is an implementation technique ....

....support for saving of registers, debugging, and exception handling. Without knowledge of the behavior of a called procedure, a compiler must also make worst case assumptions about the e#ects of the procedure on its parameters and global variables. This generality can be costly. Linkage tailoring [AC72] reduces the cost of a procedure linkage by using a special purpose non standard interface. Allen and Cocke describe four classes of procedure linkage. closed This is a standard procedure linkage. Registers are saved at procedure entry and restored at procedure exit, and results must appear in a ....

Frances E. Allen and John Cocke. A catalogue of optimizing transformations. In Randall Rustin, editor, Design and Optimization of Compilers. Prentice-Hall, 1972.


Instruction-Processing Optimization Techniques For VLSI.. - Bunda (1993)   (1 citation)  (Correct)

....requirements, but since static instruction counts are only weakly correlated to path length, few conclusions about performance can be drawn. This is partly because program execution time is dominated by inner loops. Also, some optimization strategies reduce path length while increasing static size [3]. 41 5.3 Path Length Direct comparison of D16 and DLXe path lengths can be more meaningful than comparisons between arbitrary architectures, because both instruction sets are executed on the same pipeline. If execution resources or computations per instruction for both machines were different, ....

Frances E. Allen and John Cocke. A catalogue of optimizing transformations. In Randall Rustin, editor, Proceedings Courant Computer Science Symposium 5. Prentice-Hall, March 1971.


Code Transformations to Improve Memory Parallelism - Pai, Adve (1999)   (6 citations)  (Correct)

....code transformations already known and implemented in compilers for other purposes, providing the analysis needed to relate them to memory parallelism. The key transformation we use is unroll and jam, which was originally proposed for improving floating point pipelining and for scalar replacement [3, 4, 5, 6]. We develop an analysis that maps the memory parallelism problem to floating point pipelining. We evaluate the clustering transformations applied by hand to a latency detection microbenchmark and five scientific applications running on simulated and real uniprocessor and multiprocessor systems. ....

....2(a) This transformation unrolls an outer loop and fuses (jams) the resulting inner loop copies into a single inner loop. Previous work has used unroll and jam for scalar replacement (replacing array memory operations with register accesses) better floating point pipelining, or cache locality [3, 4, 5, 6, 10]. Using unroll and jam for read miss clustering requires different heuristics, and may help even when the previously studied benefits are unavailable. We prefer to use unroll and jam instead of strip mine and interchange for two reasons. First, unrolland jam allows us to exploit additional ....

F. E. Allen and J. Cocke, "A Catalogue of Optimizing Transformations," in Design and Optimization of Compilers (R. Rustin, ed.), pp. 1--30, Prentice-Hall, 1972.


Estimating Interlock And Improving Balance For Pipelined .. - Callahan, Cocke, Kennedy (1988)   (51 citations)  Self-citation (Cocke)   (Correct)

....The translated programs have parallelism made explicit by vector statements or parallel DO loops [ACK87] This paper examines high level transformations to improve the eciency of programs running on processors with pipelined oating point co processors. In particular, the e ects of: loop fusion[AC72], War84] loop distribution[Wol82] loop unrolling[AC72] loop interchange[Wol82] All83] Wol86] unroll and jam (section 3.5) and redundant load elimination (section 3.1) on loop balance are discussed. Some of these transformations are already implemented in PFC and others are being ....

....by vector statements or parallel DO loops [ACK87] This paper examines high level transformations to improve the eciency of programs running on processors with pipelined oating point co processors. In particular, the e ects of: loop fusion[AC72] War84] loop distribution[Wol82] loop unrolling[AC72], loop interchange[Wol82] All83] Wol86] unroll and jam (section 3.5) and redundant load elimination (section 3.1) on loop balance are discussed. Some of these transformations are already implemented in PFC and others are being implemented. 2 Balance, Interlock and Eciency 2.1 Machine ....

[Article contains additional citation context not shown here]

F. E. Allen and J. Cocke. A catalogue of optimizing transformations. In Design and Optimization of Compilers, pages 1-30, Prentice-Hall, 1972.


A Program Integration Algorithm that Accommodates - Semantics-Preserving..   (Correct)

No context found.

Allen, F.E. and Cocke, J.A., "A catalogue of optimizing transformations," pp. 1-30 in Design and Optimization of Compilers, ed. R. Rustin,Prentice-Hall, Englewood Cliffs, NJ (1972).


A New Approach to Data Mining for Software Design - Walid Taha Scott   (Correct)

No context found.

F. Allen and J. Cocke. A catalogue of optimizing transformations. In Design and Optimization of Compilers, pages 1--30. Prentice-Hall, 1972.


Loop Transformations for Architectures with Partitioned.. - Huang, Carr, al. (2001)   (Correct)

No context found.

F. Allen and J. Cocke. A catalogue of optimizing transformations. In Design and Optimization of Compilers, pages 1-30. Prentice-Hall, 1972.


Optimizing Loop Performance for Clustered VLIW Architectures - Qian, Carr, Sweany (2002)   (2 citations)  (Correct)

No context found.

F. Allen and J. Cocke. A catalogue of optimizing transformations. In R. Rustin, editor, Design and Optimization of Compilers, pages 1--30. Prentice-Hall, 1972.


Improving Register Allocation for Subscripted Variables - Callahan, Carr, Kennedy (1990)   (120 citations)  (Correct)

No context found.

F. E. Allen and J. Cocke. A catalogue of optimizing transformations. In Design and Optimization of Compilers, pages 1--30. Prentice-Hall, 1972.


Optimizing Sparse Matrix-Vector Product Computations.. - Mellor-Crummey, Garvin (2003)   (Correct)

No context found.

F. Allen and J. Cocke. A catalogue of optimizing transformations. In J. Rustin, editor, Design and Optimization of Compilers. Prentice-Hall, 1972.


Improving Effective Bandwidth through Compiler Enhancement of.. - Ding (2000)   (10 citations)  (Correct)

No context found.

F. Allen and J. Cocke. A catalogue of optimizing transformations. In J. Rustin, editor, Design and Optimization of Compilers. PrenticeHall, 1972.


Branch Elimination via Multi-Variable Condition Merging - Kreahling, Whalley..   (Correct)

No context found.

Allen, F.E., Cocke, J.: A catalogue of optimizing transformations. In Rustin, R., ed.: Design and Optimization of Compilers. Prentice-Hall, Englewood Cli s, NJ, USA (1971) 1-30. Transformations.


An Experiment with Inline Substitution - Cooper, Hall, Torczon (1991)   (32 citations)  (Correct)

No context found.

F. E. Allen and J. Cocke, `A catalogue of optimizing transformations', in J. Rustin (ed.), Design and Optimization of a Compiler, Prentice-Hall, 1972, pp. 1--30.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC