14 citations found. Retrieving documents...
Sharad K. Singhai and Kathryn S. McKinley. A parameterized loop fusion algorithm for improving parallelism and cache locality. The Computer Journal, 40(6), 1997.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
New Results on Array Contraction - Darte, Huard (2002)   (1 citation)  (Correct)

....= 2 E(I 1) B(I 1) 3 F(I 1) C(I) A(I 1) B(I 1) D(I) A(I 1) B(I 1) ENDDO Figure 4: Di#erent versions for locality. 5 Related Work Loop fusion for optimization of locality and memory reduction has a long history. All experimental studies (see for example the experimental results in [17, 19, 24, 14, 15, 25, 16]) lead to the same conclusions: loop transformations (especially loop fusion) are important for data locality optimization in general and array contraction in particular, and array contraction has an impact both on performance and on memory size. Furthermore, there are benefits to performing these ....

....is di#erent (number of contracted arcs and not number of contracted arrays) and, because of this, they can miss the optimal solution for array contraction (see the code in Figure 2(b) for example) Furthermore, they do not consider loop shifting, but only loop fusion. Kennedy and McKinley (see [17, 11, 24] among other papers) were the first who tried to optimize loop fusion for locality in a well defined framework (model, NP completeness, welldefined heuristics, etc. The main di#erence is that we target exactly array contraction, while they study more general (but less accurate in terms of ....

Sharad K. Singhai and Kathryn S. McKinley. A parameterized loop fusion algorithm for improving parallelism and cache locality. The Computer Journal, 40(6), 1997. 14


Towards Automatic Synthesis of High-Performance.. - Cociorva.. (2001)   (Correct)

....products. 6 Related Work Much work has been done on improving locality and parallelism through loop fusion. Kennedy and co workers [11] have developed algorithms for modeling the degree of data sharing and for fusing a collection of loops to improve locality and parallelism. Singhai and McKinley [29] examined the effects of loop fusion on data locality and parallelism together. Although this problem is NP hard, they were able to find optimal solutions in restricted cases and heuristic solutions for the general case. Gao et al. 6] studied the contraction of arrays into scalars through loop ....

S. Singhai and K. S. McKinley. A Parameterized Loop Fusion Algorithm for Improving Parallelism and Cache Locality. The Computer Journal, 40(6):340--355, 1997.


Loop Optimizations for a Class of Memory-Constrained .. - Cociorva, Wilkins.. (2001)   (Correct)

....with the loop nest that consumes it, the dimensionality of the intermediate array can be reduced, thereby reducing its memory requirement. Loop tiling for enhancing data locality and parallelism has been extensively studied [2, 5, 8, 14, 37, 38, 45, 47, 43] Loop fusion has also been studied [41, 42, 33, 32] as a means of improving data locality. There has been much less work investigating the use of loop fusion as a means of reducing memory requirements [11] We have previously investigated the problem of finding optimal loop fusion transformations for minimization of intermediate arrays in the ....

....of tiling called data shackling has been developed [19, 20] together with more recent 111 work by Ahmed et al. 1] which allows a cleaner treatment of locality enhancement in imperfectly nested loops. As mentioned earlier, loop fusion has also been used as a means of improving data locality [18, 41, 42, 33, 32]. 7. CONCLUSION In this paper, we have addressed the memory access and space optimization of a class of nested loop computations that implement multi dimensional summations of the product of several arrays. We have described a dynamic programming algorithm for finding the optimal fusion and ....

S. Singhai and K. S. McKinley. A Parameterized Loop Fusion Algorithm for Improving Parallelism and Cache Locality. The Computer Journal, 40(6):340--355, 1997.


Global Communication Optimization for - Tensor Contraction Expressions   (Correct)

....[11] developed a fast algorithm that allows accurate modeling of data sharing as well as the use of fusion enabling transformations. Ding [6] illustrates the use of loop fusion in reducing storage requirements through an example, but does not provide a general solution. Singhai and McKinley [24] examined the effects of loop fusion on data locality and parallelism together. They viewed the optimization problem as one of partitioning a weighted directed acyclic graph in which the nodes represent loops and the weights on edges represent the amount of locality and parallelism. Although the ....

....discusses the application of fusion directly to array statements in languages such as F90 and ZPL. Callahan et al. 1] present a technique to convert array references to scalar accesses in innermost loops. As mentioned earlier, loop fusion has also been used as a means of improving data locality [11, 24, 22, 21]. There has been much less work investigating the use of loop fusion as a means of reducing memory requirements [8, 23] Another significant way in which our approach differs from other work that we are aware of, is that we attempt global optimization across a collection of loop nests using ....

S. Singhai and K. McKinley. A parameterized loop fusion algorithm for improving parallelism and cache locality. The Computer Journal, 40(6):340--355, 1997.


Design Memory Mapping - Lindenmaier (2000)   (Correct)

....in a partition should be fused. The algorithm nds a Joses 8503 cacheopt Rel 1. 01 June 5, 2000 6 Optimizations of Loops and Arrays JOSES transformation papers dealing with frameworks that handle this transformation this transformation [WL91] ST92] KP93] CMT94] KP93] Fusion [KM93, GOST93, SM98] x Distribution x x x Permutation [AK84] x x x x Tiling (capacity) Wol87, GJG88, CL95] x) x x x x Tiling (con icts) LRW91, Ess93, CM95] Reversal x x x x Skewing x x x x Table 1: This table lists papers that describe speci c loop transformation and gives an overview of frameworks that ....

....data set sizes by distribution. GOST93] concurrently developed the same representation to drive loop fusion. They also propose a ow algorithm to optimize with respect to reuse. Their algorithm has a higher complexity than that of [KM93] and the work is not backed by experimental results. SM98] extend the work of [KM93] to consider register pressure in addition to parallelism and locality. They give an optimal solution for the case where the loop dependency graph is a tree. Their heuristic solution for DAGs nds a spanning tree and then applies the optimal algorithm. Loop Tiling The ....

S. Singhai and K. S. McKinley. A parameterized loop fusion algorithm for improving parallelism and cache locality. The Computer Journal, 40(6):340-355, 1998.


Towards Automatic Synthesis of High-Performance.. - Cociorva.. (2001)   (Correct)

....products. 6 Related Work Much work has been done on improving locality and parallelism through loop fusion. Kennedy and co workers [11] have developed algorithms for modeling the degree of data sharing and for fusing a collection of loops to improve locality and parallelism. Singhai and McKinley [29] examined the effects of loop fusion on data locality and parallelism together. Although this problem is NP hard, they were able to find optimal solutions in restricted cases and heuristic solutions for the general case. Gao et al. 6] studied the contraction of arrays into scalars through loop ....

S. Singhai and K. S. McKinley. A Parameterized Loop Fusion Algorithm for Improving Parallelism and Cache Locality. The Computer Journal, 40(6):340--355, 1997.


Data Locality Enhancement by Memory Reduction - Song, Xu, Wang, Li (2001)   (9 citations)  (Correct)

....Loop fusion has been studied extensively. To name a few publications, Kennedy and McKinley prove maximizing data locality by loop fusion is NP hard [13] They provide two polynomial time heuristics. Singhai and McKinley present parameterized loop fusion to improve parallelism and cache locality [25]. They do not perform memory reduction or loop shifting. Recently, Darte analyzes the complexity of loop fusions [5] and claims that the problem of maximum fusion of parallel loops with constant dependence distances is NPcomplete when combined with loop shifting. His goal is to nd the minimum ....

S. K. Singhai and K. S. McKinley. A parameterized loop fusion algorithm for improving parallelism and cache locality. The Computer Journal, 40(6), 1997.


Loop Transformations for Parallel Execution of a Class of.. - Daniel Cociorva John   (Correct)

....loop fusion. There have been a number of previous studies that have addressed loop tiling [1, 5, 12, 13, 17, 18] for enhancing data locality. The optimal alignment of arrays in evaluating array expressions on data parallel architectures is considered in [3, 4] Loop fusion has also been studied [15, 16, 11, 10] as a means of improving data locality. However the problem of optimal loop fusion and multi dimensional tiling has not been considered together, to the best of our knowledge. As explained in Section 2, the application of both loop tiling and loop fusion transformations is essential in the context ....

S. Singhai and K. S. McKinley. A Parameterized Loop Fusion Algorithm for Improving Parallelism and Cache Locality. The Computer Journal, 40(6):340-355, 1997.


Locality Optimizations for Multi-Level Caches - Rivera, Tseng (1999)   (10 citations)  (Correct)

....L2 group reuse. These techniques easily generalize to three or more cache levels. 4 Loop Fusion Loop fusion is a transformation where adjacent loops are fused into a single loop containing both loop bodies. It can be used to improve locality directly by bringing together memory references [14, 18, 25], or to enable additional locality optimizations such as loop permutation [19] and array contraction [9] We observe improvements in temporal locality after fusing the loop nests of Figure 2 at the innermost level, obtaining the nest shown in Figure 6. Assuming array sizes exceed the L2 cache ....

....the applicability of loop fusion [18] They also propose cache partitioning, a version of MAXPAD which does not take severe conflict misses into account. Singhai and McKinley present a parameterized loop fusion algorithm which considers parallelism and register pressure in addition to reuse [25]. In comparison, our fusion algorithm explicitly calculates group reuse benefits for loop fusion in conjunction with inter variable padding. We also consider multiple levels of cache. Lam, Rothberg, Wolfshow conflict misses can severely degrade the performance of tiling [17] Coleman and McKinley ....

S. Singhai and K. S. McKinley. A parameterized loop fusion algorithm for improving parallelism andcache locality. The Computer Journal, 40(6):340--355, 1997.


Enhancing Branch Prediction via On-Line Statistical Analysis - Dropsho   Self-citation (Kathryn)   (Correct)

No context found.

Singhai, Sharad, and McKinley, Kathryn S. A Parameterized Loop Fusion Algorithm for Improving Parallelism and Cache Locality. The Computer Journal (1997). 213


Improving Data Locality by Array - Yonghong Song Rong   (Correct)

No context found.

Sharad K. Singhai and Kathryn S. McKinley. A parameterized loop fusion algorithm for improving parallelism and cache locality. The Computer Journal, 40(6), 1997.


Compiler-Based Code Partitioning for Intelligent.. - Chen, Chen, Kandemir, .. (2003)   (Correct)

No context found.

S. Singhai and K. S. McKinley. "A Parameterized Loop Fusion Algorithm for Improving Parallelism and Cache Locality. The Computer Journal", 40(6):340--355, 1999.


Improving Effective Bandwidth through Compiler Enhancement of.. - Ding (2000)   (10 citations)  (Correct)

No context found.

harad Singhai and Kathryn S. McKinley. A parameterized loop fusion algorithm for improving parallelism and cache locality. The Computer Journal, 40(6):340--355, 1997.


New Complexity Results on Array Contraction and Related Problems - Darte, Huard (2002)   (Correct)

No context found.

Sharad K. Singhai and Kathryn S. McKinley. A parameterized loop fusion algorithm for improving parallelism and cache locality. The Computer Journal, 40(6), 1997.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC