| N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In ICS, pages 141--152, 2000. |
....1 i 1 ub1 # # lbn i n ubn This representation is not particularly convenient for representing transformations that operate on a collection of loops that are not perfectly nested. For instance, there are two traditional iteration spaces in the code shown in figure 1. Ahmed et al. [1] and Kelly Pugh [12] suggest two di#erent methods for constructing a unified iteration space. In this paper we illustrate the our methods using the Kelly Pugh method. For the simple Moldyn example, they would use a four dimensional space. Each loop corresponds to a pair of dimensions, where the ....
Nawaaz Ahmed, Nikolay Mateev, and Keshav Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Conference Proceedings of the 2000.
....that operate on a collection of loops that are not perfectly nested. For instance, there are three traditional iteration spaces in the code shown in figure 1, and it is awkward to express how the sparse tiling run time reordering transformation operates across all three. Ahmed et al. [1] and Kelly Pugh [16] give two di#erent methods for building a unified iteration space. In this paper, we use the Kelly Pugh method. For the simplified moldyn example in Figure 1, they would use a four dimensional space. Each loop corresponds to a pair of dimensions, where the first dimension of ....
N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Proceedings of the 2000.
.... enhancing data locality has been studied extensively [27, 33, 30] and analytic models of the impact of tiling on locality have been developed [7, 20, 25] Recently, a data centric version of tiling called data shackling has been developed [12, 13] together with more recent work by Ahmed et al. [1]) which allows a cleaner treatment of locality enhancement in imperfectly nested loops. The approach undertaken in this project bears similarities to some projects in other domains, such as the SPIRAL project which is aimed at the design of a system to generate efficient libraries for digital ....
N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loops. ACM Intl. Conf. on Supercomputing, 2000.
.... locality has been studied extensively [2, 8, 37, 38, 45, 43] and analytic models of the impact of tiling on locality have been developed [12, 27, 34] Recently, a data centric version of tiling called data shackling has been developed [19, 20] together with more recent 111 work by Ahmed et al. [1]) which allows a cleaner treatment of locality enhancement in imperfectly nested loops. As mentioned earlier, loop fusion has also been used as a means of improving data locality [18, 41, 42, 33, 32] 7. CONCLUSION In this paper, we have addressed the memory access and space optimization of a ....
N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loops. Proc. ACM International Conference on Supercomputing, Santa Fe, NM, 2000.
....lacking. Optimizations similar to incrementalization have been studied for various language features, e.g. 8, 16, 34, 52, 51, 53, 57, 58, 59, 71, 79] but no systematic technique handles aggregate computations on arrays. At the same time, many optimizations have been studied for arrays, e.g. [1, 2, 3, 4, 6, 22, 29, 31, 37, 41, 55, 56, 63, 69, 74], but none of them achieves incrementalization. This paper presents a method and algorithms for incrementalizing aggregate array computations. The method is composed of algorithms for four major problems: 1) identifying an aggregate array computation and how its parameters are updated, 2) ....
....[51] and can be used to compute all the values that are useful. Pruning then eliminates useless data and code, saving space and time. Example 4. 5 In the code (18) computing s[i 1] uses s 1 [i] and s 1 [i m] Therefore, a simple dependence analysis determines that incrementally computing s[1] through s[n 1 m] uses s 1 [0] through s 1 [n 1 1] so no saved additional values are pruned. The incrementalized loop is formed as in Section 3.3, but using the AACs that have been extended with useful additional values. Example 4.6 From the code in (17) and its incremental version in (18) ....
N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. International Journal of Parallel Programming, 29(5):493-544, Oct. 2001.
....loops, in which an outer loop encloses exactly one inner loop. To enhance locality in imperfectly nested loops, Ahmed, Mateev, and Pingali proposed an approach by which iteration space of each statement is embedded in a special space called product space using a#ne embedding functions [3]. These embedding functions e#ectively generalized transformations like loop fusion and loop fission that has been used for locality enhancement and can be used for tiling imperfectly nested loops if the product space can be transformed into a fully permutable one [4] Tiling can also be applied ....
N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Proceedings of the 2000.
.... enhancing data locality has been studied extensively [27, 33, 30] and analytic models of the impact of tiling on locality have been developed [7, 20, 25] Recently, a data centric version of tiling called data shackling has been developed [12, 13] together with more recent work by Ahmed et al. [1]) which allows a cleaner treatment of locality enhancement in imperfectly nested loops. The approach undertaken in this project bears similarities to some projects in other domains, such as the SPIRAL project which is aimed at the design of a system to generate efficient libraries for digital ....
N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loops. ACM Intl. Conf. on Supercomputing, 2000.
....deal with perfectly nested loops and uniform dependences, some even with limited dimensionality. These restrictions allow for provable optimality within the methods frameworks. In contrast, our goal is to extend the applicability of tiling. Recently, Song and Li [9] and Ahmed, Mateev and Pingali [1] developed special tiling methods for imperfectly nested loop programs in the framework of cache optimization. However, tiling methods for cache optimization cannot be used directly in tiling for coarse grain parallelism, even if both try to improve locality. One of the main di erences is the ....
N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Conference proceedings of the
....their method cannot be used for our purpose, since their technique restructures the code, which would invalidate our space time mapping. In addition, the outermost loop is not partitioned, which is usually necessary for our purposes (cf. Section 3. 4) Recently, Ahmed, Mateev and Pingali [1, 2] presented a method for tiling imperfeclty nesteded loop nests. The main di erence to our approach is that their work treats cache optimization only and does not take parallelism into account. Also technically, the approaches are quite di erent: their method uses the product space af all iteration ....
N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Conference proceedings of the
.... an arbitrarily nested program the largest outermost fully permutable loop nest using ane transforms[17] Ahmed et al. have also developed a heuristic procedure that nds outermost fully permutable loop nests in two steps: the rst turns arbitrary loop nests into a highdimensional perfect loop nest[1, 2], and then unimodular techniques are used to create fully permutable loop nests. The approach of just blocking fully permutable loop nests, however, is inadequate. Most programs cannot be made into one fully permutable loop nest but can still bene t from the concept of blocking. By going back to ....
....of the existing blocking techniques can produce this code. Blocking with unimodular transforms does not apply because the loops are not perfectly nested. Algorithms that only block fully permutable loop nest would also fail as loops J and K cannot be combined into one fully permutable loop nest[1, 2]. Data shackling cannot change the order of the execution[13] Iteration space slicing cannot get the blocking e ect to improve spatial locality[19] 4.2 Handling Sets of Independent Threads In this section, we focus on arbitrarily nested loops that can be broken up into independent threads. It ....
[Article contains additional citation context not shown here]
N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Proceedings of the 2000 ACM International Conference on Supercomputing, pages 141-152, May 2000.
No context found.
Nawaaz Ahmed, Nikolay Mateev, and Keshav Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Proceedings of the 2000.
No context found.
Nawaaz Ahmed, Nikolay Mateev, and Keshav Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Proc. International ConferenceonSupercomputing, Santa Fe, New Mexico, May2000.
No context found.
N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Proc. International Conference on Supercomputing, Santa Fe, New Mexico, May 2000.
....JAD storage. The technology described in the rest of this paper accomplishes this. 3 Framework for Data centric Restructuring In this section, we sketch a data centric framework for restructuring imperfectly nested dense matrix codes with dependences. It extends the framework we developed in [1] for locality enhancement of dense matrix codes. For lack of space, we only sketch the ideas here; full details are available in [12] Our framework makes the usual assumptions about programs: i) programs are sequences of statements nested within loops, ii) all memory accesses are through array ....
....is the execution of statement Sk at iteration NOQP of the surrounding loops. Flow , anti , and output dependences from statement instances N OSR to statement instances NOST can be expressed as matrix inequalities of the form U WV NOSR) NOST) X: H Y which we call dependence classes [1]. For our running example in Figure 4, it is easy to show that there are two relevant dependence classes. 3 The first dependence class U Z [ B Z2 N = Bacb = N 3 )Z4[ arises because statement S1 writes to a location b[j] which is then read by statement S2; similarly, ....
[Article contains additional citation context not shown here]
Nawaaz Ahmed, Nikolay Mateev, and Keshav Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Proceedings of the 2000 International Conference on Supercomputing, Santa Fe, New Mexico, May 8--11, 2000.
....contains a sequence of perfectly nested loop nests. Their algorithm identifies one loop from each loop nest, fuses these together and skews them with respect to the time step loop. However, this transformation strategy is not applicable to codes such as matrix factorizations. In our previous work [1], we developed a general framework to enhance locality in imperfectly nested loops. Our for c for r for k S2 S1 Source code Statement Iteration Spaces for . Code gen r k c r c 1 1 1 2 2 c c r r k 1 1 1 2 T S 1 2 Product Space Transformed Product Space Output code F F 2 1 S 2 ....
....all pairs D ( 3 E wA] If the intersection of this plane with the system of legal solutions ( is non empty, then a solution belonging to the intersection will exploit the reuse. We pick a solution that exploits as many reuses as possible, as discussed in detail in a companion paper [1]. For the code fragment shown in Figure 6, this heuristic will pick the second solution. For our running example, our algorithm picks the following embeddings : D U W X [ E = U V V V V W X [ D Q R E = U V V V V W X Y Y Y Y [ 3.4 Determining Tile Sizes We ....
N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Proc. International Conference on Supercomputing, Santa Fe, New Mexico, May 2000.
No context found.
N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In ICS, pages 141--152, 2000.
No context found.
N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In ACM Supercomputing'00, May 2000.
No context found.
N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Proceedings of the International Conference on Supercomputing, 2000.
No context found.
Nawaaz Ahmed, Nikolay Mateev, and Keshav Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Conference Proceedings of the 2000.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC