33 citations found. Retrieving documents...
Nicholas Mitchell, Larry Carter, Jeanne Ferrante, and Karin Hogstedt, "Quantifying the multi-level nature of tiling interactions," Tech. Rep. CS97-531, UCSD, Computer Science and Engineering Department, Mar. 1997.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Towards Automatic Synthesis of High-Performance.. - Cociorva.. (2001)   (Correct)

....not consider the impact of the amount of required memory; the memory requirement is a key issue for the problems considered in this paper. Loop tiling for enhancing data locality has been studied extensively [27, 33, 30] and analytic models of the impact of tiling on locality have been developed [7, 20, 25]. Recently, a data centric version of tiling called data shackling has been developed [12, 13] together with more recent work by Ahmed et al. 1] which allows a cleaner treatment of locality enhancement in imperfectly nested loops. The approach undertaken in this project bears similarities to ....

N. Mitchell, K. Hogstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature of tiling interactions. Intl. Journal of Parallel Programming, 26(6):641--670, June 1998.


Loop Optimizations for a Class of Memory-Constrained .. - Cociorva, Wilkins.. (2001)   (Correct)

....the impact of the amount of required memory; the memory requirement is a key issue for the problems considered in this paper. Loop tiling for enhancing data locality has been studied extensively [2, 8, 37, 38, 45, 43] and analytic models of the impact of tiling on locality have been developed [12, 27, 34]. Recently, a data centric version of tiling called data shackling has been developed [19, 20] together with more recent 111 work by Ahmed et al. 1] which allows a cleaner treatment of locality enhancement in imperfectly nested loops. As mentioned earlier, loop fusion has also been used as a ....

N. Mitchell, K. H ogstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 26(6):641--670, June 1998.


Portable Compilation of Vector Expressions for.. - Kalinov.. (1999)   (Correct)

....usually includes sizes of memory hierarchy levels and their blocks as well as time characteristics of read and write operations to different levels of memory hierarchy. Code generation scheme proposed in this paper uses memory hierarchy model which is based on models considered in [4] 10] 12] [13], 15] 17] 18] All parameters of this model are shown in the table 1. In section 6 in addition to parameters shown in table 1 we will need one derived parameter, called time of ideal sequential reading. To introduce this parameter let us estimate time needed to sequentially read all elements ....

....a case tiling does not decrease the running time of code for the reductive expression. Figure 7.3 just shows this effect for SPARCstation 5 having a 8K primary cache. 8 Related Work Tiling is a well known optimization transformation for programs, performing array based computations [18] 17] [13], 4] 12] 10] Most of works we know of deal with perfectly nested loops. A few works [18] address tiling of imperfectly nested loops, but they neglect computations between loops not related 15 1.08 1.09 1.1 1.11 1.12 100 200 300 400 500 600 700 800 900 1000 Time in seconds 2.3 2.4 2.5 ....

N. Mitchell, L. Carter, J. Ferrante, and K. Hogstedt. Quantifying the Multi-Level Nature of Tiling Interactions, 10th International Workshop on Languages and Compilers for Parallel Computing, August, 1997.


Increasing Temporal Locality with Skewing and Recursive.. - Jin, Mellor-Crummey.. (2001)   (3 citations)  (Correct)

....page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and or a fee. SC2001 November 2001, Denver (c) 2001 ACM 1 58113293 X 01 0011 5. 00 monly used to increase both spatial and temporal reuse in one or more levels of cache [8, 10, 13, 15, 16]. Tiling reshapes an iteration space over a data domain by partitioning it into pieces that fit comfortably into cache and then completing all computations on each piece before moving to the next. Tiling rearranges the order of computation so that multiple references to a data element occur in ....

....to bring temporal reuse of values within the same iteration or a few iterations away (e.g. unroll and jam [7] scalar replacement can dramatically reduce load store tra#c. Tiling (also known as loop blocking) is one of the key transformations used for improving temporal reuse in cache (e.g. [8, 10, 15, 14, 16]) Most research on tiling has focused on improving reuse for a single level of cache and has been only applied to perfectly nested loops, in which an outer loop encloses exactly one inner loop. To enhance locality in imperfectly nested loops, Ahmed, Mateev, and Pingali proposed an approach by ....

N. Mitchell, K. Hogstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 26(5), 1998.


Towards Automatic Synthesis of High-Performance.. - Cociorva.. (2001)   (Correct)

....not consider the impact of the amount of required memory; the memory requirement is a key issue for the problems considered in this paper. Loop tiling for enhancing data locality has been studied extensively [27, 33, 30] and analytic models of the impact of tiling on locality have been developed [7, 20, 25]. Recently, a data centric version of tiling called data shackling has been developed [12, 13] together with more recent work by Ahmed et al. 1] which allows a cleaner treatment of locality enhancement in imperfectly nested loops. The approach undertaken in this project bears similarities to ....

N. Mitchell, K. Hogstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature of tiling interactions. Intl. Journal of Parallel Programming, 26(6):641--670, June 1998.


Generating Efficient Tiled Code for Distributed Memory Machines - Tang, Xue (2000)   (3 citations)  (Correct)

....10 Related Work Given a tiling transformation, this paper presents compiler techniques for generating a SPMD program to execute a rectangularly tiled iteration space. Some closely related work is reviewed below. Previous work on tiling includes: improving the performance of a memory hierarchy [6, 7, 23, 25, 35, 40], determining the sizes and shapes of tile to minimise communication overhead on distributed memory machines [10, 29, 30, 33, 38] and determining the tile size to minimise execution time on distributed memory machines [1, 5, 26] To integrate our techniques with a data parallel compiler, compiler ....

....taking cache optimisation on a single processor into account [1, 5, 26] The problem of selecting a tiling that minimises the total execution time by considering many factors simultaneously such as communication overhead and cache performance is difficult. Some initial attempt can be found in [25]. Methods for removing anti and output dependences and for transforming programs into single assignment form are many: array expansion [14] node splitting [27] array privatisation [17] and others [4] Recently, a partial array expansion technique is proposed to reduce the amount of memory usage ....

N. Mitchell, L. Carter, J. Ferrante, and K. Hogstedt. Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 26(26):641--670, 1998.


Software Support For Improving Locality in Advanced Scientific Codes - Tseng (2000)   (Correct)

.... that most of the benefits of cache optimizations can be achieved by simply targeting the smallest level of cache where data reuse can be obtained [72] However, an important exception to this rule is trying to improve translation look aside buffer (TLB) performance as well as cache performance [60]. Because L2 and L3 caches are becoming so large, it is entirely possible to run out of TLB entries before filling cache if data accesses are widely scattered. On some processors TLBs are only 4 way instead of fully associative, increasing chances of TLB misses significantly. We intend to examine ....

.... code for complex linear algebra codes [48] Sarkar describes data locality optimizations used in the IBM XL Fortran compilers, including loop transformations and tiling [74] Mitchell et al. discussed the interactions of multi level tiling for several goals, such as cache, TLB, and parallelism [60]. Song and Li extending tiling techniques to handle multiple loop nests [76] In many cases their technique can exploit reuse across multiple iterations of the time step loop, yielding major improvements. Their technique does not currently apply to 3D arrays, and they concentrate on only L2 cache ....

N. Mitchell, L. Carter, J. Ferrante, and K. Hogstedt. Quantifying the multi-level nature of tiling interactions. In Proceedings of the Tenth Workshop on Languages and Compilers for Parallel Computing, Minneapolis, MN, August 1997.


Tiling Optimizations for 3D Scientific Computations - Rivera, Tseng (2000)   (6 citations)  (Correct)

.... transformations and tiling [28] Chame and Moon propose tiling algorithms for choosing tile sizes based on cost models for estimating capacity and cross interference misses [3] Mitchell et al. discussed the interactions of multi level tiling for several goals, such as cache, TLB, and parallelism [22]. They found explicitly considering multiple levels of the memory hierarchy (cache and TLB) led to the choice of compromise tile sizes which can yield significant improvements in performance. Following their work, we found compiler optimizations targeting higher (smaller) levels of cache can ....

N. Mitchell, L. Carter, J. Ferrante, and K. H ogstedt. Quantifying the multi-level nature of tiling interactions. In Proceedings of the Tenth Workshop on Languages and Compilers for Parallel Computing, Minneapolis, MN, August 1997.


Tiling Optimizations for 3D Scientific Computations - Rivera, Tseng (2000)   (6 citations)  (Correct)

.... transformations and tiling [28] Chame and Moon propose tiling algorithms for choosing tile sizes based on cost models for estimating capacity and cross interference misses [3] Mitchell et al. discussed the interactions of multi level tiling for several goals, such as cache, TLB, and parallelism [22]. They found explicitly considering multiple levels of the memory hierarchy (cache and TLB) led to the choice of compromise tile sizes which can yield significant improvements in performance. Following their work, we found compiler optimizations targeting higher (smaller) levels of cache can ....

N. Mitchell, L. Carter, J. Ferrante, and K. H ogstedt. Quantifying the multi-level nature of tiling interactions. In Proceedings of the Tenth Workshop on Languages and Compilers for Parallel Computing, Minneapolis, MN, August 1997.


Cache-Efficient Multigrid Algorithms (Extended Abstract) - Sellappa, al.   (Correct)

....is well developed, the choice of parameters remains a difficult optimization problem. A major reason for this difficulty is that different components of the memory hierarchy may have conflicting notions of good locality, making the simultaneous optimization of these levels inherently difficult [13]. The importance of locality of reference is even more critical for hierarchical computations based on techniques such as multigrid [3] fast multipole [7] and wavelets [16] which typically perform #### operations on each data element. This is markedly different from dense matrix computations, ....

N. Mitchell, L. Carter, J. Ferrante, and K. Hogstedt. Quantifying the multi-level nature of tiling interactions. In Languages and Compilers for Parallel Computing: 10th Annual Workshop, LCPC'97, number 1366 in Lecture Notes in Computer Science, pages 1--15. Springer, 1998.


Tile Selection Algorithms and Their Performance Models - Hsu, Kremer (1999)   (Correct)

.... # N (C, 1) 2h w) h # w) euc [26] h l 1) w h w # N (C, 1) 1 h 1 w moon [23] h w h w # N (C, l) 1 h 1 w (h w) C tli [3] h w h w # N (C, l) 1 h 1 w (h w) C h # w C 2 wmc [32] h w h = w, h # w # #C, h w # B C (h # w) mhcf [22] h w hw(1 h 1 l 1 n) # #C, h w # B (1 h 1 w) 1 n 1 l) 2 (h # w) Figure 2: Di#erent tile selection algorithms. Underlined algorithms are used for our quantitative comparison. A general tile selection algorithm chooses from the set of qualified candidate blocks the one which ....

.... A general tile selection algorithm chooses from the set of qualified candidate blocks the one which minimizes a particular cost model, i.e. arg min cost(h w) qualified(B) where qualified(B) is a subset of B in which all blocks satisfy certain qualifications such as fit in constraints [22] or shape constraints. Based on the observation that capacity misses and conflict misses are two major sources of cache misses in a loop nest for low associativity caches [20] most algorithms focus on blocks which generate very few self conflicts, and their cost models try to quantify either ....

N. Mitchell, K. Hogstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 26(6), December 1998.


A Stable and Efficient Loop Tiling Algorithm - Hsu, Kremer (2000)   (1 citation)  (Correct)

....misses [17, 16] Most compiler optimizations for improving cache performance have focused on reducing capacity misses, conflict misses, or both. A recent study showed that both capacity misses and conflict misses are equally important in determining the cache performance [25] Loop tiling [38, 39, 22, 8, 26, 32, 34, 20, 6] is a well known compiler optimization that partitions the iteration space of a loop nest into tiles (or blocks) to avoid replacement misses of those array elements frequently referenced during the computation involving the tile. Early e#orts have been to select the tile in such a way that its ....

....cross conflicts, and maximize cache utilization. We propose a new algorithm arguing that . The best tile is among those which have few conflict misses and good cache utilization. Also, eucPad and datPad do not take TLB misses into account. We argue that, as other researchers have done before [8, 26], TLB misses are an important performance factor that needs to be considered since a TLB miss costs more than a cache miss and may cause cache stall. The best tile eliminates TLB misses. Finally, we argue that the fixed amount pad choices strategy proposed by eucPad can select bad tiles in ....

[Article contains additional citation context not shown here]

N. Mitchell, K. Hogstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 26(6), December 1998.


Software Support For Improving Locality in Scientific Codes - Han, Rivera, Tseng (2000)   (8 citations)  (Correct)

.... code for complex linear algebra codes [42] Sarkar describes data locality optimizations used in the IBMXL Fortran compilers, including loop transformations and tiling [64] Mitchell et al. discussed the interactions of multi level tiling for several goals, such as cache, TLB, and parallelism [54]. They found explicitly considering multiple levels of the memory hierarchy (cache and TLB) led to the choice of compromise tile sizes which can yield significant improvements in performance. Song and Li extending tiling techniques to handle multiple loop nests [67] In many cases their technique ....

N. Mitchell, L. Carter, J. Ferrante, and K. H ogstedt. Quantifying the multi-level nature of tiling interactions. In Proceedings of the Tenth Workshop on Languages and Compilers for Parallel Computing, Minneapolis, MN, August 1997.


A Compiler Framework for Tiling Imperfectly-Nested Loops - Song, Li (1999)   (6 citations)  (Correct)

....loop. Our work in this paper covers more complex loop nests, including imperfectly nested loops and skewed tiles. Mitchell et al. use matrix multiplication as an example to show that both the TLB misses and the cache misses must be considered simultaneously in order to achieve the best performance [9], although they provide no formal algorithms to select tile sizes. Kodukula and Pingali propose a matrix based framework to represent transformations of imperfectlynested loops [6] including permutation, reversal, skewing, scaling, alignment, distribution and jamming, but not including tiling. ....

Nicholas Mitchell, Karin Hogstedt, Larry Carter, and Jeanne Ferrante. Quantifying the multi-level nature of tiling interactions. In International Journal of Parallel Programming, 1998.


Performance Coupling: A Methodology for Analyzing Appliction.. - Geisler   (Correct)

....in isolation summed together vs. kernels run together) a single application is represented by multiple coupling parameters (one for each kernel interaction) This is explained further in the following chapter. Larry Carter et al. have studied the performance impacts of hierarchical tiling [7]. Their technique focuses on improving a single kernel within an application, however the additional information that the coupling parameter provides indicates that the technique would be useful across kernels. The coupling parameter can indicate which cross kernel tilings should be persued and ....

Nicholas Mitchell, Karin Hogstedt, Larry Carter, and Jeanne Ferrante. Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 1998.


Performance Coupling: Case Studies for Measuring the.. - Geisler, Taylor   (Correct)

....in isolation summed together vs. kernels run together) a single application is represented by multiple coupling parameters (one for each kernel interaction) This is explained further in the following chapter. Larry Carter et al. have studied the performance impacts of hierarchical tiling [6]. Their technique focuses on improving a single kernel within an application, however the additional information that the coupling parameter provides indicates that the technique would be useful across kernels. The coupling parameter can indicate which cross kernel tilings should be persued and ....

Nicholas Mitchell, Karin Hogstedt, Larry Carter, and Jeanne Ferrante. Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 1998.


A Compiler Perspective on Architectural Evolutions - Nicholas Mitchell Larry (1997)   (2 citations)  Self-citation (Mitchell Carter Ferrante)   (Correct)

No context found.

Nicholas Mitchell, Larry Carter, Jeanne Ferrante, and Karin Hogstedt, "Quantifying the multi-level nature of tiling interactions," Tech. Rep. CS97-531, UCSD, Computer Science and Engineering Department, Mar. 1997.


ILP versus TLP on SMT - Mitchell, Carter, Ferrante, Tullsen (1999)   (3 citations)  Self-citation (Mitchell Carter Ferrante)   (Correct)

....and integer sort. Naive matrix multiply can perform poorly, due in part to high communication to computation ratio and high cache and TLB miss rates. However, with proper loop restructuring these latencies can be reduced to the point where floating point operations mask all memory latency [ 11]. Thus, a naive implementation is latency bound and an optimized one is computation bound. We experiment with varying levels of optimization in twelve implementations of 512 x 512 x 512 matrix mul tiply) The optimizations include tiling [12] for registers and cache. Tiling is a well known ....

....amount of TLP, and between the phases of integer sort. Bucket size: When bucket tiling integer sort, we must choose the size of the buckets. When tiling a computation for a conventional processor, the choice of tile size depends on the size of the problem and the target machine s characteristics [11]. On SMT, threads share cache and TLB. How does this sharing affect tile choice Figure 5 shows the effect of the interaction between TLP and lo cality in integer sort. As we increase TLP, the best bucket size decreases. With one thread the best choice uses a bucket size of 2 6, whereas with ....

Nicholas Mitchell, Kafin H6gstedt, Larry Carter, and Jeanne Ferrante. Quantifying the multi-level nature of tiling interactions. In International Journal on Parallel Programming, June 1998.


A Modal Model of Memory - Mitchell, Carter, Ferrante (2001)   (2 citations)  Self-citation (Mitchell Carter Ferrante)   (Correct)

....the modal model of memory. Our system combines limited static analysis with bounded experimentation to take advantage of the modal nature of performance. 1. 1 Limited Static Analysis Many compilation strategies estimate the profitability of a transformation with a purely static analysis [9, 28, 26, 23, 10, 13], which in many cases can lead to good optimization choices. However, by relying only on static information, the analysis can fail on two counts. First, the underlying mathematical tools, such as integer linear programming, often are restricted to simple program structures. For example, most ....

N. Mitchell, K. Hogstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature of tiling interactions. In International Journal on Parallel Programming, June 1998.


Guiding Program Transformations with Modal Performance Models - Mitchell (2000)   (2 citations)  Self-citation (Mitchell Carter Ferrante)   (Correct)

....enough. Its syntax and semantics are straightforward: express performance as natural numbers, and use the less than relation to pick the best. Observation 1 posits infeasibility of analysis in this case, not the language of expression. Rather than counting cycles, we can count cache misses [39, 103, 98, 81, 41, 48] or predict execution time [55] Purely static counting models typically limit the guidance system in the syntax it can handle (e.g. only a#ne index expressions, only when array dataflow analysis is su#ciently precise) For example, they can 11 do j = 1, M do k = 1, N Y (j) A(j, k) # X(k) ....

....example, Ghosh [48] presents a cost model to quantify the number of cache misses in loop nests. On kernels such as matrix multiply, their algorithm predicts nearly exactly how many cache misses the kernel endures. Others also provide quantitative models for predicting the number of cache misses [39, 26, 98, 81, 22, 41], or execution time [104, 103, 55, 17] Ladner et al. present a probabilistic quantitative model [67] Combined static dynamic approaches: Based on user specified performance templates, Brewer derives cost models based on profile feedback [15] He uses these platform specific cost models to guide ....

Nicholas Mitchell, Karin Hogstedt, Larry Carter, and Jeanne Ferrante. Quantifying the multi-level nature of tiling interactions. In International Journal on Parallel Programming, June 1998.


The UCSD Active Web - Pasquale (1997)   Self-citation (Carter Ferrante)   (Correct)

....Hierarchical Tiling combines partitioning of computation for locality and parallelism with greater compiler control of data movement and stor 54 age. Since the UCSD Active Web will be even more complex and heterogeneous than current single architectures, studying multi level tradeoffs [71][100] is increasingly important. G.3.5.2 Application Experience Primarily through our participation in the NPACI partnership, our faculty has access to and experience with many large applications. To highlight a few: AppLeS prototypes are being developed for the molecular interaction program DOT ....

N. Mitchell, L. Carter, J. Ferrante, and K. Hogstedt, "Quantifying the Multi-level Nature of Tiling Interactions," In Proc. of Tenth International Workshop on Languages and Compiler for Parallel Computing, August 1997, to be published as Springer-Verlag Lecture Notes in Computer Science.


Guiding Program Transformations with Modal Performance Models - Mitchell (2000)   (2 citations)  Self-citation (Mitchell Carter Ferrante)   (Correct)

....specifications . scoping, classification, regression once per mode, per platform all three take on the order of tens of hours . modal system implemented in Scheme . bucket tiling implemented using SUIF 2.0 infrastructure contributions . model interactions between TLB and cache [MCF97] [MHCF98] . introduce bucket tiling, algorithms and implementation [MCF99] observe modal phenomenon . framework for modal modeling, design and implementation . specify modal model for locality . modal guidance future work . further refine mode specifications and system . fully integrate modal ....

Nicholas Mitchell, Karin Hogstedt, Larry Carter, and Jeanne Ferrante. Quantifying the multi-level nature of tiling interactions. In International Journal on Parallel Programming, June 1998.


Optimizing Matrix Multiplication with a Classifier Learning.. - Li, Garzaran   (Correct)

No context found.

N. Mitchell, K. Hogstedt, L. Carter, and J. Ferrante. Quantifying the Multi-Level Nature of Tiling Interactions. Int. Journal of Parallel Programming, 26(6):641--670, June 1998.


A Quantitative Analysis of Tile Size Selection Algorithms - Hsu, Kremer   (Correct)

No context found.

N. Mitchell, K. Hogstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 26(6), December 1998.


A Stable and Efficient Loop Tiling Algorithm - Hsu, Kremer (2000)   (1 citation)  (Correct)

No context found.

N. Mitchell, K. Hogstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 26(6), December 1998.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC