| S. Ghosh, M. Martonosi, and S. Malik. Automated Cache Optimizations Using CME Driven Diagnosis. In Proceedings of the International Conference on Supercomputing, pages 316--326, Santa Fe, New Mexico, USA, May 2000. |
....taken by the compiler community [WL91, KCRB98, RT98a] They developed schemes to apply data locality optimizations automatically within a compiler pass. The optimizations within the compiler are guided by cache capacity estimates [FST91, GJG88, TFJ94] and cache miss prediction techniques [GMM99, GMM00] Available implementations of these techniques, however, are limited to research compilers up to now. Furthermore, these data locality optimization techniques cannot be applied to complex programs as for example multigrid methods since the data dependences within the algorithm are typically to ....
S. Ghosh, M. Martonosi, and S. Malik. Automated Cache Optimizations Using CME Driven Diagnosis. In Proceedings of the International Conference on Supercomputing, pages 316--326, Santa Fe, New Mexico, USA, May 2000.
....us understand the causes behind these misses. These models can then be employed to guide various optimisations to reduce cache misses in a systematic manner. In the last few years, several compile time analytical methods have been proposed to statically predict the cache behaviour of a program [4, 6, 11, 13, 14, 26]. At this early stage, all these research e orts have focused on loop oriented programs operating on arrays. Such a method consists of (a) a procedure for setting up mathematical formulas to characterise the cache misses in a program and (b) an algorithm for nding cache misses (and their causes, ....
....tile and pad sizes by reasoning about these equations rather than solving them for cache misses. However, computing the exact number of cache misses from the CMEs when required is expensive. Some statistics based methods have been reported to produce eciently a reasonable estimate of such misses [4, 14, 26] from the CMEs. The CMEs are limited to isolated perfect loop nests consisting of straight line assignments. Recently, an attempt for exactly modelling the cache behaviour of loop nests using Presburger formulas is made in [6] While the cache misses for both perfect and imperfect loop nests ....
[Article contains additional citation context not shown here]
S. Ghosh, M. Martonosi, and S. Malik. Automated cache optimizations using CME driven diagnosis. In International Conference on Supercomputing (ICS'00), pages 316-326, 2000.
....to understand the causes behind cache misses and helps reduce these misses in a systematic manner. However, computing the exact number of cache misses from the CMEs is computationally expensive. Some statistics based methods have been reported to produce an accurate estimate of such misses [2, 12, 24]. In certain compiler transformations, it is possible to reduce the number of cache misses by reasoning about the causes of some cache misses expressed in the CMEs without requiring the CMEs to be solved. Two classic applications are tiling and padding [11] Unfortunately, the CMEs are limited to ....
....data reuse for uniformly generated references in a perfect loop nest. They also use reuse vectors to derive an estimate of cache misses to guide their data locality algorithm. Gannon, Jalby and Gallivan [10] and Wolfe [26] discuss the use of reference window for predicting cache misses. The CMEs [11, 12] represent a more ambitious analytical method in an attempt to provide a more accurate analysis of cache misses. This framework is targeted at perfectly nested loops with ane loop bounds and data accesses. If all reuse vectors of a reference are used, all cache misses for the reference can be ....
Somnath Ghosh, Margaret Martonosi, and Sharad Malik. Automated cache optimizations using CME driven diagnosis. In International Conference on Supercomputing (ICS'00), pages 316-326, 2000.
....recurrences based on the Euclidean remainder algorithm may be used to quickly compute a sequence of non conflicting tile dimensions [6, 26] A cost function is used to select the tiles preserving the most reuse. A search space algorithm using a very precise cache model can obtain similar results [12, 14]. To efficiently compute non conflicting tile dimensions for 3D arrays we introduce Euc3D, an extension to the Euc algorithm given in [26] The pseudocode in Figure 9 presents an overview of Euc3D. Like Euc, the algorithm initially computes several non conflicting array tiles and then selects ....
.... More recently, Ghosh et al. developed symbolic cache representation which are highly accurate in predicting cache misses [11, 12, 13] Their cache miss equations can be used to predict the number of cache misses for a computation, and also be used to guide compiler transformations such as tiling [14]. A number of researchers have investigated tiling as 11 a means of exploiting reuse. Tiling was first proposed by Irigoin and Triolet [15] and Wolfe [35, 36] Lam, Rothberg, and Wolf show conflict misses can severely degrade the performance of 2D tiling [20] Wolf and Lam analyze temporal and ....
S. Ghosh, M. Martonosi, and S. Malik. Automated cache optimizations using cme driven diagnosis. In Proceedings of the 2000 ACM International Conference on Supercomputing, Santa Fe, NM, May 2000.
....recurrences based on the Euclidean remainder algorithm may be used to quickly compute a sequence of non conflicting tile dimensions [6, 26] A cost function is used to select the tiles preserving the most reuse. A search space algorithm using a very precise cache model can obtain similar results [12, 14]. To efficiently compute non conflicting tile dimensions for 3D arrays we introduce Euc3D, an extension to the Euc algorithm given in [26] The pseudocode in Figure 9 presents an overview of Euc3D. Like Euc, the algorithm initially computes several non conflicting array tiles and then selects ....
.... More recently, Ghosh et al. developed symbolic cache representation which are highly accurate in predicting cache misses [11, 12, 13] Their cache miss equations can be used to predict the number of cache misses for a computation, and also be used to guide compiler transformations such as tiling [14]. A number of researchers have investigated tiling as 11 a means of exploiting reuse. Tiling was first proposed by Irigoin and Triolet [15] and Wolfe [35, 36] Lam, Rothberg, and Wolf show conflict misses can severely degrade the performance of 2D tiling [20] Wolf and Lam analyze temporal and ....
S. Ghosh, M. Martonosi, and S. Malik. Automated cache optimizations using cme driven diagnosis. In Proceedings of the 2000 ACM International Conference on Supercomputing, Santa Fe, NM, May 2000.
No context found.
S. Ghosh, M. Martonosi, and S. Malik. Automated cache optimizations using CME driven diagnosis. In International Conference on Supercomputing (ICS'00), pages 316-326, 2000.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC