82 citations found. Retrieving documents...
J. Ferrante, V. Sarkar, and W. Thrash, "On estimating and enhancing cache effectiveness, " Proceedings of 4th International Workshop in Lanuages and Compilers for Parallel Computing, pp. 587--616, 1991.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Data Locality Optimizations for Multigrid Methods on Structured.. - Weiß   (Correct)

....amount of capacity misses is reduced. To obtain good performance for matrix multiplication the size of the tiles must be tailored to the cache size and other cache parameters like cache line size and set associativity. In some cases, however, blocking will increase the amount of conflict misses [FST91, WL91] 4.3.4 Data Prefetching The loop transformations discussed so far aim at reducing the capacity misses of a computation. Misses which are introduced by first time accesses are not addressed by these optimizations. Prefetching [VL00] allows the microprocessor to issue a data request before ....

....Algorithm 4.10 Applying array transpose. 2: double a[N ] M ] 1: Data structure after transposing: 2: double a[M ] N ] 4.4.4 Data Copying In Section 4.3 the loop blocking or tiling have been introduced as a technique to reduce the amount of capacity misses. Several investigations [FST91, WL91] have shown that blocked codes suffer from a high degree of conflict misses introduced by self interference. The interference will be demonstrated by means of Figure 4.7. The figure shows a part (block) of a big array a(i; j) which is to be reused by a blocked algorithm. Suppose that a ....

[Article contains additional citation context not shown here]

J. Ferrante, V. Sarkar, and W. Trash. On Estimating and Enhancing Cache Effectiveness. In U. Banerjee, editor, Fourth International Workshop on Languages and Compilers for Parallel Computing. Springer, August 1991.


Impact of Memory Hierarchy on Program Partitioning and.. - Kaplow, Maniatty.. (1995)   (Correct)

....of the recent work has focused, as we do, on the loop nests of a program. Porterfield [11] estimates the number of cache lines referenced by a loop, but considers only caches with unit line size. Moreover, his method is applicable only to loops with constant data dependency vectors. Ferrante et al.[7] determine an upper bound for the number of distinct cache lines referenced in a program using a detailed analysis of the data dependency of array references and index expressions in loop nests. Fahringer[3] de velops the notion of array access classes that are created by grouping together array ....

J. Ferrante, V. Sarkar, W. Thrash. On Estimating and Enhancing Cache Effectiveness. In Languages and Compilers for Parallel Computing, Fourth Internation Workshop, Santa Clara, CA. Springer-Verlag, NY, August 1991.


An Integer Linear Programming Approach for Optimizing.. - Kandemir Banerjee.. (1999)   (3 citations)  (Correct)

.... in the loop graph I of the nest graph a: Then we define Cost(VcXt[j] as the number of cache misses incurred due to array Q when the dimension j is its FCD and the loop I is placed in the innermost position in the nest Although several methods can be used to estimate Cost values (e.g. see [20] [5], 24] 6] 25] in our experiments we use a slightly modified form of the approach proposed by Sarkar et al. 24] as this approach is relatively easy to implement and results in good estimations for the codes encountered in practice. Since we also want to consider both the first level cache ....

J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In Proc. Languages and Compilers for Parallel Computing (LCPC'91), pages 32P343, 1991.


Design and Evaluation of a Compiler Algorithm for Prefetching - Mowry, Lam, Gupta (1992)   (320 citations)  (Correct)

....space is simply the set of innermost loops whose volume of data accessed in a single iteration does not exceed the cache size. We estimate the amount of data used for each level of loop nesting, using the reuse vector information. Our algorithm is a simplified version of those proposed previously[8, 11, 23]. We assume loop iteration counts that cannot be determined at compile time to be small this tends to minimize the number of prefetches. Later, in Section 4.2, we present results where unknown loop iteration counts are assumed to be large) A reuse can be exploited only if it lies within the ....

J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In Fourth Workshop on Languages and Compilers for Parallel Computing, Aug 1991.


Improving Memory Hierarchy Performance for Irregular .. - Mellor-Crummey.. (2001)   (6 citations)  (Correct)

....CPU speed and memory speed widens, systems are being constructed with deeper hierarchies. Achieving high performance on such systems requires tailoring the reference behavior of applications to better match the characteristics of a machine s memory hierarchy. Techniques such as loop blocking [1, 2, 3, 4, 5, 6] and data prefetching [4, 7, 8] have significantly improved memory hierarchy utilization for regular applications. A limitation of these techniques is that they aren t as effective for irregular applications. Improving performance for irregular applications is extremely important since large scale ....

....of the paper. 2. Related Work Blocking for improving the performance of memory hierarchies has been a subject of research for the last few decades. Early papers focused on blocking to improve paging performance [9, 10] but recent work has focused more narrowly on improving cache performance [2, 5, 4, 6]. Techniques similar to blocking have also been effectively applied to improvement of reuse in registers [1] Most of these methods deal with one level of the memory hierarchy only, although the cache and register techniques can be effectively composed. A recent paper by Navarro et al. examines ....

J. Ferrante, V. Sarkar, and W. Thrash, "On Estimating and Enhancing Cache Effectiveness," Proceedings of Fourth Workshop on Languages and Compilers for Parallel Computing, (Aug 1991).


Automatic Data Layout and Code Restructuring for.. - Jean-Francois Collard.. (1998)   (Correct)

....[4] CMEs now handle associative caches, and techniques were described [4] to automatically pad arrays and select appropriate tile size. But again, automatic restructuring and insertion of I Os are not addressed in their paper. Nearly all of these papers have benefited from the seminal work in [3]. 7 Future Work Let us consider the transitive closure of relation #, and the equivalence classes with respect to this transitive relation. The equivalence classes build a partition T1 ; Tn , like in Section 3.3. These equivalence classes for the transitive relation give directly, for a ....

J. Ferrante, V. Sarkar, and W. Trash. On estimating and enhancing cache effectiveness. In A. Nicolau U. Banerjee, D. Gelernter and D. Padua, editors, Proceedings of the Fourth Workshop on Languages and Compilers for Parallel Computing, pages 328--343, Santa Clara, CA, August 1991. Springer.


Bit Reversal On Uniprocessors - Karp (1996)   (3 citations)  (Correct)

....that will be used on the sixth reference. That one flushes the line needed for the seventh, and so on. Other power of two strides will also be bad. A stride of half the cache size gives only tworows# one quarter of the cache size, four rows# etc. Surprisingly, other strides can also cause problems[12]. A four way set associative, 32 KByte cache with a 128 byte line size will perform poorly when the stride is 103 We can fix our example bychanging the stride by a small amount. If the array in our simple program were dimensioned a(2 13 16,5) each reference would be to a differentrow in the ....

J. Ferrante, V. Sarkar, and W. Thrash, On Estimating and Enhancing Cache Effectiveness, Proceedings of the Fourth Workshop on Languagesand Compilers for Parallel Computing, (1991). To appear in Springer Verlag's Lecture Notes in Computer Science series.


A Matrix-Based Approach to Global Locality Optimization - Kandemir, Choudhary.. (1999)   (16 citations)  (Correct)

....5(c) is 4 under optimal layouts. If desired, our model can also accommodate sophisticated techniques that estimate the number of misses in a given program. For example, instead of locality coefficients, we can use cache miss estimations obtained using the techniques proposed by Ferrante et al. [19] or Sarkar et al. 49] 4.2 Formulation for the general case In the general case, when we handle a given loop nest during the global optimization process, some of the array layouts might be known, while the layouts of some arrays are yet to be determined. In such a case, we end up with a system ....

....used (LRU) policy. Like Gannon et al. 20] he focuses on estimating the cache miss rates for a given loop nest. These approaches, however, do not propose how to reach the best transformed version, and imply that a number of candidate solutions should be evaluated. The works of Ferrante et al. [19] and Sarkar et al. 49] can also be considered to belong to this category. Wolf and Lam [56] describe reuse vectors and explain how they can be used for optimizing cache locality. Their approach involves first optimizing nest locality using uni modular loop transformations and then applying ....

J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In Proc. Languages and Compilers for Parallel Computing (LCPC'91), pages 328--343, 1991.


Software Support For Improving Locality in Advanced Scientific Codes - Tseng (2000)   (Correct)

....and engineers. Because of trends in computer architectures, lessons learned here are also likely to prove very useful for other application domains, including image processing and high performance databases. 6 Related Work There has been much work on improving locality in scientific applications [3, 24, 25, 26, 39, 40, 57, 67, 69, 70, 79, 87]. Here we will focus on the work which is most relevant to our proposed research. A number of researchers have investigated tiling as a means of exploiting reuse. Lam, Rothberg, Wolf show conflict misses can severely degrade the performance of tiling [51] Wolf and Lam analyze temporal and ....

J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, Fourth International Workshop, Santa Clara, CA, August 1991. Springer-Verlag.


P³T+: A Performance Estimator for Distributed and.. - Fahringer, Pozgaj   (Correct)

....level 0 in E. Further, let SE denote the set of statements in E outside of loops. Then the accumulated computation time ctE(E) implied by all statements S 2 SE and loop nests L 2 LE , is defined as ctE(E) X s2SE ctS(s) X l2LE ctL(l) 4. 4 Number of Cache Misses It is well known [18, 45, 36, 33, 24] that inefficient memory access patterns and data mapping into the memory hierarchy (data locality problem) of a single processor cause major program performance degradation. P 3 T estimates the number of accessed cache lines which correlates with the number of cache misses. This parameter is ....

J. Ferrante, V. Sarkar, and W. Trash. On estimating and enhancing cache effectiveness. In Proc. of the 4th Workshop on Languages and Compilers for Parallel Computing, Santa Clara, CA, Aug 1991.


Tiling Optimizations for 3D Scientific Computations - Rivera, Tseng (2000)   (6 citations)  (Correct)

.... [1, 17, 24, 25] Data transformations have also been combined with loop transformations [5, 16] Severalcachecapacity estimation techniques havebeen proposed to help guide data locality optimizations [9, 33] These techniques can also be enhanced to take into account limited cache associativity [8, 30]. More recently, Ghosh et al. developed symbolic cache representation which are highly accurate in predicting cache misses [11, 12, 13] Their cache miss equations can be used to predict the number of cache misses for a computation, and also be used to guide compiler transformations such as tiling ....

J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, Fourth International Workshop, Santa Clara, CA, August 1991. Springer-Verlag.


Tiling Optimizations for 3D Scientific Computations - Rivera, Tseng (2000)   (6 citations)  (Correct)

.... 17, 24, 25] Data transformations have also been combined with loop transformations [5, 16] Several cache capacity estimation techniques have been proposed to help guide data locality optimizations [9, 33] These techniques can also be enhanced to take into account limited cache associativity [8, 30]. More recently, Ghosh et al. developed symbolic cache representation which are highly accurate in predicting cache misses [11, 12, 13] Their cache miss equations can be used to predict the number of cache misses for a computation, and also be used to guide compiler transformations such as tiling ....

J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, Fourth International Workshop, Santa Clara, CA, August 1991. Springer-Verlag.


P³T+: A Performance Estimator for Distributed and Parallel.. - Pozgaj, Fahringer (2000)   (Correct)

....can be useful to analyze the important communication computation relationship by using also the communication parameters described above. evaluate whether there is enough computation contained in a loop, thus parallelizing the loop may be effective. ffl Cache misses It is well known [27, 81, 55, 52, 33] that inefficient memory access patterns and data mapping into the memory hierarchy (data locality problem) of a single processor can cause major program performance degradation. P 3 T estimates the number of accessed cache lines which correlates with the number of cache misses. All P 3 T ....

....function. Predicting the distinct number of cache lines accessed inside of a nest of loops has seen considerable research activity during the last few years. In general it is assumed that the smaller this number the smaller the critical number of cache misses. J. Ferrante, V. Sarkar and W. Trash [33] compute an upper bound for the number of cache lines accessed in a sequential program, which allows them to successfully guide loop interchange. Part of their techniques are based on polynomial evaluations. A. Porterfield [64] obtains an upper bound for the total number of cache misses by ....

J. Ferrante, V. Sarkar, and W. Trash. On estimating and enhancing cache effectiveness. In Proc. of the 4th Workshop on Languages and Compilers for Parallel Computing, Santa Clara, CA, Aug 1991.


A Modal Model of Memory - Mitchell, Carter, Ferrante (2001)   (2 citations)  Self-citation (Ferrante)   (Correct)

....the modal model of memory. Our system combines limited static analysis with bounded experimentation to take advantage of the modal nature of performance. 1. 1 Limited Static Analysis Many compilation strategies estimate the profitability of a transformation with a purely static analysis [9, 28, 26, 23, 10, 13], which in many cases can lead to good optimization choices. However, by relying only on static information, the analysis can fail on two counts. First, the underlying mathematical tools, such as integer linear programming, often are restricted to simple program structures. For example, most ....

J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In Workshop on Languages and Compilers for Parallel Computing, 1991.


The Deleterious Nature of Interacting Tiling.. - Mitchell, Carter..   Self-citation (Ferrante)   (Correct)

....constraint does not fully utilize block size information, and [1] limits block size to 1. In contrast to these and other approaches to tiling size selection, our multi level approach uses the block size at each level, and uses a global cost function. We employ counting arguments similar to [28, 19, 16, 1, 21] to estimate the number of misses in a module. Some of this work [19, 21] does not apply their results to determine tile size. No previous work has applied these arguments at multiple levels to show that optimization decisions made independently can lead to performance degradation. With respect ....

....size to 1. In contrast to these and other approaches to tiling size selection, our multi level approach uses the block size at each level, and uses a global cost function. We employ counting arguments similar to [28, 19, 16, 1, 21] to estimate the number of misses in a module. Some of this work [19, 21] does not apply their results to determine tile size. No previous work has applied these arguments at multiple levels to show that optimization decisions made independently can lead to performance degradation. With respect to optimization, Bacon, Graham and Sharp [4] observe: There is no single ....

Jeanne Ferrante, Vivek Sarkar, and Wendy Thrash. On estimating and enhancing cache effectiveness. (589), 1991. Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing.


Exploiting Cache Locality At Run-Time - Yan (1998)   (Correct)

No context found.

J. Ferrante, V. Sarkar, and W. Thrash, "On estimating and enhancing cache effectiveness, " Proceedings of 4th International Workshop in Lanuages and Compilers for Parallel Computing, pp. 587--616, 1991.


Analysis and Evaluation of The Synchronized - Pipelined Parallelism Model (2006)   (Correct)

No context found.

Jeanne Ferrante, Vivek Sarkar, and W. Thrash. "On Estimating and Enhancing Cache Effectiveness". In Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing, pages 328--343, London, UK, 1992. Springer-Verlag.


P³T+: A Performance Estimator for Distributed and.. - Fahringer, Pozgaj (1999)   (Correct)

No context found.

J. Ferrante, V. Sarkar, and W. Trash. On estimating and enhancing cache effectiveness. In Proc. of the 4th Workshop on Languages and Compilers for Parallel Computing, Santa Clara, CA, Aug 1991.


An Integer Linear Programming Approach for Optimizing.. - Kandemir Banerjee.. (1999)   (3 citations)  (Correct)

No context found.

J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In Proc. Languages and Compilers for Parallel Computing (LCPC'91), pages 328--343, 1991.


Estimating Cache Performance for Sequential and Data Parallel.. - Fahringer (1997)   (2 citations)  (Correct)

No context found.

J. Ferrante, V. Sarkar, and W. Trash. On estimating and enhancing cache effectiveness. In Proc. of the 4th Workshop on Languages and Compilers for Parallel Computing, Santa Clara, CA, Aug 1991.


P³T+: A Performance Estimator for Distributed and.. - Fahringer, Pozgaj (2001)   (Correct)

No context found.

J. Ferrante, V. Sarkar, and W. Trash. On estimating and enhancing cache effectiveness. In Proc. of the 4th Workshop on Languages and Compilers for Parallel Computing, Santa Clara, CA, Aug 1991.


Inter-array Data Regrouping - Ding, Kennedy (1999)   (9 citations)  (Correct)

No context found.

J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, Fourth International Workshop, Santa Clara, CA, August 1991. Springer-Verlag.


Exploiting Superword-Level Locality in Multimedia Extension.. - Shin, Chame, Hall (2003)   (Correct)

No context found.

J. Ferrante, V. Sarkar, and W. Thrash, "On estimating and enhancing cache effectiveness," in Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing, (Santa Clara, California), pp. 328--343, August 1991.


Projecting Periodic Polyhedra for Loop Nest Analysis - Meister (2004)   (2 citations)  (Correct)

No context found.

J. Ferrante, J. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In Advances in Languages and Compilers for Parallel Processing, pages 328--343. MIT Press, 1991.


Compiler-Controlled Caching in Superword Register Files for.. - Shin, Chame, Hall (2002)   (2 citations)  (Correct)

No context found.

J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing, pages 328--343, Santa Clara, California, August 1991.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC