14 citations found. Retrieving documents...
V. Sarkar, G. Gao, and S. Han. Locality analysis for distributed shared-memory multiprocessors. In Proc. the Ninth International Workshop on Languages & Compilers for Parallel Computing (LCPC'96), Santa Clara, California, August 1996.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
A Matrix-Based Approach to Global Locality Optimization - Kandemir, Choudhary.. (1999)   (16 citations)  (Correct)

....layouts. If desired, our model can also accommodate sophisticated techniques that estimate the number of misses in a given program. For example, instead of locality coefficients, we can use cache miss estimations obtained using the techniques proposed by Ferrante et al. 19] or Sarkar et al. [49]. 4.2 Formulation for the general case In the general case, when we handle a given loop nest during the global optimization process, some of the array layouts might be known, while the layouts of some arrays are yet to be determined. In such a case, we end up with a system of equations of the ....

....Like Gannon et al. 20] he focuses on estimating the cache miss rates for a given loop nest. These approaches, however, do not propose how to reach the best transformed version, and imply that a number of candidate solutions should be evaluated. The works of Ferrante et al. 19] and Sarkar et al. [49] can also be considered to belong to this category. Wolf and Lam [56] describe reuse vectors and explain how they can be used for optimizing cache locality. Their approach involves first optimizing nest locality using uni modular loop transformations and then applying tiling to the loops that ....

V. Sarkar, G. R. Gao, and S. Han. Locality analysis for distributed shared-Memory multiprocessors. In Proc. 9th Workshop on Languages and Compilers for Parallel Computing (LCPC'96), Santa Clara, California, August 1996.


Solving Tiling Using Optimal Permutations - Rastello, Pande (1997)   (Correct)

....between processors considering the delays of the network. Tiling is one of the most popular techniques used in Automatic Parallelization for minimizing communication in mapping iteration spaces on to the processors [12, 1, 11] The important motivation behind tiling is to improve locality [7] and data reuse [2, 10, 4] that results in lesser communication. There have been a number of important research approaches which have investigated different aspects of tiling. The important problem to be solved in tiling is to determine the shape and the size of a tile [9] Once the shape and the ....

Gao, G. R., Sarkar V. and Han S., "Locality Analysis for Distributed Shared Memory Multiprocessors", Proceedings of the 9th Workshop on Languages and Compilers for Parallel Computing, August '96.


The Deleterious Nature of Interacting Tiling.. - Mitchell, Carter..   (Correct)

....constraint does not fully utilize block size information, and [1] limits block size to 1. In contrast to these and other approaches to tiling size selection, our multi level approach uses the block size at each level, and uses a global cost function. We employ counting arguments similar to [28, 19, 16, 1, 21] to estimate the number of misses in a module. Some of this work [19, 21] does not apply their results to determine tile size. No previous work has applied these arguments at multiple levels to show that optimization decisions made independently can lead to performance degradation. With respect ....

....size to 1. In contrast to these and other approaches to tiling size selection, our multi level approach uses the block size at each level, and uses a global cost function. We employ counting arguments similar to [28, 19, 16, 1, 21] to estimate the number of misses in a module. Some of this work [19, 21] does not apply their results to determine tile size. No previous work has applied these arguments at multiple levels to show that optimization decisions made independently can lead to performance degradation. With respect to optimization, Bacon, Graham and Sharp [4] observe: There is no single ....

G. Gao, V. Sarkar, and S. Han. Locality analysis for distributed shared-memory multiprocessors. In Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing, 1996.


A Graph Based Framework to Detect Optimal Memory.. - Kandemir.. (1999)   (Correct)

....columns. e) unacceptable as more than one edge are selected between two neighboring columns. of the edge that connects the i th node of U with the j th node of V. In practice the edge costs should be computed as accurately as possible using the techniques based on miss rate estimations [20]. However, the derivation of exact cost expressions is beyond the scope of this paper. Once the costs are determined, the rest of the technique to be presented is fully automatic. Let Cost 0 (EUV [i; j] be Cost(EUV [i; j] if EUV [i; j] is 1 (i.e. EUV [i; j] is selected for a given path on the ....

....the total number of misses will be minimized. In the following discussion (for sake of simplicity of presentation) we assume that Cost 0 (EUV [i; j] can take only three possible values: TL, SL, and NL. However, as mentioned above, our technique can accommodate more accurate cost estimations [20]. We impose three conditions to ensure the correctness of the solution: 1) We should select a path from S to T in the LG, 2) We should select a single node from each column, and (3) We should select the edge between two selected nodes. Consider now Figure 2 in order to interpret these ....

[Article contains additional citation context not shown here]

V. Sarkar, G. Gao, and S. Han. Locality analysis for distributed shared-memory multiprocessors. In Proc. the Ninth International Workshop on Languages & Compilers for Parallel Computing (LCPC'96), Santa Clara, California, August 1996.


An ILP Approach for Optimizing Cache Locality - Kandemir, Banerjee.. (1998)   (Correct)

....of Q. In addition to the node columns, the LG has a start 1 In this paper we assume that the condition c cache line size is always satisfied and we do not consider it further. Notice, however, that different values of c lead to different degrees of spatial locality. The cost model we use [47] captures this aspect. 2 In all the graphs shown we assume that the direction of the edges is from left to right. S S S P P P Q Q Q R R R S S S i k P P P Q Q Q R R R S S i k T T T Tr St Figure 2: The MLG and an optimal solution for the program fragment shown in Figure 1(a) node (marked with ....

....nest graph x. Then we define Cost(V Q xl [j] as the number of cache misses incurred due to array Q when the dimension j is its FCD and the loop l is placed in the innermost position in the nest x. Although several methods can be used to estimate Cost(V Q xl [j] e.g. see [35] 13] 50] [47], 14] 51] 44] and the references therein) in our experiments we use a slightly modified form of the approach proposed by Sarkar et al. 47] as this approach is relatively easy to implement and results in good estimations for the codes encountered in practice. Since we also want to consider ....

[Article contains additional citation context not shown here]

V. Sarkar, G. Gao, and S. Han. Locality analysis for distributed shared-memory multiprocessors. In Proc. the Ninth International Workshop on Languages & Compilers for Parallel Computing (LCPC'96), Santa Clara, California, August 1996.


Quantifying the Multi-Level Nature of Tiling Interactions - Mitchell, Carter.. (1997)   (23 citations)  (Correct)

....available information increases, the cost functions which guide tiling optimization choices must be similarly expanded. Researchers have tried several solutions to this information expansion problem. Many have simply ignored the multi level information, instead relying on one level cost functions [28, 11, 26]. Others rephrase program optimization as a search problem and invent heuristics to prune the search [15, 30] We suggest a different solution which first formulates the system to be optimized by quantifying both the effects of tiling choices and the interactions between such choices, and then ....

....2.2 Related Work Related work falls into four categories: quantification of performance, determining tile characteristics for a single level, unifying optimizations for a single level and unifying optimizations for multiple levels. Quantifying performance: We employ counting arguments similar to [21, 13, 11, 1, 26] to estimate the number of misses in a module. No previous work has applied these arguments to multiple levels of the memory hierarchy. Single level tile characteristics: Works such as [21, 11, 1] give methods for choosing tile size in a nested loop; 21] uses a fits in constraint based only on ....

Vivek Sarkar, Guang R. Gao, and Shaohua Han. Locality analysis for distributed shared-memory multiprocessors. In Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing, 1996.


Quantifying the Multi-Level Nature of Tiling Interactions - Mitchell (1997)   (23 citations)  (Correct)

....grant in association with the Intel Corporation. A preliminary version of this paper appeared in the Tenth International Workshop for Languages and Compilers for Parallel Computing, August, 1997. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 1998 2 relying on one level cost functions [1] 10] [11]. Others rephrase program optimization as a search problem and invent heuristics to prune the search [12] 13] We explore a different solution which first formulates the system to be optimized by quantifying both the effects of tiling choices and the interactions between such choices in a single ....

....work falls into four categories: quantification of performance, determining tile characteristics for a single level, unifying optimizations for a single level and unifying optimizations for multiple levels. Quantifying performance: We employ counting arguments similar to [20] 23] 10] 24] [11] to estimate the number of misses in a module. No previous work has applied these arguments to multiple levels of the memory hierarchy. Single level tile characteristics: Works such as [20] 10] 24] give methods for choosing tile size in a nested loop; 20] uses a fits in constraint based ....

[Article contains additional citation context not shown here]

Vivek Sarkar, Guang R. Gao, and Shaohua Han, "Locality analysis for distributed sharedmemory multiprocessors," in Languages and Compilers for Parallel Computing, 1996.


Quantifying the Multi-Level Nature of Tiling Interactions - Nicholas Mitchell (1997)   (23 citations)  (Correct)

.... This work supported in part by NSF CCR 9504150 and a UC MICRO grant in association with the Intel Corporation. Researchers have developed a number solutions to this information expansion problem. Many have simply ignored the multi level information, instead relying on one level cost functions [27, 11, 25]. Others rephrase program optimization as a search problem and invent heuristics to prune the search [15, 29] We suggest a different solution which first formulates the system to be optimized by quantifying both the effects of tiling choices and the interactions between such choices in a single ....

....2.2 Related Work Related work falls into four categories: quantification of performance, determining tile characteristics for a single level, unifying optimizations for a single level and unifying optimizations for multiple levels. Quantifying performance: We employ counting arguments similar to [21, 13, 11, 1, 25] to estimate the number of misses in a module. No previous work has applied these arguments to multiple levels of the memory hierarchy. Single level tile characteristics: Works such as [21, 11, 1] give methods for choosing tile size in a nested loop; 21] uses a fits in constraint based only on ....

V. Sarkar, G. R. Gao, and S. Han. Locality analysis for distributed shared-memory multiprocessors. In LCPC, 1996.


Communication-Minimal Tiling of Uniform Dependence Loops - Xue (1996)   (12 citations)  (Correct)

....on the inequality of arithmetic and geometric means and several basic concepts from convex cones. Motivated by the insights provided by this framework, we intend to pursue one important related problem of tiling nested loops to improve cache locality. Some earlier work in this area can be found in [5, 9, 13, 20] 9 Acknowledgements I would like to thank all referees for their comments and suggestions. I also want to thank Referee B for pointing out a mistake in the formulation of a lemma in the original version of the paper, which has led to a split of that lemma into Lemmas 4 5, and 6 in this paper. ....

G. R. Gao, V. Sarkar, and S. Han. Locality analysis for distributed shared-memory multiprocessors. In Proc. of the 9th Workshop on Languages and Compilers for Parallel Computing, Aug. 1996.


Compiler-Assisted Cache Replacement: Problem.. - Yang, Govindarajan.. (2003)   (3 citations)  Self-citation (Gao)   (Correct)

No context found.

Guang R. Gao, Vivek Sarkar, and Shaohua Han. Locality analysis for distributed sharedmemory multiprocesors. In Proc. of the 1996 International Workshop on Languages and Compilers for Parallel Computing(LCPC), San Jose, California, Aug 1996.


Compiler-Assisted Cache Replacement: Problem.. - Yang, Govindarajan.. (2003)   (3 citations)  Self-citation (Gao)   (Correct)

No context found.

Guang R. Gao, Vivek Sarkar, and Shaohua Han. Locality analysis for distributed sharedmemory multiprocesors. In Proc. of the 1996 International Workshop on Languages and Compilers for Parallel Computing(LCPC), San Jose, California, Aug 1996.


Guiding Program Transformations with Modal Performance Models - Mitchell (2000)   (2 citations)  Self-citation (Sarkar)   (Correct)

....increases, the cost functions which guide tiling optimization choices must be similarly expanded. 20 21 Researchers have developed a number of solutions to this information expansion problem. Many have simply ignored the multi level information, instead relying on one level cost functions [112, 26, 99]. Others rephrase program optimization as a search problem and invent heuristics to prune the search [46, 114] We explore a di#erent solution which first formulates the system to be optimized by quantifying both the e#ects of tiling choices and the interactions between such choices in a single ....

....N) B k (H, N) N 3 1 H 1 W 1 N 1 S k 2 HW When using this simple miss count formula, we must scale back the size of TLB and cache. The simple miss count only counts lines from A towards capacity (and ignores many other issues, as well) Therefore, similar to [99] and the e#ective cache size of [114] we modify the fits in constraint with a fudge factor of 75 : 30 B k (H, W ) # 0.75C k . Again, we need only use this fudge factor when using M simple k , as this formula ignores many architecture code interactions. Minimizing the M simple k subject ....

Vivek Sarkar, Guang R. Gao, and Shaohua Han. Locality analysis for distributed shared-memory multiprocessors. In Workshop on Languages and Compilers for Parallel Computing, 1996.


An Integer Linear Programming Approach for Optimizing.. - Kandemir Banerjee.. (1999)   (3 citations)  (Correct)

No context found.

V. Sarkar, G. Gao, and S. Han. Locality analysis for distributed shared-memory multiprocessors. In Proc. the Ninth International Workshop on Languages & Compilers for Parallel Computing (LCPC'96), Santa Clara, California, August 1996.


On Tiling as a Loop Transformation - Xue (1997)   (19 citations)  (Correct)

No context found.

G. R. Gao, V. Sarkar, and S. Han. Locality analysis for distributed shared-memory multiprocessors. In Proc. of the 9th Workshop on Languages and Compilers for Parallel Computing, Aug. 1996.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC