14 citations found. Retrieving documents...
M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam. A linear algebra framework for automatic determination of optimal data layouts. IEEE Transactions on Parallel and Distributed Systems, 10(2):115--135, 1999. 2.1.1

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
New Results on Array Contraction - Darte, Huard (2002)   (1 citation)  (Correct)

....more and more important. First, as the gap between general purpose single chip processor and memory speeds grew, exploiting memory hierarchy became fundamental to achieve good performance. A large amount of compiler work has therefore focused on loop transformations and optimized data layouts (see [26, 21, 1, 20, 10, 16] to quote but a few) for better cache reuse and data prefetching. Now, memory optimizations become even more important in the context of compilation for embedded processor applications. Performance is not necessarily the only issue, but power consumption, memory design (size, type, etc. are new ....

Mahmut Kandemir, Alok Choudhary, Nagaraj Shenoy, Prithviraj Banerjee, and J. Ramanujam. A linear algebra framework for automatic determination of optimal data layout. IEEE Transactions on Parallel and Distributed Systems, 10(2):115--135, February 1999. 13


Data Cache Locking for Higher Program Predictability - Vera, Lisper (2003)   (Correct)

....lately to exploit caches e#ciently. Software controlled prefetching [27] hides the memory latency by overlapping a memory access with computation and other accesses. Another useful optimization is applying loop transformations such as tiling [4, 7, 20, 34] and data transformations such as padding [5, 17, 28, 30]. In all cases, a fast and accurate assessment of a program s cache behavior at compile time is needed to make an appropriate choice of parameter values. 1.1 Caches in Real Time Systems Real time systems rely on the assumption that tasks worst case execution times (WCETs) are known. In order to ....

M. Kandemir, A. Choudhary, P. Banerjee, and J. Ramanujam. A linear algebra framework for automatic determination of optimal data layouts. IEEE Transactions on Parallel and Distributed Systems, 10(2):115--135, Feb. 1999.


Data Cache Locking for Higher Program Predictability - Vera, Lisper, Xue (2003)   (Correct)

....proposed lately to exploit caches e#ciently. Software prefetching [26] hides the memory latency by overlapping a memory access with computation and other accesses. Another useful optimization is applying loop transformations such as tiling [4, 7, 19, 34] and data transformations such as padding [5, 16, 27, 30]. In all cases, a fast and accurate assessment of a program s cache behavior at compile time is needed to make an appropriate choice of parameter values. 1.1 Caches in Real Time Systems Real time systems rely on the assumption that tasks worstcase execution times (WCETs) are known. In order to ....

M. Kandemir, A. Choudhary, P. Banerjee, and J. Ramanujam. A linear algebra framework for automatic determination of optimal data layouts. IEEE Transactions on Parallel and Distributed Systems, 10(2):115--135, Feb. 1999.


Coyote Project: The Simulator - Vera   (Correct)

....been proposed to exploit caches e#ciently. Software controlled prefetching [12] hides the memory latency by overlapping a memory access with computation and other accesses. Another useful optimization is applying loop transformations such as tiling [4, 6, 11, 17] and data layout transformations [5, 10, 13, 14]. In all cases, a fast and accurate assessment of a program s cache behavior at compile time is needed to make an appropriate choice of parameter values. Real time systems rely on the assumption that tasks worst case execution times (WCETs) are known. In order to get an accurate WCET, a tight ....

M. Kandemir, A. Choudhary, P. Banerjee, and J. Ramanujam. A linear algebra framework for automatic determination of optimal data layouts. IEEE Transactions on Parallel and Distributed Systems, 10(2):115--135, Feb. 1999.


Scheduling and Partitioning for Multiple Loop Nests - Wang, Zhuge, Sha (2001)   (Correct)

....so as to increase computation granularity and thereby reduce communication time. The traditional loop tiling only considers the singleton loop and lack of the consideration of prefetching and scheduling. 13 183 Another technique related to the memory access latency is data layout optimizations [3]. They modify the memory storage order of multi dimensional arrays so as to reduce the cache miss rate. Loop partition is in the higher level above the data layout transformation. We assume full associativity in the rst level memory. For the lower associativity, data layout transformation can be ....

M. Kandemir, A. Choudhary, and N. Shenoy. A linear algebra framework for automatic determination of optimal data layouts. IEEE Transactions on Parallel and Distributed Systems, 10(2), Feb 1999.


Let's Study Whole-Program Cache Behaviour Analytically - Vera, Xue (2002)   (3 citations)  (Correct)

....gap between processor and main memory speeds. However, caches are e ective only when programs exhibit sucient data locality in their memory access patterns. Optimising compilers attempt to apply loop transformations such as tiling [3, 6, 14, 24, 26] and data transformations such as padding [12, 13, 17, 18] to improve the cache performance of a program. The models guiding these transformations (in making an appropriate choice of parameter values such as tile and pad sizes) are mostly heuristic or approximate. Memory system designers often use cache simulators to evaluate alternative design options. ....

M. Kandemir, A. Choudhary, P. Banerjee, and J. Ramanujam. A linear algebra framework for automatic determination of optimal data layouts. IEEE Transactions on Parallel and Distributed Systems, 10(2):115-135, Feb. 1999.


Let's Study Whole-Program Cache Behaviour Analytically - Vera, Xue (2001)   (3 citations)  (Correct)

....gap between processor and main memory speeds. However, caches are e ective only when programs exhibit sucient data locality in their memory access patterns. Optimising compilers attempt to apply loop transformations such as tiling [4, 14, 16, 27, 29] and data transformations such as padding [12, 13, 19, 20] to improve the cache performance of a program. The models guiding these transformations (in making an appropriate choice of parameter values such as tile and padding sizes) are mostly heuristic or approximate. Memory system designers often use cache simulators to evaluate alternative design ....

M. Kandemir, A. Choudhary, P. Banerjee, and J. Ramanujam. A linear algebra framework for automatic determination of optimal data layouts. IEEE Transactions on Parallel and Distributed Systems, 10(2):115-135, Feb. 1999.


Compiler Optimizations for Parallel Sparse Programs with.. - Chang, Chuang, Lee (1999)   (Correct)

....to reduce it to weighted bipartite graph matching. Kennedy et al. 11] determine data layouts automatically on distributed memory environments by using 0 1 integer programming [3] Gupta et al. 8] extend the work of Li and Chen by presenting a framework based on weighted graphs. Kandemir et al. [10] present a framework that can automatically determine data layouts with respect to loop transformations. Their work can find optimal data layouts for all arrays at compiler time. Chatterjee, Gilbert, Schreiber, and Teng [5] presents a framework for the automatic determination of array alignments. ....

M. Kandemir, A. Choudhary, N. Shenoy, and P. Banerjee, and J. Ramanujam. A linear algebra framework for automatic determination of optimal data layouts. IEEE Transaction on Parallel and Distributed Systems, 10(2):115--135, February 1999.


Software Methods to Improve Data Locality and Cache Behavior - Beyls (2004)   (Correct)

No context found.

M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam. A linear algebra framework for automatic determination of optimal data layouts. IEEE Transactions on Parallel and Distributed Systems, 10(2):115--135, 1999. 2.1.1


Optimizing Program Locality through CMEs and GAs - Vera, Abella.. (2003)   (2 citations)  (Correct)

No context found.

M. Kandemir, A. Choudhary, P. Banerjee, and J. Ramanujam. A linear algebra framework for automatic determination of optimal data layouts. IEEE Transactions on Parallel and Distributed Systems, 10(2):115--135, Feb. 1999.


Software Methods to Improve Data Locality and Cache Behavior - Beyls (2004)   (Correct)

No context found.

M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam. A linear algebra framework for automatic determination of optimal data layouts. IEEE Transactions on Parallel and Distributed Systems, 10(2):115--135, 1999. 2.1.1


Optimizing Program Locality through CMEs and GAs - Vera, Abella, Gonzalez, Llosa (2003)   (2 citations)  (Correct)

No context found.

M. Kandemir, A. Choudhary, P. Banerjee, and J. Ramanujam. A linear algebra framework for automatic determination of optimal data layouts. IEEE Transactions on Parallel and Distributed Systems, 10(2):115--135, Feb. 1999.


Efficient and Accurate Analytical Modeling of Whole-Program Data .. - Xue, Vera (2003)   (Correct)

No context found.

M. Kandemir, A. Choudhary, P. Banerjee, and J. Ramanujam. A linear algebra framework for automatic determination of optimal data layouts. IEEE Transactions on Parallel and Distributed Systems, 10(2):115--135, Feb. 1999.


New Complexity Results on Array Contraction and Related Problems - Darte, Huard (2002)   (Correct)

No context found.

Mahmut Kandemir, Alok Choudhary, Nagaraj Shenoy, Prithviraj Banerjee, and J. Ramanujam. A linear algebra framework for automatic determination of optimal data layout. IEEE Transactions on Parallel and Distributed Systems, 10(2):115--135, February 1999.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC