MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  An ILP Approach for Optimizing Cache Locality

Download:
Download as a PDF | Download as a PS
by M. Kandemir, P. Banerjee, A. Choudhary, J. Ramanujam, E. Ayguade, Prof Prith Banerjee
ftp://ftp.ac.upc.es/pub/reports/CEPBA/1998/UPC-CEPBA-1998-28.ps.Z
Add To MetaCart

Abstract:

The delivered performance on modern processors that employ deep memory hierarchies is closely related to the performance of the memory subsystem. Compiler optimizations aimed at improving cache locality are critical in realizing the performance potential of powerful processors. For scientific applications, several loop transformations have been shown to be useful in improving both temporal and spatial locality. Recently, there has been some work in the area of data layout optimizations, i.e., changing the memory layouts of multi-dimensional arrays from the language-defined default such as column-major storage in Fortran. The effect of such memory layout decisions is on the spatial locality characteristics of loop nests. While data layout transformations are not constrained by data dependences, they have no effect on temporal locality. On the other hand, loop transformations are not readily applicable to imperfect loop nests and are constrained by data dependences. More importantly, loop transformations affect the memory access patterns of all the arrays accessed in a loop nest, and as a result, the locality characteristics of some of the arrays may worsen. This paper presents a technique based on integer linear programming (ILP) that attempts to derive the best combination of loop and data layout transformations. Prior attempts to unify loop and data

Citations

3148 Computer Architecture: A Quantitative Approach – Hennessy, Patterson - 1996
676 A data locality optimizing algorithm – Wolf, Lam - 1991
549 High-Performance Compilers for Parallel Computing – Wolfe
487 The cache performance and optimizations of blocked algorithms – LAM, ROTHBERG, et al. - 1991
253 Improving data locality with loop transformations – McKinley, Carr, et al. - 1996
251 Strategies for cache and local memory management by global program transformation – Gannon, Jalby, et al. - 1988
173 More iteration space tiling – Wolfe - 1989
168 Tile size selection using cache organization and data layout – Coleman, McKinley - 1995
159 Data and computation transformation for multiprocessors – Anderson, Amarasinghe, et al. - 1995
152 Unifying data and control transformations for distributed shared memory machines – Cierniak, Li - 1995
151 Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers – Gupta, Banerjee - 1992
140 The Livermore Fortran kernels: a computer test of the numerical performance range – McMahon - 1986
135 Software methods for improvement of cache performance on supercomputer applications – Porterfield - 1989
129 Data-centric multi-level blocking – Kodukula, Ahmed, et al. - 1997
109 Reducing false sharing on shared memory multiprocessors through compile time data transformations – Jeremiassen, Eggers - 1995
104 Data transformations for eliminating conflict misses – Rivera, Tseng - 1998
100 On estimating and enhancing cache effectiveness – Ferrante, Sarkar, et al. - 1991
87 Parafrase-2: An environment for parallelizing, partitioning, synchronizing, and scheduling programs on multiprocessors – Polychronopoulos, Girkar, et al. - 1989
82 Eliminating false sharing – Eggers, Jeremiassen - 1991
75 Automatic Data Layout for High Performance Fortran – Kremer - 1995
71 The Omega Library Interface Guide – Kelly, Maslov, et al. - 1996
62 A quantitative analysis of loop nest locality – McKinley, Temam - 1996
56 Compiling for NUMA Parallel Machines – Li - 1993
45 Improving the Performance of Virtual Memory Computers – Abu-Sufah - 1979
45 Integer and combinatorial optimization. WileyInterscience – Nemhauser, Wolsey - 1999
44 Improving locality using loop and data transformations in an integrated framework – Kandemir, Choudhary, et al. - 1998
39 Non-singular data transformations: Definition, validity, applications – O’Boyle, Knijnenburg - 1996
36 New CPU benchmark suites from SPEC – Dixit - 1992
35 Optimizing data locality by array restructuring – Leung, Zahorjan - 1995
34 A Novel Approach Towards Automatic Data Distribution – Garcia, Ayguade, et al. - 1995
34 Reduction of cache coherence overhead by compiler data layout and loop transformation – Ju, Dietz - 1992
33 A compiler algorithm for optimizing locality in loop nests – Kandemir, Ramanujam, et al. - 1997
26 A matrix-based approach to the global locality optimization problem – Kandemir, Choudhary, et al. - 1998
25 Performance Computational Chemistry Group, NWChem, A Computational Chemistry Package for Parallel Computers, Version 4.1 – High
24 Automatic selection of Dynamic Data Partitioning Schemes for Distributed-Memory Multicomputers – Palermo, Banerjee - 1995
22 Combining Optimization for Cache and Instruction-Level Parallelism – Carr - 1996
22 Hierarchical tiling: a methodology for high performance – Carter, Ferrante, et al. - 1996
21 Integrating loop and data transformations for global optimisation – O’Boyle, Knijnenburg - 1998
18 Compiling communication efficient programs for massively parallel machines – Li, Chen - 1991
17 Dynamic Data Distribution with Control Flow Analysis – Garcia, Ayguade, et al. - 1996
17 The combined effectiveness of unimodular transformations, tiling, and software prefetching – Saavedra, Mao, et al. - 1996
15 A hyperplane based approach for optimizing spatial locality in loop nests – Kandemir, Choudhary, et al. - 1998
14 Locality analysis for distributed shared-memory multiprocessors – Sarkar, Gao, et al. - 1996
12 Impact of cache interferences on usual numerical dense loop nests – Temam, Fricker, et al. - 1993
11 Data-distribution support on distributed-shared memory multiprocessors – Chandra, Chen, et al. - 1997
11 Automatic partitioning of data and computations on scalable shared memory multiprocessors – Tandri, Abdelrahman - 1997
9 Transformations for imperfectly nested loops – Kodukula, Pingali - 1996
9 The Perfect Club Benchmarks: Effective performance evaluation of supercomputers – Club - 1989
3 A graph based framework to detect optimal memory layouts for improving data locality – Kandemir, Choudhary, et al. - 1999
2 lp solve version 2.1, Available from ftp:// ftp.es.ele.tue.nl/pub/lp solve – Berkelaar