(Enter summary)
Abstract: Previous research has used program transformation to
introduce parallelism and to exploit data locality. Unfortunately,
these two objectives have usually been considered
independently. This work explores the tradeoffs
between effectively utilizing parallelism and memory
hierarchy on shared-memory multiprocessors. We
present a simple, but surprisingly accurate, memory
model to determine cache line reuse from both multiple
accesses to the same memory location and from
consecutive memory access.... (Update)
Cited by: More
Exploiting Cache Locality At Run-Time - Yan (1998)
(Correct)
Synthesizing Transformations for Locality Enhancement of.. - Ahmed, Mateev, Pingali (2001)
(Correct)
P³T+: A Performance Estimator for Distributed and.. - Fahringer, Pozgaj (1999)
(Correct)
Similar documents (at the sentence level):
78.9%: Automatic and Interactive Parallelization - McKinley (1994)
(Correct)
Active bibliography (related documents): More All
0.1: Improving Data Locality with Loop Transformations - McKinley, CARR, TSENG (1996)
(Correct)
0.1: Practical Dependence Testing - Goff, Kennedy, Tseng (1991)
(Correct)
0.1: Maximizing Loop Parallelism and Improving Data Locality via.. - Kennedy, McKinley (1994)
(Correct)
Similar documents based on text: More All
0.2: The ParaScope Parallel Programming Environment - Cooper (1993)
(Correct)
0.2: Interprocedural Transformations for Parallel Code Generation - Hall, Kennedy, McKinley (1991)
(Correct)
0.2: Analysis and Transformation in the ParaScope Editor - Kennedy, McKinley, Tseng (1991)
(Correct)
Related documents from co-citation: More All
57: A Data Locality Optimizing Algorithm (context) - Wolf, Lam - 1991
33: The cache performance and Optimizations of Blocked Algorithms (context) - Lam, Rothberg et al. - 1991
31: Strategies for cache and local memory management by global program transformaion.. (context) - Gannon, Jalby et al. - 1988
BibTeX entry: (Update)
K. Kennedy and K.S. McKinley. Optimizing for Parallelism and Data Locality. In International Conference on Supercomputing 1992, pages 323--334, Washington D.C., July 1992. http://citeseer.ist.psu.edu/kennedy92optimizing.html More
@inproceedings{ kennedy92optimizing,
author = "K. Kennedy and K. S. M\raisebox{.2em}{c}Kinley",
title = "Optimizing for Parallelism and Data Locality",
booktitle = "Proceedings of the 1992 {ACM} International Conference on Supercomputing",
address = "Washington, DC",
year = "1992",
url = "citeseer.ist.psu.edu/kennedy92optimizing.html" }
Citations (may not include all citations):
474
A data locality optimizing algorithm (context) - Wolf, Lam - 1991 ACM DBLP
376
The cache performance and optimizations of blocked algorithm.. (context) - Lam, Rothberg et al. - 1991 ACM DBLP
283
Optimizing Supercompilers for Supercomputers (context) - Wolfe - 1989 ACM
258
Automatic translation of Fortran programs to vector form
- Allen, Kennedy - 1987 ACM DBLP
245
An extended set of Fortran basic linear algebra subprograms
- Dongarra, Croz et al. - 1988 ACM
216
Strategies for cache and local memory management by global p.. (context) - Gannon, Jalby et al. - 1988 ACM DBLP
180
LINPACK User's Guide (context) - Dongarra, Bunch et al. - 1979
178
Supernode partitioning (context) - Irigoin, Triolet - 1988 ACM DBLP
171
Dependence graphs and compiler optimizations (context) - Kuck, Kuhn et al. - 1981 ACM DBLP
168
The parallel execution of DO loops (context) - Lamport - 1974 ACM DBLP
158
Improving register allocation for subscripted variables
- Callahan, Carr et al. - 1990 ACM DBLP
146
Unimodular transformations of double loops (context) - Banerjee - 1990
111
More iteration space tiling (context) - Wolfe - 1989 ACM
110
The Livermore Fortran Kernels: A computer test of the numeri.. (context) - McMahon - 1986
107
Software Methods for Improvement of Cache Performance (context) - Porterfield - 1989
82
On estimating and enhancing cache effectiveness (context) - Ferrante, Sarkar et al. - 1991 ACM DBLP
71
Supercomputer performance evaluation and the Perfect benchma..
- Cybenko, Kipp et al. - 1990 ACM DBLP
71
Data dependence and its application to parallel processing (context) - Wolfe, Banerjee - 1987 ACM
59
the number of operations simultaneously executable in Fortra.. (context) - Kuck, Muraoka et al. - 1972
54
Automatic decomposition of scientific programs for parallel .. (context) - Allen, Callahan et al. - 1987 ACM DBLP
44
A Global Approach to Detection of Parallelism (context) - Callahan - 1987 ACM
43
Automatic loop interchange (context) - Allen, Kennedy - 1984 ACM DBLP
34
A theory of loop permutations (context) - Banerjee - 1990 ACM
30
Improving the Performance of Virtual Memory Computers (context) - Abu-Sufah - 1979
19
Automatic and Interactive Parallelization
- McKinley - 1992 ACM
11
Maximizing parallelism via loop transformations (context) - Wolf, Lam - 1990
10
Static performance estimation in a parallelizing compiler (context) - Kennedy, McIntosh et al. - 1991
9
A static performance estimator in the Fortran D programming .. (context) - Balasundaram, Fox et al. - 1992 ACM
2
Improving data locality (context) - Kennedy, Kinley et al. - 1992
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://softlib.rice.edu/CRPC/softlib/TRs_online.html): More
Experiences on Data-Parallel Programming - Clark, von Hanxleden, Kennedy (1994)
(Correct)
A Priori Estimates for Mixed Finite Element.. - Cowsar, Dupont, Wheeler
(Correct)
An Empirical Evaluation of Dependence Analysis in Parallel Program .. - Monk (1995)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC