(Enter summary)
Abstract: We present a novel, compile-time method for determining the cache performance of
the loop nests in a program. The cache hit-rates are produced by applying the reference
string, determined during compilation, to an architecturally parameterized cache simulator.
We also describe a heuristic that uses this method for compile-time optimization
of loop ranges in iteration-space blocking. The results of the loop program optimizations
are presented for different parallel program benchmarks and various ... (Update)
Context of citations to this paper: More
...An execution or benchmark method can be used, or a cache performance estimation technique can be employed. We describe a method in [7] that can be used to determine quickly the optimal range values by partial simulated execution of the application code on an...
Cited by: More
P³T+: A Performance Estimator for Distributed and.. - Fahringer, Pozgaj (1999)
(Correct)
P³T+: A Performance Estimator for Distributed and.. - Pozgaj, Fahringer (2000)
(Correct)
Fast and Accurate Method for Determining a Lower Bound .. - Fursin, O'Boyle.. (2004)
(Correct)
Similar documents (at the sentence level):
41.6%: Program Optimization Based on Compile-Time Cache Performance.. - Wesley Kaplow (1996)
(Correct)
6.8%: COP - Cache Optimization Tools for Scientific Computing - Szymanski (1997)
(Correct)
Active bibliography (related documents): More All
0.1: Tiling for Parallel Execution - Optimizing Node Cache.. - Kaplow, Szymanski (1996)
(Correct)
0.1: Impact of Memory Hierarchy on Program Partitioning and.. - Wesley Kaplow William (1995)
(Correct)
0.0: On Estimating the Useful Work Distribution of Parallel Programs.. - Fahringer (1996)
(Correct)
Similar documents based on text: More All
0.2: Run-Time Reference Clustering for Cache Performance.. - Kaplow, Szymanski..
(Correct)
0.1: Languages, Compilers And Run-Time Systems For Scalable.. - Szymanski, (Eds.)
(Correct)
0.1: Network Management and Control Using Collaborative .. - Ye, Kalyanaraman, .. (2001)
(Correct)
Related documents from co-citation: More All
8: The cache performance and Optimizations of Blocked Algorithms (context) - Lam, Rothberg et al. - 1991
7: The network weather service: A distributed resource performance forecasting serv..
- Wolski, Spring et al. - 1998
7: VFC: The Vienna Fortran Compiler (context) - Benkner - 1998
BibTeX entry: (Update)
W. K. Kaplow and B. K. Szymanski. Program optimization based on compile-time cache performance prediction. Parallel Processing Letters, 6(1):173--184, 1996. http://citeseer.ist.psu.edu/article/kaplow96program.html More
@article{ kaplow96program,
author = "Wesley K. Kaplow and Boleslaw K. Szymanski",
title = "Program optimization based on compile-time cache performance prediction",
journal = "Parallel Processing Letters",
volume = "6",
number = "1",
pages = "173--184",
year = "1996",
url = "citeseer.ist.psu.edu/article/kaplow96program.html" }
Citations (may not include all citations):
376
The Cache Performance and Optimizations of Blocked Algorithm.. (context) - Lam, Rotherberg et al. - 1991
216
Strategies for Cache and Local Memory Management by Global P.. (context) - Gannon, Jalby et al. - 1988
137
Compiler Optimizations for Improving Data Locality
- Carr, McKinley et al. - 1994
109
Cache Profiling and the SPEC Benchmarks: A Case Study
- Lebeck, Wood - 1994
68
the Granularity and Clustering of Directed Acyclic Task Grap..
- Gerasoulis, Yang - 1993
62
Computer Organization and Design: The Hardware /Software Int.. (context) - Patterson, Hennessy - 1993
58
MemSpy: Analyzing Memory System Bottlenecks in Programs
- Gupta, Martonosi et al. - 1992
30
Performance Debugging Shared--Memory Multiprocessor Programs.. (context) - Goldberg, Hennessy
11
Automatic Cache Performance Prediction in a Parallelizing Co.. (context) - Fahringer - 1993
2
Impact of Memory Hierarchy on Program Partitioning and Sched..
- Kaplow, Maniatty et al. - 1995
1
Personal Communication (context) - Decyk
1
Integrating Data and Task Parallelism in Scientific Programs
- Tannenbaum, Deelman et al.
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://fermivista.math.jussieu.fr/ftp/ftp.cs.rpi.edu.html): More
ILP-Based Scheduling with Time and Resource Constraints in.. - Chaudhuri, Walker (1994)
(Correct)
Rationale for Adding Hash Tables to the C++ Standard Template.. - Musser (1995)
(Correct)
Adaptive Local Refinement with Octree.. - Flaherty, Loy.. (1997)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC