See this document in CiteSeerX!

Program Optimization Based on Compile-Time Cache Performance Prediction (1996)  (Make Corrections)  (12 citations)
Parallel Processing Letters World Scientific Publishing Company Wesley K....
Parallel Processing Letters



  Home/Search   Context   Related

 
View or download:
rpi.edu/pub/szymansk/cacheopt.ps
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  rpi.edu (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: We present a novel, compile-time method for determining the cache performance of the loop nests in a program. The cache hit-rates are produced by applying the reference string, determined during compilation, to an architecturally parameterized cache simulator. We also describe a heuristic that uses this method for compile-time optimization of loop ranges in iteration-space blocking. The results of the loop program optimizations are presented for different parallel program benchmarks and various ... (Update)

Context of citations to this paper:   More

...An execution or benchmark method can be used, or a cache performance estimation technique can be employed. We describe a method in [7] that can be used to determine quickly the optimal range values by partial simulated execution of the application code on an...

Cited by:   More
P³T+: A Performance Estimator for Distributed and.. - Fahringer, Pozgaj (1999)   (Correct)
P³T+: A Performance Estimator for Distributed and.. - Pozgaj, Fahringer (2000)   (Correct)
Fast and Accurate Method for Determining a Lower Bound .. - Fursin, O'Boyle.. (2004)   (Correct)

Similar documents (at the sentence level):
41.6%:   Program Optimization Based on Compile-Time Cache Performance.. - Wesley Kaplow (1996)   (Correct)
6.8%:   COP - Cache Optimization Tools for Scientific Computing - Szymanski (1997)   (Correct)

Active bibliography (related documents):   More   All
0.1:   Tiling for Parallel Execution - Optimizing Node Cache.. - Kaplow, Szymanski (1996)   (Correct)
0.1:   Impact of Memory Hierarchy on Program Partitioning and.. - Wesley Kaplow William (1995)   (Correct)
0.0:   On Estimating the Useful Work Distribution of Parallel Programs.. - Fahringer (1996)   (Correct)

Similar documents based on text:   More   All
0.2:   Run-Time Reference Clustering for Cache Performance.. - Kaplow, Szymanski..   (Correct)
0.1:   Languages, Compilers And Run-Time Systems For Scalable.. - Szymanski, (Eds.)   (Correct)
0.1:   Network Management and Control Using Collaborative .. - Ye, Kalyanaraman, .. (2001)   (Correct)

Related documents from co-citation:   More   All
8:   The cache performance and Optimizations of Blocked Algorithms (context) - Lam, Rothberg et al. - 1991
7:   The network weather service: A distributed resource performance forecasting serv.. - Wolski, Spring et al. - 1998
7:   VFC: The Vienna Fortran Compiler (context) - Benkner - 1998

BibTeX entry:   (Update)

W. K. Kaplow and B. K. Szymanski. Program optimization based on compile-time cache performance prediction. Parallel Processing Letters, 6(1):173--184, 1996. http://citeseer.ist.psu.edu/article/kaplow96program.html   More

@article{ kaplow96program,
    author = "Wesley K. Kaplow and Boleslaw K. Szymanski",
    title = "Program optimization based on compile-time cache performance prediction",
    journal = "Parallel Processing Letters",
    volume = "6",
    number = "1",
    pages = "173--184",
    year = "1996",
    url = "citeseer.ist.psu.edu/article/kaplow96program.html" }
Citations (may not include all citations):
376   The Cache Performance and Optimizations of Blocked Algorithm.. (context) - Lam, Rotherberg et al. - 1991
216   Strategies for Cache and Local Memory Management by Global P.. (context) - Gannon, Jalby et al. - 1988
137   Compiler Optimizations for Improving Data Locality - Carr, McKinley et al. - 1994
109   Cache Profiling and the SPEC Benchmarks: A Case Study - Lebeck, Wood - 1994
68   the Granularity and Clustering of Directed Acyclic Task Grap.. - Gerasoulis, Yang - 1993
62   Computer Organization and Design: The Hardware /Software Int.. (context) - Patterson, Hennessy - 1993
58   MemSpy: Analyzing Memory System Bottlenecks in Programs - Gupta, Martonosi et al. - 1992
30   Performance Debugging Shared--Memory Multiprocessor Programs.. (context) - Goldberg, Hennessy
11   Automatic Cache Performance Prediction in a Parallelizing Co.. (context) - Fahringer - 1993
2   Impact of Memory Hierarchy on Program Partitioning and Sched.. - Kaplow, Maniatty et al. - 1995
1   Personal Communication (context) - Decyk
1   Integrating Data and Task Parallelism in Scientific Programs - Tannenbaum, Deelman et al.



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://fermivista.math.jussieu.fr/ftp/ftp.cs.rpi.edu.html):   More
ILP-Based Scheduling with Time and Resource Constraints in.. - Chaudhuri, Walker (1994)   (Correct)
Rationale for Adding Hash Tables to the C++ Standard Template.. - Musser (1995)   (Correct)
Adaptive Local Refinement with Octree.. - Flaherty, Loy.. (1997)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC