See this document in CiteSeerX!

Memory Latency Rediction via Data Prefetching and Data Forwarding in Shared Memory Multiprocessors (1994)  (Make Corrections)  
David Kristian Poulsen



  Home/Search   Context   Related

 
View or download:
uiuc.edu/reports/1377.ps
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  uiuc.edu/reports/ (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: This dissertation considers the use of data prefetching and an alternative mechanism, data forwarding, for reducing memory latency due to interprocessor communication in cache coherent, shared memory multiprocessors. The benefits of prefetching and forwarding are considered for large, numerical application codes with loop-level and vector parallelism. Data prefetching is applied to these applications using two different multiprocessor prefetching algorithms implemented within a parallelizing... (Update)

Similar documents (at the sentence level):
9.2%:   Data Prefetching And Data Forwarding In Shared Memory.. - Poulsen, Yew (1994)   (Correct)
6.8%:   Execution-Driven Tools for Parallel Simulation of Parallel.. - Poulsen, Yew (1993)   (Correct)

Active bibliography (related documents):   More   All
1.1:   Integrating Fine-Grained Message Passing In Cache Coherent.. - Yew, Poulsen (1996)   (Correct)
0.6:   Performance Evaluation for Parallel Systems: A Survey - Hu, Gorton (1997)   (Correct)
0.5:   Dynamically Reconfigurable Architecture for a Class of Real-Time.. - Ohkami (1992)   (Correct)

Similar documents based on text:
94.0:   Unknown -   (Correct)

BibTeX entry:   (Update)

@misc{ poulsen-memory,
  author = "David Kristian Poulsen",
  title = "Memory Latency Rediction via Data Prefetching and Data Forwarding in Shared
    Memory Multiprocessors",
  url = "citeseer.ist.psu.edu/poulsen94memory.html" }
Citations (may not include all citations):
866   Techniques and Tools (context) - Aho, Sethi et al. - 1986
496   SPLASH: Stanford parallel applications for shared-memory (context) - Singh, Weber et al. - 1991
480   The program dependence graph and its uses in optimization (context) - Ferranti, Ottenstein et al. - 1987
478   The Stanford Dash multiprocessor (context) - Lenoski, Laudon et al. - 1992
474   A data locality optimizing algorithm (context) - Wolf, Lam - 1991
358   The Tera computer system - Alverson, Callahan et al. - 1990
344   Design and evaluation of a compiler algorithm for prefetchin.. - Mowry, Lam et al. - 1992
299   Dependence Analysis for Supercomputing (context) - Banerjee - 1988
277   Advanced compiler optimizations for supercomputers (context) - Padua, Wolfe - 1986
249   Tolerating latency through software-controlled prefetching i.. - Mowry, Gupta - 1991
249   Tolerating latency through software-controlled data prefetch.. - Mowry - 1994
216   Strategies for cache and local memory management by global p.. (context) - Gannon, Jalby et al. - 1988
166   The Wisconsin Wind Tunnel: virtual prototyping of parallel c.. - Reinhardt, Hill et al. - 1993
164   A practical algorithm for exact array dependence analysis (context) - Pugh - 1992
150   Proteus: a high performance parallel architecture simulator - Brewer, Dellarocas et al. - 1991
149   Software prefetching (context) - Callahan, Kennedy et al. - 1991
146   Unimodular transformations of double loops (context) - Banerjee - 1990
142   Guided self-scheduling: a practical scheduling scheme for pa.. (context) - Polychronopoulos, Kuck - 1987
122   An effective on-chip preloading scheme to reduce data access.. (context) - Baer, Chen - 1991
121   An architecture for software-controlled data prefetching (context) - Klaiber, Levy - 1991
113   The performance of multistage interconnection networks for m.. (context) - Kruskal, Snir - 1983
112   Efficient synchronization primitives for large-scale cache-c.. (context) - Goodman, Vernon et al. - 1989
109   Comparative evaluation of latency reducing and tolerating te.. - Gupta, Hennessy et al. - 1991
109   Multiprocessor simulation and tracing using Tango (context) - Davis, Goldschmidt et al. - 1991
107   Software methods for improvement of cache performance on sup.. (context) - Porterfield - 1989
99   Adaptive software cache management for distributed shared me.. - Bennett, Carter et al. - 1990
94   Stride directed prefetching in scalar processors (context) - Fu, Patal et al. - 1992
94   Performance analysis of parallelizing compilers on the Perfe.. (context) - Blume, Eigenmann - 1992
92   Cooperative shared memory: software and hardware for scalabl.. - Hill, Larus et al. - 1992
90   Reducing memory latency via non-blocking and prefetching cac.. - Chen, Baer - 1992
83   Compiler-directed data prefetching in multiprocessors with m.. - Gornish, Granston et al. - 1990
78   Data prefetching in multiprocessor vector cache memories (context) - Fu, Patel - 1991
75   Measuring parallelism in computation-intensive scientific/en.. (context) - Kumar - 1988
71   Improving locality and parallelism in nested loops (context) - Wolf - 1992
67   The Rice Parallel Processing Testbed (context) - Covington, Madala et al. - 1988
65   Eliminating false sharing (context) - Eggers, Jeremiassen - 1991
60   Techniques for efficient inline tracing on a shared-memory m.. - Eggers, Keppel et al. - 1990
59   Analysis of cache invalidation patterns in multiprocessors (context) - Weber, Gupta - 1989
58   Dynamic dependency analysis of ordinary programs - Austin, Sohi - 1992
57   Compiling Fortran D for MIMD distributed-memory machines - Hiranandani, Kennedy et al. - 1992
57   The detection and elimination of useless misses in multiproc.. - Dubois, Skeppstedt et al. - 1993
51   Reducing memory and traffic requirements for scalable direct.. - Gupta, Weber et al. - 1990
50   Data access microarchitectures for superscalar processors wi.. - Chen, Mahlke et al. - 1991
50   Parafrase-2: an environment for parallelizing, partitioning,.. (context) - Polychronopoulos, Girkar et al. - 1989
45   Simulation of multiprocessors: accuracy and performance (context) - Goldschmidt - 1993
44   Optimizing compilers for supercomputers (context) - Wolfe - 1982
42   Lockup-free instruction fetch/prefetch cache organization (context) - Kroft - 1981
42   Cache performance of blocked algorithms (context) - Lam, Wolf - 1991
42   Program improvement by source-to-source transformation (context) - Loveman - 1977
42   Hiding memory latency using dynamic scheduling in shared-mem.. (context) - Gharachorloo, Gupta et al. - 1992
41   The impact of hierarchical memory systems on linear algebra .. (context) - Gallivan, Jalby et al. - 1988
40   Limitations of cache prefetching on a bus-based multiprocess.. (context) - Tullsen, Eggers - 1993
36   TRAPEDS: producing traces for multicomputers via execution-d.. (context) - Stunkel, Fuchs - 1989
35   The impact of synchronization and granularity on parallel sy.. - Chen, Su et al. - 1990
35   An efficient data dependence analysis for parallelizing comp.. - Li, Yew et al. - 1990
32   The design and analysis of Dash: a scalable directory-based .. (context) - Lenoski - 1992
31   Data prefetching in shared memory multiprocessors (context) - Lee, Yew et al. - 1987
31   Execution-driven tools for parallel simulation of parallel a.. - Poulsen, Yew - 1993
31   Restructuring Fortran programs for Cedar - Eigenmann, Hoeflinger et al. - 1993
31   A pipelined, shared resource MIMD computer (context) - Smith - 1978
30   Decoupled access/execute computer architectures (context) - Smith - 1984
30   Dynamic instruction scheduling and the Astronautics ZS-1 (context) - Smith - 1989
29   Shared data placement optimizations to reduce multiprocessor.. (context) - Torrellas, Lam et al. - 1990
27   The accuracy of trace-driven simulations of multiprocessors (context) - Goldschmidt, Hennessy - 1993
25   Data prefetching for high-performance processors - Chen - 1993
25   Structured memory access architecture (context) - Pleszkun, Davidson - 1983
23   Kendall Square Research Corporation (context) - Principles - 1991
22   Data prefetching and data forwarding in shared memory multip.. - Poulsen, Yew - 1994
20   Address tracing for parallel machines (context) - Stunkel, Janssens et al. - 1991
18   Processor mapping techniques toward efficient data redistrib.. - Kalns, Ni - 1994
18   Hardware and software for functional and fine grain parallel.. - Beckmann - 1993
18   Architectural primitives for a scalable shared memory multip.. (context) - Lee, Ramachandran - 1991
15   A critique of trace-driven simulation for shared-memory mult.. (context) - Bitar - 1989
15   Notification and multicast networks for synchronization and .. - Andrews, Beckmann et al. - 1992
12   Synchronous parallel discrete-event simulation on shared-mem.. - Konas, Yew - 1992
12   Chief: a parallel simulation environment for parallel system.. (context) - Bruner, Cheong et al. - 1990
10   An efficient architecture for loop based data preloading - Chen, Bringmann et al. - 1992
9   Scientific benchmark characterizations (context) - Berry, Cybenko et al. - 1991
8   An empirical study of DOACROSS loops - Chen, Yew - 1991
8   MaxPar: an execution driven simulator for studying parallel .. - Chen - 1989
8   The effect of barrier synchronization and scheduling overhea.. (context) - Beckmann, Polychronopoulos - 1989
7   Toward auto-scheduling compilers (context) - Polychronopoulos - 1988
7   Improving the performance of virtual memory computers (context) - Abu-Sufah - 1978
6   Measuring limits of parallelism and characterizing its vulne.. (context) - Rauchwerger, Dubey et al. - 1993
6   An overview of interprocedural analysis techniques for high .. - Schouten - 1990
6   Parallelism in numeric and symbolic programs (context) - Larus - 1990
6   A testbed for studying parallel programs and parallel execut.. (context) - Grunwald, Nutt et al. - 1992
6   EPG source code instrumentation tools - user manual (context) - Poulsen, Yew - 1994
5   Dynamic dependence analysis: a novel method for data depende.. - Petersen, Padua - 1992
5   Evaluation of programs and parallelizing compilers using dyn.. - Petersen - 1993
5   Loop-level parallelism in numeric and symbolic programs (context) - Larus - 1993
5   Tango introduction and tutorial (context) - Goldschmidt, Davis - 1991
5   Efficient doacross synchronization on distributed shared-mem.. (context) - Su, Yew - 1991
5   The Horizon supercomputer system: architecture and software (context) - Kuehn, Smith - 1988
4   Analysis of a Cedar implementation of TRFD (context) - Andrews, Gallivan - 1993
4   An efficient hardware algorithm for exploiting multiple arit.. (context) - Tomasulo - 1967
4   CARL: an architecture simulation language (context) - Beckmann - 1990
3   A cache technique for synchronization variables in highly pa.. - Berke - 1988
3   The Cedar Fortran project (context) - Padua, Hoeflinger et al. - 1992
3   Parsim user interface reference manual (context) - Bruner - 1990
2   An empirical study on the effectiveness of branch bypassing (context) - Chang - 1991
2   Issues on the design of parallelizing compilers - Leung - 1990
2   Chief: a simulation environment for studying parallel system.. - Konas, Poulsen et al. - 1994
2   Success and limitations in automatic parallelization of Perf.. - Blume - 1992
2   SIMPLE: an execution-driven multiprocessor simulator (context) - Lin, Abraham - 1991
2   The IBM System/360 model 91: machine philosophy and instruct.. (context) - Anderson, Sparacio et al. - 1967
1   Using compile-time analysis to adapt the cache coherence enf.. (context) - Mounes-Toussi, Lilja - 1994
1   Reducing memory access delays in large-scale shared-memory m.. (context) - Granston - 1992
1   The Perfect benchmark suite - some imperfections (context) - Poulsen, Yew - 1994
1   Perfect Benchmarks documentation, suite 1 (context) - Kipp - 1990
1   Parafrase-2 programmer's manual (context) - Girkar, Haghighat et al. - 1991
1   EPG source code instrumentation tools - internal documentati.. - Poulsen, Yew - 1994
1   CA: Waterside Associates (context) - The, Report - 1990
1   MA: Alliant Computer Systems Corporation (context) - Series, Littleton - 1986
1   EPG-sim critical path simulation tools - tutorial (context) - Poulsen, Yew - 1994
1   Effect of storage allocation/reclamation methods on parallel.. (context) - Kumar - 1987

Documents on the same site (http://polaris.cs.uiuc.edu/reports/):   More
Implementation Of Run Time Techniques In The Polaris Fortran.. - Lawrence (1996)   (Correct)
Multiprocessor Cache Coherence: The Compiler-Directed Approach - Choi, Lim, al. (1996)   (Correct)
A Rational Lanczos Algorithm for Model Reduction II: - Interpolation Point Selection   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC