| Alternate document: Details Data Preload For Superscalar And VLIW Processors (93) William Yu-Wei Chen, Jr. |
(Enter summary)
Abstract: Data Prefetching for High-Performance Processors by Tien-Fu Chen Chairperson of Supervisory Committee: Professor Jean-Loup Baer Department of Computer Science and Engineering Recent technological advances are such that the gap between processor cycle times and memory cycle times is growing. Techniques to reduce or tolerate large memory latencies become essential for achieving high processor utilization. In this dissertation, we propose and evaluate data prefetching techniques that address the... (Update)
Context of citations to this paper: More
...26 list scheduling algorithm and the memory is not partitioned. In hardware prefetching scheduling, we use the model presented in [4]. In this model, to take advantage of the data locality, the next block in the remote memory is also loaded whenever a block is loaded...
...to schedule the ALU operations, but the memory is not partitioned. In hardware prefetching scheduling, we use the model presented in [3]. In this method, to take advantage of the data locality, the next block in the higher level memory is also loaded whenever a block is...
Cited by: More
Memory Latency Rediction via Data Prefetching and Data Forwarding .. - Poulsen (1994)
(Correct)
Next-Generation Memory Systems - Wang (2004)
(Correct)
Parallel Vector Access: A Technique for Improving Memory System.. - Mathew (2000)
(Correct)
Similar documents (at the sentence level):
9.5%: Reducing Memory Latency via Non-blocking and Prefetching Caches - Chen (1992)
(Correct)
Active bibliography (related documents): More All
0.5: Processor Management Policies for Multiprocessors - Yu (1994)
(Correct)
0.5: Designing Memory Consistency Models For Shared-Memory.. - Adve (1993)
(Correct)
0.4: Cohesion : An Efficient Distributed Shared Memory System.. - De Ls
(Correct)
Similar documents based on text: More All
0.4: Rotation Patterns In The - Large-Scale Solar Corona
(Correct)
0.3: "prefetching For Reducing Cache Misses" - Prepared By Ayse
(Correct)
0.1: Computer-Generated Pen-and-Ink Illustration - Winkenbach (1996)
(Correct)
Related documents from co-citation: More All
25: Tolerating latency through software-controlled prefetching in shared-memory mult..
- Mowry, Gupta - 1991
13: An architecture for software-controlled data prefetching (context) - Klaiber, Levy - 1991
13: Design and evaluation of a compiler algorithm for prefetching
- Mowry, Lam et al. - 1992
BibTeX entry: (Update)
Chen, T.-F. Data prefetching for high-performance processors. Ph.D dissertation, Technical Report 93-07-01, Department of Computer Science and Engineering, University of Washington, July 1993. (23) http://citeseer.ist.psu.edu/chen93data.html More
@techreport{ chen93data,
author = "Tien-Fu Chen",
title = "Data Prefetching for High-Performance Processors",
number = "TR-93-07-01",
year = "1993",
url = "citeseer.ist.psu.edu/chen93data.html" }
Citations (may not include all citations):
1575
Computer Architecture: A Quantitative Approach (context) - Hennessy, Patterson - 1990
496
SPLASH: Stanford parallel applications for shared-memory (context) - Singh, Weber et al. - 1992
474
A data locality optimizing algorithm (context) - Wolf, Lam - 1991
443
Improving direct-mapped cache performance by the addition of..
- Jouppi - 1990
358
The Tera computer system
- Alverson, Callahan et al. - 1990
249
Tolerating latency through software-controlled prefetching i..
- Mowry, Gupta - 1991
234
Cache memories (context) - Smith
213
Weak ordering - a new definition
- Adve, Hill - 1990
195
A new solution to coherence problems in multicache systems (context) - Censier, Feautrier - 1978
185
Branch prediction strategies and branch target buffer design (context) - Lee, Smith - 1984
165
Memory access buffering in multiprocessors (context) - Dubois, Scheurich et al. - 1986
155
Cache coherence protocols: evaluation using a multiprocessor.. (context) - Archibald, Baer - 1986
147
Alternative implementation of two-level adaptive branch pred..
- Yeh, Patt - 1992
137
Lockup-free instruction fetch/prefetch cache organization (context) - Kroft - 1981
122
An effective on-chip preloading scheme to reduce data access.. (context) - Baer, Chen - 1991
121
An architecture for softwarecontrolled data prefetching (context) - Klaiber, Levy - 1991
112
The priority-based coloring approach to register allocation (context) - Chow, Hennessy - 1990
110
Improving the accuracy of dynamic branch prediction using br.. (context) - Pan, So et al. - 1992
109
Multiprocessor simulation and tracing using Tango (context) - Davis, Goldschmidt et al. - 1991
107
Software methods for improvement of cache performance on sup.. (context) - Porterfield - 1989
96
Branch prediction for free
- Ball, Larus - 1993
94
Stride directed prefetching in scalar processors (context) - Fu, Patel - 1992
93
High-bandwidth data memory systems for superscalar processor (context) - Sohi, Franklin - 1991
92
Performance evaluation of memory consistency models for shar..
- Gharachorloo, Gupta et al.
83
Compilerdirected data prefetching in multiprocessors with me..
- Gornish, Granston et al. - 1990
78
Data prefetching in multiprocessor vector cache memories (context) - Fu, Patel - 1991
77
Efficient instruction scheduling for a pipelined architectur.. (context) - Gibbons, Muchnick - 1986
66
Boosting beyond static scheduling in a superscalar processor
- Smith, Lam et al. - 1990
55
Exploring the benefits of multiple hardware contexts in a mu.. (context) - Weber, Gupta - 1989
53
Software support for speculative loads
- Rogers, Li - 1992
50
Data access microarchitectures for superscalar processors wi..
- Chen, Mahlke et al. - 1991
48
and event ordering in multiprocessors (context) - Dubois, Scheurich et al. - 1988
42
Hiding memory latency using dynamic scheduling in shared-mem.. (context) - Gharachorloo, Gupta et al. - 1992
40
Limitation of cache prefetching on a bus-based multiprocesso.. (context) - Tullsen, Eggers - 1993
39
Balanced scheduling: instruction scheduling when memory late..
- Kerns, Eggers - 1992
37
Prefetch unit for vector operations on scalar computers (context) - Sklenar - 1992
35
Improved multithreading techniques for hiding communication .. (context) - Boothe, Ranade - 1992
34
The performance impact of block sizes and fetch strategies (context) - Przybylski - 1990
34
Branch target buffer design and optimization (context) - Perleberg, Smith - 1989
33
Parallel MIMD Computation: the HEP Supercomputer and its app.. (context) - Kowalik - 1985
33
Delayed consistency and its effects on miss rate of parallel..
- Dubois, Wang et al. - 1991
33
Circular scheduling: a new technique to perform software pip.. (context) - Jain
32
A performance study of memory consistency models
- Zucker, Baer - 1992
31
Data prefetching in shared memory multiprocessors (context) - Lee, Yew et al.
18
Latency tolerance through multithreading in large-scale mult..
- Kurihara, Chaiken et al. - 1991
16
RP3 processor-memory element (context) - Brantley, McAuliffe et al. - 1985
16
execute computer architectures (context) - Smith
12
Tolerating data access latency with register preloading
- Chen, Mahlke et al. - 1992
12
Architectural and implementations tradeoffs in the design of..
- Laudon, Gupta et al. - 1992
11
Writes caches as an alternative to write buffers (context) - Bray, Flynn - 1991
10
Lockup-free caches in high-performance multiprocessors (context) - Scheurich, Dubois - 1991
9
A brief survey of papers on scheduling for pipelined process.. (context) - Krishnamurthy - 1990
9
IBM RISC System/6000 processor architecture (context) - Oehler, Groves - 1990
4
Multi-level cache hierarchies: Organizations (context) - Baer, Wang - 1989
3
Relaxed Consistency and Synchronization in Parallel Processo..
- Zucker - 1992
3
single instruction stream / multiple instruction pipelining).. (context) - Murakami, Irie et al. - 1989
3
A parallel execution evaluation testbed (context) - Grunwald, Nutt et al. - 1991
3
Efficient support of concurrent threads in a hybrid dataflow.. (context) - Hum, Gao - 1991
2
The Art of Computer System Performance Anaylsis (context) - Jain
2
APRIL: A processor architecture for multithreading (context) - Agarwal, Lim et al. - 1990
1
Design and evaluation of a compiler algoritm for prefetching (context) - Mowry, Lam et al. - 1992
1
A multithreaded massively parallel architecture (context) - Nikhi, Papadopoulos et al. - 1991
1
Two techniques to enchance the performance of memory consist.. (context) - Gharachorloo, Gupta et al.
1
A timing based simulation study of prefetching in a second l.. (context) - Smith, Archibald et al. - 1991
1
Computing per-processor summary side-effect information (context) - Jeremiassen, Eggers - 1992
1
PROTEUS: A parallel-architecture simulator (context) - Brewer, Dellarocas et al. - 1991
1
Multprocessor cache design considerations (context) - Lee, Yew et al.
1
A lockupfree multiprocesso cache design (context) - Stenstrom, Dahlgren et al. - 1991
1
Sotfware pipelining: An effective scheduling technique for V.. (context) - Lam - 1988
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://fermivista.math.jussieu.fr/ftp/ftp.cs.washington.edu.html): More
Mobisaic - Voelker, Bershad (1995)
(Correct)
Time-Space Tradeoffs for Undirected Graph Traversal - Beame, Borodin, Raghavan.. (1993)
(Correct)
Automatic SAT-Compilation of Planning Problems - Ernst, Millstein, Weld (1997)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC