(Enter summary)
Abstract: Processor speeds are increasing much faster than memory speeds, and thus memory bandwidth is rapidly becoming the limiting performance factor for many applications, particularly those whose inner loops linearly traverse streams of vector-like data. Because they execute sustained accesses, these streaming computations are limited more by bandwidth than by latency. Examples of these kinds of programs include vector (scientific) computations, multi-media compression and decompression, encryption,... (Update)
Cited by: More
Bottlenecks in Multimedia Processing with SIMD Style.. - Talla, John, Burger (2003)
(Correct)
Parallel Vector Access: A Technique for Improving Memory System.. - Mathew (2000)
(Correct)
Hardware Support to Reduce Overhead in Fine-Grain Media Codes - Talla, John, Burger (2001)
(Correct)
Active bibliography (related documents): More All
2.2: Dynamic Access Ordering for Symmetric Shared-Memory Multiprocessors - McKee (1994)
(Correct)
1.8: Access Order and Memory-Conscious Cache Utilization - McKee, Wulf (1995)
(Correct)
1.7: Evaluation of Dynamic Access Ordering Hardware - McKee, Oliver, Wulf, Wright.. (1995)
(Correct)
Similar documents based on text: More All
0.3: A Framework for Effective Scheduling of Data-Parallel.. - Walker (2001)
(Correct)
0.3: Caches As Filters: A Framework for the Analysis of Caching Systems - Weikle (2001)
(Correct)
0.2: Experimental Implementation of Dynamic Access Ordering - McKee, Klenke, Schwab.. (1993)
(Correct)
Related documents from co-citation: More All
9: Code Generation for Streaming: An Access/Execute Mechanism (context) - Davidson, Benitez - 1991
8: Access ordering and memory-conscious cache utilization (context) - McKee - 1994
7: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Assoc..
- Jouppi - 1990
BibTeX entry: (Update)
S.A. McKee, "Maximizing Memory Bandwidth for Streamed Computations", Ph.D. Dissertation, University of Virginia, May, 1995. http://citeseer.ist.psu.edu/mckee95maximizing.html More
@misc{ mckee95maximizing,
author = "S. McKee",
title = "Maximizing Memory Bandwidth for Streamed Computations",
text = "S.A. McKee, Maximizing Memory Bandwidth for Streamed Computations, Ph.D.
Dissertation, University of Virginia, May, 1995.",
year = "1995",
url = "citeseer.ist.psu.edu/mckee95maximizing.html" }
Citations (may not include all citations):
2003
The Art of Computer Programming (context) - Knuth - 1973
1575
Computer Architecture: A Quantitative Approach (context) - Hennessy, Patterson - 1990
723
Memory Coherence in Shared Virtual Memory Systems
- Li, Hudak - 1989
606
How to Make a Multiprocessor Computer that Correctly Execute.. (context) - Lamport - 1979
468
Memory Consistency and Event Ordering in Scalable SharedMemo..
- Gharachorloo, Lenoski et al. - 1990
422
Implementation and Performance of Munin
- Carter, Bennett et al. - 1991
376
The Cache Performance and Optimizations of Blocked Algorithm.. (context) - Lam, Rothberg et al. - 1991
357
The Directory-Based Cache Coherence Protocol for the DASH Mu.. (context) - Lenoski, Laudon et al. - 1990
345
Basic Linear Algebra Subprograms for Fortran Usage (context) - Lawson, Hanson et al. - 1979
344
Design and Evaluation of a Compiler Algorithm for Prefetchin..
- Mowry, Lam et al. - 1992
283
Optimizing Supercompilers for Supercomputers (context) - Wolfe - 1989
248
Solving Linear Systems on Vector and Shared Memory Computers (context) - Dongarra, Duff et al. - 1991
222
MIPS RISC Architecture (context) - Kane, Heinrich - 1992
213
Weak Ordering --- A New Definition
- Adve, Hill - 1990
195
A New Solution to Coherence Problems in Multicache Systems (context) - Censier, Feautrier - 1978
165
Memory Access Buffering in Multiprocessors (context) - Dubois, Scheurich et al. - 1986
156
An Evaluation of Directory Schemes for Cache Coherence
- Agarwal, Simoni et al. - 1988
155
Cache Coherence Protocols: Evaluation Using a Multiprocessor.. (context) - Archibald, Baer - 1986
149
Software Prefetching (context) - Callahan, Kennedy et al. - 1991
142
High-Performance Computer Architecture (context) - Stone - 1993
122
Firefly: A Multiprocessor Workstation (context) - Thacker, Stewart - 1987
122
An Effective On-Chip Preloading Scheme to Reduce Data Access.. (context) - Baer, Chen - 1991
121
An Architecture for Software-Controlled Data Prefetching (context) - Klaiber, Levy - 1991
111
Using Cache Memory to Reduce Processor-Memory Traffic (context) - Goodman - 1983
110
The Livermore Fortran Kernels: A Computer Test of the Numeri.. (context) - McMahon - 1986
109
Comparative Evaluation of Latency Reducing and Tolerating Te..
- Gupta, Hennessy et al. - 1991
107
Software Methods for Improvement of Cache Performance on Sup.. (context) - Porterfield - 1989
99
Adaptive Software Cache Management for Distributed Shared Me..
- Bennett, Carter et al. - 1990
98
A set of Level 3 Basic Linear Algebra Subprograms (context) - Dongarra, DuCroz et al. - 1990
97
The Architecture of Pipelined Computers (context) - Kogge - 1981
94
Stride Directed Prefetching in Scalar Processors (context) - Fu, Patel et al. - 1992
90
Reducing Memory Latency via Non-blocking and Prefetching Cac..
- Chen, Baer - 1992
88
The Working Set Model for Program Behavior (context) - Denning - 1968
87
The Implementation of a Coherent Memory Abstraction on a NUM.. (context) - Cox, Fowler - 1989
83
Compiler-directed Data Prefetching in Multiprocessor with Me..
- Gornish, Granston et al. - 1990
82
To Copy or Not to Copy: A Compile-Time Technique for Assessi..
- Temam, Granston et al. - 1993
76
The Wisconsin Multicube: A New LargeScale Cache-Coherent Mul.. (context) - Goodman, Woest - 1988
70
Simple But Effective Techniques for NUMA Memory Management
- Bolosky, Fitzgerald et al. - 1989
66
Implementing a Cache Consistency Protocol (context) - Katz, Eggers et al. - 1985
65
Surpassing the TLB Performance of Superpages with Less Opera.. (context) - Talluri, Hill - 1994
65
PIPE: A VLSI Decoupled Architecture (context) - Goodman, Hsieh et al. - 1985
59
Analysis of Cache Invalidation Patterns in Multiprocessors (context) - Weber, Gupta - 1989
49
Sequential Hardware Prefetching in Shared-Memory Multiproces.. (context) - Dahlgren, Dubois et al. - 1995
49
Memory Hierarchy Management
- Carr - 1989
48
Software-Extended Coherent Shared Memory: Performance and Co..
- Chaiken, Agarwal - 1994
46
Cache Coherence in Large-Scale Shared-Memory Multiprocessors..
- Lilja - 1993
43
Software-Controlled Caches in the VMP Multiprocessor (context) - Cheriton, Slavenburg et al. - 1986
43
Software Prefetching and Caching for Translation Lookaside B..
- Bala, Kaashoek et al. - 1994
42
Comparison of Hardware and Software Cache Coherence Schemes
- Adve, Adve et al. - 1991
40
Code Generation for Streaming: An Access/Execute Mechanism (context) - Benitez, Davidson - 1991
38
Pseudo-Randomly Interleaved Memory
- Rau - 1991
38
The Organization and Use of Parallel Memories (context) - Budnik, Kuck - 1971
38
Automatic Management of Programmable Caches (context) - Cytron, Karlovsky et al. - 1988
38
Digital Equipment Corporation (context) - Handbook - 1992
37
Prefetch Unit for Vector Operation on Scalar Computers (context) - Sklenar - 1992
37
Analysis of the Impact of Memory in Distributed Parallel Pro.. (context) - Peris, Squillante et al. - 1994
36
An Empirical Evaluation of Two Memory-Efficient Directory Me.. (context) - O'Krafka, Newton - 1990
36
PowerPC 601 RISC Microprocessor User's Manual (context) - Inc - 1993
34
The Performance Impact of Block Sizes and Fetch Strategies (context) - Przybylski - 1990
34
Access Ordering and Memory-Conscious Cache Utilization (context) - McKee, Wulf - 1995
33
Vector Access Performance in Parallel Memories Using a Skewe.. (context) - Harper, Jump - 1987
33
Evaluation of the WM Architecture
- Wulf - 1992
33
Paging Tradeoffs in Distributed-Shared-Memory Multiprocessor..
- Burger, Hyder et al. - 1994
33
Synchronization, Coherence, and Event Ordering in Multiproce.. (context) - Dubois, Scheurich et al. - 1988
32
Intel Corporation (context) - XP, Book - 1991
32
A Performance Study of Memory Consistency Models
- Zucker, Baer - 1992
31
Data Prefetching in Shared Memory Multiprocessors (context) - Lee, Yew et al. - 1987
30
The Declining Effectiveness of Dynamic Caching for General-P..
- Burger, Goodman et al. - 1995
25
Medusa: An Experiment in Distributed Operating System Struct.. (context) - Ousterhout, Scelza et al. - 1980
24
Guide to Parallel Programming on Sequent Computer Systems (context) - Osterhaug - 1989
24
Vector Computer Memory Bank Contention
- Bailey - 1987
23
Memory Access Coalescing: a Technique for Eliminating Redund..
- Davidson, Jinturkar - 1994
23
High-speed DRAMs (context) - Quinnell - 1991
22
Access Ordering and Effective Memory Bandwidth
- Moyer - 1993
22
A Cache Coherence Approach for Large Multiprocessor Systems (context) - Archibald - 1988
21
Hierarchical Cache/Bus Architecture for Shared Memory Multip.. (context) - Wilson - 1987
20
A Simulation Study of the CRAY X-MP Memory System (context) - Cheung, Smith - 1986
19
Address Transformation to Increase Memory Performance (context) - Harper - 1989
19
Performance of the iPSC/860 Node Architecture (context) - Moyer - 1991
16
Mountain View (context) - Overview, Inc - 1992
16
A Vectorizing Software Pipelining Compiler for LIW and Super.. (context) - Meadows, Nakamoto et al.
14
the Floating Point Performance of the i860 Microprocessor (context) - Lee - 1990
14
Interleaved Parallel Schemes: Improving Memory Throughput on.. (context) - Seznec, Lenfant - 1992
14
Issues in Multiprogrammed Multiprocessor Scheduling (context) - Leutenegger - 1990
13
Experimental Implementation of Dynamic Access Ordering
- McKee, Klenke et al. - 1994
13
Sunder: A Programmable Hardware Prefetch Architecture for Nu.. (context) - Chiueh - 1994
12
The Chinese Remainder Theorem and the Prime Memory System (context) - Gao - 1993
12
Increasing Memory Bandwidth for Vector Computations
- McKee, Moyer et al. - 1994
12
Decoupled Access/Execute Architectures (context) - Smith - 1984
12
Tolerating Data Access Latency with Register Preloading
- Chen, Mahlke et al. - 1992
11
The CONVEX C-1 64-bit Supercomputer (context) - Wallach - 1985
11
Memory Bandwidth Optimizations for Wide-Bus Machines
- Alexander, Bailey et al. - 1993
10
Evaluation of Memory System Extensions (context) - Li, Petersen - 1991
10
Behavioral Characterization of Multiprocessor Memory Systems.. (context) - Gallivan, Gannon et al. - 1989
10
An Efficient Architecture for Loop Based Data Preloading
- Chen, Bringmann et al. - 1992
10
Scientific Computation: An Introduction with Parallel Comput.. (context) - Golub, Ortega - 1993
9
The Influence of Memory Hierarchy on Algorithm Organization:.. (context) - Gannon, Jalby - 1987
9
High Bandwidth Memory Systems for Superscalar Processors (context) - Sohi, Franklin - 1991
9
Block, Multistride Vector, and FFT Accesses in Parallel Memo.. (context) - Harper - 1991
9
Cascade Design Automation (context) - Manual - 1993
8
Estimating the Performance Advantages of Relaxing Consistenc.. (context) - Torrellas, Hennessy - 1990
8
The NAS860 Library User's Manual (context) - Lee - 1993
8
Code Restructuring to Exploit Page Mode and Read-Ahead Featu.. (context) - Palacharla, Kessler - 1995
8
Achieving High Performance on the i860 Microprocessor (context) - Lee - 1991
7
Computer Organization and Architecture: Principles of Princi.. (context) - Stallings - 1990
7
An Analytic Model of SMC Performance
- McKee - 1993
7
Analytic Models of SMC Performance
- McKee - 1994
7
A Comparison of Three Current Superscalar Designs (context) - Laird - 1992
6
On Array Storage for Conflict-Free Memory Access for Paralle.. (context) - Balakrishnan, Jain et al. - 1988
6
Breaking the Memory Bottleneck, Parts 1 & 2 (context) - Loshin, Budge - 1992
6
A Compiler-assisted Scheme for Adaptive Cache Coherence Enfo.. (context) - Nguyen, Mounes-Toussi et al. - 1994
5
A Novel Cache Design for Vector Processing (context) - Yang, Yang - 1992
5
Special Report (context) - Up - 1992
5
Bounds on Memory Bandwidth in Streamed Computations
- McKee, Wulf et al. - 1995
4
Mentor Graphics Corporation (context) - Quicksim, Manual - 1993
4
An Empirical Study of the Work Load Distribution Under Stati..
- Li, Nguyen - 1994
4
Uniprocessor SMC Performance on Vectors with Non-Unit Stride..
- McKee - 1993
3
Cascade Design Automation (context) - Calculation, Document et al. - 1994
3
Hardware Support for Dynamic Access Ordering: Performance of..
- McKee - 1993
3
Dynamic Access Ordering for Symmetric Shared-Memory Multipro..
- McKee - 1994
3
Using Lookahead to Reduce Memory Bank Contention for Decoupl.. (context) - Bird, Uhlig - 1991
3
Advanced Microprocessors (context) - Tabak - 1991
3
The Dragon Processor (context) - Atkinson, McCreight - 1987
3
Design of a Processor Bus Interface ASIC for the Stream Memo.. (context) - McGee, Klenke et al. - 1994
2
Software Assistance for Directory-Based Caches
- Li - 1993
2
Reducing Memory Contention in Shared Memory Multiprocessors (context) - Harper - 1991
2
Logic Modeling Corporation (context) - Reference - 1992
1
Optimizing Synthesized High-Speed ASICs (context) - Landon - 1995
1
School of Engineering and Applied Science (context) - Benitez, Allocation et al. - 1994
1
personal communication (context) - Wolski - 1994
1
An Bibliography 191 Approach for Optimizing Synthesized High.. (context) - Landon, Klenke et al. - 1995
1
Prefetching in Multiprocessor Vector Cache Memories (context) - Fu, Patel - 1991
1
Iterative Methods for Sparse Matrices (context) - Evans - 1985
1
Data-Specific Optimizations
- Jinturkar - 1994
1
Data Structures, Algorithms and Software for Sparse Matrices (context) - Duff - 1985
1
Improving the Performance of a Directory-Based Cache Coheren.. (context) - Li, Mounes-Toussi et al. - 1994
1
A Conflict-Free Memory Design for Bibliography 195 Multiproc.. (context) - Shing, Ni - 1991
1
Extended Data Out (context) - Book - 1994
1
Effectiveness of Hardware-based Sequential and Stride Prefet.. (context) - Dahlgren, Stenstrom - 1994
1
to be published August (context) - Aluwihare - 1995
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.cs.virginia.edu/~sam3a/papers.html): More
Dynamic Access Ordering for Symmetric Shared-Memory Multiprocessors - McKee (1994)
(Correct)
Access Order and Memory-Conscious Cache Utilization - McKee, Wulf (1995)
(Correct)
Compiling for Efficient Memory Utilization - McKee (1996)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC