See this document in CiteSeerX!

Maximizing Memory Bandwidth for Streamed Computations (1995)  (Make Corrections)  (15 citations)
Sally A. McKee



  Home/Search   Context   Related

 
View or download:
virginia.edu/pub/dissertati...9505.ps.Z
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  virginia.edu/~sam3a/papers (more)
Homepages:  S.Mckee  

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: Processor speeds are increasing much faster than memory speeds, and thus memory bandwidth is rapidly becoming the limiting performance factor for many applications, particularly those whose inner loops linearly traverse streams of vector-like data. Because they execute sustained accesses, these streaming computations are limited more by bandwidth than by latency. Examples of these kinds of programs include vector (scientific) computations, multi-media compression and decompression, encryption,... (Update)

Cited by:   More
Bottlenecks in Multimedia Processing with SIMD Style.. - Talla, John, Burger (2003)   (Correct)
Parallel Vector Access: A Technique for Improving Memory System.. - Mathew (2000)   (Correct)
Hardware Support to Reduce Overhead in Fine-Grain Media Codes - Talla, John, Burger (2001)   (Correct)

Active bibliography (related documents):   More   All
2.2:   Dynamic Access Ordering for Symmetric Shared-Memory Multiprocessors - McKee (1994)   (Correct)
1.8:   Access Order and Memory-Conscious Cache Utilization - McKee, Wulf (1995)   (Correct)
1.7:   Evaluation of Dynamic Access Ordering Hardware - McKee, Oliver, Wulf, Wright.. (1995)   (Correct)

Similar documents based on text:   More   All
0.3:   A Framework for Effective Scheduling of Data-Parallel.. - Walker (2001)   (Correct)
0.3:   Caches As Filters: A Framework for the Analysis of Caching Systems - Weikle (2001)   (Correct)
0.2:   Experimental Implementation of Dynamic Access Ordering - McKee, Klenke, Schwab.. (1993)   (Correct)

Related documents from co-citation:   More   All
9:   Code Generation for Streaming: An Access/Execute Mechanism (context) - Davidson, Benitez - 1991
8:   Access ordering and memory-conscious cache utilization (context) - McKee - 1994
7:   Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Assoc.. - Jouppi - 1990

BibTeX entry:   (Update)

S.A. McKee, "Maximizing Memory Bandwidth for Streamed Computations", Ph.D. Dissertation, University of Virginia, May, 1995. http://citeseer.ist.psu.edu/mckee95maximizing.html   More

@misc{ mckee95maximizing,
  author = "S. McKee",
  title = "Maximizing Memory Bandwidth for Streamed Computations",
  text = "S.A. McKee, Maximizing Memory Bandwidth for Streamed Computations, Ph.D.
    Dissertation, University of Virginia, May, 1995.",
  year = "1995",
  url = "citeseer.ist.psu.edu/mckee95maximizing.html" }
Citations (may not include all citations):
2003   The Art of Computer Programming (context) - Knuth - 1973
1575   Computer Architecture: A Quantitative Approach (context) - Hennessy, Patterson - 1990
723   Memory Coherence in Shared Virtual Memory Systems - Li, Hudak - 1989
606   How to Make a Multiprocessor Computer that Correctly Execute.. (context) - Lamport - 1979
468   Memory Consistency and Event Ordering in Scalable SharedMemo.. - Gharachorloo, Lenoski et al. - 1990
422   Implementation and Performance of Munin - Carter, Bennett et al. - 1991
376   The Cache Performance and Optimizations of Blocked Algorithm.. (context) - Lam, Rothberg et al. - 1991
357   The Directory-Based Cache Coherence Protocol for the DASH Mu.. (context) - Lenoski, Laudon et al. - 1990
345   Basic Linear Algebra Subprograms for Fortran Usage (context) - Lawson, Hanson et al. - 1979
344   Design and Evaluation of a Compiler Algorithm for Prefetchin.. - Mowry, Lam et al. - 1992
283   Optimizing Supercompilers for Supercomputers (context) - Wolfe - 1989
248   Solving Linear Systems on Vector and Shared Memory Computers (context) - Dongarra, Duff et al. - 1991
222   MIPS RISC Architecture (context) - Kane, Heinrich - 1992
213   Weak Ordering --- A New Definition - Adve, Hill - 1990
195   A New Solution to Coherence Problems in Multicache Systems (context) - Censier, Feautrier - 1978
165   Memory Access Buffering in Multiprocessors (context) - Dubois, Scheurich et al. - 1986
156   An Evaluation of Directory Schemes for Cache Coherence - Agarwal, Simoni et al. - 1988
155   Cache Coherence Protocols: Evaluation Using a Multiprocessor.. (context) - Archibald, Baer - 1986
149   Software Prefetching (context) - Callahan, Kennedy et al. - 1991
142   High-Performance Computer Architecture (context) - Stone - 1993
122   Firefly: A Multiprocessor Workstation (context) - Thacker, Stewart - 1987
122   An Effective On-Chip Preloading Scheme to Reduce Data Access.. (context) - Baer, Chen - 1991
121   An Architecture for Software-Controlled Data Prefetching (context) - Klaiber, Levy - 1991
111   Using Cache Memory to Reduce Processor-Memory Traffic (context) - Goodman - 1983
110   The Livermore Fortran Kernels: A Computer Test of the Numeri.. (context) - McMahon - 1986
109   Comparative Evaluation of Latency Reducing and Tolerating Te.. - Gupta, Hennessy et al. - 1991
107   Software Methods for Improvement of Cache Performance on Sup.. (context) - Porterfield - 1989
99   Adaptive Software Cache Management for Distributed Shared Me.. - Bennett, Carter et al. - 1990
98   A set of Level 3 Basic Linear Algebra Subprograms (context) - Dongarra, DuCroz et al. - 1990
97   The Architecture of Pipelined Computers (context) - Kogge - 1981
94   Stride Directed Prefetching in Scalar Processors (context) - Fu, Patel et al. - 1992
90   Reducing Memory Latency via Non-blocking and Prefetching Cac.. - Chen, Baer - 1992
88   The Working Set Model for Program Behavior (context) - Denning - 1968
87   The Implementation of a Coherent Memory Abstraction on a NUM.. (context) - Cox, Fowler - 1989
83   Compiler-directed Data Prefetching in Multiprocessor with Me.. - Gornish, Granston et al. - 1990
82   To Copy or Not to Copy: A Compile-Time Technique for Assessi.. - Temam, Granston et al. - 1993
76   The Wisconsin Multicube: A New LargeScale Cache-Coherent Mul.. (context) - Goodman, Woest - 1988
70   Simple But Effective Techniques for NUMA Memory Management - Bolosky, Fitzgerald et al. - 1989
66   Implementing a Cache Consistency Protocol (context) - Katz, Eggers et al. - 1985
65   Surpassing the TLB Performance of Superpages with Less Opera.. (context) - Talluri, Hill - 1994
65   PIPE: A VLSI Decoupled Architecture (context) - Goodman, Hsieh et al. - 1985
59   Analysis of Cache Invalidation Patterns in Multiprocessors (context) - Weber, Gupta - 1989
49   Sequential Hardware Prefetching in Shared-Memory Multiproces.. (context) - Dahlgren, Dubois et al. - 1995
49   Memory Hierarchy Management - Carr - 1989
48   Software-Extended Coherent Shared Memory: Performance and Co.. - Chaiken, Agarwal - 1994
46   Cache Coherence in Large-Scale Shared-Memory Multiprocessors.. - Lilja - 1993
43   Software-Controlled Caches in the VMP Multiprocessor (context) - Cheriton, Slavenburg et al. - 1986
43   Software Prefetching and Caching for Translation Lookaside B.. - Bala, Kaashoek et al. - 1994
42   Comparison of Hardware and Software Cache Coherence Schemes - Adve, Adve et al. - 1991
40   Code Generation for Streaming: An Access/Execute Mechanism (context) - Benitez, Davidson - 1991
38   Pseudo-Randomly Interleaved Memory - Rau - 1991
38   The Organization and Use of Parallel Memories (context) - Budnik, Kuck - 1971
38   Automatic Management of Programmable Caches (context) - Cytron, Karlovsky et al. - 1988
38   Digital Equipment Corporation (context) - Handbook - 1992
37   Prefetch Unit for Vector Operation on Scalar Computers (context) - Sklenar - 1992
37   Analysis of the Impact of Memory in Distributed Parallel Pro.. (context) - Peris, Squillante et al. - 1994
36   An Empirical Evaluation of Two Memory-Efficient Directory Me.. (context) - O'Krafka, Newton - 1990
36   PowerPC 601 RISC Microprocessor User's Manual (context) - Inc - 1993
34   The Performance Impact of Block Sizes and Fetch Strategies (context) - Przybylski - 1990
34   Access Ordering and Memory-Conscious Cache Utilization (context) - McKee, Wulf - 1995
33   Vector Access Performance in Parallel Memories Using a Skewe.. (context) - Harper, Jump - 1987
33   Evaluation of the WM Architecture - Wulf - 1992
33   Paging Tradeoffs in Distributed-Shared-Memory Multiprocessor.. - Burger, Hyder et al. - 1994
33   Synchronization, Coherence, and Event Ordering in Multiproce.. (context) - Dubois, Scheurich et al. - 1988
32   Intel Corporation (context) - XP, Book - 1991
32   A Performance Study of Memory Consistency Models - Zucker, Baer - 1992
31   Data Prefetching in Shared Memory Multiprocessors (context) - Lee, Yew et al. - 1987
30   The Declining Effectiveness of Dynamic Caching for General-P.. - Burger, Goodman et al. - 1995
25   Medusa: An Experiment in Distributed Operating System Struct.. (context) - Ousterhout, Scelza et al. - 1980
24   Guide to Parallel Programming on Sequent Computer Systems (context) - Osterhaug - 1989
24   Vector Computer Memory Bank Contention - Bailey - 1987
23   Memory Access Coalescing: a Technique for Eliminating Redund.. - Davidson, Jinturkar - 1994
23   High-speed DRAMs (context) - Quinnell - 1991
22   Access Ordering and Effective Memory Bandwidth - Moyer - 1993
22   A Cache Coherence Approach for Large Multiprocessor Systems (context) - Archibald - 1988
21   Hierarchical Cache/Bus Architecture for Shared Memory Multip.. (context) - Wilson - 1987
20   A Simulation Study of the CRAY X-MP Memory System (context) - Cheung, Smith - 1986
19   Address Transformation to Increase Memory Performance (context) - Harper - 1989
19   Performance of the iPSC/860 Node Architecture (context) - Moyer - 1991
16   Mountain View (context) - Overview, Inc - 1992
16   A Vectorizing Software Pipelining Compiler for LIW and Super.. (context) - Meadows, Nakamoto et al.
14   the Floating Point Performance of the i860 Microprocessor (context) - Lee - 1990
14   Interleaved Parallel Schemes: Improving Memory Throughput on.. (context) - Seznec, Lenfant - 1992
14   Issues in Multiprogrammed Multiprocessor Scheduling (context) - Leutenegger - 1990
13   Experimental Implementation of Dynamic Access Ordering - McKee, Klenke et al. - 1994
13   Sunder: A Programmable Hardware Prefetch Architecture for Nu.. (context) - Chiueh - 1994
12   The Chinese Remainder Theorem and the Prime Memory System (context) - Gao - 1993
12   Increasing Memory Bandwidth for Vector Computations - McKee, Moyer et al. - 1994
12   Decoupled Access/Execute Architectures (context) - Smith - 1984
12   Tolerating Data Access Latency with Register Preloading - Chen, Mahlke et al. - 1992
11   The CONVEX C-1 64-bit Supercomputer (context) - Wallach - 1985
11   Memory Bandwidth Optimizations for Wide-Bus Machines - Alexander, Bailey et al. - 1993
10   Evaluation of Memory System Extensions (context) - Li, Petersen - 1991
10   Behavioral Characterization of Multiprocessor Memory Systems.. (context) - Gallivan, Gannon et al. - 1989
10   An Efficient Architecture for Loop Based Data Preloading - Chen, Bringmann et al. - 1992
10   Scientific Computation: An Introduction with Parallel Comput.. (context) - Golub, Ortega - 1993
9   The Influence of Memory Hierarchy on Algorithm Organization:.. (context) - Gannon, Jalby - 1987
9   High Bandwidth Memory Systems for Superscalar Processors (context) - Sohi, Franklin - 1991
9   Block, Multistride Vector, and FFT Accesses in Parallel Memo.. (context) - Harper - 1991
9   Cascade Design Automation (context) - Manual - 1993
8   Estimating the Performance Advantages of Relaxing Consistenc.. (context) - Torrellas, Hennessy - 1990
8   The NAS860 Library User's Manual (context) - Lee - 1993
8   Code Restructuring to Exploit Page Mode and Read-Ahead Featu.. (context) - Palacharla, Kessler - 1995
8   Achieving High Performance on the i860 Microprocessor (context) - Lee - 1991
7   Computer Organization and Architecture: Principles of Princi.. (context) - Stallings - 1990
7   An Analytic Model of SMC Performance - McKee - 1993
7   Analytic Models of SMC Performance - McKee - 1994
7   A Comparison of Three Current Superscalar Designs (context) - Laird - 1992
6   On Array Storage for Conflict-Free Memory Access for Paralle.. (context) - Balakrishnan, Jain et al. - 1988
6   Breaking the Memory Bottleneck, Parts 1 & 2 (context) - Loshin, Budge - 1992
6   A Compiler-assisted Scheme for Adaptive Cache Coherence Enfo.. (context) - Nguyen, Mounes-Toussi et al. - 1994
5   A Novel Cache Design for Vector Processing (context) - Yang, Yang - 1992
5   Special Report (context) - Up - 1992
5   Bounds on Memory Bandwidth in Streamed Computations - McKee, Wulf et al. - 1995
4   Mentor Graphics Corporation (context) - Quicksim, Manual - 1993
4   An Empirical Study of the Work Load Distribution Under Stati.. - Li, Nguyen - 1994
4   Uniprocessor SMC Performance on Vectors with Non-Unit Stride.. - McKee - 1993
3   Cascade Design Automation (context) - Calculation, Document et al. - 1994
3   Hardware Support for Dynamic Access Ordering: Performance of.. - McKee - 1993
3   Dynamic Access Ordering for Symmetric Shared-Memory Multipro.. - McKee - 1994
3   Using Lookahead to Reduce Memory Bank Contention for Decoupl.. (context) - Bird, Uhlig - 1991
3   Advanced Microprocessors (context) - Tabak - 1991
3   The Dragon Processor (context) - Atkinson, McCreight - 1987
3   Design of a Processor Bus Interface ASIC for the Stream Memo.. (context) - McGee, Klenke et al. - 1994
2   Software Assistance for Directory-Based Caches - Li - 1993
2   Reducing Memory Contention in Shared Memory Multiprocessors (context) - Harper - 1991
2   Logic Modeling Corporation (context) - Reference - 1992
1   Optimizing Synthesized High-Speed ASICs (context) - Landon - 1995
1   School of Engineering and Applied Science (context) - Benitez, Allocation et al. - 1994
1   personal communication (context) - Wolski - 1994
1   An Bibliography 191 Approach for Optimizing Synthesized High.. (context) - Landon, Klenke et al. - 1995
1   Prefetching in Multiprocessor Vector Cache Memories (context) - Fu, Patel - 1991
1   Iterative Methods for Sparse Matrices (context) - Evans - 1985
1   Data-Specific Optimizations - Jinturkar - 1994
1   Data Structures, Algorithms and Software for Sparse Matrices (context) - Duff - 1985
1   Improving the Performance of a Directory-Based Cache Coheren.. (context) - Li, Mounes-Toussi et al. - 1994
1   A Conflict-Free Memory Design for Bibliography 195 Multiproc.. (context) - Shing, Ni - 1991
1   Extended Data Out (context) - Book - 1994
1   Effectiveness of Hardware-based Sequential and Stride Prefet.. (context) - Dahlgren, Stenstrom - 1994
1   to be published August (context) - Aluwihare - 1995



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.cs.virginia.edu/~sam3a/papers.html):   More
Dynamic Access Ordering for Symmetric Shared-Memory Multiprocessors - McKee (1994)   (Correct)
Access Order and Memory-Conscious Cache Utilization - McKee, Wulf (1995)   (Correct)
Compiling for Efficient Memory Utilization - McKee (1996)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC