See this document in CiteSeerX!

Comparative Evaluation of Latency Reducing and Tolerating Techniques (1991)  (Make Corrections)  (109 citations)
Anoop Gupta, John Hennessy, Kourosh Gharachorloo, Todd Mowry, Wolf-Dietrich Weber
ACM Computer Architecture News, SIGARCH



  Home/Search   Context   Related

Links:   ACM   DBLP

 
View or download:
toronto.edu/~tcm/tcm_pape...isca91.ps.Z
toronto.edu/~tcm/tcm_pape...isca91.ps.Z
cmu.edu/user/tcm/www/tcm...isca91.ps.gz
Cached:  PS.gz  PS  PDF   Image  Update  Help
Problem Downloading?
From:  toronto.edu/~tcm/Papers (more)
From:  toronto.edu/~tcm/Papers
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: Techniques that can cope with the large latency of memory accesses are essential for achieving high processor utilization in large-scale shared-memory multiprocessors. In this paper, we consider four architectural techniques that address the latency problem: (i) hardware coherent caches, (ii) relaxed memory consistency, (iii) softwarecontrolled prefetching, and (iv) multiple-context support. While some studies of benefits of the individual techniques have been done, no study evaluates all of... (Update)

Cited by:   More
Memory Latency Rediction via Data Prefetching and Data Forwarding .. - Poulsen (1994)   (Correct)
Fast Accurate Simulation of Large Shared - Memory Multiprocessors Revised   (Correct)
High-Performance Frontends for Trace Processors - Jacobson (1999)   (Correct)

Similar documents (at the sentence level):
10.3%:   Tolerating Latency Through Software-Controlled Prefetching in.. - Mowry, Gupta (1991)   (Correct)
5.3%:   Performance Evaluation of Memory Consistency Models.. - Gharachorloo, Gupta.. (1991)   (Correct)
5.3%:   Memory Consistency Models for Shared-Memory Multiprocessors - Gharachorloo (1995)   (Correct)

Active bibliography (related documents):   More   All
0.4:   Exploiting Thread-Level Parallelism On . . . - Lo (1998)   (Correct)
0.3:   Architectural and Implementation Tradeoffs in the Design of.. - James Laudon (1992)   (Correct)
0.2:   Balanced Multithreading: Increasing Throughput via a.. - Tune, Kumar, Tullsen, .. (2004)   (Correct)

Similar documents based on text:   More   All
0.3:   A Comparative Evaluation of Software Techniques to Hide Memory.. - Lizy Kurian (1995)   (Correct)
0.3:   Optimizing Supercompilers for - Supercomputers The Mit   (Correct)
0.3:   Comparative Evaluation of Latency Tolerance Techniques for.. - Mowry, Chan, Lo (1998)   (Correct)

Related documents from co-citation:   More   All
46:   Tolerating latency through software-controlled prefetching in shared-memory mult.. - Mowry, Gupta - 1991
28:   SPLASH: Stanford parallel applications for shared memory (context) - Singh, Weber et al. - 1992
24:   How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Progr.. (context) - Lamport - 1979

BibTeX entry:   (Update)

Anoop Gupta, John Hennessy, Kourosh Gharachorloo, Todd Mowry, and WolfDietrich Weber. Comparative evaluation of latency reducing and tolerating techniques. In Proceedings of the 18th International Conference on Computer Architecture, pages 254--263. IEEE, May 1991. http://citeseer.ist.psu.edu/gupta91comparative.html   More

@inproceedings{ gupta91comparative,
    author = "A. Gupta and J. Hennessy and K. Gharachorloo and T. Mowry and W.-D. Weber",
    title = "Comparative Evaluation of Latency Reducing and Tolerating Techniques",
    booktitle = "Proceedings of the 18th International Symposium on Computer Architecture ({ISCA})",
    journal = "ACM Computer Architecture News, SIGARCH",
    volume = "19",
    number = "3",
    publisher = "ACM Press",
    address = "New York, NY",
    pages = "254--265",
    year = "1991",
    url = "citeseer.ist.psu.edu/gupta91comparative.html" }
Citations (may not include all citations):
468   Memory consistency and event ordering in scalable shared-mem.. - Gharachorloo, Lenoski et al. - 1990  ACM   DBLP
357   The directory-based cache coherence protocol for the DASH mu.. (context) - Lenoski, Laudon et al. - 1990  ACM   DBLP
249   Tolerating latency through softwarecontrolled prefetching in.. - Mowry, Gupta - 1991
213   Weak ordering - A new definition - Adve, Hill - 1990  DBLP
212   April: A processor architecture for multiprocessing - Agarwal, Lim et al. - 1990  ACM   DBLP
165   Memory access buffering in multiprocessors (context) - Dubois, Scheurich et al. - 1986  ACM   DBLP
157   Architecture and applications of the HEP multiprocessor comp.. (context) - Smith - 1981
155   Cache coherence protocols: Evaluation using a multiprocessor.. (context) - Archibald, Baer - 1986  ACM   DBLP
107   Software Methods for Improvement of Cache Performance on Sup.. (context) - Porterfield - 1989  ACM
92   Performance evaluation of memory consistency models for shar.. - Gharachorloo, Gupta et al. - 1991  ACM
90   The IBM research parallel processor prototype (context) - Pfister, Brantley et al. - 1985
83   Compilerdirected data prefetching in multiprocessors with me.. - Gornish, Granston et al. - 1990
72   MASA: A multithreaded processor architecture for parallel sy.. (context) - Halstead, Fujita - 1988  ACM   DBLP
55   Exploring the benefits of multiple hardware contexts in a mu.. (context) - Weber, Gupta - 1989  ACM   DBLP
48   Portable Programs for Parallel Processors (context) - Lusk, Overbeek - 1987  ACM
48   Evaluating the performance of four snooping cache coherency .. (context) - Eggers, Katz - 1989  ACM   DBLP
43   Performance tradeoffs in multithreaded processors - Agarwal - 1989  ACM   DBLP
42   Lockup free instruction fetchprefetch cache organization (context) - free, prefetch et al. - 1981
31   Data prefetching in shared memory multiprocessors (context) - Lee, Yew et al. - 1987
17   The Effectiveness of Caches and Data Prefetch Buffers in Lar.. (context) - Lee - 1987  ACM
10   Technical Report no (context) - Goodman, sequential - 1989
8   Technical Report CSL-TR (context) - Goldschmidt, Davis et al. - 1990
7   The Butterfly parallel processor (context) - Schmidt - 1987
7   and improvement of the cache behavior of shared data in cach.. (context) - Torrellas, Lam et al. - 1990
7   Vectorization of a particle simulation method for hypersonic.. (context) - McDonald, Baganoff - 1988
6   Parallel distributed-time logic simulation (context) - Soule, Gupta - 1989  ACM
6   Hierarchical cachebu architecture shared memory multiprocess.. (context) - Hierarchical, architecture et al. - 1987
6   Toward dataflowvon Neumann hybrid architecture (context) - dataflow, architecture et al. - 1988
4   Analysis of multithreaded architectures for parallel computi.. (context) - Saavedra-Barrera, Culler et al. - 1990  ACM   DBLP
1   How to make a multiprocessor computer that correctly execute.. (context) - Lampon - 1979  DBLP



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.eecg.toronto.edu/~tcm/Papers.html):   More
Informing Loads: Enabling Software To Observe And.. - Horowitz.. (1995)   (Correct)
Compiler-Based Prefetching for Recursive Data Structures - Luk (1996)   (Correct)
Informing Memory Operations: Providing Memory Performance.. - Horowitz (1996)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC