(Enter summary)
Abstract: Memory latency is an important bottleneck in system performance that cannot be adequately solved by
hardware alone. Several promising software techniques have been shown to address this problem successfully
in specific situations. However, the generality of these software approaches has been limited because
current architectures do not provide a fine-grained, low-overhead mechanism to observe memory behavior
directly. To fill this need, we propose a new set of memory operations called informing ... (Update)
Context of citations to this paper: More
...mechanism offers quicker control transfers than current cache miss counters. Our second method (evaluated more fully in an earlier study [HMMS95] removes the explicit user state for the hit miss information, but retains the explicit dispatch instruction. In this case, the...
...which can be collected using informing memory operations is the precise miss rate of all memory references. A previous study [10] has demonstrated that per reference miss rates can be captured with low runtime overheads (less than a 25 ) and tolerable data cache...
Cited by: More
Compiler Orchestrated Prefetching via Speculation.. - Rabbah.. (2004)
(Correct)
Hybrid Compiler/Hardware Prefetching for Multiprocessors.. - Skeppstedt, Dubois (1997)
(Correct)
Predicting Data Cache Misses in Non-Numeric Applications.. - Chi-Keung Luk (1997)
(Correct)
Similar documents (at the sentence level):
15.2%: Informing Memory Operations: Memory Performance.. - Horowitz.. (1998)
(Correct)
Active bibliography (related documents): More All
0.7: Integrating Performance Monitoring and Communication in .. - Martonosi, Ofelt.. (1996)
(Correct)
0.4: Informing Memory Operations: Providing Memory Performance.. - Horowitz (1996)
(Correct)
0.2: Improving Balanced Scheduling with Compiler Optimizations that.. - Lo, Eggers (1995)
(Correct)
Similar documents based on text: More All
0.3: Tuning Memory Performance in Sequential and Parallel Programs - Martonosi, Gupta, Anderson (1995)
(Correct)
0.3: Resume - Ghosh
(Correct)
0.3: Memory Referencing Behavior in Compiler-Parallelized.. - Torrie, Martonosi..
(Correct)
Related documents from co-citation: More All
4: Design and evaluation of a compiler algorithm for prefetching
- Mowry, Lam et al. - 1992
3: Informing memory operations: Providing memory performance feedback in modern pro..
- Horowitz, Martonosi et al. - 1996
2: Performance Tradeoffs with Non-Blocking Loads (context) - Farkas, Jouppi - 1994
BibTeX entry: (Update)
M. Horowitz, M. Martonosi, T. C. Mowry, and M. D. Smith. Informing Loads: Enabling Software to Observe and React to Memory Behavior. Stanford CSL Technical Report CSL-TR-95-673. Stanford University. July 1995. http://citeseer.ist.psu.edu/article/horowitz95informing.html More
@techreport{ horowitz95informing,
author = "Mark Horowitz and Margaret Martonosi and Todd C. Mowry and Michael D. Smith",
title = "Informing Loads: Enabling Software to Observe and React to Memory Behavior",
number = "CSL-TR-95-673",
pages = "23",
year = "1995",
url = "citeseer.ist.psu.edu/article/horowitz95informing.html" }
Citations (may not include all citations):
2441
Johns Hopkins University Press (context) - Golub, Van Loan - 1989
1575
Computer Architecture: A Quantitative Approach (context) - Hennessy, Patterson - 1990
496
SPLASH: Stanford Parallel Applications for Shared Memory (context) - Singh, Weber et al. - 1991
474
A data locality optimizing algorithm (context) - Wolf, Lam - 1991
443
Improving direct-mapped cache performance by the addition of..
- Jouppi - 1990
407
Trace scheduling: A technique for global microcode compactio.. (context) - Fisher - 1981
376
The Cache Performance and Optimizations of Blocked Algorithm.. (context) - Lam, Rothberg et al. - 1991
362
The Stanford FLASH Multiprocessor (context) - Kuskin, Ofelt et al. - 1994
344
Design and evaluation of a compiler algorithm for prefetchin..
- Mowry, Lam et al. - 1992
249
Tolerating Latency Through Software-Controlled Data Prefetch..
- Mowry - 1994
166
The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel C..
- Reinhardt, Hill et al. - 1993
137
Lockup-free instruction fetch/prefetch cache organization (context) - Kroft - 1981
109
Cache Profiling and the SPEC Benchmarks: A Case Study
- Lebeck, Wood - 1994
107
Software Methods for Improvement of Cache Performance on Sup.. (context) - Porterfield - 1989
87
The implementation of a coherent memory abstraction on a NUM.. (context) - Cox, Fowler - 1989
80
Avoiding conflict misses dynamically in large direct-mapped ..
- Bershad, Lee et al. - 1994
70
Simple but effective techniques for NUMA memory management
- Bolosky, Fitzgerald et al. - 1989
61
Experimental comparison of memory management policies for NU..
- Jr, Carla - 1991
60
Scheduling and page migration for multiprocessor compute ser..
- Chandra, Devine et al. - 1994
50
Data access microarchitectures for superscalar processors wi..
- Chen, Mahlke et al. - 1991
50
Mtool: An Integrated System for Performance Debugging Shared.. (context) - Goldberg, Hennessy - 1993
41
The impact of hierarchical memory systems on linear algebra .. (context) - Gallivan, Jalby et al. - 1987
40
Interleaving: A Multithreading Technique Targeting Multiproc..
- Laudon, Gupta et al. - 1994
28
Technical Report CSL-TR (context) - Smith, Pixie - 1991
27
New CPU Benchmark Suites from SPEC (context) - Dixit - 1992
26
Performance-Measurement Tools in a Multiprocessor Environmen.. (context) - Burkhart, Millen - 1989
25
dual-issue CMOS Microprocessor (context) - Dobberpuhl, MHz - 1992
24
Support for Speculative Execution in High-Performance Proces.. (context) - Smith - 1992
24
Integrating Scalar Optimizations and Parallelization (context) - Tjiang, Wolf et al. - 1991
19
Two High-performance Workstations (context) - Dutton, Eiref et al. - 1992
10
Assembly Language Programming (context) - Paul - 1994
10
Page placement algorithms for large real-index caches (context) - Kessler, Hill - 1992
9
Automatic program transformations for virtual memory compute.. (context) - Abu-Sufah, Kuck et al. - 1979
8
Analyzing and Tuning Memory Performance in Sequential and Pa.. (context) - Martonosi - 1993
8
The organization of matrices and matrix operations in a page.. (context) - McKeller, Coffman - 1969
5
Architectural and Implementation Tradeoffs for Multiple-Cont.. (context) - Laudon - 1994
4
fills out PowerPC product line (context) - Gwennap - 1994
2
Technical report (context) - DECChip, Preliminary et al. - 1992
1
Special Report: Memory (context) - Comerford, Watson et al. - 1992
1
Instruction Set Reference Manual (context) - PA-RISC - 1992
Documents on the same site (http://www.cs.cmu.edu/~tcm/Papers.html): More
Predicting Data Cache Misses in Non-Numeric Applications.. - Mowry, Luk (1997)
(Correct)
Automatic Compiler-Inserted I/O Prefetching for.. - Mowry, Demke, Krieger (1996)
(Correct)
Cooperative Prefetching: Compiler and Hardware Support for.. - Luk, Mowry (1998)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC