MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Data Prefetch Mechanisms for Accelerating Symbolic And Numeric Computation (1996) [10 citations — 0 self]

Download:
pdf | ps
by Sharad Mehrotra, Sharad Mehrotra Ph. D
ftp://ftp.csrd.uiuc.edu/pub/CSRD_Reports/reports/1488.ps.gz
Add To MetaCart

Abstract:

Despite rapid increases in CPU performance, the primary obstacles to achieving higher performance in contemporary processor organizations remain control and data hazards. Primary data cache misses are responsible for the majority of the data hazards. With CPU primary cache sizes limited by clock cycle time constraints, the performance of future CPUs is effectively going to be limited by the number of primary data cache misses whose penalty cannot be masked. To address this problem, this dissertation takes a detailed look at memory access patterns in complex, real-world programs. A simple memory reference pattern classification is introduced, which is applicable to a broad range of computations, including pointer-intensive and numeric codes. To exploit the new classification, a data prefetch device called the Indirect Reference Buffer (IRB) is proposed. The IRB extends data prefetching to indirect memory address sequences, while also handling dense scientific codes. It is distinguished from previous designs in its seamless integration of linear and indirect address prefetching. The behavior of the IRB on a suite of programs drawn from

Citations

3148 Computer architecture : a quantitative approach, 3rd ed – Hennessy, Patterson, et al. - 2003
680 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and – Jouppi - 1990
664 ATOM: A system for building customized program analysis tools – Srivastava, Eustace - 1994
537 Cache Memories – Smith - 1982
487 The cache performance and optimizations of blocked algorithms – LAM, ROTHBERG, et al. - 1991
432 Direct Methods for Sparse Matrices – DUFF, ERISMAN, et al. - 1986
296 Shade: A fast instruction-set simulator for execution profiling – Cmelik, Keppel - 1994
264 Tolerating Latency Through SoftwareControlled Prefetching in Shared-Memory Multiprocessors – Mowry, Gupta - 1991
216 Branch prediction strategies and branch target buffer design,” Computer – Lee, Smith - 1984
165 Evaluating Stream Buffers as a Secondary Cache Replacement – Palacharla, Kessler - 1994
159 Effective Hardware-based Data Prefetching for High-performance Processors – Chen, Baer - 1995
156 An architecture for software-controlled data prefetching – Klaiber, Levy - 1991
145 Aspects of Cache Memory and Instruction Buffer Performance – Hill - 1987
138 Cache profiling and the SPEC benchmarks: A case study – LEBECK, WOOD - 1994
135 Software methods for improvement of cache performance on supercomputer applications – Porterfield - 1989
134 Highly Concurrent Scalar Processing – Hsu - 1986
110 Stride directed prefetching in scalar processors – Fu, Patel - 1992
88 Compiler-directed data prefetching in multiprocessors with memory hierarchies – Gornish, Granston, et al. - 1990
88 Efficient program tracing – Larus - 1993
79 A Load-Instruction Unit for Pipelined Processors – Eickemeyer, Vassiliadis - 1993
75 Sequential Program Prefetching in Memory Hierarchies – Smith - 1978
74 The microarchitecture of superscalar processors – Smith, Sohi - 1995
70 Using lifetime predictors to improve memory allocation performance – Barrett, Zorn - 1993
66 Instruction Fetching: Coping with Code Bloat – Uhlig, Nagle, et al. - 1995
61 Effective Cache Prefetching on BusBased Multiprocessors – Tullsen, Eggers - 1995
53 Sequential Hardware Prefetching in Shared-Memory Multiprocessors – Dahlgren, Dubois, et al. - 1995
45 Prefetching in Supercomputer Instruction Caches – Smith, Hsu - 1992
38 Characterizing the behavior of sparse algorithms on caches – Temam, Jalby - 1992
37 Prefetch unit for vector operations on scalar computers – Sklenar - 1992
36 Performance characterization of the Alpha 21164 microprocessor using TP and SPEC workloads – Cvetanovic, Bhandarkar - 1996
36 Data prefetching for high-performance processors – Chen - 1993
33 Data prefetching in shared memory multiprocessors – Lee, Yew, et al. - 1987
32 Data relocation and prefetching for programs with large data sets – Yamada, Gyllenhall, et al. - 1994
32 Cache replacement with dynamic exclusion – McFarling - 1992
30 Speculative prefetching – Jegou, Temam - 1993
29 Streamlining data cache access with fast address calculation – Austin, Pnevmatikatos, et al. - 1995
22 Data Preload for Superscalar and VLIW Processors – Chen - 1993
22 Two-level adaptive branch prediction and instruction fetch mechanisms for high performance superscalar processors – Yeh - 1993
20 Influence of cross-interferences on blocked loops: A case study with matrix-vector multiply – Fricker, Temam, et al. - 1995
20 Designing programming languages for analyzability: A fresh look at pointer data structures – Hendren, Gao - 1992
19 Sunder: A Programmable Hardware Prefetch Architecture for Numerical Loops – Chiueh - 1994
16 Adaptive and Integrated Data Cache Prefetching for Shared-Memory Multiprocessors – Gornish - 1995
16 Compiler technology for future microprocessors – Hwu, Hank, et al. - 1995
15 Compilation-Based Prefetching for Memory Latency Tolerance – Selvidge - 1992
15 Garbage collection using a dynamic threatening boundary. Computer Science – Barrett, Zorn - 1993
14 A Preliminary Evaluation of Cache-Miss-Initiated Prefetching Techniques in Scalable Multiprocessors – Bianchini, LeBlanc - 1994
14 Compiler Optimization Technique for Data Cache Prefetching Using a Small CAM – Chi - 1994
13 Analysis of memory referencing behavior for design of local memories – McNiven, Davidson - 1988
12 Predicting load latencies using cache profiling – Abraham, Rau - 1994
8 A Data Prefetch Mechanism for Accelerating General Computation – Harrison, Mehrotra - 1994