Results 1 - 10
of
231
Microarchitecture optimizations for exploiting memory-level parallelism
- In ISCA-31
, 2004
"... The performance of memory-bound commercial applications such as databases is limited by increasing memory latencies. In this paper, we show that exploiting memory-level parallelism (MLP) is an effective approach for improving the performance of these applications and that microarchitecture has a pro ..."
Abstract
-
Cited by 97 (3 self)
- Add to MetaCart
in improving MLP and overall performance by imple-menting effective instruction prefetching, more accurate branch prediction and better value prediction in addition to runahead ex-ecution. 1
Prefetching using Markov predictors
- In ISCA
, 1997
"... Prefetching is one approach to reducing the latency of memory op-erations in modem computer systems. In this paper, we describe the Markov prefetcher. This prefetcher acts as an interface between the on-chip and off-chip cache, and can be added to existing com-puter designs. The Markov prefetcher is ..."
Abstract
-
Cited by 308 (1 self)
- Add to MetaCart
is distinguished by prefetch-ing multiple reference predictions from the memory subsystem, and then prioritizing the delivery of those references to the processor. This design results in a prefetching system that provides good coverage, is accurate and produces timely results that can be ef-fectively used
Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors
- Journal of Parallel and Distributed Computing
, 1991
"... The large latency of memory accesses is a major obstacle in obtaining high processor utilization in large scale shared-memory multiprocessors. Although the provision of coherent caches in many recent machines has alleviated the problem somewhat, cache misses still occur frequently enough that they s ..."
Abstract
-
Cited by 302 (18 self)
- Add to MetaCart
that they significantly lower performance. In this paper we evaluate the effectiveness of non-binding software-controlled lyrefetching, as proposed in the Stanford DASH Multiprocessor, to address this problem. The prefetches are non-binding in the sense that the prefetched data is brought to a cache close
An effective on-chip preloading scheme to reduce data access penalty
- In Proceedings of the 1991 ACM/IEEE conference on Supercomputing, Supercomputing ’91
, 1991
"... Conventional cache prefetching approaches can be either hardware-based, generally by using a one-block-Iookahead technique, or compiler-directed, with inser-tions of non-blocking prefetch instructions. We intro-duce a new hardware scheme based on the prediction of the execution of the instruction st ..."
Abstract
-
Cited by 255 (4 self)
- Add to MetaCart
Conventional cache prefetching approaches can be either hardware-based, generally by using a one-block-Iookahead technique, or compiler-directed, with inser-tions of non-blocking prefetch instructions. We intro-duce a new hardware scheme based on the prediction of the execution of the instruction
Reducing Memory Latency via Non-blocking and Prefetching Caches
, 1992
"... Non-blocking caches and prefetching caches are two techniques for hiding memory latency by exploiting the overlap of processor computations with data accesses. A non-blocking cache allows execution to proceed concurrently with cache misses as long as dependency constraints are observed, thus exploit ..."
Abstract
-
Cited by 164 (2 self)
- Add to MetaCart
on the combination of these approaches. We also consider compiler-based optimizations to enhance the effectiveness of non-blocking caches. Results from instruction level simulations on the SPEC benchmarks show that the hardware prefetching caches generally outperform non-blocking caches. Also, the relative
Non-Referenced Prefetch (NRP) Cache for Instruction Prefetching
- IEE Proceedings of Computers and Digital Tech., vol.143, no.1
, 1996
"... A new conceptual cache, NRP (Non-Referenced Prefetch) cache, is proposed to improve the performance of instruction prefetch mechanisms which try to prefetch both the sequential and non-sequential blocks under the limited memory bandwidth. The NRP cache is used in storing prefetched blocks which were ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A new conceptual cache, NRP (Non-Referenced Prefetch) cache, is proposed to improve the performance of instruction prefetch mechanisms which try to prefetch both the sequential and non-sequential blocks under the limited memory bandwidth. The NRP cache is used in storing prefetched blocks which
Threaded Prefetching: An Adaptive Instruction Prefetch Mechanism
, 1993
"... We propose and analyze an adaptive instruction prefetch scheme, called threaded prefetching, that makes use of history information to guide the prefetching. The scheme is based on the observation that control flow paths are likely to repeat themselves. In the proposed scheme, we associate with each ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
, in effect, encode the causal relationship between an instruction block and the instruction blocks that have been brought into the cache by the block. The results from trace-driven simulations using SPEC benchmarks show that the proposed scheme improves the prefetch accuracy by more than 100 % on average
Cooperative Prefetching: Compiler and Hardware Support for Effective Instruction Prefetching in Modern Processors
- In 31st International Symposium on Microarchitecture
, 1998
"... Instruction cache miss latency is becoming an increasingly importantperformance bottleneck, especially for commercial applications. Although instruction prefetching is an attractive technique for tolerating this latency, we find that existing prefetching schemes are insufficient for modern superscal ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
sequential prefetching combined with a novel prefetch filtering mechanism to allow it to get far ahead without polluting the cache. To hide the latency of non-sequential accesses, we propose and implement a novel compiler algorithm which automatically inserts instructionprefetch instructions
Effective Instruction Prefetching in Chip Multiprocessors
- In Proc. of 11th Int'l Symp. on HPCA
, 2005
"... In this paper, we study the instruction cache miss behavior of four modern commercial applications (a database workload, TPC-W, SPECjAppServer2002 and SPECweb99). These applications exhibit high instruction cache miss rates for both the L1 and L2 caches, and a sizable performance improvement can be ..."
Abstract
- Add to MetaCart
In this paper, we study the instruction cache miss behavior of four modern commercial applications (a database workload, TPC-W, SPECjAppServer2002 and SPECweb99). These applications exhibit high instruction cache miss rates for both the L1 and L2 caches, and a sizable performance improvement can
Branch History Guided Instruction Prefetching
- In Proceedings of the Seventh International Conference on High Performance Computer Architecture (HPCA
, 2001
"... Instruction cache misses stall the fetch stage of the processor pipeline and hence affect instruction supply to the processor. Instruction prefetching has been proposed as a mechanism to reduce instruction cache (I-cache) misses. However, a prefetch is effective only if accurate and initiated suffic ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Instruction cache misses stall the fetch stage of the processor pipeline and hence affect instruction supply to the processor. Instruction prefetching has been proposed as a mechanism to reduce instruction cache (I-cache) misses. However, a prefetch is effective only if accurate and initiated
Results 1 - 10
of
231