• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 231
Next 10 →

Microarchitecture optimizations for exploiting memory-level parallelism

by Yuan Chou, Brian Fahs, Santosh Abraham, Sun Microsystems - In ISCA-31 , 2004
"... The performance of memory-bound commercial applications such as databases is limited by increasing memory latencies. In this paper, we show that exploiting memory-level parallelism (MLP) is an effective approach for improving the performance of these applications and that microarchitecture has a pro ..."
Abstract - Cited by 97 (3 self) - Add to MetaCart
in improving MLP and overall performance by imple-menting effective instruction prefetching, more accurate branch prediction and better value prediction in addition to runahead ex-ecution. 1

Prefetching using Markov predictors

by Doug Joseph, Dirk Grunwald - In ISCA , 1997
"... Prefetching is one approach to reducing the latency of memory op-erations in modem computer systems. In this paper, we describe the Markov prefetcher. This prefetcher acts as an interface between the on-chip and off-chip cache, and can be added to existing com-puter designs. The Markov prefetcher is ..."
Abstract - Cited by 308 (1 self) - Add to MetaCart
is distinguished by prefetch-ing multiple reference predictions from the memory subsystem, and then prioritizing the delivery of those references to the processor. This design results in a prefetching system that provides good coverage, is accurate and produces timely results that can be ef-fectively used

Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors

by Todd Mowry, Anoop Gupta - Journal of Parallel and Distributed Computing , 1991
"... The large latency of memory accesses is a major obstacle in obtaining high processor utilization in large scale shared-memory multiprocessors. Although the provision of coherent caches in many recent machines has alleviated the problem somewhat, cache misses still occur frequently enough that they s ..."
Abstract - Cited by 302 (18 self) - Add to MetaCart
that they significantly lower performance. In this paper we evaluate the effectiveness of non-binding software-controlled lyrefetching, as proposed in the Stanford DASH Multiprocessor, to address this problem. The prefetches are non-binding in the sense that the prefetched data is brought to a cache close

An effective on-chip preloading scheme to reduce data access penalty

by Jean-loup Baer, Tien-fu Chen - In Proceedings of the 1991 ACM/IEEE conference on Supercomputing, Supercomputing ’91 , 1991
"... Conventional cache prefetching approaches can be either hardware-based, generally by using a one-block-Iookahead technique, or compiler-directed, with inser-tions of non-blocking prefetch instructions. We intro-duce a new hardware scheme based on the prediction of the execution of the instruction st ..."
Abstract - Cited by 255 (4 self) - Add to MetaCart
Conventional cache prefetching approaches can be either hardware-based, generally by using a one-block-Iookahead technique, or compiler-directed, with inser-tions of non-blocking prefetch instructions. We intro-duce a new hardware scheme based on the prediction of the execution of the instruction

Reducing Memory Latency via Non-blocking and Prefetching Caches

by Tien-Fu Chen, Tien-fu Chen, Jean-loup Baer, Jean-loup Baer , 1992
"... Non-blocking caches and prefetching caches are two techniques for hiding memory latency by exploiting the overlap of processor computations with data accesses. A non-blocking cache allows execution to proceed concurrently with cache misses as long as dependency constraints are observed, thus exploit ..."
Abstract - Cited by 164 (2 self) - Add to MetaCart
on the combination of these approaches. We also consider compiler-based optimizations to enhance the effectiveness of non-blocking caches. Results from instruction level simulations on the SPEC benchmarks show that the hardware prefetching caches generally outperform non-blocking caches. Also, the relative

Non-Referenced Prefetch (NRP) Cache for Instruction Prefetching

by Gi-ho Park, Oh-young Kwon, Tack-Don Han, Shin-dug Kim - IEE Proceedings of Computers and Digital Tech., vol.143, no.1 , 1996
"... A new conceptual cache, NRP (Non-Referenced Prefetch) cache, is proposed to improve the performance of instruction prefetch mechanisms which try to prefetch both the sequential and non-sequential blocks under the limited memory bandwidth. The NRP cache is used in storing prefetched blocks which were ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
A new conceptual cache, NRP (Non-Referenced Prefetch) cache, is proposed to improve the performance of instruction prefetch mechanisms which try to prefetch both the sequential and non-sequential blocks under the limited memory bandwidth. The NRP cache is used in storing prefetched blocks which

Threaded Prefetching: An Adaptive Instruction Prefetch Mechanism

by Seong Baeg Kim, Myung Soon Park, Kim Myung, Soon Park, Sun-ho Park, Chong Sang Kim, Sang Lyul Min, Heonshik Shin, Shin Chong, Sang Kim, Deog-kyoon Jeong , 1993
"... We propose and analyze an adaptive instruction prefetch scheme, called threaded prefetching, that makes use of history information to guide the prefetching. The scheme is based on the observation that control flow paths are likely to repeat themselves. In the proposed scheme, we associate with each ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
, in effect, encode the causal relationship between an instruction block and the instruction blocks that have been brought into the cache by the block. The results from trace-driven simulations using SPEC benchmarks show that the proposed scheme improves the prefetch accuracy by more than 100 % on average

Cooperative Prefetching: Compiler and Hardware Support for Effective Instruction Prefetching in Modern Processors

by Chi-Keung Luk, Todd C. Mowry - In 31st International Symposium on Microarchitecture , 1998
"... Instruction cache miss latency is becoming an increasingly importantperformance bottleneck, especially for commercial applications. Although instruction prefetching is an attractive technique for tolerating this latency, we find that existing prefetching schemes are insufficient for modern superscal ..."
Abstract - Cited by 28 (2 self) - Add to MetaCart
sequential prefetching combined with a novel prefetch filtering mechanism to allow it to get far ahead without polluting the cache. To hide the latency of non-sequential accesses, we propose and implement a novel compiler algorithm which automatically inserts instructionprefetch instructions

Effective Instruction Prefetching in Chip Multiprocessors

by For Modern Commercial, Lawrence Spracklen, Yuan Chou, Santosh G. Abraham - In Proc. of 11th Int'l Symp. on HPCA , 2005
"... In this paper, we study the instruction cache miss behavior of four modern commercial applications (a database workload, TPC-W, SPECjAppServer2002 and SPECweb99). These applications exhibit high instruction cache miss rates for both the L1 and L2 caches, and a sizable performance improvement can be ..."
Abstract - Add to MetaCart
In this paper, we study the instruction cache miss behavior of four modern commercial applications (a database workload, TPC-W, SPECjAppServer2002 and SPECweb99). These applications exhibit high instruction cache miss rates for both the L1 and L2 caches, and a sizable performance improvement can

Branch History Guided Instruction Prefetching

by Viji Srinivasan, Edward S. Davidson, Gary S. Tyson, Mark J. Charney, Thomas R. Puzak - In Proceedings of the Seventh International Conference on High Performance Computer Architecture (HPCA , 2001
"... Instruction cache misses stall the fetch stage of the processor pipeline and hence affect instruction supply to the processor. Instruction prefetching has been proposed as a mechanism to reduce instruction cache (I-cache) misses. However, a prefetch is effective only if accurate and initiated suffic ..."
Abstract - Cited by 14 (1 self) - Add to MetaCart
Instruction cache misses stall the fetch stage of the processor pipeline and hence affect instruction supply to the processor. Instruction prefetching has been proposed as a mechanism to reduce instruction cache (I-cache) misses. However, a prefetch is effective only if accurate and initiated
Next 10 →
Results 1 - 10 of 231
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University