Results 1 - 10
of
1,372
Next Cache Line and Set Prediction
- In Proceedings of the 22nd Annual International Symposium on Computer Architecture
, 1995
"... Accurate instruction fetch and branch prediction is increasingly important on today's wide-issue architectures. Fetch prediction is the process of determining the next instruction to request from the memory subsystem. Branch prediction is the process of predicting the likely out-come of branch ..."
Abstract
-
Cited by 63 (3 self)
- Add to MetaCart
target address. We call such an index a next cache line and set (NLS) predictor. ANLS predictor is a pointer into the instruction cache, indicating the target instruction of a branch. In this paper we examine the use of NLS predictors for efficient and accurate fetch and branch prediction. Previous
Limiting the Number of Dirty Cache Lines
"... Abstract—Caches often employ write-back instead of writethrough, since write-back avoids unnecessary transfers for multiple writes to the same block. For several reasons, however, it is undesirable that a significant number of cache lines will be marked “dirty”. Energy-efficient cache organizations, ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—Caches often employ write-back instead of writethrough, since write-back avoids unnecessary transfers for multiple writes to the same block. For several reasons, however, it is undesirable that a significant number of cache lines will be marked “dirty”. Energy-efficient cache organizations
Adapting Cache Line Size to Application Behavior
- IN ICS
, 1999
"... A cache line size has a significant effect on miss rate and memory traffic. Today's computers use a fixed line size, typically 32B, which may not be optimal for a given application. Optimal size may also change during application execution. This paper describes a cache in which the line (fetch) ..."
Abstract
-
Cited by 55 (11 self)
- Add to MetaCart
A cache line size has a significant effect on miss rate and memory traffic. Today's computers use a fixed line size, typically 32B, which may not be optimal for a given application. Optimal size may also change during application execution. This paper describes a cache in which the line (fetch
Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines
"... Caches are organized at a line-size granularity to exploit spatial locality. However, when spatial locality is low, many words in the cache line are not used. Unused words occupy cache space but do not contribute to cache hits. Filtering these words can allow the cache to store more cache lines. We ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Caches are organized at a line-size granularity to exploit spatial locality. However, when spatial locality is low, many words in the cache line are not used. Unused words occupy cache space but do not contribute to cache hits. Filtering these words can allow the cache to store more cache lines. We
Cache Line Boundary Allocation for Garbage Collected Systems
, 2007
"... Garbage-collected systems became increasingly popular with the release of the Java programming language. Cache performance of garbage-collected systems has been a heavily researched area. Past work has shown that cache-line utilization has been poor in garbage-collected systems. This work aims to re ..."
Abstract
- Add to MetaCart
Garbage-collected systems became increasingly popular with the release of the Java programming language. Cache performance of garbage-collected systems has been a heavily researched area. Past work has shown that cache-line utilization has been poor in garbage-collected systems. This work aims
Enabling Partial Cache Line Prefetching
- In International Conference on Parallel Processing (ICPP
, 2003
"... Hardware prefetching is a simple and effective technique for hiding cache miss latency and thus improving the overall performance. However, it comes with addition of prefetch buffers and causes significant memory traffic increase. In this paper we propose a new prefetching scheme which improves perf ..."
Abstract
- Add to MetaCart
Hardware prefetching is a simple and effective technique for hiding cache miss latency and thus improving the overall performance. However, it comes with addition of prefetch buffers and causes significant memory traffic increase. In this paper we propose a new prefetching scheme which improves
Efficient Procedure Mapping using Cache Line Coloring
- IN PROCEEDINGS OF THE SIGPLAN'97 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 1997
"... As the gap between memory and processor performance continues to widen, it becomes increasingly important to exploit cache memory effectively. Both hardware and software approaches can be explored to optimize cache performance. Hardware designers focus on cache organization issues, including replace ..."
Abstract
-
Cited by 81 (13 self)
- Add to MetaCart
replacement policy, associativity, line size and the resulting cache access time. Software writers use various optimization techniques, including software prefetching, data scheduling and code reordering. Our focus is on improving memory usage through code reordering compiler techniques. In this
Handling Cross Interferences by Cyclic Cache Line Coloring
- In Proceedings of the 1998 Parallel Architectures and Compilation Techniques Conference (PACT'98
, 1998
"... Cross interference, conflicting data from several arrays, is particularly grave for caches with limited associativity. We present a uniform scheme that reduces both self and cross interference. Techniques for cyclic register allocation, namely the meeting graph, help to improve the usage of cache li ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Cross interference, conflicting data from several arrays, is particularly grave for caches with limited associativity. We present a uniform scheme that reduces both self and cross interference. Techniques for cyclic register allocation, namely the meeting graph, help to improve the usage of cache
Cache Line Aware Optimizations for ccNUMA Systems
"... Current shared memory systems utilize complex memory hi-erarchies to maintain scalability when increasing the num-ber of processing units. Although hardware designers aim to hide this complexity from the programmer, ignoring the detailed architectural characteristics can harm performance significant ..."
Abstract
- Add to MetaCart
significantly. We propose to expose the block-based design of caches in parallel computers to middleware designers to allow semi-automatic performance tuning with the system-atic translation from algorithms to an analytic performance model. For this, we design a simple interface for cache line aware (CLa
Compiler-Directed Cache Line Size Adaptivity
, 2000
"... The performance of a computer system is highly dependent on the performance of the cache memory system. The traditional cache memory system has an organization with a line size that is fixed at design time. Miss rates for different applications can be improved if the line size could be adjusted dyna ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The performance of a computer system is highly dependent on the performance of the cache memory system. The traditional cache memory system has an organization with a line size that is fixed at design time. Miss rates for different applications can be improved if the line size could be adjusted
Results 1 - 10
of
1,372