Results 1 - 10
of
1,580
Caching in the Sprite Network File System
- ACM Transactions on Computer Systems
, 1988
"... The Sprite network operating system uses large main-memory disk block caches to achieve high performance in its file system. It provides non-write-through file caching on both client and server machines. A simple cache consistency mechanism permits files to be shared by multiple clients without dang ..."
Abstract
-
Cited by 296 (12 self)
- Add to MetaCart
The Sprite network operating system uses large main-memory disk block caches to achieve high performance in its file system. It provides non-write-through file caching on both client and server machines. A simple cache consistency mechanism permits files to be shared by multiple clients without
Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors
- Journal of Parallel and Distributed Computing
, 1991
"... The large latency of memory accesses is a major obstacle in obtaining high processor utilization in large scale shared-memory multiprocessors. Although the provision of coherent caches in many recent machines has alleviated the problem somewhat, cache misses still occur frequently enough that they s ..."
Abstract
-
Cited by 302 (18 self)
- Add to MetaCart
The large latency of memory accesses is a major obstacle in obtaining high processor utilization in large scale shared-memory multiprocessors. Although the provision of coherent caches in many recent machines has alleviated the problem somewhat, cache misses still occur frequently enough
Lock-free Dynamically Resizable Arrays
"... Abstract. We present a first lock-free design and practical implementation of a dynamically resizable array (vector). The most extensively used container in the C++ Standard Library is vector, offering a combination of dynamic memory management and efficient random access. Our approach is based on a ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
by a factor of 10. The implemented approach is also applicable across a variety of symmetric multiprocessing (SMP) platforms. The performance evaluation on an 8-way AMD system with non-shared L2 cache demonstrated timing results comparable to the best available lock-based techniques for such systems
The filter cache: An energy efficient memory structure
- In Proceedings of the 1997 International Symposium on Microarchitecture
, 1997
"... Most modern microprocessors employ one or two levels of on-chip caches in order to improve performance. These caches are typically implemented with static RAM cells and often occupy a large portion of the chip area. Not surprisingly, these caches often consume a significant amount of power. In many ..."
Abstract
-
Cited by 222 (4 self)
- Add to MetaCart
applications, such as portable devices, low power is more important than performance. We propose to trade performance for power consumption by filtering cache references through an unusually small L1 cache. An L2 cache, which is similar in size and structure to a typical L1 cache, is positioned behind
Cache Coherence Support for NonShared Bus Architecture on Heterogeneous MPSoCs
- in Proceedings of the 42nd Design Automation Conference
, 2005
"... We propose two novel integration techniques — bypass and bookkeeping — in the memory controller to address the cache coherence compatibility issue of a non-shared bus heterogeneous MPSoC. The bypass approach is an inexpensive and efficient solution for computation-bound applications while the bookke ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We propose two novel integration techniques — bypass and bookkeeping — in the memory controller to address the cache coherence compatibility issue of a non-shared bus heterogeneous MPSoC. The bypass approach is an inexpensive and efficient solution for computation-bound applications while
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
- IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE
, 2006
"... This paper presents and studies a distributed L2 cache management approach through OS-level page allocation for future many-core processors. L2 cache management is a crucial multicore processor design aspect to overcome non-uniform cache access latency for good program performance and to reduce on-c ..."
Abstract
-
Cited by 134 (11 self)
- Add to MetaCart
This paper presents and studies a distributed L2 cache management approach through OS-level page allocation for future many-core processors. L2 cache management is a crucial multicore processor design aspect to overcome non-uniform cache access latency for good program performance and to reduce on
Disco: Running commodity operating systems on scalable multiprocessors
- ACM Transactions on Computer Systems
, 1997
"... In this paper we examine the problem of extending modern operating systems to run efficiently on large-scale shared memory multiprocessors without a large implementation effort. Our approach brings back an idea popular in the 1970s, virtual machine monitors. We use virtual machines to run multiple c ..."
Abstract
-
Cited by 253 (10 self)
- Add to MetaCart
where the virtual machines transparently share major data structures such as the program code and the file system buffer cache. We use the distributed system support of modern operating systems to export a partial single system image to the users. The overall solution achieves most of the benefits
Exploiting Coarse Grain Non-Shared Regions in Snoopy Coherent Multiprocessors
, 2003
"... It has been shown that many requests miss in all remote nodes in shared memory multiprocessors. We are motivated by the observation that this behavior extends to much coarser grain areas of memory. We define a region to be a continuous, aligned memory area whose size is a power of two and observe th ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
non-shared regions. A node with a RegionScout filter can determine in advance that a request will miss in all remote nodes. RegionScout filters are implemented as a layered extension over existing snoop-based coherence systems. They require no changes to existing coherence protocols or caches
Managing Wire Delay in Large Chip-Multiprocessor Caches
- IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE
, 2004
"... In response to increasing (relative) wire delay, architects have proposed various technologies to manage the impact of slow wires on large uniprocessor L2 caches. Block migration (e.g., D-NUCA and NuRapid) reduces average hit latency by migrating frequently used blocks towards the lower-latency bank ..."
Abstract
-
Cited by 157 (4 self)
- Add to MetaCart
-latency banks. Transmission Line Caches (TLC) use on-chip transmission lines to provide low latency to all banks. Traditional stride-based hardware prefetching strives to tolerate, rather than reduce, latency. Chip multiprocessors (CMPs) present additional challenges. First, CMPs often share the on-chip L2
The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches
, 2010
"... Graphics Processing Units (GPUs) have recently emerged as a new platform for high performance, general-purpose computing. Because current GPUs employ deep multithreading to hide latency, they only have small, per-core caches to capture reuse and eliminate unnecessary off-chip accesses. This paper s ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
caches imprecisely, because it is only a performance hint. This simplifies the implementation and is so effective at capturing inter-core reuse that the L2 can be eliminated entirely. The sharing tracker is motivated by but not specific to the GPU and could be used in other manycore organizations.
Results 1 - 10
of
1,580