• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 1,580
Next 10 →

Caching in the Sprite Network File System

by Michael N. Nelson, Brent B. Welch, John K. Ousterhout - ACM Transactions on Computer Systems , 1988
"... The Sprite network operating system uses large main-memory disk block caches to achieve high performance in its file system. It provides non-write-through file caching on both client and server machines. A simple cache consistency mechanism permits files to be shared by multiple clients without dang ..."
Abstract - Cited by 296 (12 self) - Add to MetaCart
The Sprite network operating system uses large main-memory disk block caches to achieve high performance in its file system. It provides non-write-through file caching on both client and server machines. A simple cache consistency mechanism permits files to be shared by multiple clients without

Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors

by Todd Mowry, Anoop Gupta - Journal of Parallel and Distributed Computing , 1991
"... The large latency of memory accesses is a major obstacle in obtaining high processor utilization in large scale shared-memory multiprocessors. Although the provision of coherent caches in many recent machines has alleviated the problem somewhat, cache misses still occur frequently enough that they s ..."
Abstract - Cited by 302 (18 self) - Add to MetaCart
The large latency of memory accesses is a major obstacle in obtaining high processor utilization in large scale shared-memory multiprocessors. Although the provision of coherent caches in many recent machines has alleviated the problem somewhat, cache misses still occur frequently enough

Lock-free Dynamically Resizable Arrays

by Damian Dechev, Peter Pirkelbauer, Bjarne Stroustrup
"... Abstract. We present a first lock-free design and practical implementation of a dynamically resizable array (vector). The most extensively used container in the C++ Standard Library is vector, offering a combination of dynamic memory management and efficient random access. Our approach is based on a ..."
Abstract - Cited by 12 (8 self) - Add to MetaCart
by a factor of 10. The implemented approach is also applicable across a variety of symmetric multiprocessing (SMP) platforms. The performance evaluation on an 8-way AMD system with non-shared L2 cache demonstrated timing results comparable to the best available lock-based techniques for such systems

The filter cache: An energy efficient memory structure

by Johnson Kin, Munish Gupta, William H. Mangione-smith - In Proceedings of the 1997 International Symposium on Microarchitecture , 1997
"... Most modern microprocessors employ one or two levels of on-chip caches in order to improve performance. These caches are typically implemented with static RAM cells and often occupy a large portion of the chip area. Not surprisingly, these caches often consume a significant amount of power. In many ..."
Abstract - Cited by 222 (4 self) - Add to MetaCart
applications, such as portable devices, low power is more important than performance. We propose to trade performance for power consumption by filtering cache references through an unusually small L1 cache. An L2 cache, which is similar in size and structure to a typical L1 cache, is positioned behind

Cache Coherence Support for NonShared Bus Architecture on Heterogeneous MPSoCs

by Taeweon Suh - in Proceedings of the 42nd Design Automation Conference , 2005
"... We propose two novel integration techniques — bypass and bookkeeping — in the memory controller to address the cache coherence compatibility issue of a non-shared bus heterogeneous MPSoC. The bypass approach is an inexpensive and efficient solution for computation-bound applications while the bookke ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
We propose two novel integration techniques — bypass and bookkeeping — in the memory controller to address the cache coherence compatibility issue of a non-shared bus heterogeneous MPSoC. The bypass approach is an inexpensive and efficient solution for computation-bound applications while

Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

by Sangyeun Cho, Lei Jin - IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE , 2006
"... This paper presents and studies a distributed L2 cache management approach through OS-level page allocation for future many-core processors. L2 cache management is a crucial multicore processor design aspect to overcome non-uniform cache access latency for good program performance and to reduce on-c ..."
Abstract - Cited by 134 (11 self) - Add to MetaCart
This paper presents and studies a distributed L2 cache management approach through OS-level page allocation for future many-core processors. L2 cache management is a crucial multicore processor design aspect to overcome non-uniform cache access latency for good program performance and to reduce on

Disco: Running commodity operating systems on scalable multiprocessors

by Edouard Bugnion, Scott Devine, Mendel Rosenblum - ACM Transactions on Computer Systems , 1997
"... In this paper we examine the problem of extending modern operating systems to run efficiently on large-scale shared memory multiprocessors without a large implementation effort. Our approach brings back an idea popular in the 1970s, virtual machine monitors. We use virtual machines to run multiple c ..."
Abstract - Cited by 253 (10 self) - Add to MetaCart
where the virtual machines transparently share major data structures such as the program code and the file system buffer cache. We use the distributed system support of modern operating systems to export a partial single system image to the users. The overall solution achieves most of the benefits

Exploiting Coarse Grain Non-Shared Regions in Snoopy Coherent Multiprocessors

by Andreas Moshovos , 2003
"... It has been shown that many requests miss in all remote nodes in shared memory multiprocessors. We are motivated by the observation that this behavior extends to much coarser grain areas of memory. We define a region to be a continuous, aligned memory area whose size is a power of two and observe th ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
non-shared regions. A node with a RegionScout filter can determine in advance that a request will miss in all remote nodes. RegionScout filters are implemented as a layered extension over existing snoop-based coherence systems. They require no changes to existing coherence protocols or caches

Managing Wire Delay in Large Chip-Multiprocessor Caches

by Bradford M. Beckmann, David A. Wood - IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE , 2004
"... In response to increasing (relative) wire delay, architects have proposed various technologies to manage the impact of slow wires on large uniprocessor L2 caches. Block migration (e.g., D-NUCA and NuRapid) reduces average hit latency by migrating frequently used blocks towards the lower-latency bank ..."
Abstract - Cited by 157 (4 self) - Add to MetaCart
-latency banks. Transmission Line Caches (TLC) use on-chip transmission lines to provide low latency to all banks. Traditional stride-based hardware prefetching strives to tolerate, rather than reduce, latency. Chip multiprocessors (CMPs) present additional challenges. First, CMPs often share the on-chip L2

The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches

by David Tarjan, Kevin Skadron , 2010
"... Graphics Processing Units (GPUs) have recently emerged as a new platform for high performance, general-purpose computing. Because current GPUs employ deep multithreading to hide latency, they only have small, per-core caches to capture reuse and eliminate unnecessary off-chip accesses. This paper s ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
caches imprecisely, because it is only a performance hint. This simplifies the implementation and is so effective at capturing inter-core reuse that the L2 can be eliminated entirely. The sharing tracker is motivated by but not specific to the GPU and could be used in other manycore organizations.
Next 10 →
Results 1 - 10 of 1,580
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University