Results 1 - 10
of
42
Informed Prefetching and Caching
- In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles
, 1995
"... The underutilization of disk parallelism and file cache buffers by traditional file systems induces I/O stall time that degrades the performance of modern microprocessor-based systems. In this paper, we present aggressive mechanisms that tailor file system resource management to the needs of I/O-int ..."
Abstract
-
Cited by 321 (8 self)
- Add to MetaCart
The underutilization of disk parallelism and file cache buffers by traditional file systems induces I/O stall time that degrades the performance of modern microprocessor-based systems. In this paper, we present aggressive mechanisms that tailor file system resource management to the needs of I/O-intensive applications. In particular, we show how to use application-disclosed access patterns (hints) to expose and exploit I/O parallelism and to allocate dynamically file buffers among three competing demands: prefetching hinted blocks, caching hinted blocks for reuse, and caching recently used data for unhinted accesses. Our approach estimates the impact of alternative buffer allocations on application execution time and applies a cost-benefit analysis to allocate buffers where they will have the greatest impact. We implemented informed prefetching and caching in DEC’s OSF/1 operating system and measured its performance on a 150 MHz Alpha equipped with 15 disks running a range of applications including text search, 3D scientific visualization, relational database queries, speech recognition, and computational chemistry. Informed prefetching reduces the execution time of the first four of these applications by 20 % to 87%. Informed caching reduces the execution time of the fifth application by up to 30%.
UNIX Disk Access Patterns
, 1993
"... Disk access patterns are becoming ever more important to understand as the gap between processor and disk performance increases. The study presented here is a detailed characterization of every lowlevel disk access generated by three quite different systems over a two month period. The contributions ..."
Abstract
-
Cited by 242 (20 self)
- Add to MetaCart
Disk access patterns are becoming ever more important to understand as the gap between processor and disk performance increases. The study presented here is a detailed characterization of every lowlevel disk access generated by three quite different systems over a two month period. The contributions of this paper are the detailed information we provide about the disk accesses on these systems (many of our results are significantly different from those reported in the literature, which provide summary data only for file-level access on small-memory systems); and the analysis of a set of optimizations that could be applied at the disk level to improve performance. Our traces show that the majority of all operations are writes; disk accesses are rarely sequential; 25-- 50% of all accesses are asynchronous; only 13--41% of accesses are to user data (the rest result from swapping, metadata, and program execution); and I/O activity is very bursty: mean request queue lengths seen by an incoming request range from 1.7 to 8.9 (1.2--1.9 for reads, 2.0--14.8 for writes), while we saw 95th percentile queue lengths as large as 89 entries, and maxima of over 1000. Using a simulator to analyze the effect of write caching at the disk level, we found that using a small non-volatile cache at each disk allowed writes to be serviced considerably faster than with a regular disk. In particular, short bursts of writes go much faster -- and such bursts are common: writes rarely come singly. Adding even 8KB of non-volatile memory per disk could reduce disk traffic by 10-- 18%, and 90% of metadata write traffic can be absorbed with as little as 0.2MB per disk of nonvolatile RAM. Even 128KB of NVRAM cache in each disk can improve write performance by as much as a factor of three. FCFS scheduling...
Input/Output Behavior of Supercomputing Applications
- ACM/IEEE CONFERENCE ON SUPERCOMPUTING
, 1991
"... This paper describes the collection and analysis of supercomputer I/O traces and their use in a collection of buffering and caching simulations. This serves two purposes. First, it gives a model of how individual applications running on supercomputers request file system I/O, allowing system designe ..."
Abstract
-
Cited by 96 (9 self)
- Add to MetaCart
This paper describes the collection and analysis of supercomputer I/O traces and their use in a collection of buffering and caching simulations. This serves two purposes. First, it gives a model of how individual applications running on supercomputers request file system I/O, allowing system designers to optimize I/O hardware and file system algorithms to that model. Second, the buffering simulations show what resources are needed to maximize the CPU utilization of a supercomputer given a very bursty I/O request rate. By using read-ahead and write-behind in a large solid-state disk,one or two applications were sufficient to fully maximize a Cray Y-MP CPU.
My cache or yours? Making storage more exclusive
- In Proceedings of the 2002 USENIX Annual Technical Conference
, 2002
"... Modern high-end disk arrays often have several gigabytes of cache RAM. Unfortunately, most array caches use management policies which duplicate the same data blocks at both the client and array levels of the cache hierarchy: they are inclusive. Thus, the aggregate cache behaves as if it was only as ..."
Abstract
-
Cited by 88 (0 self)
- Add to MetaCart
Modern high-end disk arrays often have several gigabytes of cache RAM. Unfortunately, most array caches use management policies which duplicate the same data blocks at both the client and array levels of the cache hierarchy: they are inclusive. Thus, the aggregate cache behaves as if it was only as big as the larger of the client and array caches, instead of as large as the sum of the two. Inclusiveness is wasteful: cache RAM is expensive. We explore the benefits of a simple scheme to achieve exclusive caching, in which a data block is cached at either a client or the disk array, but not both. Exclusiveness helps to create the effect of a single, large unified cache. We introduce a DEMOTE operation to transfer data ejected from the client to the array, and explore its effectiveness with simulation studies. We quantify the benefits and overheads of demotions across both synthetic and real-life workloads. The results show that we can obtain useful -- sometimes substantial -- speedups. During our investigations, we also developed some new cache-insertion algorithms that show promise for multi-client systems, and report on some of their properties.
Flash Memory File Caching for Mobile Computers
, 1994
"... In this paper we examine the impact of using flash memory as a second-level file system buffer cache to reduce power consumption and file access latency on a mobile computer. We use trace-driven simulation to evaluate the impact of what we call a FlashCache. We relate the power consumption and acce ..."
Abstract
-
Cited by 58 (5 self)
- Add to MetaCart
In this paper we examine the impact of using flash memory as a second-level file system buffer cache to reduce power consumption and file access latency on a mobile computer. We use trace-driven simulation to evaluate the impact of what we call a FlashCache. We relate the power consumption and access latency of the storage sub-system to the characteristics of the FlashCache: its size, the unit of erasure, and access costs. We find that a FlashCache can reduce the power consumption of the storage subsystem by 20-40% and improve overall response time by 30-70% when combined with an aggressive disk management policy. When combined with a more conservative policy, power is reduced from 40-70% while overall response time is improved 20-60%. We also find that durability is not a problem; a 4 MB FlashCache will last 33 years. 1 Introduction The storage subsystem on a mobile computer, usually consisting of DRAM and magnetic disk, is an important consumer of battery power. Table 1 lists the ...
Practical Prefetching Techniques for Parallel File Systems
- In Proceedings of the First International Conference on Parallel and Distributed Information Systems
, 1991
"... Improvements in the processing speed of multiprocessors are outpacing improvements in the speed of disk hardware. Parallel disk I/O subsystems have been proposed as one way to close the gap between processor and disk speeds. In a previous paper we showed that prefetching and caching have the potenti ..."
Abstract
-
Cited by 52 (2 self)
- Add to MetaCart
Improvements in the processing speed of multiprocessors are outpacing improvements in the speed of disk hardware. Parallel disk I/O subsystems have been proposed as one way to close the gap between processor and disk speeds. In a previous paper we showed that prefetching and caching have the potential to deliver the performance bene ts of parallel le systems to parallel applications. In this paper we describe experiments with practical prefetching policies, and show that prefetching can be implemented e ciently even for the more complex parallel le access patterns. We also test the ability of these policies across a range of architectural parameters. 1
A Status Report on Research in Transparent Informed Prefetching
- ACM Operating Systems Review
, 1993
"... This paper focuses on extending the power of caching and prefetching to reduce file read latencies by exploiting application level hints about future I/O accesses. We argue that systems that disclose high-level knowledge can transfer optimization information across module boundaries in a manner cons ..."
Abstract
-
Cited by 47 (4 self)
- Add to MetaCart
This paper focuses on extending the power of caching and prefetching to reduce file read latencies by exploiting application level hints about future I/O accesses. We argue that systems that disclose high-level knowledge can transfer optimization information across module boundaries in a manner consistent with sound software engineering principles. Such Transparent Informed Prefetching (TIP) systems provide a technique for converting the high throughput of new technologies such as disk arrays and log-structured file systems into low latency for applications. Our preliminary experiments show that even without a highthroughput I/O subsystem TIP yields reduced execution time of up to 30% for applications obtaining data from a remote file server and up to 13% for applications obtaining data from a single local disk. These experiments indicate that greater performance benefits will be available when TIP is integrated with low level resource management policies and highly parallel I/O subsys...
Practical prefetching techniques for multiprocessor le systems
- Journal of Distributed and Parallel Databases
, 1993
"... Abstract. Improvements in the processing speed of multiprocessors are outpacing improvements in the speed of disk hardware. Parallel disk I/O subsystems have been proposed as one way to dose the gap between processor and disk speeds. In a previous paper we showed that prefetching and caching have th ..."
Abstract
-
Cited by 45 (6 self)
- Add to MetaCart
Abstract. Improvements in the processing speed of multiprocessors are outpacing improvements in the speed of disk hardware. Parallel disk I/O subsystems have been proposed as one way to dose the gap between processor and disk speeds. In a previous paper we showed that prefetching and caching have the potentT"al to deliver the performance benefits of parallel file systems to parallel applications. In this paper we describe experiments with practical prefetching policies that base decisions only on on-line reference history, and that can be implemented efficiently. We also test the ability of those policies across a range of architectural parameters. Keywords: multiprocessor file systems, parallel I/O, file caching, prefetching 1.
Memory Servers for Multicomputers
, 1993
"... In this paper, we investigate a virtual memory management technique for multicomputers called memory servers. The memory server model extends the memory hierarchy of multicomputers by introducing a remote memory server layer. Memory servers are multicomputer nodes whose memory is used for fast back ..."
Abstract
-
Cited by 42 (1 self)
- Add to MetaCart
In this paper, we investigate a virtual memory management technique for multicomputers called memory servers. The memory server model extends the memory hierarchy of multicomputers by introducing a remote memory server layer. Memory servers are multicomputer nodes whose memory is used for fast backing storage and logically lie between the local physical memory and disks. The paper presents the model, describes how the model supports sequential programs, message-passing programs and shared virtual memory systems, discusses several design issues, and shows preliminary results of a prototype implementation on an Intel iPSC/860. Keywords: distributed memory, memory server, multicomputer, memory management unit, virtual memory. Introduction Multicomputers can provide very high processor performance and tremendous total storage capacity at relatively low costs. Scalable massively parallel multicomputers are attractive architectures because they take advantage of the performance curves of...
Performance Modeling for Realistic Storage Devices
, 1997
"... Managing large amounts of storage is difficult and becoming more so as both the complexity and number of storage devices are increasing. One approach to this problem is a self-managing storage system. Since a self-managing storage system is a real-time system, it requires a model that quickly approx ..."
Abstract
-
Cited by 36 (8 self)
- Add to MetaCart
Managing large amounts of storage is difficult and becoming more so as both the complexity and number of storage devices are increasing. One approach to this problem is a self-managing storage system. Since a self-managing storage system is a real-time system, it requires a model that quickly approximates the behavior of the storage device in a workload-dependent fashion. We develop such a model.
Our approach to modeling storage devices is to model the individual physical components of the device, such as queues, caches, and disk mechanisms, and then compose the component models. Each component model determines its behavior from the specification of the entering workload and the lower-level device behavior. To support the lower level component model in determining its behavior, each component model creates a modified workload specification to support the manner that the physical component would modify the entering workload. Modifying the workload specification allows us, for example, to capture the altered spatial locality that occurs when queues reorder their requests.
Our model predicts the device behavior in terms of response time within a relative error ranging from 2% to 30% for interesting subsets of the domain of devices and workloads. To demonstrate this, the model has been validated with synthetic traces of parallel scientific file system workloads and video-on-demand applications and traces of transaction processing applications.
Our contributions to the area of performance modeling for storage devices include the following:
- An infrastructure for developing a composite model. The infrastructure
supports the development of more complicated devices and workloads
than we have validated.
- Methods to approximate the mean seek time and rotational latency of
a disk mechanism using measures of workload spatial locality.
- Methods to approximate the miss probability and the full- and partial- hit
probabilities in an I/O system's data caches using measures of workload
spatial locality.
- Methods to approximate the queue delay for non-FCFS scheduling algorithms
using a description of the workload arrival process.
These methods can be composed to provide analytic estimation procedures for the behavior of a subset of current storage devices.

