Results 1 - 10
of
88
Design Tradeoffs for SSD Performance
"... Solid-state disks (SSDs) have the potential to revolutionize the storage system landscape. However, there is little published work about their internal organization or the design choices that SSD manufacturers face in pursuit of optimal performance. This paper presents a taxonomy of such design choi ..."
Abstract
-
Cited by 187 (10 self)
- Add to MetaCart
(Show Context)
Solid-state disks (SSDs) have the potential to revolutionize the storage system landscape. However, there is little published work about their internal organization or the design choices that SSD manufacturers face in pursuit of optimal performance. This paper presents a taxonomy of such design choices and analyzes the likely performance of various configurations using a trace-driven simulator and workload traces extracted from real systems. We find that SSD performance and lifetime is highly workloadsensitive, and that complex systems problems that normally appear higher in the storage stack, or even in distributed systems, are relevant to device firmware. 1
DFTL: A Flash Translation Layer Employing Demand-based Selective Caching of Page-level Address Mappings
- Penn State University
, 2008
"... Recent technological advances in the development of flashmemory based devices have consolidated their leadership position as the preferred storage media in the embedded systems market and opened new vistas for deployment in enterprise-scale storage systems. Unlike hard disks, flash devices are free ..."
Abstract
-
Cited by 110 (6 self)
- Add to MetaCart
(Show Context)
Recent technological advances in the development of flashmemory based devices have consolidated their leadership position as the preferred storage media in the embedded systems market and opened new vistas for deployment in enterprise-scale storage systems. Unlike hard disks, flash devices are free from any mechanical moving parts, have no seek or rotational delays and consume lower power. However, the internal idiosyncrasies of flash technology make its performance highly dependent on workload characteristics. The poor performance of random writes has been a cause of major concern which needs to be addressed to better utilize the potential of flash in enterprise-scale environments. We examine one of the important causes of this poor performance: the design of the Flash Translation Layer
To Infinity and Beyond: TimeWarped Network Emulation
- In Proceedings of the 3rd USENIX Symposium on Networked Systems Design and Implementation
, 2006
"... The goal of this work is to subject unmodified applications running on commodity operating systems and stock hardware to network speeds orders of magnitude faster than available at any given point in time. This paper describes our approach to time dilation, a technique to uniformly and accurately sl ..."
Abstract
-
Cited by 46 (5 self)
- Add to MetaCart
(Show Context)
The goal of this work is to subject unmodified applications running on commodity operating systems and stock hardware to network speeds orders of magnitude faster than available at any given point in time. This paper describes our approach to time dilation, a technique to uniformly and accurately slow the passage of time from the perspective of an operating system by a specified factor. As a side effect, physical devices— including the network—appear relatively faster to both applications and operating systems. Both qualitative and statistical evaluations indicate our prototype implementation is accurate across several orders of magnitude. We demonstrate time dilation’s utility by conducting highbandwidth head-to-head TCP stack comparisons and application evaluation. 1
Storage Device Performance Prediction with CART Models
, 2004
"... Storage device performance prediction is a key element of self-managed storage systems and application planning tasks, such as data assignment. This work explores the application of a machine learning tool, CART models, to storage device modeling. Our approach predicts a device's performance as ..."
Abstract
-
Cited by 45 (5 self)
- Add to MetaCart
(Show Context)
Storage device performance prediction is a key element of self-managed storage systems and application planning tasks, such as data assignment. This work explores the application of a machine learning tool, CART models, to storage device modeling. Our approach predicts a device's performance as a function of input workloads, requiring no knowledge of the device internals. We propose two uses of CART models: one that predicts per-request response times (and then derives aggregate values) and one that predicts aggregate values directly from workload characteristics. After being trained on our experimental platforms, both provide accurate black-box models across a range of test traces from real environments. Experiments show that these models predict the average and 90th percentile response time with an relative error as low as 16%, when the training workloads are similar to the testing workloads, and interpolate well across different workloads.
DULO: An effective buffer cache management scheme to exploit both temporal and spatial localities
- In USENIX Conference on File and Storage Technologies (FAST
, 2005
"... Sequentiality of requested blocks on disks, or their spatial locality, is critical to the performance of disks, where the throughput of accesses to sequentially placed disk blocks can be an order of magnitude higher than that of accesses to randomly placed blocks. Unfortunately, spatial locality of ..."
Abstract
-
Cited by 43 (12 self)
- Add to MetaCart
(Show Context)
Sequentiality of requested blocks on disks, or their spatial locality, is critical to the performance of disks, where the throughput of accesses to sequentially placed disk blocks can be an order of magnitude higher than that of accesses to randomly placed blocks. Unfortunately, spatial locality of cached blocks is largely ignored and only temporal locality is considered in system buffer cache management. Thus, disk performance for workloads without dominant sequential accesses can be seriously degraded. To address this problem, we propose a scheme called DULO (DUal LOcality), which exploits both temporal and spatial locality in buffer cache management. Leveraging the filtering effect of the buffer cache, DULO can influence the I/O request stream by making the requests passed to disk more sequential, significantly increasing the effectiveness of I/O scheduling and prefetching for disk performance improvements. DULO has been extensively evaluated by both tracedriven simulations and a prototype implementation in Linux 2.6.11. In the simulations and system measurements, various application workloads have been tested, including Web Server, TPC benchmarks, and scientific programs. Our experiments show that DULO can significantly increase system throughput and reduce program execution times. 1
DieCast: Testing Distributed Systems with an Accurate Scale Model
- In Proc. of NSDI
, 2008
"... Large-scale network services can consist of tens of thousands of machines running thousands of unique software configurations spread across hundreds of physical networks. Testing such services for complex performance problems and configuration errors remains a difficult problem. Existing testing tec ..."
Abstract
-
Cited by 43 (4 self)
- Add to MetaCart
(Show Context)
Large-scale network services can consist of tens of thousands of machines running thousands of unique software configurations spread across hundreds of physical networks. Testing such services for complex performance problems and configuration errors remains a difficult problem. Existing testing techniques, such as simulation or running smaller instances of a service, have limitations in predicting overall service behavior. Although technically and economically infeasible at this time, testing should ideally be performed at the same scale and with the same configuration as the deployed service. We present DieCast, an approach to scaling network services in which we multiplex all of the nodes in a given service configuration as virtual machines (VM) spread across a much smaller number of physical machines in a test harness. CPU, network, and disk are then accurately scaled to provide the illusion that each VM matches a machine from the original service in terms of both available computing resources and communication behavior to remote service nodes. We present the architecture and evaluation of a system to support such experimentation and discuss its limitations. We show that for a variety of services—including a commercial, high-performance, cluster-based file system—and resource utilization levels, DieCast matches the behavior of the original service while using a fraction of the physical resources. 1
Mambo a full system simulator for the PowerPC architecture
- ACM SIGMETRICS Performance Evaluation Review
, 2004
"... Mambo is a full-system simulator for modeling PowerPCbased systems. It provides building blocks for creating simulators that range from purely functional to timing-accurate. Functional versions support fast emulation of individual PowerPC instructions and the devices necessary for executing operatin ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
(Show Context)
Mambo is a full-system simulator for modeling PowerPCbased systems. It provides building blocks for creating simulators that range from purely functional to timing-accurate. Functional versions support fast emulation of individual PowerPC instructions and the devices necessary for executing operating systems. Timing-accurate versions add the ability to account for device timing delays, and support the modeling of the PowerPC processor microarchitecture. We describe our experience in implementing the simulator and its uses within IBM to model future systems, support early software development, and design new system software. 1
Transactional flash
- In Proc. Symposium on Operating Systems Design and Implementation (OSDI
, 2008
"... Transactional flash (TxFlash) is a novel solid-state drive (SSD) that uses flash memory and exports a transactional interface (WriteAtomic) to the higher-level software. The copy-on-write nature of the flash translation layer and the fast random access makes flash memory the right medium to support ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
(Show Context)
Transactional flash (TxFlash) is a novel solid-state drive (SSD) that uses flash memory and exports a transactional interface (WriteAtomic) to the higher-level software. The copy-on-write nature of the flash translation layer and the fast random access makes flash memory the right medium to support such an interface. We further develop a novel commit protocol called cyclic commit for TxFlash; the protocol has been specified formally and model checked. Our evaluation, both on a simulator and an emulator on top of a real SSD, shows that TxFlash does not increase the flash firmware complexity significantly and provides transactional features with very small overheads (less than 1%), thereby making file systems easier to build. It further shows that the new cyclic commit protocol significantly outperforms traditional commit for small transactions (95 % improvement in transaction throughput) and completely eliminates the space overhead due to commit records. 1
Atropos: A disk array volume manager for orchestrated use of disks
- Proceedings of the USENIX Conference on File and Storage Technologies
, 2004
"... Permission is granted for noncommercial reproduction of the work for educational or research purposes. ..."
Abstract
-
Cited by 28 (11 self)
- Add to MetaCart
(Show Context)
Permission is granted for noncommercial reproduction of the work for educational or research purposes.
EERAID: Energy efficient redundant and inexpensive disk arrays
- In Proc. 11th ACM SIGOPS European Workshop (SIGOPSEW’04
, 2004
"... I/O subsystems, in which RAID as a building block, prove to consume a large portion of energy in both low-end and highend server environments. Most of previous research works have been presented on conserving energy in multi-disk systems either at a single disk drive level or at a storage system cac ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
(Show Context)
I/O subsystems, in which RAID as a building block, prove to consume a large portion of energy in both low-end and highend server environments. Most of previous research works have been presented on conserving energy in multi-disk systems either at a single disk drive level or at a storage system cache level. This paper studies several new redundancybased, power-aware, dynamic I/O request scheduling and cache management policies at the RAID controller level, by exploiting disks power state coupled with the redundant information of disk arrays in two popular RAID architectures, RAID 1 and RAID 5. For RAID 1, we develop a Windowed Round Robin (WRR) request scheduling policy. For RAID 5, we introduce a new N-chance Power Aware cache replacement algorithm (NPA) for disk writes and a Power-Directed, Exchangeable (PDE) request scheduling policy for disk reads to save energy. 1