Results 1 - 10
of
27
FlashTier: a Lightweight, Consistent and Durable Storage Cache
"... The availability of high-speed solid-state storage has introduced a new tier into the storage hierarchy. Low-latency and high-IOPS solid-state drives (SSDs) cache data in front of high-capacity disks. However, most existing SSDs are designed to be a drop-in disk replacement, and hence are mismatched ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
(Show Context)
The availability of high-speed solid-state storage has introduced a new tier into the storage hierarchy. Low-latency and high-IOPS solid-state drives (SSDs) cache data in front of high-capacity disks. However, most existing SSDs are designed to be a drop-in disk replacement, and hence are mismatched for use as a cache. This paper describes FlashTier, a system architecture built upon solid-state cache (SSC), a flash device with an interface designed for caching. Management software at the operating system block layer directs caching. The FlashTier design addresses three limitations of using traditional SSDs for caching. First, FlashTier provides a unified logical address space to reduce the cost of cache block management within both the OS and the SSD. Second, FlashTier provides cache consistency guarantees allowing the cached data to be used following a crash. Finally, FlashTier leverages cache behavior to silently evict data blocks during garbage collection to improve performance of the SSC. We have implemented an SSC simulator and a cache manager in Linux. In trace-based experiments, we show that FlashTier reduces address translation space by 60 % and silent eviction improves performance by up to 167%. Furthermore, FlashTier can recover from the crash of a 100 GB cache in only 2.4 seconds.
HybridStore: A Cost-Efficient, High-Performance Storage System Combining SSDs and HDDs
- IN MASCOTS’11
, 2011
"... Unlike the use of DRAM for caching or buffering, certain idiosyncrasies of NAND Flash-based solid-state drives (SSDs) make their integration into existing systems non-trivial. Flash memory suffers from limits on its reliability, is an order of magnitude more expensive than the magnetic hard disk dr ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
(Show Context)
Unlike the use of DRAM for caching or buffering, certain idiosyncrasies of NAND Flash-based solid-state drives (SSDs) make their integration into existing systems non-trivial. Flash memory suffers from limits on its reliability, is an order of magnitude more expensive than the magnetic hard disk drives (HDDs), and can sometimes be as slow as the HDD (due to excessive garbage collection (GC) induced by high intensity of random writes). Given these trade-offs between HDDs and SSDs in terms of cost, performance, and lifetime, the current consensus among several storage experts is to view SSDs not as a replacement for HDD but rather as a complementary device within the high-performance storage hierarchy. We design and evaluate such a hybrid system called HybridStore to provide: (a) Hybrid-Plan: improved capacity planning technique to administrators with the overall goal of operating within cost-budgets and (b) HybridDyn: improved performance/lifetime guarantees during episodes of deviations from expected workloads through two novel mechanisms: write-regulation and fragmentation busting. As an illustrative example of HybridStore’s efficacy, HybridPlan is able to find the most cost-effective storage configuration for a large scale workload of Microsoft Research and suggest one MLC SSD with ten 7.2K RPM HDDs instead of fourteen 7.2K RPM HDDs only. HybridDyn is able to reduce the average response time for an enterprise scale random-write dominant workload by about 71 % as compared to a HDD-based system.
Write Policies for Host-side Flash Caches
"... Host-side flash-based caching offers a promising new direction for optimizing access to networked storage. Current work has argued for using host-side flash primarily as a read cache and employing a write-through policy which provides the strictest consistency and durability guarantees. However, wri ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
(Show Context)
Host-side flash-based caching offers a promising new direction for optimizing access to networked storage. Current work has argued for using host-side flash primarily as a read cache and employing a write-through policy which provides the strictest consistency and durability guarantees. However, write-through requires synchronous updates over the network for every write. For write-mostly or write-intensive workloads, it significantly under-utilizes the high-performance flash cache layer. The write-back policy, on the other hand, better utilizes the cache for workloads with significant write I/O requirements. However, conventional write-back performs out-of-order eviction of data and unacceptably sacrifices data consistency at the network storage. We develop and evaluate two consistent write-back caching policies, ordered and journaled, that are designed to perform increasingly better than write-through. These policies enable new trade-off points across performance, data consistency, and data staleness dimensions. Using benchmark workloads such as PostMark, TPC-C, Filebench, and YCSB we evaluate the new write policies we propose alongside conventional write-through and write-back. We find that ordered write-back performs better than write-through. Additionally, we find that journaled write-back can trade-off staleness for performance, approaching, and in some cases, exceeding conventional write-back performance. Finally, a variant of journaled write-back that utilizes consistency hints from the application can provide straightforward application-level storage consistency, a stricter form of consistency than the transactional consistency provided by write-through. 1
HyCache: a User-Level Caching Middleware for Distributed File Systems
"... Abstract—One of the bottlenecks of distributed file systems deals with mechanical hard drives (HDD). Although solid-state drives (SSD) have been around since the 1990’s, HDDs are still dominant due to large capacity and relatively low cost. Hybrid hard drives with a small built-in SSD cache does not ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
(Show Context)
Abstract—One of the bottlenecks of distributed file systems deals with mechanical hard drives (HDD). Although solid-state drives (SSD) have been around since the 1990’s, HDDs are still dominant due to large capacity and relatively low cost. Hybrid hard drives with a small built-in SSD cache does not meet the need of a large variety of workloads. This paper proposes a middleware that manages the underlying heterogeneous storage devices in order to allow distributed file systems to leverage the SSD performance while leveraging the capacity of HDD. We design and implement a user-level filesystem, HyCache, that can offer SSD-like performance at a cost similar to a HDD. We show how HyCache can be used to improve performance in distributed file systems, such as the Hadoop HDFS. Experiments show that HyCache achieves up to 7X higher throughput and 76X higher IOPS than Linux Ext4 file system, and can accelerate HDFS by 28 % at 32-node scales (compared to vanilla HDFS). Index Terms—distributed file systems, user level file systems, hybrid file systems, heterogeneous storage, SSD Momentus XT [6] which encapsulates both a 4GB SSD and a 500GB HDD into a single physical device. The advantage for such a HHD is that it is a drop-in replacement to HDD, however its small fixed SSD cache (< 1 % capacity) limits its ability to accelerate large numbers of workloads. Furthermore, the small SSD cache typically has inexpensive and relatively slow controllers in order to keep the costs low. Compounding the limitations, often time these HHD only use the SSD cache to accelerate read operations, missing a significant opportunity to accelerate write operations as well. These drawbacks limit the applicability to fully leverage low cost HHD architectures for their potential higher performance. I.
Janus: Optimal Flash Provisioning for Cloud Storage Workloads
- In Proceedings of the 2013 USENIX Annual Technical Conference(USENIX ATC
, 2013
"... Abstract Janus is a system for partitioning the flash storage tier between workloads in a cloud-scale distributed file system with two tiers, flash storage and disk. The file system stores newly created files in the flash tier and moves them to the disk tier using either a First-In-First-Out (FIFO) ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Abstract Janus is a system for partitioning the flash storage tier between workloads in a cloud-scale distributed file system with two tiers, flash storage and disk. The file system stores newly created files in the flash tier and moves them to the disk tier using either a First-In-First-Out (FIFO) policy or a Least-Recently-Used (LRU) policy, subject to per-workload allocations. Janus constructs compact metrics of the cacheability of the different workloads, using sampled distributed traces because of the large scale of the system. From these metrics, we formulate and solve an optimization problem to determine the flash allocation to workloads that maximizes the total reads sent to the flash tier, subject to operator-set priorities and bounds on flash write rates. Using measurements from production workloads in multiple data centers using these recommendations, as well as traces of other production workloads, we show that the resulting allocation improves the flash hit rate by 47-76% compared to a unified tier shared by all workloads. Based on these results and an analysis of several thousand production workloads, we conclude that flash storage is a cost-effective complement to disks in data centers.
Improving flash-based disk cache with lazy adaptive replacement
- Proc. of MSST
, 2013
"... Abstract-The increasing popularity of flash memory has changed storage systems. Flash-based solid state drive(SSD) is now widely deployed as cache for magnetic hard disk drives(HDD) to speed up data intensive applications. However, existing cache algorithms focus exclusively on performance improvem ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Abstract-The increasing popularity of flash memory has changed storage systems. Flash-based solid state drive(SSD) is now widely deployed as cache for magnetic hard disk drives(HDD) to speed up data intensive applications. However, existing cache algorithms focus exclusively on performance improvements and ignore the write endurance of SSD. In this paper, we proposed a novel cache management algorithm for flash-based disk cache, named Lazy Adaptive Replacement Cache(LARC). LARC can filter out seldom accessed blocks and prevent them from entering cache. This avoids cache pollution and keeps popular blocks in cache for a longer period of time, leading to higher hit rate. Meanwhile, LARC reduces the amount of cache replacements thus incurs less write traffics to SSD, especially for read dominant workloads. In this way, LARC improves performance and extends SSD lifetime at the same time. LARC is self-tuning and low overhead. It has been extensively evaluated by both trace-driven simulations and a prototype implementation in flashcache. Our experiments show that LARC outperforms state-of-art algorithms and reduces write traffics to SSD by up to 94.5% for read dominant workloads, 11.2-40.8% for write dominant workloads.
NVMFS: A Hybrid File System for Improving Random Write in NAND-flash SSD
"... been widely used as secondary storage for their better performance and lower power consumption compared to traditional Hard Disk Drives (HDDs). However, the random write performance is still a concern for SSDs. Random writes could also result in lower lifetimes for SSDs. In this paper, we propose a ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
been widely used as secondary storage for their better performance and lower power consumption compared to traditional Hard Disk Drives (HDDs). However, the random write performance is still a concern for SSDs. Random writes could also result in lower lifetimes for SSDs. In this paper, we propose a hybrid file system – NVMFS, to resolve the random write issue of SSDs. First, NVMFS distributes data dynamically between NVRAM and SSD. Hot data can be permanently stored on NVRAM without writing back to SSD, while relatively cold data can be temporarily cached by NVRAM with another copy on SSD. Second, NVMFS absorbs random writes on NVRAM and employs long term data access patterns in allocating space on SSD. As a result, NVMFS experiences reduced erase overheads at SSD. Third, NVMFS explores different write policies on NVRAM and SSD. We do in-place updates on NVRAM and non-overwrite on SSD. We exploit the maximum write bandwidth of SSD by transforming random writes at file system level to sequential ones at SSD level. We have implemented a prototype NVMFS within a Linux Kernel and compared with several modern file systems such as ext3, btrfs and NILFS2. We also compared with another hybrid file system Conquest, which originally was designed for NVRAM and HDD. The experimental results show that NVMFS improves IO throughput by an average of 98.9 % when segment cleaning is not active, while improves throughput by an average of 19.6% under high disk utilization (over 85%) compared to other file systems. We also show that our file system can reduce the erase operations and overheads at SSD. I.
Integrating Flash-based SSDs into the Storage Stack
"... Abstract—Over the past few years, hybrid storage architectures that use high-performance SSDs in concert with highdensity HDDs have received significant interest from both industry and academia, due to their capability to improve performance while reducing capital and operating costs. These hybrid a ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract—Over the past few years, hybrid storage architectures that use high-performance SSDs in concert with highdensity HDDs have received significant interest from both industry and academia, due to their capability to improve performance while reducing capital and operating costs. These hybrid architectures differ in their approach to integrating SSDs into the traditional HDD-based storage stack. Of several such possible integrations, two have seen widespread adoption: Caching and Dynamic Storage Tiering. Although the effectiveness of these architectures under certain workloads is well understood, a systematic side-by-side analysis of these approaches remains difficult due to the range of design alternatives and configuration parameters involved. Such a study is required now more than ever to be able to design effective hybrid storage solutions for deployment in increasingly virtualized modern storage installations that blend several workloads into a single stream. In this paper, we first present our extensions to the Loris storage stack that transform it into a framework for designing hybrid storage systems. We then illustrate the flexibility of the framework by designing several Caching and DST-based hybrid systems. Following this, we present a systematic side-by-side analysis of these systems under a range of individual workload types and offer insights into the advantages and disadvantages of each architecture. Finally, we discuss the ramifications of our findings on the design of future hybrid storage systems in the light of recent changes in hardware landscape and application workloads. I.
Responding rapidly to service level violations using virtual appliances
- SIGOPS Oper. Syst. Rev
, 2012
"... One of the key goals in the data center today is provid-ing storage services with service-level objectives (SLOs) for performance metrics such as latency and throughput. Meet-ing such SLOs is challenging due to the dynamism observed in these environments. In this position paper, we propose dynamic i ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
One of the key goals in the data center today is provid-ing storage services with service-level objectives (SLOs) for performance metrics such as latency and throughput. Meet-ing such SLOs is challenging due to the dynamism observed in these environments. In this position paper, we propose dynamic instantiation of virtual appliances, that is, virtual machines with storage functionality, as a mechanism to meet storage SLOs efficiently. In order for dynamic instantiation to be realistic for rapidly-changing environments, it should be automated. Therefore, an important goal of this paper is to show that such au-tomation is feasible. We do so through a caching case study. Specifically, we build the automation framework for dynam-ically instantiating virtual caching appliances. This frame-work identifies sets of interfering workloads that can bene-fit from caching, determines the cache-size requirements of workloads, non-disruptively migrates the application to use the cache, and warms the cache to quickly return to accept-able service levels. We show through an experiment that this approach addresses SLO violations while using resources ef-ficiently. 1.
On the importance of evaluating storage systems’ $costs
- In Proceedings of the 6th USENIX Conference on Hot Topics in Storage and File Systems
, 2014
"... Abstract Modern storage systems are becoming more complex, combining different storage technologies with different behaviors. Performance alone is not enough to characterize storage systems: energy efficiency, durability, and more are becoming equally important. We posit that one must evaluate stor ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract Modern storage systems are becoming more complex, combining different storage technologies with different behaviors. Performance alone is not enough to characterize storage systems: energy efficiency, durability, and more are becoming equally important. We posit that one must evaluate storage systems from a monetary cost perspective as well as performance. We believe that cost should consider the workloads used over the storage systems' expected lifetime. We designed and developed a versatile hybrid storage system under Linux that combines HDD and SSD. The SSD can be used as cache or as primary storage for hot data. Our system includes tunable parameters to enable trading off performance, energy use, and durability. We built a cost model and evaluated our system under a variety of workloads and parameters, to illustrate the importance of cost evaluations of storage systems.