Results 1 - 10
of
38
A nine year study of file system and storage benchmarking
- ACM Transactions on Storage
, 2008
"... Benchmarking is critical when evaluating performance, but is especially difficult for file and storage systems. Complex interactions between I/O devices, caches, kernel daemons, and other OS components result in behavior that is rather difficult to analyze. Moreover, systems have different features ..."
Abstract
-
Cited by 55 (8 self)
- Add to MetaCart
Benchmarking is critical when evaluating performance, but is especially difficult for file and storage systems. Complex interactions between I/O devices, caches, kernel daemons, and other OS components result in behavior that is rather difficult to analyze. Moreover, systems have different features and optimizations, so no single benchmark is always suitable. The large variety of workloads that these systems experience in the real world also adds to this difficulty. In this article we survey 415 file system and storage benchmarks from 106 recent papers. We found that most popular benchmarks are flawed and many research papers do not provide a clear indication of true performance. We provide guidelines that we hope will improve future performance evaluations. To show how some widely used benchmarks can conceal or overemphasize overheads, we conducted a set of experiments. As a specific example, slowing down read operations on ext2 by a factor of 32 resulted in only a 2–5 % wall-clock slowdown in a popular compile benchmark. Finally, we discuss future work to improve file system and storage benchmarking.
Rethinking Erasure Codes for Cloud File Systems: Minimizing I/O for Recovery and Degraded Reads,”
- Proc. 10th USENIX Conf. File and Storage Technologies,
, 2012
"... Abstract To reduce storage overhead, cloud file systems are transitioning from replication to erasure codes. This process has revealed new dimensions on which to evaluate the performance of different coding schemes: the amount of data used in recovery and when performing degraded reads. We present ..."
Abstract
-
Cited by 42 (4 self)
- Add to MetaCart
(Show Context)
Abstract To reduce storage overhead, cloud file systems are transitioning from replication to erasure codes. This process has revealed new dimensions on which to evaluate the performance of different coding schemes: the amount of data used in recovery and when performing degraded reads. We present an algorithm that finds the optimal number of codeword symbols needed for recovery for any XOR-based erasure code and produces recovery schedules that use a minimum amount of data. We differentiate popular erasure codes based on this criterion and demonstrate that the differences improve I/O performance in practice for the large block sizes used in cloud file systems. Several cloud systems [Ford10,Calder11] have adopted Reed-Solomon (RS) codes, because of their generality and their ability to tolerate larger numbers of failures. We define a new class of rotated ReedSolomon codes that perform degraded reads more efficiently than all known codes, but otherwise inherit the reliability and performance properties of Reed-Solomon codes. The online home for this paper may be found at: http://web.eecs.utk.edu/~plank/plank/papers/ FAST-2012 Abstract To reduce storage overhead, cloud file systems are transitioning from replication to erasure codes. This process has revealed new dimensions on which to evaluate the performance of different coding schemes: the amount of data used in recovery and when performing degraded reads. We present an algorithm that finds the optimal number of codeword symbols needed for recovery for any XOR-based erasure code and produces recovery schedules that use a minimum amount of data. We differentiate popular erasure codes based on this criterion and demonstrate that the differences improve I/O performance in practice for the large block sizes used in cloud file systems. Several cloud systems
Optimal recovery of single disk failure in RDP code storage systems
- ACM SIGMETRICS Performance Evaluation Review
"... Modern storage systems use thousands of inexpensive disks to meet the storage requirement of applications. To enhance the data availability, some form of redundancy is used. For example, conventional RAID-5 systems provide data availability for single disk failure only, while recent advanced coding ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
(Show Context)
Modern storage systems use thousands of inexpensive disks to meet the storage requirement of applications. To enhance the data availability, some form of redundancy is used. For example, conventional RAID-5 systems provide data availability for single disk failure only, while recent advanced coding techniques such as row-diagonal parity (RDP) can provide data availability with up to two disk failures. To reduce the probability of data unavailability, whenever a single disk fails, disk recovery (or rebuild) will be carried out. We show that conventional recovery scheme of RDP code for a single disk failure is inefficient and suboptimal. In this paper, we propose an optimal and efficient disk recovery scheme, Row-Diagonal Optimal Recovery (RDOR), for single disk failure of RDP code that has the following properties: (1) it is read optimal in the sense that it issues the smallest number of disk reads to recover the failed disk; (2) it has the load balancing property that all surviving disks will be subjected to the same amount of additional workload in rebuilding the failed disk. We carefully explore the design state space and theoretically show the optimality of RDOR. We carry out performance evaluation to quantify the merits of RDOR on some widely used disks.
2011. In search of I/O-optimal recovery from disk failures
- In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems
"... We address the problem of minimizing the I/O needed to recover from disk failures in erasure-coded storage systems. The principal result is an algorithm that finds the optimal I/O recovery from an arbitrary number of disk failures for any XOR-based erasure code. We also describe a family of codes wi ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
(Show Context)
We address the problem of minimizing the I/O needed to recover from disk failures in erasure-coded storage systems. The principal result is an algorithm that finds the optimal I/O recovery from an arbitrary number of disk failures for any XOR-based erasure code. We also describe a family of codes with high-fault tolerance and low recovery I/O, e.g. one instance tolerates up to 11 failures and recovers a lost block in 4 I/Os. While we have determined I/O optimal recovery for any given code, it remains an open problem to identify codes with the best recovery properties. We describe our ongoing efforts toward characterizing space overhead versus recovery I/O tradeoffs and generating codes that realize these bounds. 1
WorkOut: I/O Workload Outsourcing for Boosting RAID Reconstruction Performance
"... User I/O intensity can significantly impact the performance of on-line RAID reconstruction due to contention for the shared disk bandwidth. Based on this observation, this paper proposes a novel scheme, called WorkOut (I/O Workload Outsourcing), to significantly boost RAID reconstruction performance ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
(Show Context)
User I/O intensity can significantly impact the performance of on-line RAID reconstruction due to contention for the shared disk bandwidth. Based on this observation, this paper proposes a novel scheme, called WorkOut (I/O Workload Outsourcing), to significantly boost RAID reconstruction performance. WorkOut effectively outsources all write requests and popular read requests originally targeted at the degraded RAID set to a surrogate RAID set during reconstruction. Our lightweight prototype implementation of WorkOut and extensive tracedriven and benchmark-driven experiments demonstrate that, compared with existing reconstruction approaches, WorkOut significantly speeds up both the total reconstruction time and the average user response time. Importantly, WorkOut is orthogonal to and can be easily incorporated into any existing reconstruction algorithms. Furthermore, it can be extended to improving the performance of other background support RAID tasks, such as re-synchronization and disk scrubbing. 1
A Hybrid Approach to Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation
- ACM Trans. on Storage
"... The current parallel storage systemsuse thousandsof inexpensive disks to meet the storage requirement of applications.Dataredundancyand/orcodingareusedtoenhancedataavailability,e.g., Row-diagonalparity (RDP) and EVENODD codes, which are widely used in RAID-6 storage systems, provide data availabilit ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
The current parallel storage systemsuse thousandsof inexpensive disks to meet the storage requirement of applications.Dataredundancyand/orcodingareusedtoenhancedataavailability,e.g., Row-diagonalparity (RDP) and EVENODD codes, which are widely used in RAID-6 storage systems, provide data availability with up to two disk failures. To reduce the probability of data unavailability, whenever a single disk fails, disk recovery will be carried out. We find that the conventional recovery schemes of RDP and EVENODD codes forasinglefaileddisk onlyuseone paritydisk.However, there are twoparitydisks inthesystem,and bothcan be usedfor singledisk failure recovery. Inthispaper, wepropose a hybrid recovery approach which uses both parities for single disk failure recovery and we design efficient recovery schemes for RDP code (RDOR-RDP) and EVENODD code (RDOR-EVENODD). Our recovery scheme has the following attractive properties: (1) “read optimality ” in the sense that our scheme issues the smallest number of disk reads to recover a single failed disk and it reduces approximately 1/4 of disk reads compared with conventional schemes; (2) “load balancing property ” in that all surviving disks will be subjected to the same (or almost the same) amount of additional workload in rebuilding the failed disk. We carry out performance evaluation to quantify the merits of RDOR-RDP and RDOR-EVENODD on some widely used disks with DiskSim. The off-line experimental results show that RDOR-RDP and RDOR-
HPDA: A Hybrid Parity-based Disk Array for Enhanced Performance and Reliability
"... ... (SSD) can not satisfy the capacity, performance and reliability requirements of a modern storage system supporting increasingly demanding data-intensive computing applications. Applying RAID schemes to SSDs to meet these requirements, while a logical and viable solution, faces many challenges. I ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
... (SSD) can not satisfy the capacity, performance and reliability requirements of a modern storage system supporting increasingly demanding data-intensive computing applications. Applying RAID schemes to SSDs to meet these requirements, while a logical and viable solution, faces many challenges. In this paper, we propose a Hybrid Parity-based Disk Ar-ray architecture, HPDA, which combines a group of SSDs and two hard disk drives (HDDs) to improve the performance and reliability of SSD-based storage systems. In HPDA, the SSDs (data disks) and part of one HDD (parity disk) compose a RAID4 disk array. Meanwhile, a second HDD and the free space of the parity disk are mirrored to form a RAID1-style write buffer that temporarily absorbs the small write requests and acts as a surrogate set during recovery when a disk fails. The write data is reclaimed back to the data disks during the lightly loaded or idle periods of the system. Reliability analysis shows that the reliability of HPDA, in terms of MTTDL (Mean Time To Data Loss), is better than that of either pure HDD-based or SSD-based disk array. Our prototype implementation of HPDA and performance evaluations show that HPDA significantly outper-forms either HDD-based or SSD-based disk array.
Improving the availability of supercomputer job input data using temporal replication
"... Supercomputers are stepping into the Peta-scale and Exascale era, wherein handling hundreds of concurrent system failures is an urgent challenge. In particular, storage system failures have been identified as a major source of service interruptions in supercomputers. RAID solutions alone cannot prov ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
(Show Context)
Supercomputers are stepping into the Peta-scale and Exascale era, wherein handling hundreds of concurrent system failures is an urgent challenge. In particular, storage system failures have been identified as a major source of service interruptions in supercomputers. RAID solutions alone cannot provide sufficient storage protection as (1) average disk recovery time is projected to grow, making RAID groups increasingly vulnerable to additional failures during data reconstruction, and (2) disk-level data protection cannot mask higherlevel faults, such as software/hardware failures of entire I/O nodes. This paper presents a complementary approach based on the observation that files in the supercomputer scratch space are typically accessed by batch jobs, whose execution can be anticipated. Therefore, we propose to transparently, selectively, and temporarily replicate ”active ” job input data, by coordinating the parallel file system with the batch job scheduler. We have implemented the temporal replication scheme in the popular Lustre parallel file system and evaluated it with both real-cluster experiments and trace-driven simulations. Our results show that temporal replication allows for fast online data reconstruction, with a reasonably low overall space and I/O bandwidth overhead.
Prefetching with Adaptive Cache Culling for Striped Disk Arrays
"... Conventional prefetching schemes regard prediction accuracy as important because useless data prefetched by a faulty prediction may pollute the cache. If prefetching requires considerably low read cost but the prediction is not accurate, it may or may not be beneficial depending on the situation. Ho ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
(Show Context)
Conventional prefetching schemes regard prediction accuracy as important because useless data prefetched by a faulty prediction may pollute the cache. If prefetching requires considerably low read cost but the prediction is not accurate, it may or may not be beneficial depending on the situation. However, the problem of low prediction accuracy can be dramatically reduced if we efficiently manage prefetched data by considering the total hit rate for both prefetched data and cached data. To achieve this goal, we propose an adaptive strip prefetching (ASP) scheme, which provides low prefetching cost and evicts prefetched data at the proper time by using differential feedback that maximizes the hit rate of both prefetched data and cached data in a given cache management scheme. Additionally, ASP controls prefetching by using an online disk simulation that investigates whether prefetching is beneficial for the current workloads and stops prefetching if it is not. Finally, ASP provides methods that resolve both independency loss and parallelism loss that may arise in striped disk arrays. We implemented a kernel module in Linux version 2.6.18 as a RAID-5 driver with our scheme, which significantly outperforms the sequential prefetching of Linux from several times to an order of magnitude in a variety of realistic workloads. 1
MICRO: A Multilevel Caching-Based Reconstruction Optimization for Mobile Storage Systems
- IEEE Transactions on Computers
"... Abstract—High performance, highly reliable, and energy-efficient storage systems are essential for mobile data-intensive applications such as remote surgery and mobile data center. Compared with conventional stationary storage systems, mobile disk-array-based storage systems are more prone to disk f ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
(Show Context)
Abstract—High performance, highly reliable, and energy-efficient storage systems are essential for mobile data-intensive applications such as remote surgery and mobile data center. Compared with conventional stationary storage systems, mobile disk-array-based storage systems are more prone to disk failures due to their severe application environments. Further, they have a very limited power supply. Therefore, data reconstruction algorithms which are executed in the presence of disk failure for mobile storage systems must be performance-driven, reliability-aware, and energy-efficient. Unfortunately, existing reconstruction schemes cannot fulfill these three goals simultaneously because they largely overlook the fact that mobile disks have much higher failure rates than stationary disks. Besides, they normally ignore energy saving. In this paper, we develop a novel reconstruction strategy, called multilevel caching-based reconstruction optimization (MICRO), which can be applied to RAID-structured mobile storage systems to noticeably shorten reconstruction times and user response times while saving energy. MICRO collaboratively utilizes storage cache and disk array controller cache to diminish the number of physical disk accesses caused by reconstruction. Experimental results demonstrate that, compared with two representative algorithms, DOR and PRO, MICRO reduces reconstruction times on average by 20.22 percent and 9.34 percent, while saving energy by no less than 30.4 percent and 13 percent, respectively.