Results 1 -
4 of
4
Safestore: A durable and practical storage system
- In USENIX Annual Technical Conference
, 2007
"... This paper presents SafeStore, a distributed storage system designed to maintain long-term data durability despite conventional hardware and software faults, environmental disruptions, and administrative failures caused by human error or malice. The architecture of SafeStore is based on fault isolat ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
This paper presents SafeStore, a distributed storage system designed to maintain long-term data durability despite conventional hardware and software faults, environmental disruptions, and administrative failures caused by human error or malice. The architecture of SafeStore is based on fault isolation, which Safe-Store applies aggressively along administrative, physical, and temporal dimensions by spreading data across autonomous storage service providers (SSPs). However, current storage interfaces provided by SSPs are not designed for high end-to-end durability. In this paper, we propose a new storage system architecture that (1) spreads data efficiently across autonomous SSPs using informed hierarchical erasure coding that, for a given replication cost, provides several additional 9’s of durability over what can be achieved with existing black-box SSP interfaces, (2) performs an efficient end-to-end audit of SSPs to detect data loss that, for a 20 % cost increase, improves data durability by two 9’s by reducing MTTR, and (3) offers durable storage with cost, performance, and availability competitive with traditional storage systems. We instantiate and evaluate these ideas by building a SafeStore-based file system with an NFSlike interface. 1
Selfadaptive disk arrays
- In Proc. 8 th Int. Symp. on Stabilization, Safety, and Security of Distributed Systems
, 2006
"... We present a disk array organization that adapts itself to successive disk failures. When all disks are operational, all data are replicated on two disks. Whenever a disk fails, the array reorganizes itself, by selecting a disk containing redundant data and replacing these data by their exclusive or ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
We present a disk array organization that adapts itself to successive disk failures. When all disks are operational, all data are replicated on two disks. Whenever a disk fails, the array reorganizes itself, by selecting a disk containing redundant data and replacing these data by their exclusive or (XOR) with the other copy of the data contained on the disk that failed. This will protect the array against any single disk failure until the failed disk gets replaced and the array can revert to its original condition. Hence data will remain protected against the successive failures of up to one half of the original number of disks, provided that no critical disk failure happens while the array is reorganizing itself. As a result, our scheme achieves the same access times as a replicated organization under normal operational conditions while having a much lower likelihood of loosing data under abnormal conditions. In addition it tolerates much longer repair times than static disk arrays/
Evaluating the reliability of storage systems
, 2006
"... repairable systems, k-out-of-n systems Modern storage systems are often large complex distributed systems. Current techniques for evaluating their reliability function require the solution of a system of differential equations. We present a more elementary, intuitive approach that focuses on the ste ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
repairable systems, k-out-of-n systems Modern storage systems are often large complex distributed systems. Current techniques for evaluating their reliability function require the solution of a system of differential equations. We present a more elementary, intuitive approach that focuses on the steady-state behavior of each storage organization when it goes through repeated cycles of failures succeeded by repairs. As a result, our approach provides immediately a purely algebraic method for computing both the average failure rate and mean time to failure. We show how to apply our technique to model the high infant mortality of disk drives and the behavior of the so-called S.M.A.R.T. drives, which can warn users of impending disk failures.

