Results 1 - 10
of
27
Interposed Request Routing for Scalable Network Storage
- IN PROCEEDINGS OF THE FOURTH SYMPOSIUM ON OPERATING SYSTEM DESIGN AND IMPLEMENTATION (OSDI
, 2000
"... This paper presents Slice, a new storage system architecture for highspeed LANs incorporating network-attached block storage. Slice interposes a request switching filter -- called a /proxy -- along the network path between the client and the network storage system (e.g., in a network adapter or swit ..."
Abstract
-
Cited by 82 (11 self)
- Add to MetaCart
This paper presents Slice, a new storage system architecture for highspeed LANs incorporating network-attached block storage. Slice interposes a request switching filter -- called a /proxy -- along the network path between the client and the network storage system (e.g., in a network adapter or switch). The purpose of the/proxy is to route requests among a server ensemble that implements the file service. We present a prototype that uses this approach to virtualize the standard NFS file protocol to provide scalable, high-bandwidth file service to ordinary NFS clients. The paper presents and justifies the architecture, proposes and evaluates several request routing policies realizable within the architecture, and explores the effects of these policies on service structure
FAB: enterprise storage systems on a shoestring
- IN OPERATING SYSTEMS (LIHUE, HI, 18–21 MAY 2003
, 2003
"... A Federated Array of Bricks (FAB) is a logical disk system that provides the reliability and performance of enterprise-class disk arrays, at a fraction of the cost and with better scalability. The unit of deployment in FAB is a brick, a small rack-mounted storage appliance built from commodity compo ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
A Federated Array of Bricks (FAB) is a logical disk system that provides the reliability and performance of enterprise-class disk arrays, at a fraction of the cost and with better scalability. The unit of deployment in FAB is a brick, a small rack-mounted storage appliance built from commodity components including disks, a CPU, NVRAM, and network cards. Bricks federate themselves in a completely decentralized manner to provide users with a set of logical volumes. This paper motivates FAB and introduces our data replication algorithm based on majority-voting. We argue that majority voting is practical for ultra-reliable, high-throughput storage systems like FAB, and present several techniques that improve both the performance and space overhead of our protocol.
Active Disk Paxos with infinitely many processes
- In Proceedings of the 21st ACM Symposium on Principles of Distributed Computing (PODC’02
, 2002
"... We present an improvement to the Disk Paxos protocol by Gafni and Lamport which utilizes extended functionality and flexibility provided by Active Disks and supports unmediated concurrent data access by an unlimited number of processes. The solution facilitates coordination by an infinite number of ..."
Abstract
-
Cited by 31 (4 self)
- Add to MetaCart
We present an improvement to the Disk Paxos protocol by Gafni and Lamport which utilizes extended functionality and flexibility provided by Active Disks and supports unmediated concurrent data access by an unlimited number of processes. The solution facilitates coordination by an infinite number of clients using finite shared memory. It is based on a collection of read-modify-write objects with faults, that emulate a new, reliable shared memory abstraction called a ranked register. The required read-modify-write objects are readily available in Active Disks and in Object Storage Device controllers, making our solution suitable for state-of-the-art Storage Area Network (SAN) environments. 1.
A Decentralized Algorithm for Erasure-Coded Virtual Disks
- In DSN
, 2004
"... A Federated Array of Bricks is a scalable distributed storage system composed from inexpensive storage bricks. It achieves high reliability with low cost by using erasure coding across the bricks to maintain data reliability in the face of brick failures. Erasure coding generates n encoded blocks f ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
A Federated Array of Bricks is a scalable distributed storage system composed from inexpensive storage bricks. It achieves high reliability with low cost by using erasure coding across the bricks to maintain data reliability in the face of brick failures. Erasure coding generates n encoded blocks from m data blocks (n > m) and permits the data blocks to be reconstructed from any m of these encoded blocks. We present a new fully decentralized erasurecoding algorithm for an asynchronous distributed system. Our algorithm provides fully linearizable read-write access to erasure-coded data and supports concurrent I/O controllers that may crash and recover. Our algorithm relies on a novel quorum construction where any two quorums intersect in m processes.
D-SPTF: Decentralized Request Distribution in Brick-based Storage Systems
- In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS
, 2004
"... Distributed Shortest-Positioning Time First (D-SPTF) is a request distribution protocol for decentralized systems of storage servers. D-SPTF exploits high-speed interconnects to dynamically select which server, among those with a replica, should service each read request. In doing so, it simultaneou ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Distributed Shortest-Positioning Time First (D-SPTF) is a request distribution protocol for decentralized systems of storage servers. D-SPTF exploits high-speed interconnects to dynamically select which server, among those with a replica, should service each read request. In doing so, it simultaneously balances load, exploits the aggregate cache capacity, and reduces positioning times for cache misses. For network latencies expected in storage clusters (e.g., 10-- 200s), D-SPTF performs as well as would a hypothetical centralized system with the same collection of CPU, cache, and disk resources. Compared to popular decentralized approaches, D-SPTF achieves up to 65% higher throughput and adapts more cleanly to heterogenous server capabilities. Categories and Subject Descriptors H.3.4 [Information Storage and Retrieval]: Systems and Software---distributed systems, Performance evaluation General Terms Management, Performance Keywords Storage systems, Brick Based Storage, Distributed Systems, Disk Scheduling, Decentralized Systems 1.
Failure-Atomic File Access in an Interposed Network Storage System
- In Proceedings of the Ninth IEEE International Symposium on High Performance Distributed Computing (HPDC
, 2000
"... This paper presents a recovery protocol for block I/0 operations in Slice, a storage system architecture for highspeed LANs incorporating network-attached block storage. The goal of the Slice architecture is to provide a network file service with scalable bandwidth and capacity while preserving comp ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
This paper presents a recovery protocol for block I/0 operations in Slice, a storage system architecture for highspeed LANs incorporating network-attached block storage. The goal of the Slice architecture is to provide a network file service with scalable bandwidth and capacity while preserving compatibility with off-the-shelf clients and file server appliances. The Slice prototype "virtualizes" the Network File System (NFS) protocol by interposing a request switching filter at the client's interface to the network storage system (e.g., in a network adapter or switch).
Managing Scalability in Object Storage Systems for HPC Linux
- In Proceedings of the 21st IEEE / 12th NASA Goddard Conference on Mass Storage Systems and Technologies
, 2004
"... This paper describes the performance and manageability of scalable storage systems based on Object Storage Devices (OSD). Object-based storage was invented to provide scalable performance as the storage cluster scales in size. For example, in our large file tests a 10-OSD system provided 325 MB/sec ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
This paper describes the performance and manageability of scalable storage systems based on Object Storage Devices (OSD). Object-based storage was invented to provide scalable performance as the storage cluster scales in size. For example, in our large file tests a 10-OSD system provided 325 MB/sec read bandwidth to 5 clients (from disk), and a 299-OSD system provided 10,334 MB/sec read bandwidth to 151 clients. This shows linear scaling of 30x speedup with 30x more client demand and 30x more storage resources. However, the system must not become more difficult to manage as it grows. Otherwise, the performance benefits can be quickly overshadowed by the administrative burden of managing the system. Instead, the storage cluster must feel like a single system image from the management perspective, even though it may be internally composed of 10's, 100's or thousands of object storage devices. For the HPC market, which is characterized as having unusually large clusters with usually small IT budgets, it is important that the storage system "just work" with relatively little administrative overhead.
Efficient Consistency for Erasure-Coded Data Via Versioning Servers
, 2003
"... This paper describes the design, implementation and performance of a family of protocols for survivable, decentralized data storage. These protocols exploit storage-node versioning to efficiently achieve strong consistency semantics. These protocols allow erasure-codes to be used that achieve networ ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper describes the design, implementation and performance of a family of protocols for survivable, decentralized data storage. These protocols exploit storage-node versioning to efficiently achieve strong consistency semantics. These protocols allow erasure-codes to be used that achieve network and storage efficiency (and optionally data confidentiality in the face of server compromise). The protocol family is general in that its parameters accommodate a wide range of fault and timing assumptions, up to asynchrony and Byzantine faults of both storage-nodes and clients, with no changes to server implementation or client-server interface. Measurements of a prototype storage system using these protocols show that the protocol performs well under various system model assumptions, numbers of failures tolerated, and degrees of reader-writer concurrency.
Towards Global Storage Management and Data Placement
, 2001
"... As users' and companies' dependence on shared, networked information services continues to increase, we will see continued growth in large data centers and service providers. This will happen both as new services arise, and as services and servers are consolidated on one hand (for ease of management ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
As users' and companies' dependence on shared, networked information services continues to increase, we will see continued growth in large data centers and service providers. This will happen both as new services arise, and as services and servers are consolidated on one hand (for ease of management, outsourcing, and reduced duplication), and further distributed on the other hand (for fault-tolerance of critical services and to accommodate the global reach of companies and customers). This paper outlines the key research issues associated with the deployment and management of a global storage system to support this infrastructure. We build on our success in automatically managing local storage systems, and discuss how moving to a system of global data placement raises new challenges and areas of research. We believe that one of the key attributes of such a storage system is the ability to flexibly adapt to a variety of application semantics and requirements as they arise (many applicat...
Decentralized Storage Consistency via Versioning Servers
, 2002
"... This paper describes a consistency protocol that exploits versioning storage-nodes. The protocol provides linearizability with the possibility of read aborts in an asynchronous system that may suffer client and storage-node crash failures. The protocol supports both replication and erasure coding (w ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
This paper describes a consistency protocol that exploits versioning storage-nodes. The protocol provides linearizability with the possibility of read aborts in an asynchronous system that may suffer client and storage-node crash failures. The protocol supports both replication and erasure coding (which precludes post hoc repair of partial-writes), and avoids the excess work of two-phase commits. Versioning storagenodes allow the protocol to avoid excess communication in the common case of no write sharing and no failures of writing clients.

