Results 1 - 10
of
335
Disconnected Operation in the Coda File System
- ACM Transactions on Computer Systems
, 1992
"... Disconnected operation is a mode of operation that enables a client to continue accessing critical data during temporary failures of a shared data repository. An important, though not exclusive, application of disconnected operation is in supporting portable computers. In this paper, we show that di ..."
Abstract
-
Cited by 1015 (36 self)
- Add to MetaCart
(Show Context)
Disconnected operation is a mode of operation that enables a client to continue accessing critical data during temporary failures of a shared data repository. An important, though not exclusive, application of disconnected operation is in supporting portable computers. In this paper, we show that disconnected operation is feasible, efficient and usable by describing its design and implementation in the Coda File System. The central idea behind our work is that caching of data, now widely used for performance, can also be exploited to improve availability.
A Low-bandwidth Network File System
, 2001
"... This paper presents LBFS, a network file system designed for low bandwidth networks. LBFS exploits similarities between files or versions of the same file to save bandwidth. It avoids sending data over the network when the same data can already be found in the server's file system or the client ..."
Abstract
-
Cited by 394 (3 self)
- Add to MetaCart
(Show Context)
This paper presents LBFS, a network file system designed for low bandwidth networks. LBFS exploits similarities between files or versions of the same file to save bandwidth. It avoids sending data over the network when the same data can already be found in the server's file system or the client's cache. Using this technique, LBFS achieves up to two orders of magnitude reduction in bandwidth utilization on common workloads, compared to traditional network file systems
Frangipani: A Scalable Distributed File System
"... The ideal distributed file system would provide all its users with coherent, shared access to the same set of files,yet would be arbitrarily scalable to provide more storage space and higher performance to a growing user community. It would be highly available in spite of component failures. It woul ..."
Abstract
-
Cited by 320 (1 self)
- Add to MetaCart
(Show Context)
The ideal distributed file system would provide all its users with coherent, shared access to the same set of files,yet would be arbitrarily scalable to provide more storage space and higher performance to a growing user community. It would be highly available in spite of component failures. It would require minimal human administration, and administration would not become more complex as more components were added. Frangipani is a new file system that approximates this ideal, yet was relatively easy to build because of its two-layer structure. The lower layer is Petal (described in an earlier paper), a distributed storage service that provides incrementally scalable, highly available, automatically managed virtual disks. In the upper layer, multiple machines run the same Frangipani file system code on top of a shared Petal virtual disk, using a distributed lock service to ensure coherence. Frangipaniis meant to run in a cluster of machines that are under a common administration and can communicate securely. Thus the machines trust one another and the shared virtual disk approach is practical. Of course, a Frangipani file system can be exported to untrusted machines using ordinary network file access protocols. We have implemented Frangipani on a collection of Alphas running DIGITAL Unix 4.0. Initial measurements indicate that Frangipani has excellent single-server performance and scales well as servers are added.
Sharp: An architecture for secure resource peering
- In Proceedings of the 19th ACM Symposium on Operating System Principles
, 2003
"... This paper presents Sharp, a framework for secure distributed resource management in an Internet-scale computing infrastructure. The cornerstone of Sharp is a construct to represent cryptographically protected resource claims— promises or rights to control resources for designated time intervals—tog ..."
Abstract
-
Cited by 193 (36 self)
- Add to MetaCart
This paper presents Sharp, a framework for secure distributed resource management in an Internet-scale computing infrastructure. The cornerstone of Sharp is a construct to represent cryptographically protected resource claims— promises or rights to control resources for designated time intervals—together with secure mechanisms to subdivide and delegate claims across a network of resource managers. These mechanisms enable flexible resource peering: sites may trade their resources with peering partners or contribute them to a federation according to local policies. A separation of claims into tickets and leases allows coordinated resource management across the system while preserving site autonomy and local control over resources. Sharp also introduces mechanisms for controlled, accountable oversubscription of resource claims as a fundamental tool for dependable, efficient resource management. We present experimental results from a Sharp prototype for PlanetLab, and illustrate its use with a decentralized barter economy for global PlanetLab resources. The results demonstrate the power and practicality of the architecture, and the effectiveness of oversubscription for protecting resource availability in the presence of failures.
The Timed Asynchronous Distributed System Model
, 1999
"... We propose a formal definition for the timed asynchronous distributed system model. We present extensive measurements of actual message and process scheduling delays and hardware clock drifts. These measurements confirm that this model adequately describes current distributed systems such as a netwo ..."
Abstract
-
Cited by 191 (19 self)
- Add to MetaCart
(Show Context)
We propose a formal definition for the timed asynchronous distributed system model. We present extensive measurements of actual message and process scheduling delays and hardware clock drifts. These measurements confirm that this model adequately describes current distributed systems such as a network of workstations. We also give an explanation of why practically needed services, such as consensus or leader election, which are not implementable in the time-free model, are implementable in the timed asynchronous system model.
Dynamic virtual clusters in a grid site manager
- In Proceedings of the Twelfth International Symposium on High Performance Distributed Computing (HPDC-12
, 2003
"... This paper presents new mechanisms for dynamic resource management in a cluster manager called Clusteron-Demand (COD). COD allocates servers from a common pool to multiple virtual clusters (vclusters), with independently configured software environments, name spaces, user access controls, and networ ..."
Abstract
-
Cited by 154 (28 self)
- Add to MetaCart
(Show Context)
This paper presents new mechanisms for dynamic resource management in a cluster manager called Clusteron-Demand (COD). COD allocates servers from a common pool to multiple virtual clusters (vclusters), with independently configured software environments, name spaces, user access controls, and network storage volumes. We present experiments using the popular Sun GridEngine batch scheduler to demonstrate that dynamic virtual clusters are an enabling abstraction for advanced resource management in computing utilities and grids. In particular, they support dynamic, policy-based cluster sharing between local users and hosted grid services, resource reservation and adaptive provisioning, scavenging of idle resources, and dynamic instantiation of grid services. These goals are achieved in a direct and general way through a new set of fundamental cluster management functions, with minimal impact on the grid middleware itself. 1
Paxos made live: an engineering perspective
- In Proc. of PODC
, 2007
"... We describe our experience building a fault-tolerant data-base using the Paxos consensus algorithm. Despite the existing literature in the field, building such a database proved to be non-trivial. We describe selected algorithmic and engineering problems encountered, and the solutions we found for t ..."
Abstract
-
Cited by 151 (0 self)
- Add to MetaCart
(Show Context)
We describe our experience building a fault-tolerant data-base using the Paxos consensus algorithm. Despite the existing literature in the field, building such a database proved to be non-trivial. We describe selected algorithmic and engineering problems encountered, and the solutions we found for them. Our measurements indicate that we have built a competitive system. 1
A Toolkit for User-Level File Systems
- In Proc. Usenix Technical Conference
, 2001
"... This paper describes a C toolkit for easily extending the Unix file system. The toolkit exposes the NFS interface, allowing new file systems to be implemented portably at user level. A number of programs have implemented portable, user-level file systems. However, they have been plagued by low-perfo ..."
Abstract
-
Cited by 148 (12 self)
- Add to MetaCart
This paper describes a C toolkit for easily extending the Unix file system. The toolkit exposes the NFS interface, allowing new file systems to be implemented portably at user level. A number of programs have implemented portable, user-level file systems. However, they have been plagued by low-performance, deadlock, restrictions on file system structure, and the need to reboot after software errors. The toolkit makes it easy to avoid the vast majority of these problems. Moreover, the toolkit also supports user-level access to existing file systems through the NFS interface---a heretofore rarely employed technique. NFS gives software an asynchronous, low-level interface to the file system that can greatly benefit the performance, security, and scalability of certain applications. The toolkit uses a new asynchronous I/O library that makes it tractable to build large, event-driven programs that never block.
Boxwood: Abstractions as the Foundation for Storage Infrastructure
, 2004
"... Writers of complex storage applications such as distributed file systems and databases are faced with the challenges of building complex abstractions over simple storage devices like disks. These challenges are exacerbated due to the additional requirements for faulttolerance and scaling. This paper ..."
Abstract
-
Cited by 132 (8 self)
- Add to MetaCart
Writers of complex storage applications such as distributed file systems and databases are faced with the challenges of building complex abstractions over simple storage devices like disks. These challenges are exacerbated due to the additional requirements for faulttolerance and scaling. This paper explores the premise that high-level, fault-tolerant abstractions supported directly by the storage infrastructure can ameliorate these problems. We have built a system called Boxwood to explore the feasibility and utility of providing high-level abstractions or data structures as the fundamental storage infrastructure. Boxwood currently runs on a small cluster of eight machines. The Boxwood abstractions perform very close to the limits imposed by the processor, disk, and the native networking subsystem. Using these abstractions directly, we have implemented an NFSv2 file service that demonstrates the promise of our approach.
Constrained Mirror Placement on the Internet
- IEEE Journal on Selected Areas in Communications
, 2002
"... Web content providers and content distribution network (CDN) operators often set up mirrors of popular content to improve performance. Due to the scale and decentralized administration of the Internet, companies have a limited number of sites (relative to the size of the Internet) where they can pla ..."
Abstract
-
Cited by 114 (9 self)
- Add to MetaCart
Web content providers and content distribution network (CDN) operators often set up mirrors of popular content to improve performance. Due to the scale and decentralized administration of the Internet, companies have a limited number of sites (relative to the size of the Internet) where they can place mirrors. We formalize the mirror placement problem as a case of constrained mirror placement, where mirrors can only be placed on a preselected set of candidates. We study performance improvement in terms of client round-trip time (RTT) and server load when clients are clustered by the autonomous systems (AS) in which they reside. Our results show that, regardless of the mirror placement algorithm used, for only a surprisingly small range of values is increasing the number of mirror sites (under the constraint) effective in reducing client to server RTT and server load. In this range, we show that greedy placement performs the best.