Results 1 -
7 of
7
Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing
, 1988
"... In a distributed system using message logging and checkpointing to provide fault tolerance, there is always a unique maximum recoverable system state, regardless of the message logging protocol used. The proof of this relies on the observation that the set of system states that have occurred during ..."
Abstract
-
Cited by 199 (13 self)
- Add to MetaCart
In a distributed system using message logging and checkpointing to provide fault tolerance, there is always a unique maximum recoverable system state, regardless of the message logging protocol used. The proof of this relies on the observation that the set of system states that have occurred during any single execution of a system forms a lattice, with the sets of consistent and recoverable system states as sublattices. The maximum recoverable system state never decreases, and if all messages are eventually logged, the domino e ect cannot occur. This paper presents a general model for reasoning about recovery in such a system and, based on this model, an efficient algorithm for determining the maximum recoverable system state at any time. This work uni es existing approaches to fault tolerance based on message logging and checkpointing, and improves on existing methods for optimistic recovery in distributed systems.
The Duality of Memory and Communication in the Implementation of a Multiprocessor Operating System
- In Proceedings of the 11th ACM Symposium on Operating Systems Principles
, 1987
"... Mach is a multiprocessor operating system being implemented at Carnegie-Mellon University. An important component of the Mach design is the use of memory objects which can be managed either by the kernel or by user programs through a message interface. This feature allows applications such as transa ..."
Abstract
-
Cited by 139 (7 self)
- Add to MetaCart
Mach is a multiprocessor operating system being implemented at Carnegie-Mellon University. An important component of the Mach design is the use of memory objects which can be managed either by the kernel or by user programs through a message interface. This feature allows applications such as transaction management systems to participate in decisions regarding secondary storage management and page replacement. This paper explores the goals, design and implementation of Mach and its external memory management facility. The relationship between memory and communication in Mach is examined as it relates to overall performance, applicability of Mach to new multiprocessor architectures, and the structure of application programs. This research was sponsored by the Defense Advanced Research Projects Agency (DOD), ARPA Order No. 4864, monitored by the Space and Naval Warfare Systems Command under contract N00039-85-C-1034. The views expressed are those of the authors alone. Permission to copy...
Design, Implementation, and Performance Evaluation of a Distributed Shared Memory Server for Mach
- In 1988 Winter USENIX Conference
, 1988
"... This report describes the design, implementation and performance evaluation of a virtual shared memory server for the Mach operating system. The server provides unrestricted sharing of read-write memory between tasks running on either strongly coupled or loosely coupled architectures, and any mixtur ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
This report describes the design, implementation and performance evaluation of a virtual shared memory server for the Mach operating system. The server provides unrestricted sharing of read-write memory between tasks running on either strongly coupled or loosely coupled architectures, and any mixture thereof. A number of memory coherency algorithms have been implemented and evaluated, including a new distributed algorithm that is shown to outperform centralized ones. Some of the features of the server include support for machines with multiple page sizes, for heterogeneous shared memory, and for fault tolerance. Extensive performance measures of applications are presented, and the intrinsic costs evaluated. 2 1. Introduction Shared memory multiprocessors are becoming increasingly available, and with them a faster way to program applications and system services via the use of shared memory. Currently, the major limitation in using shared memory is that it is not extensible network-wi...
Strongbox: A System for Self-Securing Programs
, 1991
"... Introduction Security is a pressing problem for distributed systems. Distributed systems exchange data among a variety of users over a variety of sites, which may be geographically separated. A user who stores important data on processor A must trust not just processor A but also the processors B; ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
Introduction Security is a pressing problem for distributed systems. Distributed systems exchange data among a variety of users over a variety of sites, which may be geographically separated. A user who stores important data on processor A must trust not just processor A but also the processors B; C;D; . . . with which A communicates. The distributed security problem is difficult, and few major distributed systems attempt to address it. In fact, conventional approaches to computer security are so complex that they actually discourage designers from trying to build a secure distributed system: A software engineer who wishes to build a secure distributed data application finds that he or she must depend on the security of a distributed database which depends on the security of a distributed file system which depends on the security of a distributed operating system kernel, etc. Under
Output-driven distributed optimistic message logging and checkpointing
- Department of Computer Science, Rice University
, 1990
"... Although optimistic fault-tolerance methods using message logging and checkpointing have the potential to provide highly e cient, transparent fault tolerance in distributed systems, existing methods are limited byseveral factors. Coordinating the asynchronous message logging progress among all proce ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Although optimistic fault-tolerance methods using message logging and checkpointing have the potential to provide highly e cient, transparent fault tolerance in distributed systems, existing methods are limited byseveral factors. Coordinating the asynchronous message logging progress among all processes of the system may cause signi cant overhead, limiting their ability to scale to large systems and o setting some of the performance gains over simpler pessimistic methods. Furthermore, logging all messages received by each process may place a substantial load on the network and le server in systems with high communication rates. Finally, existing methods do not support nondeterministic process execution, such as occurs in multithreaded processes and those that handle asynchronous interrupts. This paper presents a new method using optimistic message logging and checkpointing that addresses these limitations. Any fault-tolerance method must delay output from the system to the outside world until it can guarantee that no future failure can force the system to roll back to a state before the output was sent. With this new method, only this need to commit output forces any process to log received messages or to checkpoint. Each process commits its own output, with the cooperation of the minimum number of other processes, and any messages not needed to allow pending output to be committed need not be logged. Individual processes may also dynamically switch tocheckpointing without message logging, to avoid the expense of logging a large number of messages or to support their own nondeterministic execution. 1
Issues in Logging Techniques through a Study of Four Systems
- Univ. of Glasgow (Scotland), Tech. Rep. FIDE
, 1994
"... This paper presents the main logging techniques issues through a study of four existing logging systems implemented as independent storage systems: Clio, Camelot DLF, K i t L o g and Sprite LFS. It aims to provide a clear view of problems and actual solutions. Key words and phrases: Logging techniqu ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper presents the main logging techniques issues through a study of four existing logging systems implemented as independent storage systems: Clio, Camelot DLF, K i t L o g and Sprite LFS. It aims to provide a clear view of problems and actual solutions. Key words and phrases: Logging techniques, compaction, log end, quick data lookup. Introduction
Issues in Logging Techniques through a Study of Four Systems
, 1994
"... This paper presents the main logging techniques issues through a study of four existing logging systems implemented as independent storage systems: Clio, Camelot DLF, K I T ..."
Abstract
- Add to MetaCart
This paper presents the main logging techniques issues through a study of four existing logging systems implemented as independent storage systems: Clio, Camelot DLF, K I T

