| B. W. Lampson. Atomic Transactions, pages 246-265. Volume 105 of Lecture Notes in Computer Science, Springer-Verlag, New York, N.Y., 1981. This is a revised version of Lampson and Sturgis's unpublished Crash Recovery in a Distributed Data Storage System. |
....are undefined; max viewid has the initial vMue 0, my id) and orig config contains the ids of all the replicas in the system. One view change is necessary to let the replicas have a common view and viewid to work with. We assume that the entire replica state is stored on stable storage [19]; we discuss this assumption in section 5.9. 5.2 Probes The topological changes in the network are detected by sending and receiving probes. This is accomplished using two processes at each replica, one that sends probes and the other that receives them. The probing procedure is shown in Figure ....
....a replica recovers with all its pre crash state restored. That is, no information is lost during crashes. This subsection discusses two extant implementations, and gives a reference to a method that can be used when this assumption does not hold. An easy solution is to provide stable storage [19] at each replica. Each replica has some form of nonvolatile storage (for example, disks) The updates to the replica state are 74 recorded on the log in the order they occur. The log is kept in the stable storage. During a recovery from a crash, the contents of the log are replayed in the order ....
[Article contains additional citation context not shown here]
B. W. Lampson. Atomic Transactions, pages 246-265. Volume 105 of Lecture Notes in Computer Science, Springer-Verlag, New York, N.Y., 1981. This is a revised version of Lampson and Sturgis's unpublished Crash Recovery in a Distributed Data Storage System.
....is an impossible goal to achieve, so systems are designed to be fault tolerant. This means that the system continues to provide the specified service despite a subset of all possible failures. Specifically, a failure model is required to define exactly the types of failures that can be tolerated [4]. Component failures are classified as either errors, which are expected to happen and from which recovery is possible, or disasters, which are unexpected and from which recovery is not possible. Most failure models include node failures, communication failures and media failures as errors. A node ....
....any incorrect state transformations. Communication failures are detectable errors occurring when a message is sent from one node to another node across the network (e.g. a message may be corrupted or lost) Media failures are detectable errors in a secondary storage device (e.g. a hard disk) [4]. There are several ways of approaching fault tolerance but in general these involve introducing redundancy into the system. One approach, which comes from ideas concerning fault tolerant hardware, is to form groups or replicated processes [1] Each process in the group performs the same set of ....
[Article contains additional citation context not shown here]
Butler Lampson. Atomic Transactions, volume 105 of Lecture Notes in Computer Science, chapter 11, pages 246--265. Springer-Verlag, 1981. Distributed Systems --- Architecture and Implementation: An Advanced Course.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC