22 citations found. Retrieving documents...
A. Goldberg et al. Transparent recovery of Mach applications. In Usenix Mach Workshop, pages 169--183, 1990.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
A Preemptive Deterministic Scheduling Algorithm for.. - Basile, Kalbarczyk, Iyer (2003)   (6 citations)  (Correct)

....a compatible sequence of state updates in all replicas without requiring the same thread interleaving. This is achieved by intercepting mutex lock unlock operations performed by application threads on accessing the shared data. Intercepting mutex lock unlock operations was first suggested in [7] for messagelogging based recovery. In the LSA algorithm, one replica (leader) decides the mutex acquisition order and propagates it to other replicas (followers) which enforce the leader dictated order on the execution of their threads. While the method preserves a large degree of concurrency, ....

A. Goldberg et al. Transparent recovery of Mach applications. In Usenix Mach Workshop, pages 169--183, 1990.


A Preemptive Deterministic Scheduling Algorithm for.. - Kalbarczyk, Iyer (2002)   (6 citations)  (Correct)

....a compatible sequence of state updates in all replicas without requiring the same thread interleaving. This is achieved by intercepting mutex lock unlock operations performed by application threads on accessing the shared data. Intercepting mutex lock unlock operations was first suggested in [2] for message logging based recovery. In the LSA algorithm, one replica (leader) decides the mutex acquisition order and propagates it to other replicas (followers) which enforce the leader dictated order on the execution of their threads. While the method preserves a large degree of concurrency, ....

A. Goldberg et al. Transparent recovery of Mach applications. In Usenix Mach Workshop, pages 169--183, 1990.


Loose Synchronization of Multithreaded Replicas - Basile, Whisnant, Kalbarczyk.. (2002)   (4 citations)  (Correct)

....leader determines the next preemption point at which it will be served. This information is sent to followers. Replicas are assumed to be fail silent [8] Some of the issues related to handling nondeterminism due to multithreading have been studied in the context of logbased rollback recovery. In [2] it is suggested adding support Because of the single failure assumption, only the leader can crash. Since (possibly contaminated) correct followers cannot perform an invalid computation, it would be sufficient to choose any follower as the single surviving replica and exclude all the other ....

A. Goldberg et al. Transparent recovery of Mach applications. In Usenix Mach Workshop, pages 169--183, 1990.


Active Replication of Multithreaded Applications - Basile, Kalbarczyk, Whisnant, .. (2002)   (3 citations)  (Correct)

....mutexes) This also contrasts with nonpreemptive, deterministic schedulers [6] 7] which limit concurrency by allowing only one physical thread to execute at a time. Although intercepting mutex requests to track the order of state updates has been proposed in the context of rollback recovery [1], it has not been applied to active replication, nor has it been demonstrated on a substantial application. 2. Loose Synchronization Algorithm The loose synchronization algorithm exploits the nondeterminism in replica behavior that does not impact the replica outputs. Assuming no prior knowledge ....

A. Goldberg et al. Transparent recovery of Mach applications. In Usenix Mach Workshop, pages 169--183, 1990.


Active Replication of Multithreaded Applications - Basile, Kalbarczyk, Whisnant, .. (2002)   (3 citations)  (Correct)

....contrast with approaches employing nonpreemptive, deterministic schedulers [21] 26] which limit concurrency by allowing only one physical thread to execute at a time. Although intercepting mutex requests to record the order of state updates has been proposed in the context of rollback recovery [2], it has not been applied to active replication, nor has it been demonstrated on a substantial application. To evaluate the proposed algorithm, a transparent active replication framework has been developed. The framework consists of an implementation of the loose synchronization algorithm, a ....

....been to integrate fault tolerance via replication of CORBA applications [20] Recent years have brought studies on replicating multithreaded applications. Some of the issues related to handling nondeterminism due to multithreading have been studied in the context of log based rollback recovery. [2] suggests adding support to the Mach operating system to track and to log the order in which threads access locks and semaphores. The data preserved in the log is used to support rollback recovery of failed processes (i.e. the thread execution is replayed following the order dictated by the log) ....

A. Goldberg et al. Transparent recovery of Mach applications. In Usenix Mach Workshop, pages 169--183, 1990.


Causality Considerations in Distributed.. - Vaughan, Dearle.. (1994)   (1 citation)  (Correct)

....presented in which each process is aware of its causal dependencies and can control appropriate recovery after failure. Recovery consists of finding some earlier checkpoint and replaying messages where possible. An implementation of these scheme based on the Mach operating system is described in [8]. Johnson and Zwaenepoel [11] provide an extended treatment which uses checkpoints and message logs to find a maximal recoverable state applicable to both optimistic and pessimistic logging protocols. The algorithms presented by Johnson and Zwaenepoel form the basis for recovery control in ....

Goldberg, A., Gopal, A., li, K., Strom, R. and Bacon, D. "Transparent Recovery of Mach Applications", 1st International Mach Workshop, Vermont, 1990.


Optimistic Recovery in Multi-threaded Distributed Systems - Damani, Tarafdar, Garg (1999)   (Correct)

....n is the number of processes in the system [15] While extending this solution to multi threaded processes we have two natural choices: a process centric approach and a thread centric approach. In the processcentric approach, the internal non deterministic events caused by threads are logged [7, 14]. With this provision, other researchers have used traditional optimistic protocols. This, however, gives rise to the problem of false causality between threads of a process. This problem has two serious repercussions. First, during failure free mode, it causes the unnecessary blocking of outputs ....

....apart from the order of message receives. Depending on the scheduling, the threads may access shared objects in a different order. Therefore, after a failure, replaying the message log to a process is not sufficient to recreate the desired states. To solve this problem, Goldberg et al. [7] require that shared objects be accessed only in locked regions. The order in which threads acquire locks is logged. During a replay, the same locking order is enforced. This trace andreplay technique has also been used in concurrent debuggers [10, 16] Another approach has been used by Elnozahy ....

[Article contains additional citation context not shown here]

A. P. Goldberg, A. Gopal, K. Li, R. E. Strom, and D. F. Bacon. Transparent Recovery of Mach Applications. 1st USENIX Mach Workshop, 1990.


An Architectural Overview Of The Alpha Real-Time.. - Clark, Jensen, Reynolds (1993)   (29 citations)  (Correct)

.... provides examples of this: MIG [35] is used not just for RPC but also sometimes for local IPC, and is the only IPC facility provided in a fault tolerant system built on Mach [36] an approach to transparent recovery which does use Mach s asynchronous messages is significantly complicated by them [37]. Similarly, the asynchronous message passing communication hardware of large multicomputers is often abstracted into a more productive synchronous programming model with software development tools [38] Asynchronous RPC was removed from Amoeba 2.0 as having been a truly dreadful decision and ....

Goldberg, A., A. Gopal, K. Li, R. Strom, and D. Bacon, Transparent Recovery of Mach Applications, Proceedings of the USENIX Mach Workshop, October 1990.


Methods and Models for Management of Distributed and Persistent.. - Feeley (1995)   (Correct)

....performing a global synchronization of every processor. Pessimistic checkpointing [12, 13, 88, 63, 37] duplicates messages; every message is sent to a target processor and to a designated backup processor. If a processor fails, its backup is capable of standing in for it. Optimistic checkpointing [99, 47, 64, 62] allows processors to checkpoint independently; individual checkpoints are coordinated to form a consistent global checkpoint only in the event of a failure of one of the processors. The global checkpoint is formed by tracking the dependencies among individual processor checkpoints; it is ....

Arthur Goldberg, Ajei Gopal, Kong Li, Rob Strom, and David F. Bacon. Transparent recovery of Mach applications. In Proceedings of the USENIX Mach Conference, pages 169--183, July 1990.


Lessons from FTM: an Experiment in the Design and.. - Muller, al. (1995)   (2 citations)  (Correct)

....executes normally without having to synchronize with others. The drawbacks of independent checkpointing are that it is either domino effect prone [Wood 81] Bhargava Lian 88] Merlin Randell 78] or that message replay requires deterministic processes [Strom Yemini 85] Borg et al. 89] Goldberg et al. 90] Juang Venkatesan 91] Elnozahy Zwaenepoel 92] Forcing processes to be deterministic means that the sources of systems non determinism, such as multi threading, interrupts and memory mapped I O [Gleeson 93] must be removed. Consequently, solutions for deterministic processes are always ....

A. Goldberg, A. Gopal, K. Li, R. Strom, & D.F. Bacon. Transparent recovery of mach applications. In USENIX Mach Workshop, pages 169--183, Burlington (VT), October 1990.


Experience with Chorus - Bac, Bernard, Conan, Nguyen, Taconet   (Correct)

....provided, the binding scope is only local to a machine. In Chorus, port names are global but when a machine failure occurs the localization service broadcasts a request in order to find the new locations of the ports. Because this mechanism is too expensive, we need the concept of reliable ports [Gold90]. 2. A global naming scheme for every object handled by the system In UNIX and Chorus, most external and internal names are local, particularly process identifiers. Some of these names are very important for interactive and real time applications. Thus, we don t support those applications in a ....

A.P. Goldberg, A. Gopal, K. Li, R. Strom, and D.F. Bacon. Transparent Recovery of Mach Applications. In Proc. 1st USENIX Mach Symposium, 1990.


Formal Semantics for Expressing Optimism: The Meaning of HOPE - Cowan, Lutfiyya (1995)   (4 citations)  (Correct)

....primitives is indeed provided. Finally, section 7 presents our conclusions and future research. 2 RELATED WORK Use of optimism has largely been limited to embedded systems. For instance, numerous optimistic recovery protocols have been designed [24, 18, 19] and a few have even been implemented [14]. These protocols allow separate components of a distributed system to asynchronously checkpoint their state while retaining the ability to recover the whole system to a consistent state. The basic mechanism is to optimistically assume that the sender of a message will checkpoint it s state to ....

Arthur Goldberg, Ajei Gopal, Kong Li, Rob Strom, and David F. Bacon. Transparent Recovery of Mach Applications. In First USENIX Mach Workshop, Burlington, VT, October 1990.


Using Time to Improve the Performance of Coordinated.. - Neves, Fuchs (1996)   (5 citations)  (Correct)

....of protocols have their own advantages. However, coordinated protocols have shown better performance than uncoordinated protocols when used with parallel applications [7] Additionally, coordinated protocols do not need any piece wise determinism assumption about the execution of the processes [8] and can tolerate failures that affect multiple processes simultaneously. Nevertheless, previous coordinated protocols have several overheads that should be avoided. In a typical coordinated protocol, the coordinator has to exchange three messages with each process. This overhead can become ....

A. Goldberg, A. Gopal, K. Li, R. Strom, and D. Bacon. Transparent recovery of Mach applications. In Proceedings of the Usenix Mach Workshop, pages 169--184, July 1990.


Support for Software Interrupts in Log-Based Rollback-Recovery - Slye, Elnozahy (1997)   (1 citation)  (Correct)

....from a checkpoint that occurred after the events affected the computation. This solution however has a very large overhead, especially when a checkpoint has to involve several processes. Other systems converted these forms of nondeterminism into synchronous messages that can be logged efficiently [6,16]. This solution is not satisfactory because it entails substantial changes to applications and operating systems. Additionally, it cannot support existing applications that rely on asynchronous notification through software signals. 1.2 Our Solution We present in this paper an efficient solution ....

....of the piecewise deterministic execution model can be removed at a reasonable cost. The solution in this paper does not handle other types of nondeterminism, such as results from system calls and values read from an external input source or message. These have been handled by previous work [6, 11, 16, 17]. The performance study also is independent from any particular checkpointing or logging protocol to prevent any interference with the measurements. The rest of the paper is organized as follows. Section 2 describes the method proposed for tracking interrupts in modern RISC and CISC architecture. ....

[Article contains additional citation context not shown here]

A. Goldberg, A. Gopal, K. Li, R. Strom, and D. Bacon. Transparent recovery of Mach applications. In Proceedings of the Usenix Mach Workshop, pages 169--184, Oct. 1990.


On the Use and Implementation of Message Logging - Elnozahy (1994)   (2 citations)  (Correct)

....of SBML and Manetho. checkpointing protocols, where message logging is added to provide efficient interactions with the outside world for those applications where tracking nondeterminism can be done efficiently. 6 Related Work Many message logging protocols have been proposed in the literature [2,3,5,6,13,15 20,22,28 30,33,35,36]. To the best of our knowledge, our paper is the first to advocate the use of coordinated checkpointing as the method of choice for saving processes states in message logging protocols. We are also not aware of any work that compares message logging protocols with coordinated checkpointing ....

A. Goldberg, A. Gopal, K. Li, R. Strom, and D. Bacon. Transparent recovery of Mach applications. In Proceedings of the Usenix Mach Workshop, pages 169--184, October 1990.


Language Support for the Application-Oriented Fault Tolerance .. - Lutfiyya, Cowan (1995)   (Correct)

....autonomy and avoid synchronization delay. Possible rollback propagation in case of a fault is handled by searching for a consistent system state based on the dependency information. Several optimistic recovery protocols have been proposed [1, 14, 15, 16, 31] but few have even been implemented [4, 11]. These protocols allow separate components of a distributed system to asynchronously checkpoint their state while retaining the ability to recover the whole system to a consistent state. The basic mechanism is to optimistically assume that the sender of a message will checkpoint its state to ....

Arthur Goldberg, Ajei Gopal, Kong Li, Rob Strom, and David F. Bacon. Transparent Recovery of Mach Applications. In First USENIX Mach Workshop, Burlington, VT, October 1990.


A Survey of Rollback-Recovery Protocols in Message-Passing.. - Elnozahy, Johnson, Wang (1996)   (161 citations)  (Correct)

....low, and it can be adjusted depending on how many failures the system is willing to tolerate [137] 5.2. 2 System level versus User level Implementations Support for checkpointing can be implemented in the kernel [48, 86, 135] or it can be implemented by a library linked with the user program [62, 106, 136, 159, 165, 191]. Kernel level implementations are more powerful because they can also capture kernel data structures that support the checkpointed process. However, these implementations are necessarily not portable. Checkpointing can also be implemented in user level. System calls that manipulate memory ....

....be implemented in user level. System calls that manipulate memory protection such as mprotect of UNIX can emulate concurrent and incremental checkpointing. The fork system call of UNIX can implement concurrent checkpointing if the operating system implements fork using copy on write protection [62]. Userlevel implementations however cannot access kernel s data structures that belong to the process such as open file descriptors and message buffers, but these data structures can be emulated at user level [149, 191] 5.2.3 Compiler Support A compiler can be instrumented to generate code that ....

[Article contains additional citation context not shown here]

A. P. Goldberg, A. Gopal, K. Li, R. E. Strom, and D. F. Bacon. Transparent recovery of Mach applications. In First USENIX Mach Workshop, October 1990.


Performance of Consistent Checkpointing in a Modular.. - Muller, Hue, Peyrouze (1994)   (4 citations)  (Correct)

....deterministic processes which are generally hard to implement in a stan1. Also in Proceedings of the First European Dependable Computing Conference, Berlin, Germany, October 1994. 2 dard operating system due to shared memory, multi threading and interrupts [Strom Yemini 85] Borg et al. 89] Goldberg et al. 90] Juang Venkatesan 91] Elnozahy Zwaenepoel 92] The resulting system is thus not easily portable. In the consistent checkpointing approach, processors coordinate their local checkpointing action so that the global state is guaranteed to be consistent [Chandy Lamport 85] When a failure ....

A. Goldberg, A. Gopal, K. Li, R. Strom, & D.F. Bacon. Transparent recovery of mach applications. In USENIX Mach Workshop, pages 169--183, Burlington (Vermont), October 1990.


A Checkpoint Protocol for an Entry Consistent Shared Memory.. - Neves, Castro, Guedes (1994)   (38 citations)  (Correct)

....processes, all the object versions acquired by its threads between the checkpoint and the failure. Next, the recovering process threads re execute, acquiring the same versions of the same objects as they did before the failure. This assumes that threads execute in a piece wise deterministic manner [9]. After recovery, in the event of a single process failure, the system will be in a consistent state. On the other hand, if multiple process failures occurred it might be impossible to recover the system to a consistent state. The recovery mechanism detects this situation and aborts the ....

....are delivered reliably and in FIFO order. Each process is viewed as a collection of resources, which provides an execution environment for multiple threads. These resources include an address space, where a subset of the shared objects is mapped. Threads run in a piece wise deterministic manner [9], i.e. the execution of a thread is divided into deterministic intervals started by nondeterministic events. A new interval starts when a thread acquires an object for reading or writing and ends at the next acquire. If the piece wise determinism assumption holds, a process can be recovered by ....

A. Goldberg, A. Gopal, K. Li, R. Strom, and D. Bacon. Transparent recovery of Mach applications. In Proceedings of the Usenix Mach Workshop, pages 169-- 184, July 1990.


Implementing Dynamic Atomic Actions Using Reliable Servers - Hue, Muller, Peyrouze.. (1993)   (Correct)

....synchronization during normal operation. In order to reduce the number of dependent processors involved in a checkpoint or a roll back operation, Koo and Toueg in [17] considered the dependencies between processes which are introduced by the exchange of messages. In the asynchronous approach [14][11] 16] each processor takes checkpoints independently. Messages exchanged between applications are logged during normal execution and are replayed after a crash. Using optimistic protocols, that perform logging of messages asynchronously, each processor continues to execute normally without ....

....is that determinism is generally hard to implement in a standard operating system due to shared memory, multi threading and interrupts. Moreover determinism makes difficult the management of a distributed shared memory subsystem without explicit synchronization within the application program [14]. In [21] Leu and Bhargava have shown that consistent checkpointing can be modeled as a distributed transaction (e.g. ARGUS [22] CAMELOT [12] RELAX [30] ARJUNA[23] QuickSilver [15] 29] In traditional transaction based systems, the system programmer must still explicitly define the units of ....

A. Goldberg, A.Gopal, K. Li, R. Strom and D. F. Bacon. Transparent Recovery of Mach Applications. In USENIX Mach Workshop, pp. 169-183, Burlington, Vermont, October 1990.


Implementation and Performance of Transparent.. - Elnozahy, Zwaenepoel   (Correct)

....implementation records the following internal nondeterministic events: synchronization operations between different threads of an RU, kernel calls, and timeouts. Manetho s treatment of these internal nondeterministic events is similar to that described in the implementation of Optimistic Recovery [9]. Manetho records in the AG the order in which synchronization operations occur, the results of kernel calls, and the occurrence of timeouts. During recovery, Manetho uses the information in the AG to force synchronization calls by the different threads of an RU to occur in the same order as ....

A. Goldberg, A. Gopal, K. Li, R. Strom, and D. Bacon. Transparent recovery of Mach applications. In Proceedings of the Usenix Mach Workshop, pages 169--184, October 1990.


Real-Time Scheduling and Synchronization in Real-Time Mach - Tokuda, Nakajima (1991)   (Correct)

....of platforms and the Macintosh OS on the Macintosh II [10] A new challenge is to extend the pure kernel to create a set of new servers to support realtime, secure, or fault torelant computing domains. There are a few results related to a secure version of Mach [8] and a fault torelant extension [4, 9, 2]. However, no investigations were reported for extending Mach for distributed real time computing domain. Our group has been working on a real time version of the Mach kernel in order to bridge the gap between a traditional real time executive and a time sharing operating system, like UNIX. Unlike ....

A. Goldberg, A. Gopal, K. Li, R. Storm, and D. Bacon, "Transparent Recovery of Mach Applications ", In Proceedings of USENIX Mach Workshop, October, 1990.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC