60 citations found. Retrieving documents...
B. Bhargava, S.R. Lian, Independent Checkpointing and Concurrent Rollback for Recovery -- an Optimistic Approach, Proc. IEEE Symposium on Reliable Distributed Systems, pp. 3-12, 1988

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

An Efficient Optimistic Message Logging Scheme for Recoverable .. - Park, al.   (Correct)

....of a set of processes during the recovery if the distributed recovery scheme is employed. Hence, either the rollback of the related processes have to be synchronized as in the synchronous recovery scheme [14] or a centralized coordination is required as for the centralized recovery scheme [6]. One way to support the asynchronous recovery [11, 27] is to use the message logging in addition to the checkpointing. With the asynchronous recovery, a process can independently decide its rollback in case of the system failures; and after the rollback, the process can immediately resume its ....

B. Bhargava and S.R. Lian, "Independent Checkpointing and Concurrent Rollback for Recovery - An Optimistic Approach," Proc. 4th Int'l Conf. on Data Engineering, 1988, pp. 182--189.


An Efficient Recovery Scheme for Fault-Tolerant Mobile.. - Park, Woo, Yeom   (Correct)

....when the recovery is concerned, checkpointing only schemes have a common problem in rollback. Because of the livelock problem [13] which causes recursive rollbacks, either the rollback of the related processes has to be synchronized as proposed in [13] or a centralized coordination is required [6]. One way to guarantee the asynchronous recovery [10, 23] is to use the message logging in addition to the checkpointing. Causal logging scheme [3, 4, 11] however, requires a large size of log space and a large amount of dependency information to be carded in a message, which can be a serious ....

B. Bhargava and S.R. Lian. Independent Checkpointing and Concurrent Rollback for Recovery - An Optimistic Approach. In Proc. of the Int'l Conf. on Data Engineering, pp. 182-189, 1988.


Transparent Fault Tolerance for Web Services based.. - Dialani, Miles.. (2002)   (Correct)

....process to the previous checkpoint will not a#ect dependent processes. Optimistic independent checkpointing requires that dependencies are explicitly recorded somewhere in the system, so that on rollback of a process, dependent processes will be informed appropriately and possibly also rolled back [6, 11, 20]. The pessimistic approach places more restrictions on a process autonomy in checkpointing and may require more checkpointing than optimistic approaches. Optimistic mechanisms will have more overhead in rollback, on the other hand. However, it should be noted that no single mechanism is ....

B. Bhargava and S. Lian. Independent checkpointing and concurrent rollback for recovery in distributed systems--an optimistic approach. In Proceedings of the 7th IEEE Symposium on Reliable Distributed Systems, pages 3--12, 1988.


An Efficient Recovery Scheme for Mobile Computing Environment - Park, Woo, Yeom (2001)   (1 citation)  (Correct)

....when the recovery is concerned, checkpointing only schemes have a common problem in rollback. Because of the livelock problem [13] which causes recur sive rollbacks, either the rollback of the related processes have to be synchronized as proposed in [13] or a centralized coordination is required [6]. One way to guarantee the asynchronous recovery [10, 22] is to use the message logging in addition to the checkpointing. Causal logging scheme [3, 4, 11] however, requires a large size of log space and also a large amount of dependency information to be carded in a message, which can be a ....

B. Bhargava and S.R. Lian, "Independent checkpointing and concurrent rollback for recovery - an optimistic approach," In Proc. of the Int'l Conf. on Data Engineering, pp. 182-189, 1988.


Completely Asynchronous Optimistic Recovery with Minimal.. - Smith, Johnson, Tygar (1995)   (22 citations)  (Correct)

....treating each nondeterministic influence as a message, logging it and replaying it during recovery. The message logging approach allows states of a process in addition to those saved in a checkpoint to be recovered. Recovery protocols based instead on checkpointing without message logging (e.g. [1, 5, 6, 7, 8, 15, 16, 17, 31]) can recover only process states that have been checkpointed, often forcing processes to roll back further than otherwise required after a failure. Message logging allows each process to be checkpointed less frequently, and may in general reduce failure free overhead since logging a message is ....

B. Bhargava and S. Lian. "Independent Checkpointing and Concurrent Rollback Recovery for Distributed Systems --- An Optimistic Approach." Seventh Symposium on Reliable Distributed Systems. 3--12. IEEE, 1988.


Minimizing Timestamp Size for Completely Asynchronous.. - Smith, Johnson (1996)   (Correct)

....the rollback protocol can decide to discard the message only when the message is a knowable orphan. 1.3. Previous Work In this paper, we concentrate on rollback based on optimistic message logging and replay. Recovery protocols based instead on checkpointing without message logging (e.g. [1, 3, 4, 5, 8, 15, 16, 29]) may force processes to roll back further than otherwise required, since processes can only recover states that have been checkpointed. Recovery protocols based on pessimistic message logging (e.g. 2, 9, 11, 21] can cause processes to delay execution until incoming messages are logged to ....

B. Bhargava and S. Lian. "Independent Checkpointing and Concurrent Rollback Recovery for Distributed Systems--- An Optimistic Approach." Seventh Symposium on Reliable Distributed Systems. 3--12. IEEE, 1988.


Coherence-Based Coordinated Checkpointing for.. - Kongmunvattana.. (2000)   (2 citations)  (Correct)

.... distributed snapshots resulted in algorithms for reducing the number of messages required for synchronization under coordinated checkpointing [19, 22] Elnozahy et al.: evaluated two techniques for tolerating checkpoint latency and for reducing the checkpoint size [11] Independent checkpointing [4, 23, 25] allows each process to create a checkpoint individually at any time. It does not guarantee bounded rollback recovery, so previous checkpoints cannot be discarded and garbage collection is necessary to limit the number of checkpoints stored. We focus on coordinated checkpointing in this study. ....

B. Bhargava and S. R. Lian. Independent Checkpointing and Concurrent Rollback Recovery for Distributed Systems - An Optimistic Approach. In Proc. of the 7th IEEE Symp. on Reliable Distributed Systems, pages 3--12, October 1988.


Checkpointing and Rollback of Wide-Area Distributed.. - Cao, Chan, Jia, Dillon (2001)   (Correct)

....In log based rollback recovery, both checkpointing and logging are used [14, 9] In this paper, we are concerned with only checkpoint based approaches. There exist three primary approaches to checkpointbased checkpointing and rollback in distributed systems: 1. Independent checkpointing [16, 2, 15]: Processes do not coordinate their actions for checkpointing during normal execution and each process takes its local checkpoint independently. When a failure occurs, consistent global checkpoints are established by dependency tracking and possibly message logging. 2. Synchronous (coordinated) ....

B. Bhargava, Shy-Renn Lian, "Independent Checkpointing and Concurrent Rollback Recovery for Distributed Systems -- An Optimistic Approach", IEEE Proc. 7th Symp. on Reliability in Distributed Systems, Oct. 1988, pp3-12.


Comprehensive Low-overhead Process Recovery Based on.. - Manivannan, Singhal (1995)   (Correct)

....of some process, then the send operation of that message must have been recorded also. In the literature, several checkpointing schemes have been proposed for distributed systems. They can be broadly classified into two categories asynchronous and synchronous. In asynchronous checkpointing [4, 11, 22], processes take checkpoints periodically without any coordination with others. To recover from a failure, a process communicates with other 2 processes to determine if their local states are causally related. If they are, processes that received messages which are responsible for causal ....

B. Bhargava and S. R. Lian. "Independent Checkpointing and Concurrent Rollback for Recovery in Distributed Systems--An Optimistic Approach.". In Proc. 7 th IEEE Symp. Reliable Distributed Syst., pages 3--12, 1988.


Checkpointing and Rollback Recovery in Object-based Systems - Katsuya Tanaka Hiroaki (1996)   (Correct)

....taking c i , m is an orphan [4] c is consistent if there is no orphan. In Fig. 3(1) the checkpoint hc i , c j i is consistent from the definition. In Fig. 3(2) hc i , c j i is inconsistent because m is an orphan if o i and o j are rolled back to c i and c j , respectively. Papers [2,4,5,13] discuss how to take the consistent global checkpoint. checkpoint o i o j o i o j m m (1) Consistent (2) Inconsistent time c i c j c i c j Figure 3: Consistent checkpoint 3.2 Types of invocations We have to consider how the request, response, and data messages give influence to ....

Bhargava, B. and Lian, S. R., "Independent Checkpointing and Concurrent Rollback for Recovery in Distributed Systems --- An Optimistic Approach," Proc. of the 7th Symp. on Reliable Distributed Systems , 1988, pp. 3-12.


User-Triggered Checkpointing And Rollback In Massively Parallel.. - Deconinck (1996)   (1 citation)  (Correct)

....logging, or combine user directed with usertransparent ideas. See section 3.6. Our user triggered checkpointing approach, presented in chapter 4 belongs to this class. Nevertheless, several checkpointing mechanisms have been proposed, where the domino effect is not inherently avoided, e.g. [BhLi88]. The underlying reason is that if the communication pattern is favourable, and if checkpoints are saved frequently and if failures are infrequent, the probability that the domino effect to the initial state will occur is small. Advantageous is the small overhead in the fault free execution of the ....

....channel properties may be assumed. Some schemes are based on reliable communication channels between the processes [StYe84, StBY88, WW90] This includes guaranteed message delivery, automatic retransmission, etc. Communication protocols can further require FIFO (First In, First Out) channels [BhLi88], sliding window protocols [KoTo87] etc. They may require node tonode or end to end acknowledgements. Other schemes tolerate that messages are delivered out of order, are lost, multiplicated or altered, etc. Control messages to execute the mechanism may require more stringent characteristics than ....

B. Bhargava, S.R. Lian, "Independent Checkpointing and Concurrent Rollback for Recovery in Distributed Systems", Proc. 7th Symp. on Reliable Distributed Systems, 1988, pp. 3-12.


A Communication-Induced Checkpointing Protocol that . . . - Baldoni, al. (1997)   (5 citations)  (Correct)

....with mobile hosts and minimal checkpoint coordination) 3.1 Rollback Dependency Graph The sequence of events occurring at P i between C i;x Gamma1 and C i;x (x 0) is called checkpoint interval and is denoted by I i;x . The Rollback Dependency Graph (or R graph) is defined as follows ([3, 18]) ffl each node represents a local checkpoint. ffl a directed edge from C i;x to C j;y exists if and only if: 1. i = j and y = x 1, or 2. i 6= j and a message m is sent in I i;x and delivered in I j;y . Figure 1.b depicts the R graph corresponding to the checkpoint and communication pattern ....

Bhargava, B and Lian, S.R. Independent Checkpointing and Concurrent Rollback for Recovery - An Optimistic Approach. Proc. IEEE Symp. Reliable Distributed Systems, 1988, pp.3-12.


A Quasi-synchronous Algorithm for Checkpointing in.. - Manivannan, Singhal (1995)   (Correct)

....the global state, then the send operation of that message must also have been recorded. In the literature, several checkpointing schemes have been proposed for distributed systems. They can be broadly classified in to two categories asynchronous and synchronous. In asynchronous checkpointing [3, 13, 23], processes take checkpoints periodically without any coordination with each other. To recover from a failure, processes communicate with each other to restore the system to a consistent set of local states. If their local states are causally related, sites that received messages which are ....

B. Bhargava and S. R. Lian. "Independent Checkpointing and Concurrent Rollback for Recovery in Distributed Systems--An Optimistic Approach.". In Proc. 7 th IEEE Symp. Reliable Distributed Syst., pages 3--12, 1988.


Distributed Checkpointing Based on Influential Messages - Katsuya Tanaka And (1996)   (1 citation)  (Correct)

....checkpoint hcp i , cp j i because o i sends m after taking cp i and o j receives m before cp j . Here, if o i and o j are rolled back to cp i and cp j , respectively, the global state denoted by cp i and cp j is inconsistent because o j has received m which o i had not sent. Many papers [3,5,6,17] discuss how to take the consistent global checkpoint by the cooperation of the objects. checkpoint o i o j o i o j m m (1) consistent (2) inconsistent time c i c j c i c j Figure 2: Consistent checkpoint Let us consider an example shown in Figure 3, where four objects o i , o j , ....

Bhargava, B. and Lian, S. R., "Independent Checkpointing and Concurrent Rollback for Recovery in Distributed Systems --- An Optimistic Approach," Proceeding of the 7th Symposium on Reliable Distributed Systems, 1988, pp. 3-12.


Extended Recovery Protocol in Distributed Systems - Hiroaki Higaki Makoto (1998)   (Correct)

....no orphan message. However, C 0 is inconsistent because c i 0 c j 0 , i.e. m 2 is an orphan message. 2.2. Checkpointing There are two approaches toward taking a consistent global checkpointC: asynchronous checkpointing and synchronous checkpointing. In the asynchronous checkpointing [2, 5], each process takes local checkpoints independently of the other processes. If some process fails, all the processes are coordinated to determine a consistent global checkpoint. This approach implies less overhead during the failure free execution because it requires no additional communication ....

Bhargava, B. and Lian, S.R., " Independent Checkpointing and Concurrent Rollback for Recovery in Distributed Systems," Proc. of the 7th International Symposiumon Reliable Distributed Systems, pp. 3-- 12 (1988).


An Asynchronous Recovery Scheme based on Optimistic Message.. - Park, Yeom (2000)   (2 citations)  (Correct)

....if the distributed recovery scheme (denoted by DR) is employed. Hence, either rollbacks of the related processes have to be synchronized as in the synchronous recovery scheme (denoted by SR) 9] or a centralized coordination is required as for the centralized recovery scheme (denoted by CR) [3]. One way to support the asynchronous recovery (denoted by AR) is to use message logging in addition to the checkpointing [6, 15] Asynchronous recovery means that a process can independently decide its rollback and after the rollCheckpointing Only Logging DR SR CR AR Asynchronous Yes No No ....

B. Bhargava and S.R. Lian. Independent checkpointing and concurrent rollback for recovery - an optimistic approach. In Proc. of the 4th Int'l Conf. on Data Engineering, pages 182--189, 1988.


Object-Based Checkpoints in Distributed Systems - Katsuya Tanaka Hiroaki (1998)   (1 citation)  (Correct)

....by some object. If the sending event of a message m 1 happens before [12] m 2 , m 1 causally precedes m 2 [3] If o is rolled back, objects which have received messages causally preceded by the messages sent by o have to be rolled back in order to prevent from the orphan messages. Papers [2, 4, 10, 14, 15, 17, 20] discuss how to take the consistent global checkpoints in the message based systems. Koo and Toueg [10] present synchronous protocols for taking the global consistent checkpoint and rolling back the processes, which are similar to the two phase commitment protocol [1, 8] Leong and Agrawal [14] ....

....an orphan. c is consistent if there is no orphan [4] at c. In Figure 3, an object o i sends a message m to o j . In Figure 3(1) the checkpoint hc i , c j i is consistent with the message based definition. In Figure 3(2) hc i , c j i is inconsistent because m is an orphan. Many papers [2, 4,6,17] discuss how to take the consistent checkpoints in the message based system. checkpoint o i o j o i o j m m (1) Consistent (2) Inconsistent time c i c j c i c j Figure 3: Consistent checkpoint. Leong and Agrawal [14] discuss the concept of significant messages. For example, if a message ....

Bhargava, B. and Lian, S. R., "Independent Checkpointing and Concurrent Rollback for Recovery in Distributed Systems --- An Optimistic Approach," Proc. of the 7th Symp. on Reliable Distributed Systems , pp. 3-12, 1988.


Semantics of Recovery Lines for Backward Recovery in.. - Brzezinski, Helary.. (1995)   (Correct)

....during failure free computations. Indeed, if there are no process failure, recovery messages and checkpointing activity may delay the application computation. 5. 2 Independent checkpointing In independent checkpointing, controllers save checkpoints asynchronously regardless of consistency ([5, 19, 31, 34, 38, 40, 46]) The aim is to reduce interference of the checkpointing with respect to application computation. However, as a consequence of this independence the last checkpoints might not constitute a consistent recovery line (with respect to the relation D) Hence, after a failure controllers must cooperate ....

B. Bhargava, S-R. Lian, Independent checkpointing and concurrent rollback for recovery in distributed system -- an optimistic approach, Symp. Reliable Distributed Systems SRDS'88, 1988 pp.


Limited-size Logging for Fault-Tolerant Distributed.. - Sultan, Nguyen, Iftode (2000)   (1 citation)  (Correct)

....systems. We then summarize previous work on recoverable DSM and present several fault tolerant DSM systems that use log based recovery and or independent checkpointing. An exhaustive survey of general rollback recovery in distributed systems is given in [10] In uncoordinated checkpointing [1, 42] processes take checkpoints independently and track message dependencies between them in order to determine a consistent global checkpoint by rolling back processes in response to a failure. After a failure and rollback, a recovering process collects and aggregates dependency information from all ....

....message dependencies between them in order to determine a consistent global checkpoint by rolling back processes in response to a failure. After a failure and rollback, a recovering process collects and aggregates dependency information from all processes in the form of a rollback dependency graph [1] or a checkpoint graph [42] It then determines the recovery line and implicitly which processes need to also rollback. This approach su ers from the domino e ect [32] An optimal checkpoint garbage collection algorithm based on dependency tracking was devised in [43] It was also proved that ....

B. Bhargava, S. R. Lian. Independent Checkpointing and Concurrent Rollback for Recovery - An Optimistic Approach. Proc. Symposium on Reliable Distributed Systems, pp. 3-12, June 1988.


Significant Checkpoint in Distributed System - Katsuya Tanaka Hiroaki (1996)   (Correct)

....is no orphan. In Fig. 3, an object o i sends a message m to o j . In Fig. 3(1) the checkpoint hc i , c j i is consistent from the definition. In Fig. 3(2) hc i , c j i is inconsistent because m is an orphan if o i and o j are rolled back to c i and c j , respectively. Many papers [2, 4, 5, 14] discuss how to take the consistent global checkpoint. checkpoint o i o j o i o j m m (1) consistent (2) inconsistent time c i c j c i c j Fig. 3. Consistent checkpoint Leong and Agrawal [11] discuss the concept of significant messages. For example, suppose that m is write in Fig. 3. ....

Bhargava, B. and Lian, S. R., "Independent Checkpointing and Concurrent Rollback for Recovery in Distributed Systems --- An Optimistic Approach," Proc. of the 7th Symp. on Reliable Distributed Systems , 1988, pp. 3-12.


Recovery Protocol for Mobile Checkpointing - Higaki, Takizawa (1998)   (Correct)

....with the mobile stations may be disconnected. However, some applications are computed on mobile and fixed stations and are required to be continued even while the communication channel is disconnected. Many papers[4, 9, 11] discuss how to handle the disconnected operations. The checkpoint restart[5, 6, 10, 12, 17, 20, 21, 22] is one of the well known methods to realize reliable distributed systems. Every station s i takes a checkpoint c i where the local state information of s i is stored in the stable storage. If some station fails, s i restarts the computation from c i . A set of checkpoints taken by all the ....

....fail to take c M i due to the lack of battery capacity or the movement to the outside of the cell. If the checkpoints are taken synchronously, all the stations have to give up to take the checkpoints if some mobile station fails to take the checkpoint. Hence, asynchronous checkpointing protocols[5, 10, 21, 22] are preferable for the mobile stations. Papers[1, 15] propose the mobile asynchronous checkpointing protocols. Here, the protocol overhead is high since S i is required to take a new checkpoint of M i each time a message is transmitted between them. In this paper, we newly propose a hybrid ....

Bhargava, B. and Lian, S.R., "Independent Checkpointing and Concurrent Rollback for Recovery in Distributed Systems," Proc. of the 7th International Symposium on Reliable Distributed Systems, pp. 3--12 (1988).


A Fast Rollback-Recovery Scheme based on Optimistic Message.. - Park, Yeom   (Correct)

....domino effect is that the amount of computation lost due to the rollback, called the rollback distance, is not bounded. In the worst case, the recovery line consists of a set of the initial points; i.e. the total loss of the computation in spite of checkpointing efforts. A solution proposed in [4] avoids the recursive rollbacks in such a way that the consistent recovery line is determined first by the coordination among the processes, so that the processes can directly roll back to the selected checkpoint without successive rollback trials. Though this solution avoids the recursive ....

B. Bhargava and S.R. Lian. Independent checkpointing and concurrent rollback for recovery - an optimistic approach. In Int. Conference on Data Engineering, pages 182--189, 1988.


Application Controlled Checkpointing Coordination for.. - Park, Yeom (2000)   (3 citations)  (Correct)

....is that the amount of computation lost due to the rollback, called the rollback distance, is not bounded. In the worst case, the only consistent recovery line consists of a set of the initial points; i.e. the total loss of the computation in spite of checkpointing efforts. A solution proposed in [5] avoids the recursive rollbacks by one coordinator determining the consistent set of checkpoints, so that the other processes can directly roll back to the selected points. In order to solve the unbounded rollback distance problem, various checkpointing coordination schemes have been suggested. ....

B. Bhargava and S.R. Lian, "Independent checkpointing and concurrent rollback for recovery - an optimistic approach," In Proc. of the 4th Int'l. Conference on Data Engineering, pp. 182--189, 1988.


Fault-tolerant Parallel Applications with Dynamic Parallel.. - Gerlach, Hersch (2005)   (Correct)

No context found.

B. Bhargava, S.R. Lian, Independent Checkpointing and Concurrent Rollback for Recovery -- an Optimistic Approach, Proc. IEEE Symposium on Reliable Distributed Systems, pp. 3-12, 1988


Guaranteed Deadlock Recovery: Deadlock Resolution with.. - Wang, Merritt.. (1995)   (Correct)

No context found.

B. Bhargava and S. R. Lian, "Independent checkpointing and concurrent rollback for recovery - An optimistic approach," in Proc. IEEE Symp. Reliable Distributed Syst., pp. 3--12, 1988.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC