18 citations found. Retrieving documents...
J. S. Plank. An overview of checkpointing in uniprocessor and distributed systems, focusing on implementation and performance. Technical Report UT-CS-97-372, Department of Computer Science, University of Tennessee, July 1997.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
SRS - A Framework for Developing Malleable and Migratable.. - Vadhiyar, Dongarra (2002)   (6 citations)  (Correct)

....migratable application provide added functionality and flexibility to the scheduling and resource management systems for distributed computing. In order to achieve starting and stopping of the parallel applications, the state of the applications have to be checkpointed. Elonazhy [16] and Plank [29] have surveyed several checkpointing strategies for sequential and parallel appli cations. Checkpointing systems for sequential [30, 3 ] and parallel applications [15, 10, 4, 34, 20] have been built. Checkpointing systems are of different types depending on the transparency to the user and the ....

....on which the checkpoints are stored also fail when the machines on which the IBP depots are located fail. 5. The machine on which the RSS daemon is executing must be failure free for the duration of the application. 7 Related Work Checkpointing parallel applications have been widely studied in [16, 29, 25] and checkpointing systems for parallel applications have been developed [12, 10, 33, 38, 31, 15, 20, 34, 3, 23, 20, 4, 22, 21, 27] Some of the systems were developed for homogeneous systems [12, 11,33,34] while some checkpointing systems allows applications to be checkpointed and restarred on ....

James S. Plank. An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance. Technical Report UTCS -97-372, 1997.


System Checkpointing using Reflection and Program Analysis - Whaley (2001)   (Correct)

....has been other work on checkpointing in the context of migrating applications [10] using extra processors for fault tolerance [12] post mortem and replay debugging, elimination of boundary condition errors [13] etc. Our work is very similar to user level transparent checkpointing techniques [11]. Such techniques usually work by compiling the application program with a special checkpointing library. Our technique, on the other hand, relies on program analysis and therefore can optimize the result for size and speed. Our system also shares many of the same restrictions as user level ....

J. S. Plank. An overview of checkpointing in uniprocessor and distributed systems, focusing on implementation and performance. Tech Report UT-CS-97-372, 1997.


Low-Cost Flexible Software Fault Tolerance for Distributed.. - Tai, Tso (2001)   (Correct)

....0.5 0.6 0.7 0.8 0.9 1e 05 0.0001 0.001 0.01 0.1 1 10 100 1000 Reliability Internal Message Sending Rate MDCD baseline Figure 7: Reliability as a Function of Internal Message Sending Rate protocol. We plan also to investigate the size reducing technique called incremental checkpointing [16], which can be applied to augment our useless checkpoint avoidance strategy for further performance overhead reduction. ....

J. S. Plank, "An overview of checkpointing in uniprocessor and distributed systems, focusing on implementation and performance," Technical Report UT-CS-97372, Department of Computer Science, University of Tennessee, Knoxville, TN, July 1997. 10


Dynamic Software Updating - Hicks (2001)   (37 citations)  (Correct)

....program construction. Furthermore, we would rather that the system perform the encoding and decoding of the state automatically, as opposed to requiring the programmer to do so. A number of general purpose approaches to enable state transfer have been developed. For example, checkpointing [Pla97] and general purpose persistence [PJW96] are means to generally and automatically capture a program s state for later restart, e.g. to support process migration [Smi88] However, these approaches have a number of problems: 1. Like application specific state transfer, OS level datastructures ....

....are checked for accuracy with programmerprovided acceptance tests, while messages that would have been sent by the old version are logged. If an erroneous message is detected, the old version is restored. So that this switch over is semantically consistent, GSU employs checkpointing technology [Pla97] to checkpoint the state of the old version when its is known to be consistent, so that the system can roll back to that state on a failure. Checkpointing is also used to enable the new version to begin with the state of the old version. Unfortunately, enabling GSU error detection and recovery ....

[Article contains additional citation context not shown here]

James S. Plank. An overview of checkpointing in uniprocessor and distributed systems, focusing on implementation and performance. Technical Report UTCS -97-372, 1997.


An Overview of the NOMADS Mobile Agent System - Suri, Bradshaw, Breedy.. (2000)   (7 citations)  (Correct)

....performance, or other reasons. Strong mobility requires that the entire state of the running agent, including its execution stack, be saved prior to a move so that it can be restored once the agent has moved to its new location. The standard term describing this process is checkpointing [23]. Over the last few years, the more general concept of orthogonal persistence has also been developed by the research community [2] The goal of orthogonal persistence research is to define language independent principles and language specific mechanisms by which persistence can be made available ....

Plank, J. S. (1997). An overview of checkpointing in uniprocessor and distributed systems focusing on implementation and performance. Technical report UT-CS97 -372. Department of Computer Science, University of Tennessee, July.


Efficient and Flexible Fault Tolerance and Migration of.. - Kohl, Papadopoulos (1998)   (1 citation)  (Correct)

....is often adjusted to hedge against failures, and balance this loss against the periodic checkpoint overhead. CUMULVS user level approach to checkpointing is considered non transparent because it requires the programmer to modify the application source to coordinate the checkpointing activity [11]. In this case, the application defines the point(s) in its computation where checkpoints can be consistently collected, across concurrent sets of tasks, and which data should be included in checkpoints. Consistency here relates to identifying a global state for all tasks such that the ....

J. S. Plank, "An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance," Technical Report UT-CS-97372, Department of Computer Science, University of Tennessee, Knoxville, TN, July 1997.


NOMADS: Toward an Environment for Strong and Safe.. - Suri, Bradshaw.. (2000)   (Correct)

....performance, or other reasons. Strong mobility requires that the entire state of the running agent, including its execution stack, be saved prior to a move so that it can be restored once the agent has moved to its new location. The standard term describing this process is checkpointing [22]. Over the last few years, the more general concept of orthogonal persistence has also been developed by the research community [2] The goal of orthogonal persistence research is to define language independent principles and language specific mechanisms by which persistence can be made available ....

Plank, J. S. (1997). An overview of checkpointing in uniprocessor and distributed systems focusing on implementation and performance. Technical report UT-CS-97-372. Department of Computer Science, University of Tennessee, July.


High Availability for Software DSM Systems - Vellanki, Harel, Jeong, Lee..   (Correct)

....node to use computation performed by the failed node, the failed node periodically writes a checkpoint record containing its computation state. The checkpoint record contains sufficient information to restart computation from the point of writing the checkpoint record. A number of recent studies [24, 23] have addressed the issue of checkpointing a process and restarting it at a later time; the state of computation can be recovered by checkpointing the contents of the stack, the CPU context, and the values of private and heap variables. In this paper, we focus on checkpointing and restarting the ....

....the cost of recovering from a failure. A survey of recoverable distributed shared memory systems is provided in [18] To support high availability, the run time system has to save the state of the machine periodically. Mechanisms for checkpointing are discussed in [24] for a uniprocessor and in [23] for a cluster of machines. In coordinated checkpointing [28, 15] the system checkpoints the state of all threads simultaneously by stopping computation periodically. In order to avoid expensive checkpointing operations, some designs [21, 22, 8] checkpoint infrequently while logging changes to ....

James S. Plank. An overview of checkpointing in uniprocessor and distributed systems, focusing on implementation and performance. Technical report, University of Tennessee, July 1997.


Kckpt: Checkpoint and Recovery Facility on UnixWare Kernel - Hong, Ahn, Han, Park..   (Correct)

....status of open files[4] After a failure, the state of memory and registers can be reconstructed and target program can be restarted from the most recent checkpoint. On uniprocessor system environment, checkpoint and recovery facility can be provided either at the kernel level or at the user level[1, 3, 9]. User level checkpoint and recovery library can save the state of a process by linking the user program with checkpointing library. This facility can provide flexibility to programmers but there are also many restrictions. First, User level checkpoint library should take a checkpoint through a ....

....on the stable storage. The overhead is sequential checkpoint is same to the time it takes to write the checkpoint file to disk[8] Kckpt provide the forked checkpointing to reduce the checkpoint overheads. Since the most time consuming portions of checkpointing do not require the use of the CPU[9], a child process can take the responsibility of writing of checkpoint to disk in the forked checkpointing scheme. Thus, the parent process can continue to execute the application program concurrently while the child process is checkpoint to disk. In this approach, when a process takes a ....

James S. Plank, "An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance", Technical Report of University of Tennessee, UT-CS-97-372, Jul. 1997.


An Empirical Study of Tracing Techniques - From Failure Analysis   (Correct)

No context found.

J. S. Plank. An overview of checkpointing in uniprocessor and distributed systems, focusing on implementation and performance. Technical Report UT-CS-97-372, Department of Computer Science, University of Tennessee, July 1997.


Virtual Machine Based Heterogeneous Checkpointing - Adnan Agbaria Roy (2000)   (Correct)

No context found.

J. S. Plank. An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance. Technical Report UT-CS97 -372, Department of Computer Science, Tennessee University, July 1997.


Market-based Cluster Resource Management - Chun (2001)   (2 citations)  (Correct)

No context found.

James S. Plank. An overview of checkpointing in uniprocessor and distributed systems, focusing on implementation and performance. Technical Report UT-CS-97-372, University of Tennessee, Department of Computer Science, 1997.


Distributed Snapshots for Mobile Computing Systems - Adnan Agbaria William   (Correct)

No context found.

J. S. Plank. An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance. Technical Report UT-CS-97-372, Department of Computer Science, University of Tennessee, July 1997.


Overcoming Byzantine Failures Using Checkpointing - Adnan Agbaria Roy   (Correct)

No context found.

J. S. Plank. An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance. Technical Report UT-CS-97-372, Department of Computer Science, University of Tennessee, July 1997.


Open and Survivable Embedded Systems - Angelos Keromytis Stephen   (Correct)

No context found.

J. S. Plank. An overview of checkpointing in uniprocessor and distributed systems, focusing on implementation and performance. Technical Report UT-CS-97-372, 1997.


Using Lightweight Checkpoint/Recovery to Improve the Availability.. - Sorin (2002)   (Correct)

No context found.

James S. Plank. An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance. Technical Report UT-CS-97-372, Department of Computer Science, University of Tennessee, July 1997.


Techniques for Efficiently Recording State Changes of a Computer.. - II (2001)   (Correct)

No context found.

J. S. Plank, "An Overview of Checkpointing in Uniprocessor and Distributed Systems Focusing on Implementation and Performance," Tech. Report UT-CS-97-372, Department of Computer Science, University of Tennessee, Knoxville, Tenn., 1997.


Agents for the Masses? - Bradshaw, Greaves, Holmback.. (1999)   (Correct)

No context found.

J.S. Plank, An Overview of Checkpointing in Uniprocessor and Distributed Systems Focusing on Implementation and Performance, Tech. Report UT-CS-97-372, Dept. of Computer Science, Univ. of Tennessee, Knoxville, Tenn., 1997.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC