(Enter summary)
Abstract: Diskless Checkpointing is a technique for checkpointing the state of a long-running computation on a
distributed system without relying on stable storage. As such, it eliminates the performance bottleneck of
traditional checkpointing on distributed systems.
In this paper, we motivate diskless checkpointing and present the basic diskless checkpointing scheme along
with several variants for improved performance. The performance of the basic scheme and its variants is evaluated
on a... (Update)
Context of citations to this paper: More
...these tasks store their part of the state in their local memory. This is similar to the work of Li and Plank with diskless checkpointing [9]. In memory checkpointing provides the opportunity for very fast state recovery and avoids the problem that disk storage may be very far...
Cited by: More
Performance Modelling and Experimental Evaluation of Systems.. - Weerasinghe (2002)
(Correct)
Development of Naturally Fault Tolerant Algorithms for.. - Geist, Engelmann (2002)
(Correct)
A Distributed Fault-Tolerant Asynchronous Algorithm for.. - Weerasinghe, Lipsky (2001)
(Correct)
Active bibliography (related documents): More All
0.8: CLIP: A Checkpointing Tool for Message-Passing Parallel Programs - Chen (1997)
(Correct)
0.8: A Survey of Rollback-Recovery Protocols in.. - Elnozahy, Alvisi.. (1996)
(Correct)
0.6: Improving the Performance of Coordinated Checkpointers on Networks .. - Plank (1996)
(Correct)
Similar documents based on text: More All
0.7: Memory Exclusion: Optimizing the Performance of.. - Plank, Chen, Li.. (1996)
(Correct)
0.7: Compiler-Assisted Memory Exclusion for Fast Checkpointing - Plank, Beck, Kingsley (1995)
(Correct)
0.7: Algorithm-Based Diskless Checkpointing for Fault Tolerant.. - Plank, Kim, Dongarra (1995)
(Correct)
Related documents from co-citation: More All
3: Egida: An Extensible Toolkit for Low-overhead Fault-tolerance
- Rao, Alvisi et al. - 1999
3: A Distributed Fault-Tolerant Asynchronous Algorithm for Performing N Tasks
- Weerasinghe, Lipsky - 2001
2: MPI: A Message-Passing Interface Standard
- Interface - 1994
BibTeX entry: (Update)
J. S. Plank, K. Li, and M.A. Puening. "Diskless checkpointing." IEEE Transactions on Parallel & Distributed Systems, 9(10):972---986, Oct. 1998. http://citeseer.ist.psu.edu/plank97diskles.html More
@article{ plank98diskless,
author = "J. S. Plank and K. Li and M. A. Puening",
title = "Diskless Checkpointing",
journal = "IEEE Transactions on Parallel and Distributed Systems",
volume = "9",
number = "10",
pages = "972--??",
year = "1998",
url = "citeseer.ist.psu.edu/plank97diskles.html" }
Citations (may not include all citations):
191
Introduction to Parallel Computing (context) - Kumar, Grama et al. - 1994
180
A survey of rollback-recovery protocols in message-passing s..
- Elnozahy, Johnson et al. - 1996
156
reliable secondary storage (context) - Chen, Lee et al. - 1994
133
Manetho: Transparent rollback-recovery with low overhead
- Elnozahy, Zwaenepoel - 1992
120
The performance of consistent checkpointing
- Elnozahy, Johnson et al. - 1992
117
Libckpt: Transparent checkpointing under unix
- Plank, Beck et al. - 1995
95
Virtual memory primitives for user programs
- Appel, Li - 1991
56
Checkpointing and its applications
- Wang, Huang et al. - 1995
49
PVM --- A Users' Guide and Tutorial for Networked Parallel C.. (context) - Geist, Beguelin et al. - 1994
46
The Condor distributed processing system (context) - Tannenbaum, Litzkow - 1995
45
MIST: PVM with transparent migration and checkpointing
- Casas, Clark et al. - 1995
41
Igor: A system for program debugging via reversible executio.. (context) - Feldman, Brown - 1989
38
A tutorial on Reed-Solomon coding for fault-tolerance in RAI..
- Plank - 1997
38
A longitudinal survey of internet host reliability
- Long, Muir et al. - 1995
38
Redundant Disk Arrays: Reliable (context) - Gibson - 1992
32
Ickp --- a consistent checkpointer for multicomputers (context) - Plank, Li - 1994
32
EVENODD: An optimal scheme for tolerating double disk failur.. (context) - Blaum, Brady et al. - 1994
30
Application level fault tolerance in heterogeneous networks ..
- Beguelin, Seligman et al. - 1997
29
IEEE Transactions on Parallel and Distributed Systems (context) - Li, Naughton et al. - 1994
28
Lightweight logging for lazy release consistent distributed ..
- Costa, Guedes et al. - 1996
26
The checkpoint mechanism in KeyKOS (context) - Landau - 1992
23
A case for two-level distributed recovery schemes
- Vaidya - 1995
20
CATCH -- Compiler-assisted techniques for checkpointing (context) - Li, Fuchs - 1990
20
Consistent checkpoints of PVM applications
- Stellner - 1994
18
Demonic memory for process histories (context) - Wilson, Moher - 1989
17
Checkpointing SPMD applications on transputer networks (context) - Silva, Veer et al. - 1994
15
Impact of checkpoint latency on overhead ratio of a checkpoi.. (context) - Vaidya - 1997
14
Faster checkpointing with N + 1 parity (context) - Plank, Li - 1994
12
Improving the performance of coordinated checkpointers on ne..
- Plank - 1996
10
Fault tolerant matrix operations for networks of workstation..
- Plank, Kim et al. - 1997
10
Fault tolerant matrix operations for networks of workstation..
- Kim, Plank et al. - 1997
8
Transparent fault tolerance for parallel applications on net..
- Scales, Lam - 1996
7
Job and process recovery in a UNIX-based operating system (context) - Kingsbury, Kline - 1989
7
Compressed differences: An algorithm for fast incremental ch..
- Plank, Xu et al. - 1995
5
Efficient checkpoint mechanisms for massively parallel machi.. (context) - Chiueh, Deng - 1996
5
Parallelization of the fast multipole algorithm: Algorithm a.. (context) - Leathrum - 1992
4
Fault-tolerance for off-the-shelf applications and hardware (context) - Russinovich, Segall - 1995
3
Solutions to the shallow water test set using the spectral t.. (context) - Hack, Jakob et al. - 1993
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.cs.utk.edu/~plank/plank/papers/): More
Netsolve: An Environment for Deploying Fault-Tolerant Computing - James Plank
(Correct)
Memory Exclusion: Optimizing the Performance of.. - Plank, Chen, Li.. (1996)
(Correct)
An Efficient Checkpointing Method for Multicomputers with.. - Kai Li (1992)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC