See this document in CiteSeerX!

Diskless Checkpointing (1997)  (Make Corrections)  (5 citations)
James S. Plank, Kai Li, Michael Puening
IEEE Transactions on Parallel and Distributed Systems



  Home/Search   Context   Related

 
View or download:
utk.edu/~plank/plank/p...CS97380.ps.Z
utk.edu/~library/Te...utcs97380.ps.Z
utk.edu/~plank/papers/CS97380.ps.Z
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  utk.edu/~plank/plank/papers/ (more)
From:  utk.edu/~plank/papers...CS97380
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: Diskless Checkpointing is a technique for checkpointing the state of a long-running computation on a distributed system without relying on stable storage. As such, it eliminates the performance bottleneck of traditional checkpointing on distributed systems. In this paper, we motivate diskless checkpointing and present the basic diskless checkpointing scheme along with several variants for improved performance. The performance of the basic scheme and its variants is evaluated on a... (Update)

Context of citations to this paper:   More

...these tasks store their part of the state in their local memory. This is similar to the work of Li and Plank with diskless checkpointing [9]. In memory checkpointing provides the opportunity for very fast state recovery and avoids the problem that disk storage may be very far...

Cited by:   More
Performance Modelling and Experimental Evaluation of Systems.. - Weerasinghe (2002)   (Correct)
Development of Naturally Fault Tolerant Algorithms for.. - Geist, Engelmann (2002)   (Correct)
A Distributed Fault-Tolerant Asynchronous Algorithm for.. - Weerasinghe, Lipsky (2001)   (Correct)

Active bibliography (related documents):   More   All
0.8:   CLIP: A Checkpointing Tool for Message-Passing Parallel Programs - Chen (1997)   (Correct)
0.8:   A Survey of Rollback-Recovery Protocols in.. - Elnozahy, Alvisi.. (1996)   (Correct)
0.6:   Improving the Performance of Coordinated Checkpointers on Networks .. - Plank (1996)   (Correct)

Similar documents based on text:   More   All
0.7:   Memory Exclusion: Optimizing the Performance of.. - Plank, Chen, Li.. (1996)   (Correct)
0.7:   Compiler-Assisted Memory Exclusion for Fast Checkpointing - Plank, Beck, Kingsley (1995)   (Correct)
0.7:   Algorithm-Based Diskless Checkpointing for Fault Tolerant.. - Plank, Kim, Dongarra (1995)   (Correct)

Related documents from co-citation:   More   All
3:   Egida: An Extensible Toolkit for Low-overhead Fault-tolerance - Rao, Alvisi et al. - 1999
3:   A Distributed Fault-Tolerant Asynchronous Algorithm for Performing N Tasks - Weerasinghe, Lipsky - 2001
2:   MPI: A Message-Passing Interface Standard - Interface - 1994

BibTeX entry:   (Update)

J. S. Plank, K. Li, and M.A. Puening. "Diskless checkpointing." IEEE Transactions on Parallel & Distributed Systems, 9(10):972---986, Oct. 1998. http://citeseer.ist.psu.edu/plank97diskles.html   More

@article{ plank98diskless,
    author = "J. S. Plank and K. Li and M. A. Puening",
    title = "Diskless Checkpointing",
    journal = "IEEE Transactions on Parallel and Distributed Systems",
    volume = "9",
    number = "10",
    pages = "972--??",
    year = "1998",
    url = "citeseer.ist.psu.edu/plank97diskles.html" }
Citations (may not include all citations):
191   Introduction to Parallel Computing (context) - Kumar, Grama et al. - 1994
180   A survey of rollback-recovery protocols in message-passing s.. - Elnozahy, Johnson et al. - 1996
156   reliable secondary storage (context) - Chen, Lee et al. - 1994
133   Manetho: Transparent rollback-recovery with low overhead - Elnozahy, Zwaenepoel - 1992
120   The performance of consistent checkpointing - Elnozahy, Johnson et al. - 1992
117   Libckpt: Transparent checkpointing under unix - Plank, Beck et al. - 1995
95   Virtual memory primitives for user programs - Appel, Li - 1991
56   Checkpointing and its applications - Wang, Huang et al. - 1995
49   PVM --- A Users' Guide and Tutorial for Networked Parallel C.. (context) - Geist, Beguelin et al. - 1994
46   The Condor distributed processing system (context) - Tannenbaum, Litzkow - 1995
45   MIST: PVM with transparent migration and checkpointing - Casas, Clark et al. - 1995
41   Igor: A system for program debugging via reversible executio.. (context) - Feldman, Brown - 1989
38   A tutorial on Reed-Solomon coding for fault-tolerance in RAI.. - Plank - 1997
38   A longitudinal survey of internet host reliability - Long, Muir et al. - 1995
38   Redundant Disk Arrays: Reliable (context) - Gibson - 1992
32   Ickp --- a consistent checkpointer for multicomputers (context) - Plank, Li - 1994
32   EVENODD: An optimal scheme for tolerating double disk failur.. (context) - Blaum, Brady et al. - 1994
30   Application level fault tolerance in heterogeneous networks .. - Beguelin, Seligman et al. - 1997
29   IEEE Transactions on Parallel and Distributed Systems (context) - Li, Naughton et al. - 1994
28   Lightweight logging for lazy release consistent distributed .. - Costa, Guedes et al. - 1996
26   The checkpoint mechanism in KeyKOS (context) - Landau - 1992
23   A case for two-level distributed recovery schemes - Vaidya - 1995
20   CATCH -- Compiler-assisted techniques for checkpointing (context) - Li, Fuchs - 1990
20   Consistent checkpoints of PVM applications - Stellner - 1994
18   Demonic memory for process histories (context) - Wilson, Moher - 1989
17   Checkpointing SPMD applications on transputer networks (context) - Silva, Veer et al. - 1994
15   Impact of checkpoint latency on overhead ratio of a checkpoi.. (context) - Vaidya - 1997
14   Faster checkpointing with N + 1 parity (context) - Plank, Li - 1994
12   Improving the performance of coordinated checkpointers on ne.. - Plank - 1996
10   Fault tolerant matrix operations for networks of workstation.. - Plank, Kim et al. - 1997
10   Fault tolerant matrix operations for networks of workstation.. - Kim, Plank et al. - 1997
8   Transparent fault tolerance for parallel applications on net.. - Scales, Lam - 1996
7   Job and process recovery in a UNIX-based operating system (context) - Kingsbury, Kline - 1989
7   Compressed differences: An algorithm for fast incremental ch.. - Plank, Xu et al. - 1995
5   Efficient checkpoint mechanisms for massively parallel machi.. (context) - Chiueh, Deng - 1996
5   Parallelization of the fast multipole algorithm: Algorithm a.. (context) - Leathrum - 1992
4   Fault-tolerance for off-the-shelf applications and hardware (context) - Russinovich, Segall - 1995
3   Solutions to the shallow water test set using the spectral t.. (context) - Hack, Jakob et al. - 1993



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.cs.utk.edu/~plank/plank/papers/):   More
Netsolve: An Environment for Deploying Fault-Tolerant Computing - James Plank   (Correct)
Memory Exclusion: Optimizing the Performance of.. - Plank, Chen, Li.. (1996)   (Correct)
An Efficient Checkpointing Method for Multicomputers with.. - Kai Li (1992)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC