(Enter summary)
Abstract: Most distributed and multiprocessor recovery schemes proposed
in the literature are designed to tolerate arbitrary number of
failures. In this paper, we demonstrate that, it is often advantageous
to use "two-level" recovery schemes. A two-level
recovery scheme tolerates the more probable failures with low
performance overhead, while the less probable failures may be
tolerated with a higher overhead. By minimizing the overhead
for the more frequently occurring failure scenarios, our approach
is ... (Update)
Context of citations to this paper: More
.... selection of such an interval is for the most part a solved problem [25, 34] There has been important research in parallel systems [16, 33, 36], but the results are less unified. No previous work has addressed the issue of processor availability following a failure in...
Cited by: More
Hazim Shafi - Ibm Research Burnet
(Correct)
A Large-Scale Study of Failures in High-Performance.. - Bianca Schroeder Garth
(Correct)
Processor Allocation and Checkpoint Interval Selection in.. - Plank, Thomason (2001)
(Correct)
Similar documents (at the sentence level):
19.4%: A Case for Multi-Level Distributed Recovery Schemes - Vaidya (1994)
(Correct)
15.9%: Another Two-Level Failure Recovery Scheme: Performance Impact of.. - Vaidya (1994)
(Correct)
Active bibliography (related documents): More All
0.5: On Checkpoint Latency - Vaidya (1995)
(Correct)
0.4: A Survey of Rollback-Recovery Protocols in.. - Elnozahy, Alvisi.. (1996)
(Correct)
0.3: Some Thoughts on Distributed Recovery - Vaidya (1994)
(Correct)
Similar documents based on text: More All
0.2: Recovery Schemes for High Availability and High Performance .. - Lundberg, Häggander
(Correct)
0.2: Staggered Consistent Checkpointing - Vaidya (1999)
(Correct)
0.2: On Staggered Checkpointing - Vaidya (1996)
(Correct)
Related documents from co-citation: More All
12: Libckpt: Transparent checkpointing under Unix
- Plank, Beck et al. - 1995
9: A longitudinal survey of internet host reliability
- Long, Muir et al. - 1995
9: A survey of rollback-recovery protocols in message-passing systems
- Elnozahy, Johnson et al. - 1996
BibTeX entry: (Update)
N. H. Vaidya, "A case for two-level distributed recovery schemes", in Proc. of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, May 1995, pp. 64--73. http://citeseer.ist.psu.edu/vaidya95case.html More
@inproceedings{ vaidya95case,
author = "Nitin H. Vaidya",
title = "A Case for Two-Level Distributed Recovery Schemes",
booktitle = "Measurement and Modeling of Computer Systems",
pages = "64-73",
year = "1995",
url = "citeseer.ist.psu.edu/vaidya95case.html" }
Citations (may not include all citations):
572
Distributed snapshots: Determining global states in distribu.. (context) - Chandy, Lamport - 1985
109
Sender-based message logging
- Johnson, Zwaenepoel - 1987
58
Nonblocking and orphan-free message logging protocols
- Alvisi, Hoppe et al. - 1993
31
Queueing and Computer Science Applications (context) - Trivedi, Statistics - 1988
27
Efficient Checkpointing on MIMD Architectures (context) - Plank - 1993
22
A first order approximation to the optimum checkpoint interv.. (context) - Young - 1974
21
Computer Organization & Design: The Hardware/Software Interf.. (context) - Patterson, Hennessy - 1994
20
Roll-forward checkpointing scheme: A novel fault-tolerant ar..
- Pradhan, Vaidya - 1994
18
Analytic models for rollback and recovery strategies in data.. (context) - Chandy, Browne et al. - 1975
15
Fail-safe PVM: A portable package for distributed programmin.. (context) - Le'on, Fisher et al. - 1993
15
Performance analysis of checkpointing strategies (context) - Tantawi, Ruschitzka - 1984
11
Comparative analysis of different models of checkpointing an.. (context) - Nicola, van Spanje - 1990
10
Performance of rollback recovery systems under intermittent .. (context) - Gelenbe, Derochette - 1978
6
Optimal message logging protocols (context) - Alvisi, Marzullo - 1994
5
Another two-level failure recovery scheme: Performance impac..
- Vaidya - 1994
4
Analysis of checkpointing schemes for multiprocessor systems
- Ziv, Bruck - 1993
3
A case for multi-level distributed recovery schemes
- Vaidya - 1994
3
Analysis of an improved distributed checkpointing algorithm (context) - Garg, Wong - 1993
3
Efficient checkpointing over local area network (context) - Ziv, Bruck - 1994
2
A model for roll-back recovery with multiple checkpoints (context) - Gelenbe - 1976
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://disys.korea.ac.kr/~kibom/Recovery/ftpaper.html): More
Rapid Prototyping of Parallel Fault Tolerant Systems - Nixon, Birkinshaw, Croll.. (1994)
(Correct)
Recovery in Multicomputers with Finite Error Detection.. - Krishna, Vaidya, Pradhan (1994)
(Correct)
Distributed Recovery Units: An Approach for Hybrid and.. - Nitin Vaidya (1993)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC