by David Powell, Michel Cukier, Jean Arlat, Yves Crouzet
ftp://ftp.laas.fr/pub/Publications/1996/96466.ps
Add To MetaCart
Abstract:
Abstract. It is well-known that the dependability that can be achieved by a fault-tolerant system is particularly sensitive to both the asymptotic value of coverage and to the time distribution of coverage. However, most previous work on coverage evaluation by statistical processing of the results of fault-injection experiments has only been concerned with estimating asymptotic coverage. In this paper, we tackle the problem of estimating the parameters of models that also account for coverage latency. After discussing some data sets resulting from fault-injection experiments on practical systems, we propose a series of coverage latency models that might be considered to account for the observed phenomena in a system dependability evaluation. We consider both exponential and non-exponential models, and assess their pertinence by means of a sensitivity study. We confirm previous results that latency can have an extremely important effect on the achievable dependability. We also show that the shape of the latency distribution has only a minor impact in the practical case of systems with high asymptotic coverage. A simple action model based on an exponential latency distribution is therefore proposed. We show how worst-case confidence limits can be obtained for the parameters of this action model and study the effects of data truncation that are unavoidable in any practical measurements of latency. We conclude by a critical assessment of the proposed estimation technique and a demonstration of its application to practical data sets.
Citations
|
171
|
The Theory of Stochastic Processes
– Cox, Miller
- 1965
|
|
110
|
Fault Injection for Dependability Validation: A Methodology and Some Applications
– Arlat, Aguera, et al.
- 1990
|
|
91
|
ªDOCTOR: An Integrated Software Fault Injection Environment for Distributed Real-Time Systems,º
– Han, Shin, et al.
- 1995
|
|
51
|
Xception: Software Fault Injection and Monitoring
– Carreira, Madeira, et al.
- 1995
|
|
48
|
Fault Injection Experiments Using FIAT
– Barton, Czeck, et al.
- 1990
|
|
43
|
Fault injection and dependability evaluation of fault tolerant systems
– Arlat, Costes, et al.
- 1993
|
|
31
|
Reliability Modeling techniques for Self-Repairing Computer Systems
– Bouricius, Carter, et al.
- 1969
|
|
30
|
Coverage Modeling for Dependability Analysis of Fault-Tolerant Systems
– Dugan, Trivedi
- 1989
|
|
29
|
Applied Life Data Analysis
– Nelson
- 1982
|
|
25
|
Experimental evaluation of the fault tolerance of an atomic multicast system
– Arlat, Aguera, et al.
- 1990
|
|
19
|
The SURE approach to reliability analysis
– Butler
- 1992
|
|
17
|
Experimental Evaluation
– Iyer
- 1995
|
|
12
|
Error detection process⎯Model, design and its impact on computer performance
– Shin, Lee
- 1984
|
|
11
|
Experimental assessment of parallel systems
– Silva, Carreira, et al.
- 1996
|
|
9
|
Measurement-based analysis of error latency
– Chillarege, Iyer
- 1987
|
|
8
|
Effects of Near-Coincident Faults in Multiprocessor Systems
– McGough
- 1983
|
|
6
|
Modeling Recovery Time Distributions in Ultrareliable Fault-tolerant Systems
– Geist, Smotherman, et al.
- 1990
|
|
6
|
Methodology for Measurement of Fault Latency in a Digital Avionic Miniprocessor
– McGough, Swern, et al.
- 1981
|
|
5
|
On Stratified Sampling for High Coverage Estimations
– Powell, Cukier, et al.
- 1996
|
|
5
|
Modeling Imperfect Coverage in Fault Tolerant Systems
– Trivedi, Dugan, et al.
- 1984
|
|
4
|
Finelli, "Characterization of Fault Recovery through Fault Injection on FTMP
– B
- 1987
|
|
3
|
SURF-2: A Program for Dependability Evaluation
– Bounes, Aguera, et al.
- 1993
|
|
3
|
A Rollback Interval for Networks with an Imperfect Self-checking Property
– Shedletsky
- 1978
|
|
2
|
Architecture and Safety Requirements of the ACC Railway Interlocking System
– Amendola, Impagliazzo, et al.
- 1996
|
|
2
|
The Hybrid Reliability Predictor
– Dugan, Trivedi, et al.
- 1986
|
|
2
|
Trivedi , "The Conservativeness of Reliability Estimates Based on Instantaneous Coverage
– McGough, Smotherman, et al.
- 1985
|
|
2
|
On Modeling and Analysis of Latency Problem in Fault-Tolerant Systems
– Soh, Dillon
- 1991
|
|
2
|
The Effects of Latent Faults in Highly Reliable Computer Systems
– Swern, Bavuso, et al.
- 1987
|