Results 1 - 10
of
143
Basic concepts and taxonomy of dependable and secure computing
- IEEE TDSC
, 2004
"... This paper gives the main definitions relating to dependability, a generic concept including as special case such attributes as reliability, availability, safety, integrity, maintainability, etc. Security brings in concerns for confidentiality, in addition to availability and integrity. Basic defin ..."
Abstract
-
Cited by 779 (6 self)
- Add to MetaCart
(Show Context)
This paper gives the main definitions relating to dependability, a generic concept including as special case such attributes as reliability, availability, safety, integrity, maintainability, etc. Security brings in concerns for confidentiality, in addition to availability and integrity. Basic definitions are given first. They are then commented upon, and supplemented by additional definitions, which address the threats to dependability and security (faults, errors, failures), their attributes, and the means for their achievement (fault prevention, fault tolerance, fault removal, fault forecasting). The aim is to explicate a set of general concepts, of relevance across a wide range of situations and, therefore, helping communication and cooperation among a number of scientific and technical communities, including ones that are concentrating on particular types of system, of system failures, or of causes of system failures.
The Timed Asynchronous Distributed System Model
, 1999
"... We propose a formal definition for the timed asynchronous distributed system model. We present extensive measurements of actual message and process scheduling delays and hardware clock drifts. These measurements confirm that this model adequately describes current distributed systems such as a netwo ..."
Abstract
-
Cited by 191 (19 self)
- Add to MetaCart
(Show Context)
We propose a formal definition for the timed asynchronous distributed system model. We present extensive measurements of actual message and process scheduling delays and hardware clock drifts. These measurements confirm that this model adequately describes current distributed systems such as a network of workstations. We also give an explanation of why practically needed services, such as consensus or leader election, which are not implementable in the time-free model, are implementable in the timed asynchronous system model.
A Metaobject Architecture for Fault Tolerant Distributed Systems: The FRIENDS Approach
- IEEE Transactions on Computers
, 1998
"... : The FRIENDS system developed at LAAS-CNRS is a metalevel architecture providing libraries of metaobjects for fault tolerance, secure communications and group-based distributed applications. The use of metaobjects provides a nice separation of concerns between mechanisms and applications. Metaobjec ..."
Abstract
-
Cited by 88 (12 self)
- Add to MetaCart
: The FRIENDS system developed at LAAS-CNRS is a metalevel architecture providing libraries of metaobjects for fault tolerance, secure communications and group-based distributed applications. The use of metaobjects provides a nice separation of concerns between mechanisms and applications. Metaobjects can be used transparently by applications and can be composed according to the needs of a given application, a given architecture and its underlying properties. In FRIENDS, metaobjects are used recursively to add new properties to applications. They are designed using an object oriented design method and implemented on top of basic system services. This paper describes the FRIENDS software-based architecture, the objectoriented development of metaobjects, the experiments that we have done and summarises the advantages and drawbacks of a metaobject approach for building fault tolerant systems. 1 Introduction The use of a metalevel architecture to build dependable systems has emerged recen...
The Timely Computing Base Model and Architecture”,
- IEEE Transactions on Computers,
, 2002
"... ..."
Fault Injection and Dependability Evaluation of Fault-Tolerant Systems
- IEEE Trans. Computers
, 1993
"... Abstract- This paper describes a dependability evaluation method based on fault injection that establishes the link between the experimental evaluation of the fault tolerance process and the fault occurrence process. The main characteristics of a fault injection test sequence aimed at evaluating the ..."
Abstract
-
Cited by 73 (13 self)
- Add to MetaCart
Abstract- This paper describes a dependability evaluation method based on fault injection that establishes the link between the experimental evaluation of the fault tolerance process and the fault occurrence process. The main characteristics of a fault injection test sequence aimed at evaluating the coverage of the fault tolerance process are presented. Emphasis is given to the derivation of experimental measures. The various steps by which the fault occurrence and fault tolerance processes are combined to evaluate dependability measures are identified and their inter-actions are analyzed. The method is illustrated by an application to the dependability evaluation of the distributed fault-tolerant architecture of the ESPRIT Delta-4 Project. Index Terms- Coverage, dependability modeling and evalu-ation, experimental evaluation, fault injection, fault tolerance, Markov chains. I.
Monitoring, Testing, and Debugging of Distributed Real-Time Systems
, 2000
"... Testing is an important part of any software development project, and can typically surpass more than half of the development cost. For safety-critical computer based systems, testing is even more important due to stringent reliability and safety requirements. However, most safety-critical comput ..."
Abstract
-
Cited by 56 (1 self)
- Add to MetaCart
Testing is an important part of any software development project, and can typically surpass more than half of the development cost. For safety-critical computer based systems, testing is even more important due to stringent reliability and safety requirements. However, most safety-critical computer based systems are real-time systems, and the majority of current testing and debugging techniques have been developed for sequential (non real-time) programs. These techniques are not directly applicable to real-time systems, since they disregard issues of timing and concurrency. This means that existing techniques for reproducible testing and debugging cannot be used. Reproducibility is essential for regression testing and cyclic debugging, where the same test cases are run repeatedly with the intention of verifying modified program code or to track down errors. The current trend of consumer and industrial applications goes from single microcontrollers to sets of distributed micro-controllers, which are even more challenging than handling real-time per-see, since multiple loci of observation and control additionally must be considered. In this thesis we try to remedy these problems by presenting an integrated approach to monitoring, testing, and debugging of distributed real-time systems. For monitoring
Implementing Fault Tolerant Applications using Reflective Object-Oriented
- Programming”, Proceedings of the 25th IEEE International Symposium on FaultTolerant Computing, Pasadena (CA
, 1995
"... This paper shows how refection and object-oriented programming can be used to ease the implementation of classical fault tolerance mechanisms in distributed applications. When the underlying runtime system does not provide fault tolerance transparently, classical approaches to implementing fault tol ..."
Abstract
-
Cited by 52 (8 self)
- Add to MetaCart
This paper shows how refection and object-oriented programming can be used to ease the implementation of classical fault tolerance mechanisms in distributed applications. When the underlying runtime system does not provide fault tolerance transparently, classical approaches to implementing fault tolerance mechanisms ofren imply mixing functional programming with non-functional programming (e.g. error processing mechanisms). The use of reflection improves the transparency of fault tolerance mechanisms to the programmer and more generally provides a clearer separation between functional and non-functional programming. The implementations of some classical replication techniques using a reflective approach are presented in detail and illustrated by several examples, which have been prototyped on a network of Unix workstations. Lessons learnt from our experiments are drawn and future work is discussed. 1
Dependable computing: concepts, limits, challenges
- In Proceedings 25th IEEE International Symposium on Fault-Tolerant Computing
, 1995
"... Our society is faced with an ever increasing dependence on computing systems, which lead to question ourselves about the limits of their dependability, and about the challenges raised by those limits. In order to respond these questions, a global conceptual and terminological framework is needed, wh ..."
Abstract
-
Cited by 52 (0 self)
- Add to MetaCart
Our society is faced with an ever increasing dependence on computing systems, which lead to question ourselves about the limits of their dependability, and about the challenges raised by those limits. In order to respond these questions, a global conceptual and terminological framework is needed, which is first given. The limits and challenges in dependability are then addressed, from technical and financial viewpoints. The recognition that design faults are the major limiting factor leads to recommending the extension of fault tolerance from products to their production process.
Adaptive Distributed and Fault-Tolerant Systems
- International Journal of Computer Systems Science and Engineering
, 1995
"... An adaptive computing system is one that modifies its behavior based on changes in the environment. Since sites connected by a local-area network inherently have to deal with network congestion and the failure of other sites, distributed systems can be viewed as an important subclass of adaptive ..."
Abstract
-
Cited by 51 (6 self)
- Add to MetaCart
(Show Context)
An adaptive computing system is one that modifies its behavior based on changes in the environment. Since sites connected by a local-area network inherently have to deal with network congestion and the failure of other sites, distributed systems can be viewed as an important subclass of adaptive systems. As such, use of adaptive methods in this context has the same potential advantages of improved efficiency and structural simplicity as for adaptive systems in general. This paper describes a model for adaptive systems that can be applied in many scenarios arising in distributed and fault-tolerant systems. This model divides the adaptation process into three different phases---change detection, agreement, and action---that can be used to describe existing algorithms that deal with change, as well as to develop new adaptive algorithms. In addition to clarifying the logical structure of such algorithms, this model can also serve as a unifying implementation framework. Several ad...
Fail-Awareness in Timed Asynchronous Systems
, 2003
"... We address the problem of the impossibility of implementing synchronous fault-tolerant service specifications in asynchronous distributed systems. We introduce a method for weakening a synchronous service specification so that it becomes implementable in "timed" asynchronous systems, that ..."
Abstract
-
Cited by 49 (15 self)
- Add to MetaCart
(Show Context)
We address the problem of the impossibility of implementing synchronous fault-tolerant service specifications in asynchronous distributed systems. We introduce a method for weakening a synchronous service specification so that it becomes implementable in "timed" asynchronous systems, that is, asynchronous systems in which processes have access to local hardware clocks. The method (1) adds to a service interface an exception indicator so that a client knows at any time if a server is currently providing its standard "synchronous" semantics or some other specified exceptional semantics, (2) the standard behavior provided when the exception indicator does not signal an exception is "similar" to the original synchronous service behavior, and (3) a server has to provide its standard semantics whenever the underlying communication and process services exhibit "synchronous behavior ". To illustrate our method, we show how the specification of a synchronous datagram service and an internal clock synchronization service can be transformed into a fail-aware service specification. Further illustrations of the usefulness of fail-aware services are provided by describing a railway crossing service and a fail-aware weak group membership service.