| A. L. Liestman and R. H. Campbell. A fault-tolerant scheduling problem. IEEE Transactions on Software Engineering, SE-12(10):1089--1095, October 1986. |
....our contribution and discussing areas for future work. 2. Architecting Dependable Systems with Coordinated Atomic Actions CA Actions The CA actions [8] are a structuring mechanism for developing dependable concurrent systems through the generalization of the concepts of atomic actions [3] and transactions [4] Atomic actions are used for controlling cooperative concurrency among a set of participating processes and for realizing coordinated forward error recovery using exception handling; transactions are used for maintaining the coherency of shared external resources that are ....
....out of several CA actions. Unlike classical action nesting where a subset of action participants If several exceptions have been raised concurrently they are resolved using a resolution tree imposing a partial order on all action exceptions, and the participants handle the resolved exception [3]. enters into a nested action, composed CA actions are autonomous entities with their own participants and external objects. In this model, a participant of a CA action can dynamically initiate the creation of a composed CA action (or dynamically nested action) The internal structure of a ....
R. H. Campbell and B. Randell. Error recovery in asynchronous systems. IEEE Transactions on Software Engineering, SE-12(8), 1986.
....are well known: they are distributed transactions and atomic actions . Distributed transactions [8] use backward error recovery as the main fault tolerance measure in order to satisfy completely or partially the Acid (atomicity, consistency, isolation, durability) properties. Atomic actions [3] allow programmers to apply both backward and forward error recovery. The latter relies on coordinated handling of action exceptions that involves all action participants. Backward error recovery has a limited applicability, and in spite of all its advantages, modern systems are increasingly ....
....can access resources that have Acid properties. Action participants either reach the end of the action and produce a normal outcome or, if one or more exceptions are raised, they all are involved in their coordinated handling. If several exceptions have been raised concurrently, they are resolved [3] using a resolution tree imposing a partial order on all action exceptions, and the participants handle the resolved exception. If this handling is successful the action completes normally, but if handling is not possible then all responsibility for recovery is passed to the containing action ....
R. H. Campbell and B. Randell. Error recovery in asynchronous systems. IEEE Transactions on Software Engineering, SE-12(8), 1986.
....Index terms: Checkpointing, Fault Tolerance, Frequency Scaling, Power Management, Realtime systems, Reliability, Voltage Scaling. 1 Introduction Slack that exists in a schedule has been used for fault tolerance purposes, for example, to restart a task, or a part of a task, after a fault occurs [9, 18, 20, 21, 30, 33]. Checkpoints may need to be inserted in situations where the slack in scheduling may not allow an entire task to be restarted [8] This paper presents a study showing how this slack can also be exploited to simultaneously tolerate failures and reduce the energy consumption of the system. The idea ....
....hardware is required. In space and aviation applications, reducing the hardware is important since that decreases weight, size, power consumption, and cost. Other studies have dealt with real time scheduling providing tolerance to transient faults using a timeline and a primary backup approach [20, 21, 26]. 8 Conclusions We have presented two checkpointing policies that allow a real time system to recover from failure and reduce power consumption. Both policies enable the reduction of the processor speed to the level that yields minimum energy consumption during failure free operation. If a ....
A. Liestman and R. Campbell. A fault-tolerant scheduling problem. IEEE Transactions on Software Engineering, SE-12(11):1089--1095, November 1986.
....or impose restrictions on the way in which fault tolerance is carried out. Moreover, those approaches, which are briefly described below, assume that alternative tasks run with the same priorities as their respective primaries. Here we refer only to approaches which deal with temporary faults. In [10] only periodic tasks are considered; the task periods have to be multiple of each other; and the ex ecution times of alternative tasks have to be shorter than the execution times of their respective primaries. The approach presented in [6] considers only re execution of faulty tasks to tolerate ....
L. Liestman and R. H. Campbell. A fault-tolerant scheduling problem. IEEE Transaction on Software Engineering, SE-12(11):1089-1095, 1986.
....be used for cost or weight constraints, time redundancy becomes an e ective method for achieving fault tolerance in a real time system. The problem of tasks scheduling in real time and fault tolerant systems using time redundancy did not receive much attention in the literature. Only few works [5, 1, 3, 4, 2] provide exact schedulability analysis for fault tolerant real time task sets. Usually, such approaches assume each task is composed of two di erent copies (primary, backup) Hence, whenever a task has a failure, the scheduling algorithm is able to guarantee the execution of the backup copy. The ....
A. L. Liestman and R. H. Campbell, \A FaultTolerant Scheduling Problem". IEEE Transactions on Software Engineering, 12(11), pp. 108995, November 1986.
.... for participants to interact and to coordinate their execution (external objects can be used as well) The CA action mechanism also provides a basic framework for exception handling, which can support a variety of fault tolerance mechanisms aimed at tolerating both hardware and software faults [3, 12]. The use of CA action design makes it easier to prove formally that the system has certain properties, since each CA action guarantees a set of properties. These can be used to construct the proof of global system properties. Guidelines: The design phase of dependable distributed applications ....
R. H. Campbell and B. Randell. Error recovery in asynchronous systems. IEEE Transactions on Software Engineering, 12(8):811--826, 1986.
....error, and an unhandled exception in a task provokes the task termination, and the exception is lost. TransLib deals with multithreaded and or multi process (distributed) transactions that can 15 raise exceptions concurrently. In this case exception resolution is needed. Exception resolution [36, 37] is used to choose an exception that represents all the exceptions that have been concurrently raised. TransLib provides a default resolution scheme that it is applied when multiple fingers of a transaction raise exceptions. This default resolution scheme propagates the predefined exception ....
R. H. Campbell and B. Randell. Error Recovery in Asynchronous Systems. IEEE Transactions on Software Engineering, 12(8):811--826, August 1986. 20
....information, amongst a collection of peer entities. This paradigm appeared as early as in [75] where it is called multipoint association, and also in [71] where it is called conversation, a term that we avoid in order not to cause confusion with a di erent paradigm with the same name described in [19], and also discussed below. Multipeer interactions are the kind of interaction one might wish among managers of a distributed database, a group of commerce servers, a group of TTP servers, or a group of participants running a cryptographic agreement (e.g. contract signing) Communication 27 ....
R. H. Campbell and B. Randell. Error recovery in asynchronous systems. IEEE Transactions on Software Engineering, 12(8):811-826, 1986.
.... for participants to interact and to coordinate their execution (external objects can be used as well) The CA action mechanism also provides a basic framework for exception handling, which can support a variety of fault tolerance mechanisms aimed at tolerating both hardware and software faults [3, 12]. The use of CA action design makes it easier to prove formally that the system has certain properties, since each CA action guarantees a set of properties. These can be used to construct the proof of global system properties. Guidelines: The design phase of dependable distributed applications ....
R. H. Campbell and B. Randell. Error recovery in asynchronous systems. IEEE Transactions on Software Engineering, 12(8):811--826, 1986.
.... cooperation should be a system design concern as we do not want to reason about it using single two participant interactions (which can be done but can dramatically increase the responsibility of programmers and as such be error prone) The general concept of atomic actions, proposed in [4], answers all these concerns. Several participants (threads, processes, objects, etc. enter an action and cooperate inside it to achieve joint goals (Fig. 7) They are designed to cooperate inside the action and are aware of this cooperation. These participants share work and explicitly exchange ....
....of both cooperation and competition and that it is important to allow them to be combined within one system. CA actions provide a framework for dealing with different kinds of concurrency and achieving fault tolerance by integrating and extending two complementary concepts atomic actions [4] and atomic transactions [8] Atomic actions are used to control cooperative concurrency and to implement coordinated error recovery whilst transactions are used to maintain the consistency of shared resources in the presence of failures and competitive concurrency (Fig. 8) Fig. 8. CA atomic ....
[Article contains additional citation context not shown here]
Campbell, R.H., Randell, B.: Error Recovery in Asynchronous Systems. IEEE Transactions on Software Engineering, SE-12, 8 (1986) 811-826
....an alternative section of the program. The point to which a process is restored is called a recovery point. To establish a recovery point it is necessary to save appropriate system state information at run time. Further details about forward and backward error recovery can be found in [And81] [Cam86] and [Rom97] BACKGROUND AND RELATED WORK 35 2.6 Resumption, Termination and Signal Models After an exception has been handled, an important consideration is whether the process that caused the exception should continue its execution. Three models have been studied to cope with this problem: ....
Campbell, R., H., Error Recovery in Asynchronous Systems, IEEE Transactions on Software Engineering, August 1986, pp. 811-826
....support) to each individual conversation. Basically, these features provide error detection and recovery within conversations: when an error has been detected, the corresponding recovery starts. Conversations can use backward error recovery, forward error recovery, or a combination of these [3, 8]. In any case, recovery has to be coordinated, and all conversation participants have to be involved in it. Backward error recovery does not depend much on the application and can be implemented in a way transparent (or provides, to a considerable degree, by the conversation support) because it ....
....degree, by the conversation support) because it uses the rollback of all conversation participants to recover the system. Forward recovery usually relies on an exception mechanism and may incorporate an additional mechanism to resolve multiple exceptions raised in several conversation participants [3] (see section 3.2.1 for a more detailed discussion) This recovery is application dependent by nature and this is why only basic support and a general structuring mechanism are provided by conversations. Conversations can be nested; in this case, the execution of the nested conversation is ....
[Article contains additional citation context not shown here]
R.H. Campbell and B. Randell. Error Recovery in Asynchronous Systems. IEEE Transactions on Software Engineering, SE-12(8):811-826, Agosto 1986.
....(PB) scheme is an example of the temporal redundancy. The triple modular redundancy (TMR) is an example of the spatial redundancy [14] There have been some previous works on the integration of fault tolerance with real time computing, the work from Gosh, Melhem and Moss e [6] is one example. In [10] Liestman and Campbell present a fault tolerant scheduling algorithm to handle transient faults. The tasks are assumed to be periodic, and the short backup copies of all the tasks are scheduled on a uniprocessor system to guarantee minimum performance. On the other hand in [13] Oh and Son present ....
A.L. Liestman and R.H. Campbell. A fault-tolerant scheduling problem. IEEE Transactions on Software Engineering, 12(11):1089--1095, 1994.
....second recovery block by slack scheduling. 2.5 Scheduling Fault Tolerant Tasks In the literature, there are only very few papers which specifically deal with scheduling aspects of fault tolerant systems. Now we provide brief outlines of research reported in this area. Liestman and Campbell [56] have proposed a set of scheduling algorithms for fault tolerant real time systems, executing on time shared, frame based, single processor hardware. Here if the primary algorithm fails, an alternate is guaranteed to be executed before the deadline of that task. If the primary algorithm executes ....
....then the alternate need not be executed and the time reserved for it is re used(i.e. adds to the slack in the system) They considered simply periodic systems in which the arrival period of a job is fixed and is a multiple of the next smallest period. A schedule is defined as FT feasible[56], if all the requests will be serviced before their deadlines, even if all of the scheduled primary algorithms fail. A schedule is FToptimal, if it is feasible and has the maximum number of primaries scheduled compared to other feasible schedules. The method suggested involves three algorithms. ....
[Article contains additional citation context not shown here]
A. L. Liestman and R. H. Campbell. A Fault-Tolerant Scheduling Problem. IEEE Transactions on Software Engineering, 12(11):1089--95, November 1986.
No context found.
A. L. Liestman and R. H. Campbell. A fault-tolerant scheduling problem. IEEE Transactions on Software Engineering, SE-12(10):1089--1095, October 1986.
No context found.
A. L. Liestman and R. H. Campbell. A fault-tolerant scheduling problem. IEEE Transactions on Software Engineering, SE-12(10):1089--1095, October 1986.
....of application, due to space, weight and cost considerations it may not be feasible to provide space redundancy. Such systems need to exploit time redundancy techniques. Few works treat the scheduling problem in real time and fault tolerant systems using time redundancy. Liestman and Campbell [8] proposed a set of scheduling algorithms for frame based, simply periodic uniprocessor systems. Burns, Davis and Punnekkat [3] provided exact schedulability tests for fault tolerant task sets, assuming that faults are detected at the end of tasks executions and can only affect one task at a time. ....
A. L. Liestman and R. H. Campbell. A fault-tolerant scheduling problem. IEEE Transactions on Software Engineering, 12(11):1089--95, November 1986.
....by a binary distribution of task periods, they mean that if the tasks are ordered in terms of increasing period, then = 2 . The optimal result can be generalized to include conditions in which tasks are related by = k , where k is an integer. Though there have been several works in the literature [4, 5, 36, 45] which deal with allocation algorithms for fault tolerant systems, they are developed under vastly different assumptions and are only remotely related to our work. Here we mention several. In order to tolerate processor failures, Balaji et al. [4] presented an algorithm to dynamically distribute ....
A.L. Liestman and R.H. Campbell, A Fault Tolerant Scheduling Problem, IEEE Transactions on Software Engineering 12(11), November 1986, 1089-1095.
....worst case load level (e.g. unexpected network congestion) a transient overload occurs, potentially causing some deadlines to be missed. The real time system must remain robust and maintain an acceptable level of performance under a transient overload. The imprecise computation 1 technique [1 5] was introduced as a way to deal with transient overloads. The technique is motivated by the fact that one can often trade off precision for timeliness. It prevents missed deadlines and provides graceful degradation during a transient overload by ensuring that an approximate result of acceptable ....
A. L. Liestman and R. H. Campbell. A fault-tolerant scheduling problem. IEEE Transactions on Software Engineering, SE-12(10):1089--1095, October 1986.
No context found.
Campbell, R., Randell, B.: Error recovery in asynchronous systems. IEEE Transactions on Software Engineering (SE) SE-12 number 8 (1986) 811--826
No context found.
A. Liestman and R. Campbell. A fault-tolerantscheduling problem. IEEE Transaction on Software Engineering, SE-12(11):1089--1095,November 1986.
No context found.
R.H. Campbel, B. Randell. Error Recovery in Asynchronous Systems. IEEE Transactions on Software Engineering, 12(8), 811-826, 1986.
No context found.
R. H. Campbell and B. Randell. Error recovery in asynchronous systems. IEEE Transactions on Software Engineering, SE-12(8):811--826, 1986.
No context found.
R. H. Campbell and B. Randell. Error recovery in asynchronous systems. IEEE Transactions on Software Engineering, SE-12(8):811--826, 1986.
No context found.
A.L. Liestman and R.H. Campbell. A Fault-Tolerant Scheduling Problem. IEEE Transactions on Software Engineering, vol. SE-12, pp. 1089-1095, Nov. 1986.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC