Results 1 - 10
of
20
An Efficient Fault-tolerant Scheduling Algorithm for Real-time Tasks with Precedence Constraints
- in Heterogeneous Systems,” Proc. 31st Int’l Conf. Parallel Processing
, 2002
"... In this paper, we investigate an efficient off-line scheduling algorithm in which real-time tasks with precedence constraints are executed in a heterogeneous environment. It provides more features and capabilities than existing algorithms that schedule only independent tasks in real-time homogeneous ..."
Abstract
-
Cited by 46 (15 self)
- Add to MetaCart
(Show Context)
In this paper, we investigate an efficient off-line scheduling algorithm in which real-time tasks with precedence constraints are executed in a heterogeneous environment. It provides more features and capabilities than existing algorithms that schedule only independent tasks in real-time homogeneous systems. In addition, the proposed algorithm takes the heterogeneities of computation, communication and reliability into account, thereby improving the reliability. To provide faulttolerant capability, the algorithm employs a primarybackup copy scheme that enables the system to tolerate permanent failures in any single processor. In this scheme, a backup copy is allowed to overlap with other backup copies on the same processor, as long as their corresponding primary copies are allocated to different processors. Tasks are judiciously allocated to processors so as to reduce the schedule length as well as the reliability cost, defined to be the product of processor failure rate and task execution time. In addition, the time for detecting and handling of a permanent fault is incorporated into the scheduling scheme, thus making the algorithm more practical. To quantify the combined performance of fault-tolerance and schedulability, the performability measure is introduced. Compared with the existing scheduling algorithms in the literature, our scheduling algorithm achieves an average of 16.4% improvement in reliability and an average of 49.3% improvement in performability. 1.
Fault-Tolerant Deployment of Embedded Software for Cost-Sensitive Real-Time Feedback-Control Applications
- In Procs. of Design Automation and Test in Europe
, 2004
"... Designing cost-sensitive real-time control systems for safety critical applications requires a careful analysis of the cost/coverage trade-offs of fault-tolerant solutions. This further complicates the difficult task of deploying the embedded software that implements the control algorithms on the ex ..."
Abstract
-
Cited by 32 (5 self)
- Add to MetaCart
(Show Context)
Designing cost-sensitive real-time control systems for safety critical applications requires a careful analysis of the cost/coverage trade-offs of fault-tolerant solutions. This further complicates the difficult task of deploying the embedded software that implements the control algorithms on the execution platform that is often distributed around the plant (as it is typical, for instance, in automotive applications). We propose a synthesis-based design methodology that relieves the designers from the burden of specifying detailed mechanisms for addressing platform faults, while involving them in the definition of the overall fault-tolerance strategy. Thus, they can focus on addressing plant faults within their control algorithms, selecting the best components for the execution platform, and defining an accurate fault model. Our approach is centered on a new model of computation, Fault Tolerant Data Flows (FTDF), that enables the integration of formal validation techniques.
An Algorithm for Automatically Obtaining Distributed and Fault-Tolerant Static Schedules
- In International Conference on Dependable Systems and Networks, DSN’03
, 2003
"... Our goal is to automatically obtain a distributed and fault-tolerant embedded system: distributed because the system must run on a distributed architecture; fault-tolerant because the system is critical. Our starting point is a source algorithm, a target distributed architecture, some distribution c ..."
Abstract
-
Cited by 30 (7 self)
- Add to MetaCart
(Show Context)
Our goal is to automatically obtain a distributed and fault-tolerant embedded system: distributed because the system must run on a distributed architecture; fault-tolerant because the system is critical. Our starting point is a source algorithm, a target distributed architecture, some distribution constraints, some indications on the execution times of the algorithm operations on the processors of the target architecture, some indications on the communication times of the data-dependencies on the communication links of the target architecture, a number of fail-silent processor failures that the obtained system must tolerate, and finally some real-time constraints that the obtained system must satisfy. In this article, we present a scheduling heuristic which, given all these inputs, produces a fault-tolerant, distributed, and static scheduling of the algorithm on the architecture, with an indication whether or not the real-time constraints are satisfied. The algorithm we propose consist of a list scheduling heuristic based active replication strategy, that allows at least +1 replicas of an operation to be scheduled on different processors, which are run in parallel to tolerate at most failures. Due to the strategy used to schedule operations, simulation results show that the proposed heuristic improve the performance of our method, both in the absence and in the presence of failures.
A Formal Approach to Fault Tree Synthesis for the Analysis of Distributed Fault Tolerant Systems
- PROCS. OF THE 5TH ACM INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE
, 2005
"... Designing cost-sensitive real-time control systems for safetycritical applications requires a careful analysis of both performance versus cost aspects and fault coverage of fault tolerant solutions. This further complicates the difficult task of deploying the embedded software that implements the co ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
(Show Context)
Designing cost-sensitive real-time control systems for safetycritical applications requires a careful analysis of both performance versus cost aspects and fault coverage of fault tolerant solutions. This further complicates the difficult task of deploying the embedded software that implements the control algorithms on a possibly distributed execution platform (for instance in automotive applications). In this paper, we present a novel technique for constructing a fault tree that models how component faults may lead to system failure. The fault tree enables us to use existing commercial analysis tools to assess a number of dependability metrics of the system. Our approach is centered on a model of computation, Fault Tolerant Data Flow (FTDF), that enables the integration of formal verification techniques. This new analysis capability is added to an existing design framework, also based on FTDF, that enables a synthesis-based, correctby-construction, design methodology for the deployment of real-time feedback control systems in safety critical applications.
A scheduling heuristics for distributed real-time embedded systems tolerant to processor . . .
, 2004
"... ..."
Fault-tolerant distributed deployment of embedded control software
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
, 2008
"... Abstract—Safety-critical feedback-control applications may suffer faults in the controlled plant as well as in the execution platform, i.e., the controller. Control theorists design the control laws to be robust with respect to the former kind of faults while assuming an idealized scenario for the l ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Safety-critical feedback-control applications may suffer faults in the controlled plant as well as in the execution platform, i.e., the controller. Control theorists design the control laws to be robust with respect to the former kind of faults while assuming an idealized scenario for the latter. The execution platforms supporting modern real-time embedded systems, however, are distributed architectures made of heterogeneous components that may incur transient or permanent faults. Making the platform fault tolerant involves the introduction of design redundancy with obvious impact on the final cost. We present a design flow that enables the efficient exploration of redundancy/cost tradeoffs. After providing a system-level specification of the target platform and the fault model, designers can rely on the synthesis of the low-level fault-tolerance mechanisms. This is performed automatically as part of the embedded software deployment through the combination of the following three steps: replication, mapping, and scheduling. Our approach has a sound foundation in faulttolerant data flow, a novel model of computation that simplifies the integration of formal validation techniques. Finally, we report on the application of our design flow to two case studies from the automotive industry: a steer-by-wire system from General Motors and a drive-by-wire system from BMW. Index Terms—Automotive electronics, embedded control software, fault tolerance, real-time embedded systems. I.
An Active Replication Scheme that Tolerates Failures in Distributed Embedded Real-Time Systems
- in "Proceedings of IFIP Working Conference on Distributed and Parallel Embedded Systems, DIPES’04
, 2004
"... Abstract Embedded real-time systems are being increasingly used in a major part of critical applications. In these systems, critical real-time constraints must be satisfied even in the presence of failures. In this paper, we present a new method-based on graph transformation that introduces fault-to ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
Abstract Embedded real-time systems are being increasingly used in a major part of critical applications. In these systems, critical real-time constraints must be satisfied even in the presence of failures. In this paper, we present a new method-based on graph transformation that introduces fault-tolerance in building embedded real-time systems. The proposed method targets distributed architecture and can tolerate a fixed number of arbitrary processors and communication links failures. Because of the resource limitation in embedded systems, our method uses a software-based replication technique to provide fault-tolerance. Finally, since we use graph transformation to perform replication, our method may be used by any off-line distribution-scheduling algorithm to generate a fault-tolerant distributed schedule.
A Fault-tolerant Real-time Scheduling Algorithm for Precedence-Constrained Tasks in Distributed Heterogeneous Systems
, 2001
"... In this paper, we propose and evaluate a fault-tolerant real-time scheduling algorithm that can tolerate one processor's permanent fault in a heterogeneous distributed system. Workload in this study consists of a stream of real-time jobs where each job contains multiple precedence-constrained t ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In this paper, we propose and evaluate a fault-tolerant real-time scheduling algorithm that can tolerate one processor's permanent fault in a heterogeneous distributed system. Workload in this study consists of a stream of real-time jobs where each job contains multiple precedence-constrained tasks with individual deadlines. A Primary Backup (PB) model is employed, where each real-time task has two copies, i.e. a primary one and a backup one, that are allocated to two different processors. The backup copy executes only if the primary copy fails due to the failure of its assigned processor. The proposed scheduling algorithm also takes the reliability measure into account, in order to further enhance the reliability of the heterogeneous system. In addition, the detection time for permanent fault is incorporated into the scheduling scheme so as to make the scheduling result more realistic and accurate. Simulation results show that the proposed algorithm provides significantly improved reliability and schedulability.
Reducing the Cost of Redundant Execution in Safety-Critical Systems using Relaxed Dedication
"... Abstract—We introduce on-demand redundancy, a set of architectural techniques that leverage the tightly-coupled nature of components in systems-on-chip to reduce the cost of safety-critical systems. On-demand redundancy eases the assumptions that traditionally segregate the execution of critical and ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract—We introduce on-demand redundancy, a set of architectural techniques that leverage the tightly-coupled nature of components in systems-on-chip to reduce the cost of safety-critical systems. On-demand redundancy eases the assumptions that traditionally segregate the execution of critical and non-critical tasks (NCTs), making resources available for critical tasks at potentially arbitrary points in both space and time, and otherwise freeing resources to execute non-critical tasks when critical tasks are not executing. Relaxed dedication is one such technique that allows non-critical tasks to execute on critical task resources. Our results demonstrate that for a wide variety of applications and architectures, relaxed dedication is more cost-effective than a traditional approach that employs dedicated resources executing in lockstep. Applied to dual-modular redundancy (DMR), relaxed dedication exposes 73 % more NCT cycles than traditional DMR on average, across a wide variety of usage scenarios. I.
Cost-effective Safety and Fault Localization using Distributed Temporal Redundancy
"... Cost pressure is driving vendors of safety-critical systems to integrate previously distributed systems. One natural approach we have previous introduced is On-Demand Redundancy (ODR), which allows safety-critical and non-critical tasks, traditionally isolated to limit interference, to execute on sh ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Cost pressure is driving vendors of safety-critical systems to integrate previously distributed systems. One natural approach we have previous introduced is On-Demand Redundancy (ODR), which allows safety-critical and non-critical tasks, traditionally isolated to limit interference, to execute on shared resources. Our prior work has shown that relaxed dedication (RD), one ODR strategy which allows non-critical tasks (NCTs) to execute on idle critical task resources (CTRs), significantly increases NCT throughput. Unfortunately, there are circumstances under which, in spite of this opportunity, it is difficult to effectively schedule NCTs. In this paper, we introduce distributed temporal redundancy (DTR), which allows critical tasks, which traditionally execute in lockstep, to execute asynchronously. In doing so, DTR increases scheduling flexibility, resulting in systems that achieve much closer to the optimal NCT throughput than with relaxed dedication alone; in one set of experiments, DTR schedules no less 93 % of the theoretical NCT cycles across a variety of synthetic benchmarks, outperforming RD by over 11%, on average. Furthermore, by distributing all redundant tasks across different resources, triple-modular redundancy, and therefore fault localization, can be achieved. We demonstrate that this can be accomplished with little additional cost and complexity: in practice, relatively few DTR tasks are in flight simultaneously, limiting the additional buffering needed to support DTR.