Results 1 -
5 of
5
Evaluating the error resiliency of gpgpu applications
- of 2012 ACM/IEEE Conference on Supercomputing
, 2012
"... Over the past years, GPUs (Graphics Processing Units) have gained wide adoption as accelerators for general purpose computing. A number of studies [1, 2] have shown that significant performance gains can be achieved by deploying ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
Over the past years, GPUs (Graphics Processing Units) have gained wide adoption as accelerators for general purpose computing. A number of studies [1, 2] have shown that significant performance gains can be achieved by deploying
1Towards Analyzing and Improving Robustness of Software Applications to Intermittent and Permanent Faults in Hardware
"... Abstract—Although a significant fraction of emerging failure and wearout mechanisms result in intermittent or permanent faults in hardware, their impact (as distinct from transient faults) on software applications has not been well studied. In this paper, we develop a distinguishing application char ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Although a significant fraction of emerging failure and wearout mechanisms result in intermittent or permanent faults in hardware, their impact (as distinct from transient faults) on software applications has not been well studied. In this paper, we develop a distinguishing application characteristic, referred to as similarity from fundamental circuit-level understanding of the failure mechanisms. We present a mathematical definition and a procedure for similarity compu-tation for practical software applications and experimentally verify the relationship between similarity and fault rate. Leveraging dependence of application robustness on the similarity metric, we present example architecture independent code transformations to reduce similarity and thereby the worst-case fault rate with minimal performance degradation. Our experimental results with arithmetic unit faults show as much as 74 % improvement in the worst case fault rate on benchmark kernels, with less than 10 % runtime penalty.
To My Family Summary
"... Embedded systems are becoming increasingly complex and have tight competing con-straints in terms of performance, cost, energy consumption, dependability, flexibility, security, etc. The objective of this thesis is to propose design methods and tools for supporting the tradeoff analysis of competing ..."
Abstract
- Add to MetaCart
(Show Context)
Embedded systems are becoming increasingly complex and have tight competing con-straints in terms of performance, cost, energy consumption, dependability, flexibility, security, etc. The objective of this thesis is to propose design methods and tools for supporting the tradeoff analysis of competing design objectives during the early design phases, which are characterized by uncertainties. We consider safety-critical real-time applications modeled as task graphs, to be implemented on distributed heterogeneous architectures consisting of processing elements (PEs), interconnected by a shared com-munication channel. Tasks are scheduled using fixed-priority preemptive scheduling, and we use non-preemptive scheduling for messages. As a first step, we address the problem of function-to-task decomposition. In this con-text we have assumed that the application functionality is captured by a set of functional blocks, with different safety requirements. We propose a Genetic Algorithm-based metaheuristic to solve the function-to-task decomposition problem. Our algorithm also decides the mapping of tasks to the PEs of a distributed architecture and the reliabil-ity of each PE in the architecture, such that the safety and integrity constraints are
1Towards Analyzing and Improving Robustness of Software Applications to Intermittent and Permanent Faults in Hardware
"... Abstract—Although a significant fraction of emerging failure and wearout mechanisms result in intermittent or permanent faults in hardware, their impact (as distinct from transient faults) on software applications has not been well studied. In this paper, we develop a distinguishing application char ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Although a significant fraction of emerging failure and wearout mechanisms result in intermittent or permanent faults in hardware, their impact (as distinct from transient faults) on software applications has not been well studied. In this paper, we develop a distinguishing application characteristic, referred to as similarity from fundamental circuit-level understanding of the failure mechanisms. We present a mathematical definition and a procedure for similarity compu-tation for practical software applications and experimentally verify the relationship between similarity and fault rate. Leveraging dependence of application robustness on the similarity metric, we present example architecture independent code transformations to reduce similarity and thereby the worst-case fault rate with minimal performance degradation. Our experimental results with arithmetic unit faults show as much as 74 % improvement in the worst case fault rate on benchmark kernels, with less than 10 % runtime penalty. I.
Research Article Characterizing the Effects of Intermittent Faults on a Processor for Dependability Enhancement Strategy
"... Copyright © 2014 Chao(Saul) Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. As semiconductor technology scales into the n ..."
Abstract
- Add to MetaCart
(Show Context)
Copyright © 2014 Chao(Saul) Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. As semiconductor technology scales into the nanometer regime, intermittent faults have become an increasing threat. This paper focuses on the effects of intermittent faults on NET versus REG on one hand and the implications for dependability strategy on the other. First, the vulnerability characteristics of representative units in OpenSPARC T2 are revealed, and in particular, the highly sensitive modules are identified. Second, an arch-level dependability enhancement strategy is proposed, showing that events such as core/strand running status and core-memory interface events can be candidates of detectable symptoms. A simple watchdog can be deployed to detect application running status (IEXE event). Then SDC (silent data corruption) rate is evaluated demonstrating its potential. Third and last, the effects of traditional protection schemes in the target CMT to intermittent faults are quantitatively studied on behalf of the contribution of each trap type, demonstrating the necessity of taking this factor into account for the strategy. 1.