Results 1 -
2 of
2
ANB- and ANBDmem-Encoding: Detecting Hardware Errors in Software
- In Computer Safety, Reliability, and Security
, 2010
"... Abstract. It is expected that commodity hardware is becoming less re-liable because of the continuously decreasing feature sizes of integrated circuits. Nevertheless, more and more commodity hardware with insuf-ficient error detection is used in critical applications. One possible solu-tion is to de ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Abstract. It is expected that commodity hardware is becoming less re-liable because of the continuously decreasing feature sizes of integrated circuits. Nevertheless, more and more commodity hardware with insuf-ficient error detection is used in critical applications. One possible solu-tion is to detect hardware errors in software using arithmetic AN-codes. These codes detect hardware errors independent of the actual failure modes of the underlying hardware. However, measurements have shown that AN-codes still exhibit large rates of undetected silent data corrup-tions (SDC). These high rates of undetected SDCs are caused by the insufficient protection of control and data flow through AN-codes. In contrast, ANB- and ANBD-codes promise much higher error detection rates because they also detect errors in control and data flow. We present our encoding compiler that automatically applies either an AN-, ANB-or ANBD-code to an application. Our error injections show that AN-, ANB-, and ANBD-codes successfully detect errors and more important that indeed ANB- and ANBD-codes reduce the SDC rate more effec-tively than AN-codes. The difference between ANBD- and ANB-codes is also visible but less pronounced. 1
On the Soundness of Silence: Investigating Silent Failures Using Fault Injection Experiments
"... Abstract—Fault injection campaigns have been used extensively to characterize the behavior of systems under errors. Traditional characterization studies, however, focus only on analyzing fail-stop behavior, incorrect test results, and other obvious failures observed during the experiment. More resea ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Fault injection campaigns have been used extensively to characterize the behavior of systems under errors. Traditional characterization studies, however, focus only on analyzing fail-stop behavior, incorrect test results, and other obvious failures observed during the experiment. More research is needed to evaluate the impact of silent failures—a relevant and insidious class of real-world failures—and doing so in a fully automated way in a fault injection setting. This paper presents a new methodology to identify fault injection-induced silent failures and assess their impact in a fully automated way. Drawing inspiration from system call-based anomaly detection, we compare faulty and fault-free execution runs and pinpoint behavioral differences that result in externally visible changes—not reported to the user—to detect silent failures. Our investigation across several different programs demonstrates that the impact of silent failures is relevant, consistent with field data, and should be carefully considered to avoid compromising the soundness of fault injection results. Keywords-silent failure; fail-stop; fault injection; LLVM; sys-tem call tracing I.