SUPPORT FOR FAULT TOLERANCE IN VLSI PROCESSORS †
Abstract:
Fault tolerance techniques are used to allow computer systems to continue correct operation despite component failure. Hardware-supported concurrent error-detection and limited fault tolerance in system components, as implemented by coding or replication, are often required. Detection latency can be reduced by increasing the visibility of internal module state using compressed ‘‘signatures’ ’ of internal values. Thus, encoders, decoders, comparators, and data compression circuitry are of critical importance in fault-tolerant VLSI systems. In this paper we describe alternative implementations of such circuits and various ways in which they can be connected in VLSI modules. We also describe possible performance enhancements through the use of a technique, called micro rollback, which allows error detection to be performed in parallel with inter-module communication. As a concrete example, we present area and performance measurements of alternative microarchitectures and circuits that can be used to add detection and correction to a VLSI RISC processor we are implementing. I.
Citations
| 72 | Error-Control Coding for Computer Systems – Rao, Fujiwara - 1989 |
| 67 | The Design and Analysis of VLSI Circuits – Glasser, Dobberpuhl - 1985 |
| 14 | Built-in self-test techniques – McCluskey - 1985 |
| 6 | Self-Checking VLSI Building Blocks for Fault-Tolerant Multicomputers – Tamir, Séquin - 1983 |
| 3 | The Implementation and Application of Micro Rollback – Tamir, Tremblay, et al. - 1988 |
| 2 | How Parts Fail – Doyle - 1981 |
| 1 | A 70-ns Word-Wide 1-Mbit ROM With On-Chip Error-Correction Circuits – Davis - 1985 |
| 1 | An LSI Totally Self-Checking Hamming Coded Memory – Sievers, Rennels - 1982 |

