23 citations found. Retrieving documents...
G. A. Gibson, L. Hellerstein, R. M. Karp, R. H. Katz, and D. A. Patterson. Failure correction techniques for large disk arrays. In Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 123--132, Boston, MA, April 1989.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Note: Correction to the 1997 Tutorial on Reed-Solomon Coding - James Plank Ying   (Correct)

....will be immense; hence the need to correct the error of the 1997 Reed Solomon coding tutorial. There are other erasure coding techniques in addition to the one which this tutorial addresses. Examples are Tornado codes [13, 14] Cauchy Reed Solomon codes [1] and other parity based schemes [6]. Of these, Tornado codes are worth special mention, as they form the backbone of the Digital Fountain content dispersal system [2] Tornado codes have a randomized structure so that with the addition of m extra parity blocks, a file may be reconstructed from any n blocks. The randomized ....

G. A. Gibson, L. Hellerstein, R. M. Karp, R. H. Katz, and D. A. Patterson. Failure correction techniques for large disk arrays. In Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 123--132, Boston, MA, April 1989.


Node-covering, Error-correcting Codes and Multiprocessors.. - Dutt, Mahapatra (1997)   (1 citation)  (Correct)

....number of processor faults that can always be reconfigured around) of the FT multiprocessor, as we will shortly see. 5) Average erasure correctability, which is defined similarly to average fault tolerance k avg (see Section 1) The concept of erasure correctability, which was introduced in [18], will be explained shortly. This metric provides a lower bound on the average fault tolerance of the multiprocessor, as shown later. We will describe four useful ECCs, the 2D parity, fulltwo, 3D parity, and full three codes, that provide different trade offs of these metrics. The partitioned ....

.... of the r columns is sufficient to yield a nonzero sum; however, it is not necessary (for example, a subset of the r columns can be summed to yield a 0 vector, but the sum of all r columns may not be zero) A more relevant concept for our purpose is erasure correctability of a code used in [18], where ECCs were applied to design redundant disk arrays (RAIDs) in which disk erasures (disk failures in which the data on the disk is lost) can be tolerated by reproducing the data of the failed disks using parity equations. In such RAIDs, each primary (spare) disk corresponds to an information ....

[Article contains additional citation context not shown here]

G.A. Gibson, L. Hellerstein, R.M. Karp, R.H. Katz, and D.A. Patterson, "Failure Correction Techniques for Large Disk Arrays," Proc. ASPLOS `89, pp. 123-132, 1989.


High Performance File System Design - Staelin (1991)   (3 citations)  (Correct)

....and the system cannot use multiple devices for redundancy to provide reliable service. By limiting themselves to a single device, most file systems increase system administration tasks and limit reliability and performance. There are well known techniques, such as mirroring and checksumming [16, 51, 52], that improve file system reliability dramatically but require multiple disks. Other techniques, such as load balancing and data striping [96] optimize throughput for a set of disks. Some attempts have been made to add mirroring to UNIX invisibly, but, for the most part, file system reliability ....

....general case of shadow sets) is to have duplicate copies of the data on two (or more) disks [16] Recovering data after media failure is simple just read the data from the mirror disk. However, there is significant cost in wasted disk space. Other, less costly techniques have been proposed [52, 118]. One approach is to have n disks in an array, with one disk serving as the parity or checksum disk, which allows the system to recover from the loss of any single disk using the data from the n Gamma 1 remaining disks. This and other configurations that allow the system to recover from ....

[Article contains additional citation context not shown here]

G. Gibson, L. Hellerstein, R. Karp, R. Katz, and D. Patterson. Failure correction techniques for large disk arrays. In Proceedings Architectural Support for Programming Languages and Operating Systems, pages 123--132, Boston, Apr. 1989.


RAID Organization and Performance - Schwarz (1992)   (3 citations)  (Correct)

....A key feature of our approach is that reliability groups can contain several check data disks beyond the single parity disk. Introduction Redundant arrays of inexpensive disks (RAIDs) introduced by Patterson, Gibson and Katz [9] and further studied by Chen, Menon and Mattson, Muntz and Lui, 2] [4] [7] 8] achieve tolerance for a single disk failure by introducing redundancy. Previous work regarding RAIDs has been concerned with the cost and run time performance for both realistic and synthesized workloads for rather small arrays. We introduce a generalization of the five level RAID ....

Garth A. Gibson, Lisa Hellerstein, Richard Karp, Randy H. Katz and David A. Patterson. "Failure Correction Techniques for Large Disk Arrays," Proceedings of the Third International Conference on Architectural Support of Programming Languages and Operating Systems, pp.123-132, 1989.


Bigfoot-NFS: A Parallel File-Striping NFS Server (Extended.. - Kim, Minnich, McVoy (1994)   (Correct)

.... massively parallel machines (e.g. the CM 2 and CM 5 with their parallel disk arrays) on supercomputers (e.g. Crays and the Maxximum Strategies systems) and on networks of machines (e.g. Zebra file system) However, the most common system in use for managing an array of disks is known as RAID[4]. Several models of RAID exist, but a theme common to all is the interleaved storage system. In an interleaved storage system, data is stored over a set of storage units. Often motivating the use of RAID is increasing apparent disk throughput. Moving data to disk is essentially a serial ....

Garth A. Gibson, Lisa Hellerstein, Richard M. Karp, Randy H. Katz, and David A. Patterson. Failure correction techniques for large disk arrays. In Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 123--132, April 1989.


Data Partitioning and Load Balancing in Parallel Disk Systems - Scheuermann, al. (1998)   (19 citations)  (Correct)

.... the proposed fault tolerance techniques in that they can be combined, in a straightforward manner, with arbitrary variants of either mirroring (e.g. mirrored disks, interleaved declustering, or chained declustering [8, 15, 35, 64, 74] or error correcting codes (e.g. parity groups of some type [30,31,36,37,53,54, 56 58, 60, 61, 66, 70]) or simply conventional logging [34] However, the placement of data replicas or error correcting information itself provides additional degrees of freedom that should be taken into account by an integrated approach in order to ensure the best possible performance and availability for given ....

Gibson GA, Hellerstein L, Karp RM, Katz RH, Patterson DA (1989) Failure Correction Techniques for Large Disk Arrays. Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems, pp 123--132


Design and Evaluation of Gracefully Degradable Disk Arrays - Reddy, Chandy, Banerjee (1993)   (7 citations)  (Correct)

....disk as a hot spare, which is used as replacement for the failed disk. The overhead is 2 disks for n data disks in this system. Other forms of providing redundancy in disk arrays, such as 2 dimensional parity, full 2 codes, 3 dimensional parity, full 3 codes and additive 3 codes were considered in [9]. The parity striping approach [10] is another approach for achieving high data availability. This approach is similar to RAID in achieving the fault tolerance and differs mainly in the data organization on the disks. For our discussion, the above description of these techniques is sufficient. One ....

Gibson, G. et al. Failure correction techniques for large disk arrays. Proc. 3rd Int. Conf. on Architectural Support for Programming Languages and Operating Systems ASPLOS, April 1989.


Algorithm-Based Diskless Checkpointing for Fault Tolerant.. - Plank, Kim, Dongarra (1995)   (10 citations)  (Correct)

....N processors into m groups G 0 ; Gm Gamma1 , and have PN j be responsible for checkpointing the processors in G j , for 0 j m. This is basically a 1 dimensional parity scheme, which can tolerate up to m simultaneous processor failures, as long as each failure occurs in a different group [18]. The extreme we have presented is m = 1. At the other extreme are systems like Isis [4] or Targon [7] where m = N , and every processor has a backup processor to which it sends checkpoints. As m grows, the overhead of checkpointing and recovery will decrease because there is less contention for ....

G. A. Gibson et al. Failure correction techniques for large disk arrays. 3rd Int. Conf. on Arch. Sup. for Prog. Lang. and Op. Sys., pp. 123--132, Apr 1989.


Dual Crosshatch Disk Array: A Highly Reliable Disk Array System - Mishra, Mohapatra   (Correct)

....from concurrent failure of no more than one disk per parity group will have inadequate reliability for such large storage requirement. In this paper, we present a novel and efficient, low overhead parity organization, the interleaved 2d parity scheme, which is a variant of the 2d parity scheme [4]. This parity code can correct any three disk failure and most four disk failures. This encoding scheme is used to introduce a highly reliable and robust disk array architecture, the Dual Crosshatch Disk Array (DCDA) that is capable of tolerating any three disk and controller failures with ....

....in redundant disk arrays are required to correct erasures instead of arbitrary errors. The basic metric of a code is the number and types of erasure it can correct to enhance reliability. Besides this metric, three other commonly used metrics are update penalty, check disk overhead and group size [4]. The update penalty of a code is the number of check disks whose contents must be modified when the contents of an information disk is updated. Redundant updates contribute to performance degradation associated with any error correcting code, and should be minimized. The check disk overhead for ....

[Article contains additional citation context not shown here]

G. A. Gibson, L. Hellerstein, R. M. Karp, R. H. Katz, and D. A. Patterson, "Failure Correction Techniques for Large Disk Arrays," International Conference on Architectural Support for Programming Language and Operating Systems, pp. 123-132, April 1989.


Swift: Using Distributed Disk Striping to Provide High I/O.. - Cabrera, Long (1991)   (32 citations)  (Correct)

....detecting capabilities of the disks, a single parity disk is sufficient to tolerate a single failure [17,3] In this way, if a disk fails it can be reconstructed using the information on the other disks. Higher level erasure correcting codes can be used if more than one failure is to be tolerated [18]. 3 Ethernet based Prototype of Swift A simplified prototype of the Swift architecture has been built as a set of libraries that use the standard filing and interprocess communication facilities of the UNIX operating system. We have used the UNIX file system facilities to name and store objects, ....

G. A. Gibson, L. Hellerstein, R. M. Karp, R. H. Katz, and D. A. Patterson, "Failure correction techniques for large disk arrays," in Proceedings of the 3 rd International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 123--32, Apr. 1989.


Disk Array Storage System Reliability - Burkhard, Menon (1993)   (23 citations)  (Correct)

....in the sense that for a given number of data disks it can maintain data integrity through c concurrent disk failures with exactly c check disks. Several schemes for providing better reliability than RAID Level 5 have been proposed. Gibson et al. present the multidimensional parity schemes [8]. Blaum et al. present a scheme that accommodates two concurrent disk failures [2] that is also based on a novel variety of MDS codes [3] Cheung and Kumar study multi parity schemes that accommodate a fixed number of concurrent failures [6] Neither the multidimensional or multi parity ....

Garth A. Gibson, Lisa Hellerstein, Richard M. Karp, Randy H. Katz, and David A. Patterson. Failure Correction Techniques for Large Disk Arrays. In Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS III), pages 123--132, Boston, April 1989.


Fault Tolerant Matrix Operations for Networks of.. - Plank, Kim, Dongarra (1997)   (6 citations)  (Correct)

....n processors into m groups G 0 ; Gm Gamma1 , and have Pn j be responsible for checkpointing the processors in G j , for 0 j m. This is basically a 1 dimensional parity scheme, which can tolerate up to m simultaneous processor failures, as long as each failure occurs in a different group [21]. The extreme we have presented is m = 1. At the other extreme are systems like Isis [5] or Targon [7] where m = n, and every processor has a backup processor to which it sends checkpoints. As m grows, the overhead of checkpointing and recovery will decrease because there is less contention for ....

G. A. Gibson, L. Hellerstein, R. M. Karp, R. H. Katz, and D. A. Patterson. Failure correction techniques for large disk arrays. In Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 123--132, Boston, MA, April 1989.


Data Partitioning and Load Balancing in Parallel Disk.. - Scheuermann, Weikum.. (1994)   (19 citations)  (Correct)

.... the proposed fault tolerance techniques in that they can be combined, in a straightforward manner, with arbitrary variants of either mirroring (e.g. mirrored disks, interleaved declustering, or chained declustering [8, 15, 35, 64, 74] or error correcting codes (e.g. parity groups of some type [30, 31, 36, 37, 53, 54, 60, 61, 57, 56, 58, 66, 70]) or simply conventional logging [34] However, the placement of data replicas or error correcting information itself provides additional degrees of freedom that should be taken into account by an integrated approach in order to ensure the best possible performance and availability for given ....

G.A. Gibson, L. Hellerstein, R.M. Karp, R.H. Katz, and D.A. Patterson, Failure Correction Techniques for Large Disk Arrays, Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems, 1989, pp. 123--132


Fault Tolerant Matrix Operations for Networks of.. - Kim, Plank, Dongarra (1997)   (6 citations)  (Correct)

....mesh, with a checkpointing processor dedicated to each row of processors. This model enables the algorithm to tolerate a certain set of multiple failures simultaneously, one failure for each group (e.g. each row or column of processors) This is called the one dimensional parity scheme [13]. 4 P P 5 P 0 P 1 P 3 P 2 G 0 G 1 P C P A P 6 checkpoints N = 4 m = 2 P 7 G 0 G 1 P C P A checkpoints N = 4 m = 2 P 1 2 P 4 P P 5 P P 6 7 Figure 2. A multiple failure recovery model: before after two failures (P 0 and P 3 fail) 2.4. Multiple Checkpointing in Matrix Operations In most ....

....amount of extra memory while checkpointing at smaller checkpointing intervals. There are several more complicated schemes for configuring multiple checkpointing processors to tolerate more general sets of multiple failures. These schemes include two dimensional parity and multi dimensional parity [13], the Reed Solomon coding scheme [21, 22] and Evenodd parity [3] ....

G. A. Gibson, L. Hellerstein, R. M. Karp, and D. A. Patterson. Failure correction techniques for large disk arrays. pages 123--132, April 1989.


Data Partitioning and Load Balancing in Parallel Disk.. - Scheuermann, Weikum.. (1994)   (19 citations)  (Correct)

.... the proposed fault tolerance techniques in that they can be combined, in a straightforward manner, with arbitrary variants of either mirroring (e.g. mirrored disks, interleaved declustering, or chained declustering [8, 15, 35, 64, 74] or error correcting codes (e.g. parity groups of some type [30, 31, 36, 37, 53, 54, 60, 61, 57, 56, 58, 66, 70]) or simply conventional logging [34] However, the placement of data replicas or error correcting information does itself provide additional degrees of freedom that should be taken into account by an integrated approach in order to ensure the best possible performance and availability for given ....

G.A. Gibson, L. Hellerstein, R.M. Karp, R.H. Katz, and D.A. Patterson, Failure Correction Techniques for Large Disk Arrays, Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems, 1989, pp. 123--132


MDS Disk Array Reliability - Burkhard, Menon (1992)   (Correct)

....in the sense that for a given number of data disks it can maintain data integrity through c concurrent disk failures with exactly c check disks. Several schemes for providing better reliability than RAID Level 5 have been proposed. Gibson et al. present the multidimensional parity schemes [7]. Blaum et al. present a scheme that accommodates two concurrent disk failures [2] that is also based on a novel variety of MDS codes [3] Cheung and Kumar study multi parity schemes that accommodate a fixed number of concurrent failures [5] Neither the multidimensional or multi parity schemes ....

Garth A. Gibson, Lisa Hellerstein, Richard M. Karp, Randy H. Katz, and David A. Patterson. Failure Correction Techniques for Large Disk Arrays. In Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS III), pages 123--132, Boston, April 1989.


Fault Tolerant Matrix Operations for Parallel and Distributed.. - Kim (1996)   (Correct)

....P 1 P 3 P 2 G 0 G 1 P C P A checkpoints m = 3 N = 4 Figure 2. 4: A multiple failure recovery model algorithm to tolerate a certain set of multiple failures simultaneously, one failure for each group (e.g. each row or column of processors) This is often called the one dimensional parity scheme [GHKP89] Several schemes have been developed to configure extra checkpointing processors to tolerate multiple processor failures. For example, the paper [GHKP89] presents two dimensional parity or multidimensional parity, in which the coding information is distributed in twodimensional or ....

....simultaneously, one failure for each group (e.g. each row or column of processors) This is often called the one dimensional parity scheme [GHKP89] Several schemes have been developed to configure extra checkpointing processors to tolerate multiple processor failures. For example, the paper [GHKP89] presents two dimensional parity or multidimensional parity, in which the coding information is distributed in twodimensional or multidimensional fashion, respectively. Another paper [BBBM94] introduces EVENODD parity, with which two extra processors may be used to tolerate any two failures in ....

[Article contains additional citation context not shown here]

G. A. Gibson, L. Hellerstein, R. M. Karp, and D. A. Patterson. Failure correction techniques for large disk arrays. pages 123--132, April 1989.


Dual-Crosshatch Disk Array: A highly reliable.. - Mishra, Vemulapalli, ..   (Correct)

....array architecture, the Dual Crosshatch Disk Array (DCDA) that is capable of tolerating any three disk failures with minimumnumber of redundant disks, and any five controller failures. The DCDA uses a novel and efficient parity scheme, the interleaved 2dparity, a variant of the 2d parity scheme [3]. It is a hybrid approach of RAID 4 and RAID 5 in the sense that one of the parity groups uses block interleaved data and stripped parity while the other uses dedicated parity disks, and hence the name hybrid RAID architecture. The DCDA architecture has extremely high reliability with low check ....

....resulting in greater MTTDL. 2.2 Multiple Erasure Correcting Codes In order to maintain data integrity under more than one disk failure, more that one redundant disks are required. According to the coding theory, C concurrent disk failures can be tolerated using at least C redundant check disks [3]. Gibson et al. have presented the 1d parity, 2d parity, additive 3, and in general, the multidimensional parity scheme [3] The 2dparity, a double erasure correcting code, can tolerate all sets of 3 erasures except the bad 3 erasures, and additive 3 code can correct all sets of 4 erasures except ....

[Article contains additional citation context not shown here]

Gibson et al., "Failure Correction Techniques for Large Disk Arrays," Third Intl. Conf. on Architectural Support for Programming Language and Operating Systems, pp. 123-132, April 1989.


Multi-Dimensional Disk Array Reliability - Schwarz, Burkhard (1993)   (Correct)

....NCR Rancho Bernardo, California and IBM Almaden Research Center, San Jose, California. 2 October 28, 1993 distance separable (MDS) codes, retain some of the performance advantages of RAID Level 5 while providing higher reliability[8] Gibson et al. present the multi dimensional parity schemes [3] which form the basis for the data organizations discussed here. Our data organizations incorporate novel combinations of spare disks [5] and strings[4] Our paper is organized of follows. Section 2 contains a brief overview of the terminology we will use throughout. Within section 3 we present ....

....reliability group contains data disks and a pair of check disks[8] Level 5 organizations can tolerate a single disk failure while the MDS organization withstands a pair of concurrent disk failures. Our orthogonal organization of strings and reliability groups is similar to that of Gibson [3]. A string is a group of disks that share hardware components such as power supply and cabling, cooling, SCSI controller and cabling and host bus adapter(HBA) 4] We consider three varieties of hardware redundancy within strings. A string is soft if it contains only the basic set of hardware ....

[Article contains additional citation context not shown here]

Garth A. Gibson, Lisa Hellerstein, Richard M. Karp, Randy H. Katz, and David A. Patterson. Failure Correction Techniques for Large Disk Arrays. In Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS III), pages 123--132, Boston, April 1989.


Unknown - We See The   (Correct)

No context found.

Garth A. Gibson, Lisa Hellerstein, Richard M. Karp, Randy H. Katz, and David A. Patterson. Failure correction techniques for large disk arrays. In Third Inlernalion Conference on Archileclural Supporl for Pragrararaing Languages and Operaling Sysleras, pages 123-132, April 1989.


Algorithm-Based Diskless Checkpointing for Fault Tolerant Matrix.. - Plank (1995)   (10 citations)  (Correct)

No context found.

G. A. Gibson, L. Hellerstein, R. M. Karp, R. H. Katz, and D. A. Patterson. Failure correction techniques for large disk arrays. In Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 123--132, Boston, MA, April 1989.


Adaptive Load Balancing in Disk Arrays - Scheuermann, Weikum, Zabback (1993)   (9 citations)  (Correct)

No context found.

Gibson, G.A., Hellerstein, L., Karp, R.M., Katz, R.H. and Patterson, D.A., "Failure Correction Techniques for Large Disk Arrays", Proc. Third ACM Intern. Conf., on Architectural Support for Programming Languages and Operating Systems, 1989, pp. 123--132.


A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like .. - Plank (1997)   (30 citations)  (Correct)

No context found.

G. A. Gibson, L. Hellerstein, R. M. Karp, R. H. Katz, and D. A. Patterson. Failure correction techniques for large disk arrays. In Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 123--132, Boston, MA, April 1989.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC