See this document in CiteSeerX!

Fault Tolerant Matrix Operations for Networks of Workstations Using Multiple Checkpointing (1997)  (Make Corrections)  (10 citations)
Youngbae Kim, James S. Plank



  Home/Search   Context   Related

 
View or download:
netlib.org/utk/papers/hpc97/hpc97.ps
hpc2204.etl.go.jp/mirrors/n...hpc97.ps
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  utk.edu/utk/people/JackD...papers (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: Recently, an algorithm-based approach using diskless checkpointing has been developed to provide fault tolerance for high-performance matrix operations. With this approach, since fault tolerance is incorporated into the matrix operations, the matrix operations become resilient to any single processor failure or change with low overhead. In this paper, we present a technique called multiple checkpointing to enable the matrix operations to tolerate a certain set of multiple processor failures by ... (Update)

Context of citations to this paper:   More

.... or load balancing (e.g. 27,35] or modified algorithms for performing certain specific computations in a fault tolerant manner (e.g. [7,20,30]) While the effectiveness of these techniques has been demonstrated experimentally, none of them have made a large impact on the...

.... dependent strategies for incorporating fault tolerance have already received attention in the scienti c computing community; see, e.g. [21]. These approaches rely primarily on the use of diskless checkpointing, a signi cant improvement over traditional approaches. The nature...

Cited by:   More
A Diskless Checkpointing Algorithm for Super-scale.. - Engelmann, Geist (2003)   (Correct)
Asynchronous Parallel Pattern Search For Nonlinear.. - Hough, Kolda, Torczon (2000)   (Correct)
Design, Implementations and Robustness in Parallel.. - Roucairol, Cung, Yahfoufi (2000)   (Correct)

Similar documents (at the sentence level):   More
68.5%:   Fault Tolerant Matrix Operations for Networks of.. - Kim, Plank, Dongarra (1997)   (Correct)
30.7%:   Fault Tolerant Matrix Operations Using Checksum and Reverse.. - Kim (1996)   (Correct)
18.8%:   Fault Tolerant Matrix Operations for Parallel and Distributed.. - Kim (1996)   (Correct)

Similar documents based on text:   More   All
0.4:   Algorithm-Based Diskless Checkpointing for Fault Tolerant.. - Plank, Kim, Dongarra (1995)   (Correct)
0.3:   Survivability of Multiple Fiber Duct Failures - Schupke, Autenrieth, Fischer (2001)   (Correct)
0.2:   Diskless Checkpointing - Plank, Li, Puening (1997)   (Correct)

Related documents from co-citation:   More   All
5:   MIST: PVM with Transparent Migration and Checkpointing - Casas, Clark et al. - 1995
5:   Checkpointing SPMD applications on transputer networks (context) - Silva, Veer et al. - 1994
4:   Consistent Checkpoints of PVM Applications - Stellner - 1994

BibTeX entry:   (Update)

Y. Kim, J. S. Plank, and J. J. Dongarra. Fault tolerant matrix operations for networks of workstations using multiple checkpointing. In High Performance Computing on the Information Superhighway, HPC Asia '97, pages 460--465, Seoul, Korea, April 1997. http://citeseer.ist.psu.edu/article/kim97fault.html   More

@misc{ kim97fault,
  author = "Y. Kim and J. Plank and J. Dongarra",
  title = "Fault tolerant matrix operations for networks of workstations using multiple
    checkpointing",
  text = "Y. Kim, J. S. Plank, and J. J. Dongarra. Fault tolerant matrix operations
    for networks of workstations using multiple checkpointing. In High Performance
    Computing on the Information Superhighway, HPC Asia '97, pages 460--465,
    Seoul, Korea, April 1997.",
  year = "1997",
  url = "citeseer.ist.psu.edu/article/kim97fault.html" }
Citations (may not include all citations):
2   DOME: Parallel programming (context) - Arabe, Beguelin et al.



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://netlib2.cs.utk.edu/utk/people/JackDongarra/papers.html):   More
Message-Passing Performance of Various Computers - Dongarra, Dunigan (1995)   (Correct)
High-Performance Computing in Industry - Strohmaier, Dongarra   (Correct)
Determining the Idle Time of a Tiling: New Results - Desprez, Dongarra, Rastello, .. (1997)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC