| Alternate document: Details Error Management in the Pluggable File System (02) Douglas Thain, Miron Livny |
(Enter summary)
Abstract: Error propagation is a central problem in grid computing. We re-learned this while adding a Java feature to the Condor computational grid. Our initial experience with the system was negative, due to the large number of new ways in which the system could fail. To reason about this problem, we developed a theory of error propagation. Central to our theory is the concept of an error's scope, defined as the portion of a system that it invalidates. With this theory in hand, we recognized that the... (Update)
Cited by: More
Synthetic Grid Workloads With Ibis, Koala, And Grenchmark - Alexandru Iosup And
(Correct)
Distributed Computing in Practice: The Condor Experience - Thain, Tannenbaum, Livny (2004)
(Correct)
Phoenix: Making Data-intensive Grid Applications Fault-tolerant - George Kola Tevfik (2004)
(Correct)
Similar documents (at the sentence level):
46.5%: Coordinating Access to Computation and Data in Distributed Systems - Thain (2004)
(Correct)
Active bibliography (related documents): More All
0.5: A Semantic Characterisation for Faults in Replicated Systems - Krishnan
(Correct)
0.4: Parrot: Transparent User-Level Middleware for Data Intensive.. - Thain, Livny (2003)
(Correct)
0.4: Parrot: Transparent User-Level Middleware for Data-Intensive.. - Thain, Livny (2003)
(Correct)
Similar documents based on text: More All
0.5: Multiple Bypass: Interposition Agents for Distributed Computing - Thain, Livny (2001)
(Correct)
0.5: Bypass: A Tool for Building Split Execution Systems - Thain, Livny (2000)
(Correct)
0.5: Utilizing Widely Distributed Computational Resources.. - Basney, Livny, Mazzanti (2000)
(Correct)
Related documents from co-citation: More All
5: A computation management agent for multiinstitutional grids (context) - Frey, Tannenbaum et al.
4: Condor -- a distributed job scheduler (context) - Tannenbaum, Wright et al. - 2001
3: Stork: Making Data Placement a First Class Citizen in the Grid
- Kosar, Livny - 2004
BibTeX entry: (Update)
Douglas Thain and Miron Livny. Error scope on a computational grid: Theory and practice. In Proceedings of the 11th IEEE Symposium on High Performance Distributed Computing (HPDC), July 2002. http://citeseer.ist.psu.edu/thain02error.html More
@inproceedings{ thain-error,
author = "Douglas Thain and Miron Livny",
title = "Error Scope on a Computational Grid: Theory and Practice",
booktitle = "Proceedings of the 11th IEEE Symposium on High Performance Distributed Computing (HPDC-11)",
address = "Edinburgh, Scotland"
month = "July",
date = "24--26",
year = 2002,
url = "citeseer.ist.psu.edu/thain02error.html" }
Citations (may not include all citations):
1274
Object-Oriented Software Construction (context) - Meyer - 1997
737
The Java Programming Language (context) - Arnold, Gosling - 1997
650
An axiomatic basis for computer programming (context) - Hoare - 1969
566
Condor - a hunter of idle workstations (context) - Litzkow, Livny et al. - 1988
476
Implementing remote procedure calls
- Birrell, Nelson - 1984
423
End-to-end arguments in system design
- Saltzer, Reed et al. - 1984
380
Design and implementation of the Sun network filesystem
- Sandberg, Goldberg et al. - 1985
317
Kerberos: An authentication service for open network systems
- Steiner, Neuman et al. - 1988
289
The Legion vision of a worldwide virtual computer (context) - Grimshaw, Wulf - 1997
242
Reference Manual (context) - Ellis, Stroustrup - 1992
148
A security architecture for computational grids
- Foster, Kesselman et al. - 1998
101
Supporting checkpointing and process migration outside the U.. (context) - Solomon, Litzkow - 1992
64
The structure of the THE multiprogramming system (context) - Dijkstra - 1967
61
A world-wide distributed system using Java and the Internet
- Chandy, Dimitrov et al. - 1996
54
Exception handling: Issues and a proposed notation (context) - Goodenough - 1975
54
A computation management agent for multiinstitutional grids (context) - Frey, Tannenbaum et al. - 2001
52
Reliability issues in computing system design (context) - Randell, Lee et al. - 1978
47
SuperWeb: research issues in Java-based global computing
- Alexandrov, Ibel et al. - 1997
44
Remote UNIX - Turning idle workstations into cycle servers (context) - Litzkow - 1987
44
When the CRC and TCP checksum disagree
- Stone, Partridge - 2000
36
High-throughput resource management (context) - Livny, Raman - 1998
30
Dependable computing: From concepts to design diversity (context) - Avizienis, Laprie - 1986
30
Replica selection in the globus data grid
- Vazhkudai, Tuecke et al. - 2001
29
Providing resource management services to parallel applicati..
- Pruyne, Livny - 1994
28
Java for parallel computing and as a general language for sc..
- Fox, Furmanski - 1997
28
Asynchronous exceptions in Haskell
- Marlow, Jones et al. - 2001
20
Protocols and services for distributed dataintensive science
- Allcock, Chervenak et al. - 2000
19
Matchmaking Frameworks for Distributed Resource Management (context) - Raman - 2000
16
Netsolve: A network solver for solving computational science.. (context) - Casanova, Dongarra - 1995
15
Autoconf: Generating automatic configuration scripts (context) - Mackenzie, McGrath et al. - 1994
10
Cheap cycles from the desktop to the dedicated cluster: comb..
- Wright - 2001
10
Globus: A metacomputing intrastructure toolkit (context) - Foster, Kesselman - 1997
9
How Java's floating-point hurts everyone everywhere (context) - Kahan, Darcy - 1998
8
Exception handling: The case against (context) - Black - 1982
7
IEEE Transactions on Software Engineering (context) - Liskov, Snyder et al. - 1979
6
Abstract machines for programming language implementation
- Diehl, Hartel et al. - 2000
5
Increasing relevance of memory hardware errors: A case for r..
- Milojicic, Messer et al. - 2000
5
A standard for the transmission of IP datagrams on avian car.. (context) - Waitzman - 1990
4
Exception handling in large Ada systems (context) - Howell, Mularz - 1991
4
Structured Computer Organization (context) - Tannenbaum - 1984
3
JavaGenes and Condor: Cycle-scavenging genetic algorithms
- Globus, Langhirt et al. - 2000
3
Some new transitions in hierarchical level structures (context) - Ekandham, Bernstein - 1978
3
Using reflection for incorporating fault-tolerance technique.. (context) - Nguyen-Tuong, Grimshaw - 1999
3
A proposed solution to the problem of levels in error-messag.. (context) - Efe - 1987
2
Resourceful systems for fault tolerance (context) - Abbot - 1990
1
Making APL error messages kinder and gentler (context) - Jr - 1989
1
Fundamentals of fault-tolerance distributed computing in asy.. (context) - Gartner - 1999
1
Integrating Fault-Tolerance Techniques in Grid Applications (context) - Nguyen-Tuong - 2002
Documents on the same site (http://www.cs.wisc.edu/condor/publications.html): More
High Throughput Computing Resource Management - Historically Users
(Correct)
High Throughput Monte Carlo - Basney, Livny, Tannenbaum (1999)
(Correct)
Matchmaking: Distributed Resource Management for High.. - Raman, Livny, Solomon (1998)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC