Alternate document:   Details   Error Management in the Pluggable File System (02) Douglas Thain, Miron Livny

See this document in CiteSeerX!

Error Scope on a Computational Grid: Theory and Practice (2002)  (Make Corrections)  (4 citations)
Douglas Thain, Miron Livny
Proceedings of the 11th IEEE Symposium on High Performance Distributed Computing (HPDC-11)



  Home/Search   Context   Related

 
View or download:
wisc.edu/condor/doc/errorscope.ps
wisc.edu/~thain/libra...errorscope.pdf
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  wisc.edu/condor/publications (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: Error propagation is a central problem in grid computing. We re-learned this while adding a Java feature to the Condor computational grid. Our initial experience with the system was negative, due to the large number of new ways in which the system could fail. To reason about this problem, we developed a theory of error propagation. Central to our theory is the concept of an error's scope, defined as the portion of a system that it invalidates. With this theory in hand, we recognized that the... (Update)

Cited by:   More
Synthetic Grid Workloads With Ibis, Koala, And Grenchmark - Alexandru Iosup And   (Correct)
Distributed Computing in Practice: The Condor Experience - Thain, Tannenbaum, Livny (2004)   (Correct)
Phoenix: Making Data-intensive Grid Applications Fault-tolerant - George Kola Tevfik (2004)   (Correct)

Similar documents (at the sentence level):
46.5%:   Coordinating Access to Computation and Data in Distributed Systems - Thain (2004)   (Correct)

Active bibliography (related documents):   More   All
0.5:   A Semantic Characterisation for Faults in Replicated Systems - Krishnan   (Correct)
0.4:   Parrot: Transparent User-Level Middleware for Data Intensive.. - Thain, Livny (2003)   (Correct)
0.4:   Parrot: Transparent User-Level Middleware for Data-Intensive.. - Thain, Livny (2003)   (Correct)

Similar documents based on text:   More   All
0.5:   Multiple Bypass: Interposition Agents for Distributed Computing - Thain, Livny (2001)   (Correct)
0.5:   Bypass: A Tool for Building Split Execution Systems - Thain, Livny (2000)   (Correct)
0.5:   Utilizing Widely Distributed Computational Resources.. - Basney, Livny, Mazzanti (2000)   (Correct)

Related documents from co-citation:   More   All
5:   A computation management agent for multiinstitutional grids (context) - Frey, Tannenbaum et al.
4:   Condor -- a distributed job scheduler (context) - Tannenbaum, Wright et al. - 2001
3:   Stork: Making Data Placement a First Class Citizen in the Grid - Kosar, Livny - 2004

BibTeX entry:   (Update)

Douglas Thain and Miron Livny. Error scope on a computational grid: Theory and practice. In Proceedings of the 11th IEEE Symposium on High Performance Distributed Computing (HPDC), July 2002. http://citeseer.ist.psu.edu/thain02error.html   More

@inproceedings{ thain-error,
  author = "Douglas Thain and Miron Livny",
  title = "Error Scope on a Computational Grid: Theory and Practice",
  booktitle = "Proceedings of the 11th IEEE Symposium on High Performance Distributed Computing (HPDC-11)",
  address = "Edinburgh, Scotland"
  month = "July",
  date = "24--26",
  year = 2002,
  url = "citeseer.ist.psu.edu/thain02error.html" }
Citations (may not include all citations):
1274   Object-Oriented Software Construction (context) - Meyer - 1997
737   The Java Programming Language (context) - Arnold, Gosling - 1997
650   An axiomatic basis for computer programming (context) - Hoare - 1969
566   Condor - a hunter of idle workstations (context) - Litzkow, Livny et al. - 1988
476   Implementing remote procedure calls - Birrell, Nelson - 1984
423   End-to-end arguments in system design - Saltzer, Reed et al. - 1984
380   Design and implementation of the Sun network filesystem - Sandberg, Goldberg et al. - 1985
317   Kerberos: An authentication service for open network systems - Steiner, Neuman et al. - 1988
289   The Legion vision of a worldwide virtual computer (context) - Grimshaw, Wulf - 1997
242   Reference Manual (context) - Ellis, Stroustrup - 1992
148   A security architecture for computational grids - Foster, Kesselman et al. - 1998
101   Supporting checkpointing and process migration outside the U.. (context) - Solomon, Litzkow - 1992
64   The structure of the THE multiprogramming system (context) - Dijkstra - 1967
61   A world-wide distributed system using Java and the Internet - Chandy, Dimitrov et al. - 1996
54   Exception handling: Issues and a proposed notation (context) - Goodenough - 1975
54   A computation management agent for multiinstitutional grids (context) - Frey, Tannenbaum et al. - 2001
52   Reliability issues in computing system design (context) - Randell, Lee et al. - 1978
47   SuperWeb: research issues in Java-based global computing - Alexandrov, Ibel et al. - 1997
44   Remote UNIX - Turning idle workstations into cycle servers (context) - Litzkow - 1987
44   When the CRC and TCP checksum disagree - Stone, Partridge - 2000
36   High-throughput resource management (context) - Livny, Raman - 1998
30   Dependable computing: From concepts to design diversity (context) - Avizienis, Laprie - 1986
30   Replica selection in the globus data grid - Vazhkudai, Tuecke et al. - 2001
29   Providing resource management services to parallel applicati.. - Pruyne, Livny - 1994
28   Java for parallel computing and as a general language for sc.. - Fox, Furmanski - 1997
28   Asynchronous exceptions in Haskell - Marlow, Jones et al. - 2001
20   Protocols and services for distributed dataintensive science - Allcock, Chervenak et al. - 2000
19   Matchmaking Frameworks for Distributed Resource Management (context) - Raman - 2000
16   Netsolve: A network solver for solving computational science.. (context) - Casanova, Dongarra - 1995
15   Autoconf: Generating automatic configuration scripts (context) - Mackenzie, McGrath et al. - 1994
10   Cheap cycles from the desktop to the dedicated cluster: comb.. - Wright - 2001
10   Globus: A metacomputing intrastructure toolkit (context) - Foster, Kesselman - 1997
9   How Java's floating-point hurts everyone everywhere (context) - Kahan, Darcy - 1998
8   Exception handling: The case against (context) - Black - 1982
7   IEEE Transactions on Software Engineering (context) - Liskov, Snyder et al. - 1979
6   Abstract machines for programming language implementation - Diehl, Hartel et al. - 2000
5   Increasing relevance of memory hardware errors: A case for r.. - Milojicic, Messer et al. - 2000
5   A standard for the transmission of IP datagrams on avian car.. (context) - Waitzman - 1990
4   Exception handling in large Ada systems (context) - Howell, Mularz - 1991
4   Structured Computer Organization (context) - Tannenbaum - 1984
3   JavaGenes and Condor: Cycle-scavenging genetic algorithms - Globus, Langhirt et al. - 2000
3   Some new transitions in hierarchical level structures (context) - Ekandham, Bernstein - 1978
3   Using reflection for incorporating fault-tolerance technique.. (context) - Nguyen-Tuong, Grimshaw - 1999
3   A proposed solution to the problem of levels in error-messag.. (context) - Efe - 1987
2   Resourceful systems for fault tolerance (context) - Abbot - 1990
1   Making APL error messages kinder and gentler (context) - Jr - 1989
1   Fundamentals of fault-tolerance distributed computing in asy.. (context) - Gartner - 1999
1   Integrating Fault-Tolerance Techniques in Grid Applications (context) - Nguyen-Tuong - 2002

Documents on the same site (http://www.cs.wisc.edu/condor/publications.html):   More
High Throughput Computing Resource Management - Historically Users   (Correct)
High Throughput Monte Carlo - Basney, Livny, Tannenbaum (1999)   (Correct)
Matchmaking: Distributed Resource Management for High.. - Raman, Livny, Solomon (1998)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC