See this document in CiteSeerX!

Towards Fingerpointing in the Emulab Dynamic Distributed System  (Make Corrections)  
Michael P. Kasick, Priya Narasimhan Electrical Computer Engineering...



  Home/Search   Context   Related

 
View or download:
cmu.edu/PDLFTP/st...eportworlds06.pdf
Cached:  PDF   PS.gz  PS  Image  Update  Help

From:  cmu.edu/Publications/index (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: In the large-scale Emulab distributed system, the many failure reports make skilled operator time a scarce and costly resource, as shown by statistics on failure frequency and root cause. We describe the lessons learned with error reporting in Emulab, along with the design, initial implementation, and results of a new local erroranalysis approach that is running in production. Through structured error reporting, association of context with each error-type, and propagation of both error-type ... (Update)

Active bibliography (related documents):   More   All
1.1:   Group Communication: - Helping Or Obscuring   (Correct)
0.6:   Using Queries for Distributed Monitoring and Forensics - Singh, Roscoe, Maniatis.. (2006)   (Correct)
0.1:   Towards self-predicting systems: What if you could ask.. - Thereska, Narayanan.. (2005)   (Correct)

Similar documents based on text:
0.0:   Unknown -   (Correct)

BibTeX entry:   (Update)

@misc{ priya-towards,
  author = "Michael Kasick Priya",
  title = "Towards Fingerpointing in the Emulab Dynamic Distributed System",
  url = "citeseer.ist.psu.edu/752816.html" }
Citations (may not include all citations):
117   An integrated experimental environment for distributed syste.. - WHITE, LEPREAU et al. - 2002
16   Performance debugging for distributed systems of black boxes - AGUILERA, MOGUL et al. - 2003
11   Using Magpie for request extraction and workload modelling (context) - BARHAM, DONNELLY et al. - 2004
2   and retrieving system history (context) - COHEN, ZHANG et al. - 2005
2   Pip: Detecting the unexpected in distributed systems (context) - REYNOLDS, KILLIAN et al. - 2006
2   Detecting application-level failures in component-based inte.. (context) - KICIMAN, FOX - 2005
1   Advanced tools for operators at Amazon (context) - BODIK, FOX et al. - 2006

Documents on the same site (http://www.pdl.cs.cmu.edu/Publications/index.html):   More
Backward Error Recovery in Redundant Disk Arrays - II, Gibson (1994)   (Correct)
Filesystems for Network-Attached Secure Disks - Gibson, al. (1997)   (Correct)
Informed Multi-Process Prefetching and Caching - Tomkins, Patterson, Gibson (1997)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC