18 citations found. Retrieving documents...
D. Oppenheimer, A. Ganapathi, and D. A. Patterson. Why do Internet services fail, and what can be done about it? In Proceedings of 4th USITS, Mar. 2003.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
SkipNet: A Scalable Overlay Network with Practical Locality.. - Michael (2003)   (84 citations)  (Correct)

....performed over any naming subtree of the SkipNet but not over an arbitrary subset of the nodes of the overlay network. Another limitation is that CLB domain is encoded in the name of a data object. Thus, transparent remapping to a different load balancing domain is not possible. Previous studies [17, 19] indicate that network connectivity failures in the Internet today are due primarily to Border Gateway Protocol (BGP) misconfigurations and faults. Other hardware, software and human failures play a lesser role. As a result, node failures in overlay networks are not independent; instead, nodes ....

D. Oppenheimer, A. Ganapathi, and D. A. Patterson. Why do Internet services fail, and what can be done about it? In Proceedings of 4th USITS, Mar. 2003.


SkipNet: A Scalable Overlay Network with Practical .. - Harvey, Dunagan.. (2003)   (84 citations)  (Correct)

....over any naming subtree of the SkipNet but not over an arbitrary subset of the nodes of the overlay network. Another limitation is that CLB domain is encoded in the name of a data object. Thus, transparent remapping to a different load balancing domain is not possible. Previous studies [21, 24] indicate that network connectivity failures in the Internet today are due primarily to Border Gateway Protocol (BGP) misconfigurations and faults. Other hardware, software and human failures play a lesser role. As a result, node failures in overlay networks are not independent; instead, nodes ....

....used to repair the SkipNet overlay when such failures occur. One key benefit of SkipNet s locality properties is graceful degradation in response to disconnection one of the more common forms of Internet failure, which can be caused by router misconfigurations and link and router faults [21, 24]. Because SkipNet orders nodes according to their names, and assuming that organizations assign node names with one or a few organizational pre fixes, an organization s nodes are naturally arranged into a few contiguous overlay segments. Should an organization become disconnected, its segments ....

D. Oppenheimer, A. Ganapathi, and D. A. Patterson. Why do Internet services fail, and what can be done about it? In Proceedings of Fourth USENIX Symposium on Internet Technologies and Systems (USITS '03), Mar. 2003.


Efficient Recovery from Organizational Disconnects in.. - Harvey, Jones, Theimer.. (2003)   (Correct)

....be obtained without having to worry about the associated message traffic having to traverse external nodes that might be either hostile or unavailable. One of the more common forms of Internet failure is disconnection of an organization due to router misconfigurations and link and router faults [6, 7]. When such a disconnection occurs, SkipNet s locality properties enable a graceful degradation of functionality wherein local overlay traffic and hence access to data stored in locally defined DHTs still remains possible. Assuming that organizations assign node names with one or a few ....

D. Oppenheimer, A. Ganapathi, and D. A. Patterson. Why do Internet services fail, and what can be done about it? In Proceedings of 4th USITS, Mar. 2003.


Improving Availability with Recursive Micro-Reboots: A.. - Candea, Cutler, Fox (2003)   (1 citation)  (Correct)

.... hardware manufacturers exclude operator error and environmental failures from their calculations, even though they account for 7 28 of all unplanned downtime in some cluster and mainframe installations [94] and more than half of unplanned downtime in a selection of contemporary Internet services [80]. On the other hand, MTTR can be directly measured, making MTTR claims independently verifiable. In the case of software, for example, MTTF s are on the order of days or months, while MTTR varies from minutes to hours. For end user interactive services, such as Web sites, lowering MTTR can ....

D. Oppenheimer, A. Ganapathi, and D. Patterson. Why do internet services fail, and what can be done about it? In Proc. 4th USENIX Symposium on Internet Technologies and Systems, Seattle, WA, 2003.


Experience with Evaluating Human-Assisted Recovery Processes - Aaron Brown Leonard   Self-citation (Patterson)   (Correct)

No context found.

D. Oppenheimer, A. Ganapathi, and D. A. Patterson. Why do Internet services fail, and what can be done about it? 4th USENIX Symposium on Internet Technologies and Systems (USITS' 03). Seattle, WA, March 2003.


Path-Based Failure and Evolution Management - Mike Chen Anthony (2004)   (7 citations)  Self-citation (Patterson)   (Correct)

No context found.

OPPENHEIMER, D., GANAPATHI, A., AND PATTERSON, D. A. Why do Internet services fail, and what can be done about it? In USITS (March 2003).


The Importance of Understanding Distributed System Configuration - Oppenheimer   Self-citation (Oppenheimer)   (Correct)

....to follow cause and effect chains back to the problem s root cause. We recently conducted a study of the causes of user visible failures in Internet services by reading over 500 problem reports from three large scale ( 10 100 million hits per day) geographically distributed Internet services [2]. Qualitatively we found significant difficulties related to both of the above configuration issues, often leading directly to service unavailability or lengthening the time to diagnose or repair a failure. Quantitative evidence indicates trouble with establishing configurations: we found that (i) ....

....problems in 11 of 40 failures and would have reduced time to repair problems in 12 of 40 failures. Three incidents were particularly instructive in demonstrating the importance of operators understanding system configuration. These and other such case studies are described in more detail in [2]. Our first case study demonstrates principles (1) and (2) above. Operations staff were notified that users were complaining that the service was occasionally losing their newsgroup postings. In correct operation, a user s posting to a local service newsgroup is intercepted by the service s ....

David Oppenheimer, Archana Ganapathi, and David A. Patterson. Why do Internet services fail, and what can be done about it? To appear in 4th USENIX Symposium on Internet Technologies and Systems, 2003.


USENIX Association - Usits Th Usenix (1992)   (2 citations)  (Correct)

No context found.

D. Oppenheimer, A. Ganapathi, and D. A. Patterson. Why do Internet services fail, and what can be done about it? In Proceedings of 4th USITS, Mar. 2003.


A Large-Scale Study of Failures in High-Performance.. - Bianca Schroeder Garth   (Correct)

No context found.

D. L. Oppenheimer, A. Ganapathi, and D. A. Patterson. Why do internet services fail, and what can be done about it? In USENIX Symp. on Internet Technologies and Systems, 2003.


Improving End-to-End Availability Using Overlay Networks - Andersen (2005)   (1 citation)  (Correct)

No context found.

David Oppenheimer, Archana Ganapathi, and David A. Patterson. Why do Internet services fail, and what can be done about it? In Proc. 4th USENIX Symposium on Internet Technologies and Systems (USITS), Seattle, Washington, March 2003.


A Large-Scale Study of Failures in High-Performance.. - Bianca Schroeder Garth   (Correct)

No context found.

D. L. Oppenheimer, A. Ganapathi, and D. A. Patterson. Why do internet services fail, and what can be done about it? In USENIX Symp. on Internet Technologies and Systems, 2003.


Presence-Based Availability and P2P Systems - Dunn, Zahorjan, Gribble, Levy (2005)   (Correct)

No context found.

D. Oppenheimer, A. Ganapathi, and D. Patterson. Why do Internet services fail, and what can be done about it? In Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems (USITS), Seattle, WA, March 2003.


Reducing Enterprise Java Bean Deployment Costs via.. - White, Schmidt   (Correct)

No context found.

Oppenheimer, D., Ganapathi, A., Patterson, D.: Why do Internet Services Fail, and What can be Done about It?, In: Proc. USENIX Symposium on Internet Technologies and Systems (2003)


Byzantine Fault Tolerance in Long-Lived Systems - Rodrigo Rodrigues And (2004)   (1 citation)  (Correct)

No context found.

D. Oppenheimer, A. Ganapathi, and D. A. Patterson. Why do internet services fail, and what can be done about it? In Proc. 4th USITS, Mar. 2003.


Dependency Isolation for Thread-based Multi-tier Internet .. - Chu, Shen, Tang, Yang.. (2003)   (Correct)

No context found.

D. Oppenheimer, A. Ganapathi, and D. A. Patterson. Why do Internet services fail, and what can be done about it? In Proc. of the 4th USENIX Symposium on Internet Technologies and Systems (USITS '03), Seattle, WA, Mar. 2003.


Detecting Application-Level Failures in Component-based.. - Emre Kcman And (2004)   (Correct)

No context found.

D. Oppenheimer, A. Ganapathi, and D. Patterson. Why do internet services fail, and what can be done about it? In 4th USENIX Symposium on Internet Technologies and Systems (USITS '03), 2003.


Discovering Correctness Constraints for Self-Management of.. - Configuration Emre Kcman   (Correct)

No context found.

D. Oppenheimer, A. Ganapathi, and D. Patterson. Why do Internet services fail, and what can be done about it? In 4th USENIX Symposium on Internet Technologies and Systems (USITS '03), 2003.


Nonintrusive Failure Detection and Recovery for.. - Sultan, Bohra..   (Correct)

No context found.

D. Oppenheimer, A. Ganapathi, and D. Patterson. Why Do Internet Services Fail, and What Can Be Done About It? In Proc. 4th USENIX Symp. on Internet Technologies and Systems (USITS), Mar. 2003.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC