2 citations found. Retrieving documents...
John Chapin. Hive: Operating system fault containment for shared-memory multiprocessors. Ph.D. Thesis, Stanford University, 1997.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Virtual Clusters: Resource Mangement on Large Shared-Memory.. - Govil (2000)   (Correct)

....an independent probability of failure. Therefore, large multiprocessors have at a higher risk of experiencing faults. On a fault unaware system, any single fault can crash the entire machine; thus affecting every task running on the system, even those that were not using the resource that failed [10, 56]. These designs have the undesirable property that the probability that a task is affected by a fault is proportional to the size of the system, not the number of resources being used by the task. Fault tolerance is a well known technique for designing systems that can withstand faults without ....

John Chapin. Hive: Operating system fault containment for shared-memory multiprocessors. Ph.D. Thesis, Stanford University, 1997.


End-To-End Fault Containment In Scalable Shared-Memory.. - Teodosiu (2000)   (1 citation)  (Correct)

....requires a careful design of both its hardware and its system software. Current multiprocessors and operating systems are unable to cope with the loss of any essential hardware resource, such as the failure of a processor or of a memory board. Previous work on providing fault containment [Chapin95][Chapin97] has focused on adding support to the operating system, but has not covered the hardware aspects that are instrumental in achieving a complete fault containment solution, such as diagnosing the machine after a fault, isolating the failed resources, and recovering the state of the shared memory ....

....benefits in terms of administration costs and resource utilization efficiency. Distributed systems (such as networks of workstations) are able to offer superior reliability by providing fault containment. With fault containment the effects of a fault are limited to a small portion of the system[Chapin97], instead of being allowed to spread to the entire system. For instance, workstation users on a LAN (Local Area Network) hardly expect their machines to all crash when any individual machine goes down. Such behavior is extremely unlikely and can usually attributed to malicious intent. In a ....

[Article contains additional citation context not shown here]

J. Chapin. "Hive: Operating System Fault Containment for Shared-Memory Multiprocessors." Ph.D. Thesis, Stanford University CSL-TR-97-712, January 1997.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC