12 citations found. Retrieving documents...
D. J. Sorin, M. M. K. Martin, M. D. Hill, and D. A. Wood. Safetynet: Improving the availability of shared-memory multiprocessors with global checkpoint/recovery. In Proceedings of the 29th Annual International Symposium on Computer Architecture, pages 123--134. IEEE Computer Society, 2002.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Appears in the Proceedings of the 30th Annual International - Symposium On Computer   (Correct)

....As the computing industry continues to mature, the issues of reliability, maintainability, cost of ownership, and end user satisfaction are receiving more attention. As a result, there is now more emphasis on devising novel architectural support for avoiding or tolerating hardware failures (e.g. [2, 16, 20, 24]) While this is a welcome trend, we note that some studies show that software failures account for as much as 40 of computer system failures [14] Unfortunately, removing software bugs is a task that requires enormous human labor. Entire teams of people are dedicated to test the software and ....

D. J. Sorin, M. M. K. Martin, M. D. Hill, and D. A. Wood. SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery. In 29th Intl. Symp. on Computer Architecture, pages 123--134, 2002.


Detailed Design and Evaluation of Redundant Multithreading.. - Mukherjee, al. (2002)   (10 citations)  (Correct)

....on the processor s intrinsic checkpointed state for recovery. Unlike SRTR, the RMT techniques in this paper assume that instructions are compared for faults after the instructions retire and rely on explicit software checkpoints (e.g. as in Tandem systems [30] or hardware checkpoints (e.g. [25], 13] for recovery. 9. CONCLUSIONS with reductions in voltage levels, has made microprocessors extremely vulnerable to transient faults. In a multithreaded environment, we can detect these faults by running two copies of the same program as separate threads, feeding them identical inputs, and ....

Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill, and David A. Wood, "SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery," Proc. 29 h An- nual Int'l Syrup. on Computer Architecture, May 2002.


Speculation-Based Techniques for Lockfree Execution of Lock-Based .. - Rajwar (2002)   (Correct)

....section and does not restrict the dynamic number of store instructions executed in the critical section. Use the processor cache. Alternatively, the speculative memory state can be exposed to the processor caches. Other proposals have been made for allowing caches to buffer speculative state [42, 52, 61, 155] and these proposals can be adapted for use in SLE. The requirement, as is the case for any speculative technique that allows stores to retire speculatively, is that an architecturally correct value of the speculatively modified cache block must be available in the event of a misspeculation. This ....

....is that an architecturally correct value of the speculatively modified cache block must be available in the event of a misspeculation. This can be achieved by using a special buffer below the level one cache to store the architecturally correct values or using the level two cache for doing so [155]. 3.9.3 Committing speculative state The discussion in this section focuses on committing register state and committing speculative memory state when speculation is successful. 3.9.3.1 Committing processor register state If the reorder buffer approach is used to implement SLE, the processor ....

Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill, and David A. Wood. SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery. In Proceedings of the 29th Annual International Symposium on Computer Architecture, pages 123--134, May 2002.


Variability in Architectural Simulations of Multi-threaded.. - Alameldeen, Wood (2003)   (8 citations)  (Correct)

....commercial workloads to increase accuracy. Our work distinguishes itself from other studies by focussing on the variability phenomenon in simulation and providing a methodology to address it. Very few previous studies report results from multiple simulation runs to account for space variability [23, 24, 33]. Changes in program phase behavior were explored for SPEC benchmarks [20, 30] Simulation errors introduced by selecting particular program phases were investigated by Sherwood et al. 31] Statistical simulation based on program traces was used by Oskin et al. 29] Some architectural studies ....

Daniel J. Sorin, Milo M.K. Martin, Mark D. Hill, and David A. Wood. SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery. In Proceedings of the 29th Annual International Symposium on Computer Architecture, pp. 123-- 134, May 2002.


Dynamic Data Replication: an Approach to Providing.. - Christodoulopoulou, .. (2003)   (1 citation)  (Correct)

....I O) and failover to backup servers to provide continuous service operation at the application level. Our approach is orthogonal and aims to improve the availability of a single, software shared memory server, by means of checkpointing and rollback recovery. Similarly to our work, the authors in [25, 24, 2] address fault tolerance in the context of distributed shared memory (DSM) machines but their approach requires somewhat extensive hardware support. Software based approaches to building fault tolerant systems of commodity, off the self components include active replication [3] development of ....

D. J. Sorin, M. M. K. Martin, M. D. Hill, and D. A. Wood. Safetynet: Improving the availability of shared memory multiprocessors with global checkpoint/recovery. In Proceedings of the 29th Int'l Symposium on Computer Architecture, May 2002.


WaveScalar - Swanson, Michelson, Oskin (2003)   (Correct)

....systems can speculate by initiating speculative execution at a node with one message and completing (or squashing) it with a second message once the correct path or value is available. We are currently exploring several existing techniques for control and fine and coarse grained memory speculation [55, 40, 69] and determining whether and how to integrate them into our design. Defect tolerance: Large WaveCache systems will suffer from defective nodes, clusters, and communication networks. The fact that our architecture is uniform and decentralized means that we should be able to map around such ....

D. J. Sorin, M. M. K. Martin, M. D. Hill, and D. A. Wood, "Safetynet: improving the availability of shared memory multiprocessors with global checkpoint/recovery," in Proceedings of the 29th annual international symposium on Computer architecture, pp. 123--134, IEEE Computer Society, 2002. 22


BugNet: Continuously Recording Program Execution for.. - Narayanasamy, Pokam.. (2005)   Self-citation (Symposium Architecture)   (Correct)

No context found.

D. J. Sorin, M. M. K. Martin, M. D. Hill, and D. A. Wood. Safetynet: Improving the availability of shared-memory multiprocessors with global checkpoint/recovery. In Proceedings of the 29th Annual International Symposium on Computer Architecture, pages 123--134. IEEE Computer Society, 2002.


Fingerprinting: Bounding Soft-Error Detection.. - Smolens, Gold.. (2004)   (1 citation)  Self-citation (Architecture)   (Correct)

No context found.

D. J. Sorin, M. M. K. Martin, M. D. Hill, and D. A. Wood. SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery. In Proceedings of the 29th Annual International Symposium on Computer Architecture, June 2002.


Using Speculation to Simplify Multiprocessor Design - Sorin, al. (2004)   Self-citation (Hill)   (Correct)

No context found.

D. J. Sorin, M. M. Martin, M. D. Hill, and D. A. Wood. SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery. In Proceedings of the 29th Annual International Symposium on Computer Architecture, pages 123--134, May 2002.


Using Lightweight Checkpoint/Recovery to Improve the Availability.. - Sorin (2002)   Self-citation (Availability Memory)   (Correct)

No context found.

Daniel J. Sorin, Milo M.K. Martin, Mark D. Hill, and David A. Wood. SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery. In Proceedings of the 29th Annual International Symposium on Computer Architecture, pages 123--134, May 2002.


Checkpointing Shared Memory Programs at the.. - Bronevetsky, Marques, .. (2004)   (Correct)

No context found.

D. Sorin, M. Martin, M. Hill, and D. Wood. SafetyNet: Improving the availability of shared memory multiprocessors with global checkpoint/recovery. In Proceedings of the International Symposium on Computer Architecture (ISCA 2002.


Application-level Checkpointing for Shared Memory.. - Bronevetsky, Marques.. (2004)   (Correct)

No context found.

D. Sorin, M. Martin, M. Hill, and D. Wood. SafetyNet: Improving the availability of shared memory multiprocessors with global checkpoint/recovery. In Proceedings of the International Symposium on Computer Architecture (ISCA 2002.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC