Home     Top: Operating Systems: Fault Tolerance    [Clusters   Distributed   Fault Tolerance   Linux   Memory Management   Microkernel   Real-time   Unix   Windows]

Change ordering:   Authority   Hubs (tutorials)   Date   Expected authority       Show titles only
Tutorials/surveys/introductory articles (ordered by the degree of citation of authoritative articles)

This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.

14758.6   A Survey of Multiprocessor Operating System Kernels - Mukherjee, Schwan, Gopinath (1993)   (Correct)
Multiprocessors have been accepted as vehicles for improved computing speeds, cost/performance, and enhanced reliability or availability. However, the added performance requirements of user programs a... / A Survey of Multiprocessor Operating System Kernels DRAFT

14171.4   Configurable Fault-Tolerant Distributed Services - Hiltunen (1996)   (Correct)
Fault tolerance---that is, the ability of a system to continue providing its specified service despite failures---is becoming more important as computers are increasingly used in application areas suc... / . . Operating Systems br models different parts of the system operation are encapsulated in object

13156.9   Building Secure and Reliable Network Applications - Birman (1996)   (Correct)
ly, the remote procedure call problem, which an RPC protocol undertakes to solve, consists of emulating LPC using message passing. LPC has a number of "properties" -- a single procedure invocation res... / . Related Readings . Operating System Support For High Performance

8661.5   Abstractions for Constructing Dependable Distributed Systems - Mishra, Schlichting (1992)   (Correct)
ions for Constructing Dependable Distributed Systems Shivakant Mishra 1 and Richard D. Schlichting TR 92-19 Abstract Distributed systems, in which multiple machines are connected by a communications n... / to standard hardware or operating system services but with improved br is as a single Operating System Operating System Services Services

8084.7   PVS Bibliography - Rushby (1998)   (Correct)
this report, including the BibT E X bibliography, are available at http://www.csl.sri.com/pvs-bib.html. PVS users are encouraged to use the BibT E X entries from these files, which are as accurate, co... / the impression that the full operating system was verified as opposed to br to critical algorithms for fault tolerance in automobile and aircraft

8038.7   An Adaptive Resource Management Architecture For Global Distributed.. - Venkatasubramanian (1998)   (Correct)
Advances in networking, communication, storage, computing, and multimedia technologies coupled with many emerging application areas is fueling the merger of computing and communication systems. This w... / In early object-oriented operating systems such as Choices and br The short version of ART system operational semantics using

7644.0   HFS: A flexible file system for shared-memory multiprocessors - Krieger (1994)   (Correct)
The HURRICANE File System (HFS) is designed for large-scale, shared-memory multiprocessors. Its architecture is based on the principle that a file system must support a wide variety of file structures... / HFS as part of the HURRICANE operating system running on the HECTOR br . . Logical file system operations

7632.2   A System For Constructing Configurable High-Level Protocols - Bhatti (1996)   (Correct)
13 CHAPTER 1: INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15 1.1 Distributed Systems : : : : ... / a user level task on the Mach operating system. Additional micro-protocols br for many conversations about fault-tolerance and the event-driven model

7071.5   Operating System Services for Wide-Area Applications - Vahdat (1998)   (Correct)
Operating System Services for Wide-Area Applications by Mohammad Amin Vahdat Doctor of Philosophy in Computer Science University of California, Berkeley Professor Thomas E. Anderson, Cochair Pro... / Operating System Services for Wide-Area br . . . Scalability and Fault Tolerance .

6931.7   Safety-Critical Systems, Formal Methods and Standards - Bowen, Stavridou (1993)   (Correct)
Standards concerned with the development of safety-critical systems, and the software in such systems in particular, abound today as the software crisis increasingly affects the world of embedded comp... / The binding of application to operating system to architecture is a prime br failures was about minutes. Fault-tolerance was achieved by detecting

6283.1   Asynchrony in parallel computing: From dataflow to multithreading - Silc, Robic, Ungerer (1997)   (Correct)
The paper presents an overview of the parallel computing models, architectures, and research projects that are based on asynchronous instruction scheduling. It starts with pure dataflow computing mode... / e.g. explicit operating system calls A MIMD computer in br of Computer Design and Fault Tolerance University of Karlsruhe

6102.3   On the Integration of Concurrency, Distribution and Persistence - Munro (1993)   (Correct)
The principal tenet of the persistence model is that it abstracts over all the physical properties of data such as how long it is stored, where it is stored, how it is stored, what form it is kept in ... / NH decentralise the operating system across a number of nodes. br under the VAX VMS system. This system operated by mapping a file holding

5961.0   Supporting Fault-Tolerant Parallel Programming In Linda - Bakken (1994)   (Correct)
17 CHAPTER 1: INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : 19 1.1 Motivation for Parallel Programming : : ... / turn with the x-kernel an operating system kernel that provides support br . . Fault Tolerance Abstractions

5499.4   The Freeze Free Algorithm For Process Migration - Roush (1995)   (Correct)
tical pages without page faults. The program counter identifies the current code page, and the stack pointer identifies the current stack page. A heuristic identifies the current heap page by examinin... / . . Operating System br of a distributed file system. Operating Systems Review

5401.3   COYOTE: A System for Constructing Fine-Grain Configurable.. - Bhatti, Hiltunen, Schlichting, Chiu (1998)   (Correct)
Communication-oriented abstractions such as atomic multicast, group RPC, and protocols for location-independent mobile computing can simplify the development of complex applications built on distribut... / Reusable software D. . Operating Systems Communication Management br Systems Reliability -Fault-tolerance D. . Operating Systems

5026.4   Hive: Operating System Fault Containment For Shared-Memory.. - Chapin (1997)   (Correct)
Reliability and scalability are major concerns when designing general-purpose operating systems for large-scale shared-memory multiprocessors. This dissertation describes Hive, an operating system wit... / I Hive Operating System Fault Containment For

4961.8   Report of the Working Group on Storage I/O for Large-Scale Computing - Gibson, Vitter, Wilkes (1996)   (Correct)
We discuss the strategic directions and challenges in the management and use of storage systems -- those components of computer systems responsible for the storage and retrieval of data. The performan... / methodology D. . Operating Systems Storage br Systems Reliability-Fault-tolerance D. . Operating Systems

4832.7   The Meerkat Multicomputer: Tradeoffs in Multicomputer Architecture - Bedichek (1994)   (Correct)
The Meerkat Multicomputer: Tradeoffs in Multicomputer Architecture by Robert C. Bedichek Co-Chairpersons of Supervisory Committee: Professor Henry M. Levy Professor Edward D. Lazowska Department of C... / . . Operating System Implications of Meerkat- br the Intel Touchstone Delta System operated by Caltech on behalf of the

4741.1   NASA Langley's Research and Technology-Transfer Program in Formal.. - Butler, Carreño, Di Vito.. (1998)   (Correct)
This paper presents an overview of NASA Langley's research program in formal methods. The major goals of this work are to make formal methods practical for use on high integrity systems, to orchestrat... / and actuators. The RCP operating system provides the applications br Often the physical fault-tolerance features of these systems are

4729.7   Performance Availability for Networks of Workstations - Arpaci-Dusseau (1999)   (Correct)
Performance Availability for Networks of Workstations by Remzi H. Arpaci-Dusseau Software systems for large-scale distributed and parallel machines are difficult to build. When run in dynamic, pro... / . . . Operating System . br are unaware of the specifics of system operation. The problem of attaining

4644.4   Program Representation And Execution In Real-Time Multiprocessor.. - Niehaus (1994)   (Correct)
PROGRAM REPRESENTATION AND EXECUTION IN REAL-TIME MULTIPROCESSOR SYSTEMS FERUARY 1994 DOUGLAS NIEHAUS, B.S., NORTHWESTERN UNIVERSITY M.S., UNIVERSITY OF MICHIGAN Ph.D., UNIVERSITY OF MASSACHUSETTS AMH... / methods a predictable operating system implementation and real-time br which performs the required file system operation and writes the appropriate

4615.2   The Enterprise Executive - Wong (1992)   (Correct)
Enterprise is a graphical programming environment for designing, coding, debugging, testing, monitoring, profiling and executing programs in a distributed hardware environment. Enterprise code looks l... / refers to the computer's operating system. As Comer wrote br . . . Fault Tolerance

4586.4   System Support for Software Fault Tolerance in Highly Available.. - Sullivan (1992)   (Correct)
Today, software errors are the leading cause of outages in fault tolerant systems. System availability can be improved despite software errors by fast error detection and recovery techniques that mini... / Ibm Systems Programs The Mvs Operating System And The Ims Dbms And Db

4536.3   Formal Verification for Fault-Tolerant Architectures: Prolegomena to.. - Owre, Rushby, Shankar, von Henke (1995)   (Correct)
PVS is the most recent in a series of verification systems developed at SRI. Its design was strongly influenced, and later refined, by our experiences in developing formal specifications and mechanica... / normally associated with an operating system kernel e.g.interrupt br clock synchronization fault tolerance flight control formal

4296.9   Research Topics for Graduate Students - Science, York (1994)   (Correct)
This document outlines some of the present research interests of most members of the Department who are in a position to supervise the research of students entering during 1999/2000 academic year. The... / research interests include operating systems and kernels virtual binary br in software functionality fault-tolerance is achieved. Problems with

4288.2   End-To-End Fault Containment In Scalable Shared-Memory Multiprocessors - Teodosiu (2000)   (Correct)
Current shared-memory multiprocessors suffer from an inherent fragility, since a single hardware or system software failure can cause the entire machine to crash. This dissertation describes a combina... / unmodified off-the-shelf operating systems. I have validated this

4260.7   Local Anonymity In The Internet - Martin, Jr. (1999)   (Correct)
Packet-switched computer networks of all sizes are widely used for personal, professional, and governmental communication. However, the speed, versatility, and largely unregulated nature of computer n... / . . Operating System . br . . . Fault Tolerance .

4237.7   A Proxy Based Filtering Mechanism for the Mobile Environment - Zenel (1998)   (Correct)
A Proxy Based Filtering Mechanism for the Mobile Environment Bruce Zenel Host mobility complicates the standard networking model in unexpected ways. It increases network heterogeneity, causing diff... / Related Work . Operating System Support for Dynamic Systems . br to the area in which my system operates. . Implementing another

4217.6   Scheduling Algorithms and Operating Systems Support for Real-Time.. - Ramamritham, Stankovic (1994)   (Correct)
This paper summarizes the state of the real-time field in the areas of scheduling and operating system kernels. Given the vast amount of work that has been done by both the operations research and com... / Scheduling Algorithms and Operating Systems Support for Real-Time br . . Scheduling with Fault Tolerance Constraints

4055.3   An Agent-Based Approach to the Design of Rapidly Deployable Fault.. - Paredis (1996)   (Correct)
There exists a need for manipulators that are more flexible and reliable than the current fixed configuration manipulators. Indeed, robot manipulators can be easily reprogrammed to perform different t... / . . The Chimera Real-Time Operating br . Fault Tolerance and

4010.4   Efficient Implementations of Software Architectures via Partial.. - Marlet, Thibault, Consel (1999)   (Correct)
The notion of flexibility (that is, the ability to adapt to changing requirements or execution contexts) is recognized as a key concern in structuring software, and many architectures have been desi... / available platforms hardware operating systems etc.and features the br as well as safety fault tolerance and quality of service.

4000.2   WebOS: Operating System Services for Wide Area Applications - Vahdat (1997)   (Correct)
In this paper, we argue for the power of providing a common set of OS services to wide area applications, including mechanisms for resource discovery, a global namespace, remote process execution, res... / WebOS Operating System Services for Wide Area br resolution load balancing and fault tolerance. Second we provide a file

3938.9   Modular Specification Of Interaction Policies In Distributed Computing - Sturman (1996)   (Correct)
Software executing on distributed systems consists of many asynchronous, autonomous components which interact in order to coordinate local activity. The need for such coordination, as well as requirem... / the underlying hardware and operating system as well as on the br that exports the desired system operations to the programming

3909.2   The UCSD Active Web - Pasquale (1997)   (Correct)
The UCSD Department of Computer Science and Engineering recently submitted a proposal for large-scale Research Infrastructure funding to the National Science Foundation. The theme of the proposal is... / its strengths in network and operating systems design security br This includes monitoring system operation overseeing backups and

3851.0   Competitive Execution in a Distributed Environment - Cho (1996)   (Correct)
of the Dissertation Competitive Execution in a Distributed Environment by Sung Hyun Cho Doctor of Philosophy in Computer Science University of California, Los Angeles, 1996 Professor David R. Jeffer... / protocols are transparent operating system facilities that involve br . . Replication to Improve Fault Tolerance

3834.9   WebOS: Software Support for Scalable Web Services - Amin Vahdat (1997)   (Correct)
The burgeoning popularity of the Web is pushing against the performance limits of the underlying infrastructure, presenting a number of difficult challenges for the Web as a system. We believe that re... / building a higher level Web operating system to efficiently manage these br communication scheduling fault tolerance and authentication. To this

3679.0   Active Names: Programmable Location and Transport of Wide-Area.. - Vahdat, Anderson, Dahlin (1999)   (Correct)
Active Names are a general framework for the development and composition of wide-area applications. The key insight behind Active Names is the need to introduce programmability of name binding to supp... / for Programming Languages and Operating Systems Cambridge MA . Fox

3679.0   Active Naming: Programmable Location and Transport of Wide-Area.. - Vahdat, Anderson, Dahlin (1998)   (Correct)
Active Names are a general framework for the development and composition of wide-area applications. The key insight behind Active Names is the need to introduce programmability of name binding to supp... / for Programming Languages and Operating Systems Cambridge MA . Fox

3657.1   Efficient Reliable Group Communication For Distributed Systems - Kaashoek, Tanenbaum (1994)   (Correct)
Many applications can profit from broadcast communication, but few operating systems provide primitives that make broadcast communication available to user applications. In this paper we introduce pri... / communication but few operating systems provide primitives that br trade performance against fault tolerance. . Introduction Many

3626.9   TACOMA - fundamental abstractions supporting agent computing in a.. - Sudmann (1996)   (Correct)
The concept of migrating processes between networked computers is not a new one. However, a new computing paradigm is emerging in which an agent is able to migrate between nodes in a heterogeneous net... / of this project is to provide operating system support for agents and br agent model. Privacy security fault tolerance and heterogeneity are

3538.1   Distributed Software Engineering - Invited State-of-the-Art Report - Kramer   (Correct)
The term "Distributed Software Engineering" is ambiguous 1 . It includes both the engineering of distributed software and the process of distributed development of software, such as cooperative work... / computers with independent operating systems connected to the network br services with replication for fault tolerance than to try and provide the

3518.8   Cluster-Based Scalable Network Services - Fox, Gribble, Chawathe, Brewer.. (1997)   (Correct)
This paper has benefited from the detailed and perceptive comments of our reviewers, especially our shepherd Hank Levy. We thank Randy Katz and Eric Anderson for their detailed readings of early draft... / it is normally viewed as an operating system Multics Multiplexed br policies are left to the system operator. We describe our experiments

3394.8   Extensible Cluster-Based Scalable Network Services - Fox (1997)   (Correct)
This paper has benefited from the detailed and perceptive comments of our reviewers, especially our shepherd Hank Levy. We also thank Randy Katz, Eric Anderson, David Culler provided valuable feedback... / it is normally viewed as an operating system Multics Multiplexed br policies are left to the system operator. We describe our

3381.0   From Requirements to Services: A Study on Group Communication Support .. - Mauthe, Hutchison, Coulson, Namuye (1995)   (Correct)
In recent years computers have developed very rapidly from simple processing machines to sophisticated communication systems employing multiple media. Computers are imcreasingly used for all kinds o... / architectures including operating systems networks communication br such as distribution and fault tolerance. These systems typically

3345.9   Programming Languages: Specification - Rajan (1998)   (Correct)
This thesis introduces Multiclock Esterel, a synchronous language, suitable for application areas including embedded reactive control and digital hardware design where considerable effort is directed ... / th Workshop on Real-Time Operating Systems and Software pages -

3336.8   A brief survey of systems providing process or object migration.. - Nuttall (1994)   (Correct)
Migration is the movement of an active entity from one machine to another during execution. Such migration may be used for dynamic load balancing purposes with the aim of gaining increased performance... / facilities As published in Operating Systems Review October Volume br object persistence improved fault tolerance and potentially more

3314.1   Toward The Design Of Large-Scale, Shared-Memory Multiprocessors - Scott (1992)   (Correct)
The state-of-the-art in multiprocessing today employs thousands of high-performance microprocessors. As system sizes continue to grow, increasing care must be taken to design cost-efficient, balanced ... / software front as well. Many operating system issues such as memory br attention must be paid to fault tolerance which becomes increasingly

3286.5   Microkernel Operating Systems In Parallel Architectures - Blum (1994)   (Correct)
MICROKERNEL OPERATING SYSTEMS IN PARALLEL ARCHITECTURES by JOACHIM BLUM In the past few years operating systems' complexity has increased substantially because of the growing number of required serv... / Computer Science Microkernel Operating Systems In Parallel Architectures

3250.9   Market-Based Massively Parallel Internet Computing - Cappello, Christiansen, Neary.. (1997)   (Correct)
Recent advances in Internet connectivity and implementations of safer distributed computing through languages such as Java provide the foundation for transforming computing resources into tradable com... / sets word sizes or operating systems. The infrastructure proposed br for faking computations. Fault tolerance In a potentially

3244.3   Flexible and Adaptive Control of Real-Time Distributed Object.. - Loyall, Atlas, Schantz, Gill.. (1999)   (Correct)
Next-generation distributed systems have growing demands for real-time quality of service (QoS), flexibility, and control over the often unpredictable environments in which they are deployed. These... / and the underlying operating systems protocol stacks and br and preserve QoS during system operation. TAO's inband mechanisms

3240.9   GLUnix: a Global Layer Unix for a Network of Workstations - Ghormley (1997)   (Correct)
ions To provide remote execution of both parallel and sequential jobs, GLUnix extends some existing UNIX abstractions and introduces new abstractions, borrowing heavily from MPP environments such as ... / and implementation of GLUnix operating system middleware for a cluster of br provide hooks for application fault-tolerance mechanisms. ffl

3137.5   Fault Tolerant Matrix Operations for Networks of Workstations Using.. - Plank (1997)   (Correct)
Networks of workstations (NOWs) offer a cost effective platform for high-performance, long-running parallel computations. However, these computations must be able to tolerate the changing and often fa... / a generalpurpose time-sharing operating system and each is often owned by a br scientific computing. The fault-tolerance is based on diskless

3105.6   Computing in the RAIN: A Reliable Array of Independent Nodes - Vasken Bohossian (1998)   (Correct)
The RAIN project is a research collaboration between Caltech and NASA-JPL on distributed computing and data storage systems for future spaceborne missions. The goal of the project is to identify and d... / run in conjunction with operating system services and standard br Through software-implemented fault tolerance the system tolerates

3080.3   ULTRA III: Implementing a Scalable Shared-Memory Multiprocessor - Project (1989)   (Correct)
MIMD system currently available, beyond bus-connected systems with modest numbers of processors, is the BBN Butterfly. This relative scarcity of highly-parallel sharedmemory machines is due to a commo... / programming environments operating systems coordination primitives br compilers I O issues and fault tolerance. Parallelizing compilers are

3077.9   The Interaction of Architecture and Operating System Design - Anderson, Levy, Bershad, Lazowska (1991)   (Correct)
Today's high-performance RISC microprocessors have been highly tuned for integer and floating point application performance. These architectures have paid less attention to operating system requiremen... / of Architecture and Operating System Design Thomas E. Anderson

3028.5   HFS: A Performance-Oriented Flexible File System Based on.. - Krieger (1996)   (Correct)
ing with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works, requires prior specific permission and... / HFS as part of the Hurricane operating system running on the Hector shared br and of some basic file system operations. Section . presents the

3007.8   Algorithm-Based Diskless Checkpointing for Fault Tolerant Matrix.. - Plank (1995)   (Correct)
This paper is an exploration of diskless checkpointing for distributed scientific computations. With the widespread use of the "Network Of Workstation" (NOW) platform for distributed computing, long-r... / type and each runs a special operating system so that every node is a br the algorithms are tuned for fault-tolerance and present the performance

2957.0   A Synopsis of the Legion Project - Grimshaw, Wulf, French, Weaver, Jr. (1994)   (Correct)
The coming of giga-bit networks makes possible the realization of a single nationwide virtual computer comprised of a variety of geographically distributed high-performance machines and workstations. ... / cannot replace existing host operating systems we cannot significantly br file and data access fault-tolerance ease-of-use and user

2955.5   Graduate Course: Reactive and Real-Time Systems - Koren, Tyszberowicz   (Correct)
This article describes a graduate course on the subject of "Reactive and Real-Time Systems", which serves as the basis for courses taught by the authors at Bar-Ilan University and at Tel-Aviv Universi... / networks computer operating systems man-machine interfaces br require a high degree of fault tolerance and are embedded in larger

2944.5   The Design, Implementation and Evaluation of RETHER: A Real-Time.. - Venkatramani (1996)   (Correct)
of the Dissertation The Design, Implementation and Evaluation of RETHER : A Real-Time Ethernet Protocol by Chitra Venkatramani Doctor of Philosophy in Computer Science State University of New York a... / receiver ends run real-time operating systems and the jitter to and from br . v . . Fault Tolerance .

2933.1   Differentiated and Predictable Quality of Service in Web Server.. - Aron (2000)   (Correct)
As the World Wide Web experiences increasing commercial and mission-critical use, server systems are expected to deliver high and predictable performance. The phenomenal improvement in microprocessor ... / management facilities in the operating system software are studied. This

2834.7   Persistent Store In A Dynamic Resource Management Environment - Bridgland (1994)   (Correct)
9 Acknowledgements 11 Dedication 12 Definition of Terms 13 1 Introduction 14 1.1 Persistent Store Resource Management : : : : : : : : : : : : : : : 15 1.2 Trends in Operating System Design : : : : : ... / . Trends in Operating System Design br showing the average number of system operations' per HWO call for various

2805.9   Lessons from FTM: an Experiment in the Design and Implementation of a .. - Muller, al. (1995)   (Correct)
This report describes an experiment in the design of a general purpose fault tolerant system, FTM. The main objective of the FTM design was to implement a "low-cost" fault tolerant system that could... / standard workstations. At the operating system level our goal was to br systems while offering fault tolerance transparency to user

2801.2   Secondary Storage Garbage Collection for Decentralized Object-Based.. - Björnerstedt (1990)   (Correct)
This paper describes a mechanism for secondary storage garbage collection that may be used to reclaim inaccessible resources in decentralized persistent object based systems. Schemes for object addres... / virtual as provided by the operating system and hardware for the lowest br memory architecture such as fault tolerance and non-volatility. The

2797.6   Object Models for Distributed or Persistent Programming - Cahill Nixon (1997)   (Correct)
As use of object orientation for application development has increased, many researchers have investigated the design of object-based programming languages for the distributed and persistent programmi... / systems network systems operating systems and computer architecture br constraints performance fault-tolerance ffl Support for

2739.5   Network Multicomputing Using Recoverable Distributed Shared Memory - John Carter (1993)   (Correct)
A network multicomputer is a multiprocessor in which the processors are connected by general-purpose networking technology, in contrast to current distributedmemory multiprocessors where a dedicated s... / may involve traps into the operating system kernel interrupts context br checkpointing to provide fault tolerance. Measurements of our

2724.3   Using Group Communication to Implement a Fault-Tolerant Directory.. - Kaashoek, Tanenbaum, Verstoep (1993)   (Correct)
Group communication is an important paradigm for building distributed applications. This paper discusses a fault-tolerant distributed directory service based on group communication, and compares it wi... / the claim that a distributed operating system should provide both remote br which does not provide any fault tolerance at all. The paper concludes

2723.3   An Overview of the NYU Ultracomputer Project - Gottlieb (1986)   (Correct)
The NYU Ultracomputer is a shared memory MIMD parallel computer design to contain thousands of processors connected by an Omega network to a like number of memory modules. A new coordination primitive... / designing and implementing the operating system. The Ultracomputer machine

2707.8   Enterprise: An Interactive Graphical Programming Environment For.. - Chan, Lu, al. (1992)   (Correct)
Workstation environments have been in use for more than a decade now. Although a network of workstations together represents a large amount of aggregate computing power, single users often cannot util... / heterogeneous computers and operating systems and the complexity of br synchronization and fault tolerance allowing the rapid

2707.1   An Optically Interconnected Distributed Shared Memory System.. - Bogineni, Dowd (1992)   (Correct)
This paper introduces an optically interconnected distributed shared memory (OIDSM) system. The distributed shared memory (DSM) approach integrates both shared memory and distributed memory system ide... / implementation in the Aegis operating system of the Apollo domain. The br networks have high fault-tolerance due to their passive nature

2677.2   Group Orientation: a Paradigm for Modern Distributed Systems - Paulo Ver'issimo (1992)   (Correct)
Increasing use of distributed systems, with the corresponding decentralisation, stimulates the need for structuring activities around groups of participants, for reasons of consistency, user-friendlin... / are penetrating too slowly in operating systems technology. Two important br activity for performance or fault-tolerance reasons e.g. replicated

2669.6   Extensible Resource Management For Cluster Computing - Islam, Prodromidis, Squillante.. (1996)   (Correct)
this paper we present a new resource management system for allocating resources among such applications in general-purpose distributed-memory parallel computers. Our system, Octopus, consists of sever... / these systems also provide operating system infrastructures for creating br scheduling strategies and a fault-tolerance strategy. Our hierarchical

2637.8   A Survey of Object-Oriented Concepts - Nierstrasz (1989)   (Correct)
The object-oriented paradigm has gained popularity in various guises not only in programming languages, but in user interfaces, operating systems, databases, and other areas. We argue that the fundame... / but in user interfaces operating systems databases and other areas.

2588.9   Cluster I/O with River: Making the Fast Case Common - Arpaci-Dusseau, Anderson, Treuhaft..   (Correct)
We introduce River, a data-flow programming environment and I/O substrate for clusters of computers. River is designed to provide maximum performance in the common case --- even in the face of nonunif... / of sources unexpected operating system activity uneven load br be useful ffl Application Fault Tolerance. The ultimate goal is to

2578.7   Fast Group Communication for Standard Workstations - Vogels, Rodrigues, Veríssimo (1992)   (Correct)
This paper presents a Group Communication Service suitable for standard workstations. The communication service is designed to take advantage of the technology offered by modern standard Local Area Ne... / exploit specific network and operating system properties. Additionally br activity for performance or fault-tolerance reasons e.g. replicated

2544.4   Microkernels Meet Recursive Virtual Machines - Ford, Hibler, Lepreau, Tullmann.. (1996)   (Correct)
This paper describes a novel approach to providing modular and extensible operating system functionality, and encapsulated environments, based on a synthesis of microkernel and virtual machine concept... / modular and extensible operating system functionality and br management demand paging fault tolerance and debugging support to

2539.4   FT-SR: A Programming Language For Constructing Fault-Tolerant.. - Thomas (1993)   (Correct)
13 CHAPTER 1: INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : 15 1.1 Dependable System Construction---Princip... / of the x-kernel HP an operating system kernel designed for br can possibly be violated during system operation. The more restrictive the

2501.5   Hints for Computer System Design - Lampson (1983)   (Correct)
Studying the design and implementation of a number of computer has led to some general hints for system design. They are described here and illustrated by many examples, ranging from hardware such as ... / at the. th ACM Symposium on Operating Systems Principles and appeared in br and use of a distributed file system. Operating Systems Review

2468.0   Formal Methods Technology Transfer: A View from NASA - Caldwell (1996)   (Correct)
Since 1988 NASA Langley Research Center has supported a formal methods research group. From its inception, a primary goal of the program has been the transfer of formal methods technology into aerospa... / systems. Because adding faulttolerance mechanisms to a system

2462.9   An Evaluation of the Amoeba Group Communication System - Kaashoek, Tanenbaum (1996)   (Correct)
The Amoeba group communication system has two unique aspects: (1) it uses a sequencer-based protocol with negative acknowledgements for achieving a total order on all group messages; and (2) users cho... / of the Amoeba distributed operating system The delay for a br users choose the degree of fault tolerance they desire. This paper

2447.3   Optimizing Parallel Applications for Wide-Area Clusters - Bal, Plaat, Bakker, Dozy, Hofman (1998)   (Correct)
Recent developments in networking technology cause a growing interest in connecting local-area clusters of workstations over wide-area links, creating multilevel clusters. Often, latency and bandwidth... / GFLOPS peak performance. The operating system used on DAS is BSD OS from br such as heterogeneity fault-tolerance security accounting and

2444.0   Integration of Resource Management Activities in Distributed Systems - Nalini Venkatasubramanian University (1995)   (Correct)
We present a two-level model of distributed computation based on the actor model. This two-level model is the basis for developing a semantic framework that supports dynamic customizability and sepa... / has been used in the Muse Operating System for dynamically br service failure semantics and fault tolerance protocols and resource

2442.2   Fault Manager for Distributed Operating Environments Design.. - Sens (1998)   (Correct)
This paper presents the design, implementation, and performance evaluation of a software fault manager for distributed applications. Dubbed STAR, it uses the natural redundancy existing in networks of... / portable to UNIXTM-like operating systems. The current implementation br to offer a high level of fault tolerance. Fault management is

2439.7   Customizing Dependability with Reusable Software Components - Sturman, Agha (1996)   (Correct)
Many large software systems have different components with varying requirements for robustness and performance. Moreover, dependability requirements often change throughout their software life-cycle. ... / techniques are fixed in the operating system or application code must be br correcting codes. The software fault-tolerance techniques used with each

2429.5   Practical Byzantine Fault Tolerance - Castro, Liskov (1999)   (Correct)
This paper describes a new replication algorithm that is able to tolerate Byzantine faults. We believe that Byzantine-fault-tolerant algorithms will be increasingly important in the future because mal... / of the Third Symposium on Operating Systems Design and Implementation br snfsd executes file system operations directly in the memory

2423.4   Process Introspection: A Checkpoint Mechanism for High Performance.. - Ferrari (1996)   (Correct)
The Process Introspection project is a design and implementation effort, the main goal of which is to construct a general purpose, flexible, efficient checkpoint/restart mechanism appropriate for use ... / on one architecture or operating system platform must be restartable br to implement a number of basic fault tolerance and load balancing schemes.

2362.3   RAIDframe: Rapid prototyping for disk arrays - Gibson, al. (1996)   (Correct)
The complexity of advanced disk array architectures makes accurate representation necessary, arduous, and error-prone. In this paper, we present RAIDframe, an array framework that separates architectu... / device driver in the Sprite operating system Chen b Lee To this br RAID level provides the same fault-tolerance but rotates parity units over

2345.2   Brevix Design 1.01 - The Brevix   (Correct)
ions of devices for replication, partitioning, striping, and so on. Physical storage media and the drives they are accessible through Automatic migration of parcels between levels of the storage hiera... / Operating Systems Research Department br Future design work . Fault tolerance .

2342.7   Legion: The Next Logical Step Toward a Nationwide Virtual Computer - Grimshaw, Wulf, French, Weaver.. (1994)   (Correct)
The coming of giga-bit networks makes possible the realization of a single nationwide virtual computer comprised of a variety of geographically distributed high-performance machines and workstations. ... / cannot replace existing host operating systems see below we cannot br file and data access fault-tolerance ease-of-use and user

2312.4   Automated Synthesis and Optimization of Robot Configurations: An.. - Leger (1999)   (Correct)
Robot configuration design is hampered by the lack of established, well-known design rules, and designers cannot easily grasp the space of possible designs and the impact of all design variables on a ... /

2307.6   Parallel Operating Systems - Garcia, Ferreira, Guedes   (Correct)
ion Layer (HAL) - All these components are layered on top of a hardware abstraction layer. This layer hides hardware specific details, such as I/O interfaces and interrupt controllers, from the NT exe... / IV. Parallel Operating Systems Jo ao Garcia Paulo

2292.7   Isatis: A Customizable Distributed Object-Based Runtime System - Michel Banatre   (Correct)
This paper discusses the design and implementation of a customizable distributed object-based runtime system. Our main goal in the system's design was to provide a distributed object-based system su... / the abstractions provided by operating systems and the ones offered by br management mechanisms e.g.fault tolerance mechanism for Gothic and

2291.2   The DIOM Approach to Large-scale Interoperable Database Systems - Ling Liu   (Correct)
A large-scale interoperable database system operating in a dynamic environment should provide a uniform access user interface to its components, scalability to larger networks, evolution of database s... / e.g.hardware platforms operating systems DBMS's Distributed br interoperable database system operating in a dynamic environment

2282.7   An Overview of Checkpointing in Uniprocessor and Distributed Systems, .. - Plank (1997)   (Correct)
Checkpointing is the act of saving the state of a running program so that it may be reconstructed later in time. It is an important basic functionality in computing systems that paves the way for powe... / is performed by the operating system. Typically any program can br facilitated by checkpointing Fault-tolerance rollback recovery Debugging

2250.8   Applying Adaptive Middleware to Manage End-to-End QoS for.. - Gill, Levine, Kuhns, Schmidt, al.   (Correct)
Delivering end-to-end quality of service (QoS) for diverse classes of distributed applications remains a significant R&D challenge. While individual technologies based on prior research have touched u... / of research and commercial operating systems networks and protocols now br a higher-level description of system operating regions decouples the

2250.3   Group Support in Multimedia Communications Systems - Mauthe, Coulson, Hutchison, Namuye   (Correct)
Communication among multiple entities is becoming more and more widespread in computing and telecommunications. Although many existing communications protocols and services do offer some limited sup... / Workshop on Network and Operating System Support for Digital Audio and

2246.8   ORCHESTRA: A Fault Injection Environment for Distributed Systems - Scott Dawson (1996)   (Correct)
This paper reports on orchestra, a portable fault injection environment for testing implementations of distributed protocols. The paper focuses on architectural features of orchestra that provide port... / on the Real-Time Mach operating system and later ported to other br and validation of the fault-tolerance and timing characteristics of

2237.2   The Distributed Interoperable Object Model and Its Application to.. - Liu, Pu (1995)   (Correct)
A large-scale interoperable database system operating in a dynamic environment should provide uniform access user interface to its components, scalability to larger networks, evolution of database sch... / e.g.hardware platforms operating systems DBMS's Distributed br interoperable database system operating in a dynamic environment

2214.7   Programming Language Support for Writing Fault-Tolerant Distributed.. - Schlichting, Thomas (1995)   (Correct)
Good programming language support can simplify the task of writing fault-tolerant distributed software. Here, an approach to providing such support is described in which a general high-level distribut... / using the x-kernel an operating system designed for experimenting br augmented with mechanisms for fault tolerance. Unlike approaches based on

2214.5   Safety Kernel Enforcement of Software Safety Policies - Wika (1995)   (Correct)
Computing systems in which the consequences of failure are very serious are termed safety-critical. Many such systems exist in application areas such as aerospace, defense, transportation, power-gene... / also an issue and that the operating system and network implementation br . . . System Operation .

2208.3   Filters: QoS Support Mechanisms for Multipeer Communications - Yeadon, García, Hutchison, Shepherd (1996)   (Correct)
The nature of distributed multimedia applications is such that they require multipeer communication support mechanisms. The multimedia traffic needs to be delivered to end-systems, networks and end-us... / developed within a UNIX-like operating system. I. INTRODUCTION D br are mainly employed to support fault tolerance and distribution. They

2196.0   StratOSphere: Mobile Processing of Distributed Objects in Java - Wu, al. (1998)   (Correct)
We describe the design and implementation of StratOSphere, a framework which unifies distributed objects and mobile code applications. We begin by first examining different mobile code paradigms that ... / and packages distributed operating systems Gos and distributed br processes for loadbalancing fault-tolerance and resilience. Systems

2153.3   Frangipani: A Scalable Distributed File System - Thekkath, Mann, Lee (1997)   (Correct)
The ideal distributed file system wouldprovide all its users with coherent, shared access to the same set of files,yet would be arbitrarily scalable to provide more storage space and higher performanc... / through the standard operating system call interface. Programs br much of its scalability fault tolerance and easy administration from

2117.4   CESIUMSPRAY: A Precise and Accurate Global Time Service for.. - Veríssimo, Rodrigues, Casimiro (1997)   (Correct)
In large-scale systems, such as Internet-based distributed systems, classical clocksynchronization solutions become impractical or poorly performing, due to the number of nodes and/or the distance amo... / machinery network and operating system may exhibit a precision br with at least one GPS-node. Fault-tolerance is achieved by replicating

2108.9   Transparent Result Caching - Vahdat, Anderson (1998)   (Correct)
The goal of this work is to develop a general framework for transparently managing the interactions and dependencies among input files, development tools, and output files. By unobtrusively monitoring... / of the th ACM Symposium on Operating Systems Principles pp. - br Note that techniques from the fault tolerance community could potentially

2086.6   Open Heterogeneous Computing in ActorSpace - Christian Callsen (1994)   (Correct)
A number of efforts in heterogeneous computing involve the development of basic architecture independent communication primitives. We present a new programming paradigm, called ActorSpace, which provi... / and transparency at the operating system level. The important issue is br discussed as a way of achieving fault-tolerance in operating systems

2083.3   Automated Fault-Inject Based Dependability Analysis of Distributed.. - Stott   (Correct)
Recently, there has been interest in developing a dependability benchmarks for computer systems. This will require a way to inject several different types of faults into many different platforms and a... / the exit status for most operating systems this includes any uncaught br are non-intrusive-that is the system operates exactly the same while faults

2076.3   The Alloc Stream Facility: A Redesign of Application-Level Stream I/O - Krieger, Stumm, Unrau (1994)   (Correct)
This paper introduces a new application level I/O facility called the Alloc Stream Facility (ASF). ASF has several key advantages. First, performance is substantially improved as a result of a) the st... / The success of the Unix operating system is partly attributable to

2073.6   GATOSTAR: A Fault Tolerant Load Sharing Facility for Parallel.. - Bertil Folliot (1994)   (Correct)
This paper presents how and why to unify load sharing and fault tolerance facilities. A realization of a fault tolerant load sharing facility, GATOSTAR, is presented and discussed. It is based on th... / software methods avoiding all operating system modification and hardware br why to unify load sharing and fault tolerance facilities. A realization of

2072.1   A Structured Approach to Redundant Disk Array Implementation - II., al. (1996)   (Correct)
Error recovery in redundant disk arrays is typically performed in an ad hoc fashion, requiring architecture-specific code which limits extensibility and is difficult to verify. In this paper, we descr... / device driver in the Sprite operating system Chen a Lee To this br Drapeau Menon multiple fault tolerance ATC Blaum STC

2066.7   The Role Of Network Traffic Statistics In Devising Object Migration.. - Jonnalagadda (1997)   (Correct)
OF THE THESIS The Role of Network Traffic Statistics in Devising Object Migration Policies by Lakshmikanth S. Jonnalagadda Thesis Director: Professor James L. Flanagan We target the problem of improvi... / were some attempts to provide operating system support for object migration. br to lowly loaded ones Fault Tolerance Objects can be copied or

2048.4   Chameleon: A Software Infrastructure for Adaptive Fault Tolerance - Kalbarczyk Bagchi (1999)   (Correct)
This paper presents Chameleon, an adaptive infrastructure, which allows different levels of availability requirements to be simultaneously supported in a networked environment. Chameleon provides depe... / ARMORs the hardware and the operating system. Keywords adaptive fault br Infrastructure for Adaptive Fault Tolerance Z. Kalbarczyk S. Bagchi

2018.2   Failure Recovery Algorithms for Multi-Disk Multimedia Servers - Shenoy, Vin   (Correct)
In this paper, we present two novel disk failure recovery methods that utilize the inherent characteristics of video streams for efficient failure recovery. Whereas the first method exploits the seque... / to software failures and operating system crashes customers of br redundant disk arrays RAID fault tolerance video compression algorithms

2013.6   The synchronous dataflow programming language LUSTRE - Halbwachs, Caspi, Raymond, Pilaud (1991)   (Correct)
This paper describes the language Lustre, which is a dataflow synchronous language, designed for programming reactive systems --- such as automatic control and monitoring systems --- as well as for de... / by PRC C CNRS operating systems Reactive systems apply br for reasons of performance fault tolerance and functionality

2012.7   Previous Work in Distributed Operating Systems NOW Retreat - Kim Keeton   (Correct)
this paper: servers broadcast (un)availability when status changes; unknown Previous Work in Distributed Operating Systems NOW Retreat Kim Keeton, Steve Rodrigues, and Drew Roselli April 3, 1995 Remo... / Previous Work in Distributed Operating Systems NOW Retreat Kim Keeton br Distributed Operating System Operating Systems Review v. no.

2010.6   Highly Reliable Upgrading of Components - Cook, Dage (1999)   (Correct)
After a system is deployed, fixes, enhancements, and modifications all occur that change the components that make up the system. Unfortunately, new versions of components can introduce new errors and ... / have not addressed software fault tolerance many of the issues

1981.0   Mapping Software Architectures to Efficient Implementations via.. - Marlet (1997)   (Correct)
Flexibility is recognized as a key feature in structuring software, and many architectures have been designed to that effect. However, they often come with performance and code size overhead, result... / such as graphics and operating systems However these

1976.5   Compiler-Assisted Checkpointing - Micah Beck (1994)   (Correct)
In this paper we present compiler-assisted checkpointing, a new technique which uses static program analysis to optimize the performance of checkpointing. We achieve this performance gain using libckp... / library runtime system or operating system and require no effort on br library runtime system or operating system and require no

1962.6   Failure Recovery Algorithms for Multimedia servers - Shenoy, Vin (1999)   (Correct)
In this paper, we present two novel disk failure recovery methods that utilize the inherent characteristics of video streams for efficient recovery. Whereas the first method exploits the inherent redu... / to software failures and operating system crashes customers of br redundant disk arrays RAID fault tolerance video compression

1941.1   SuperWeb: Research Issues in Java-Based Global Computing - Alexandrov, Ibel, Schauser, Scheiman (1996)   (Correct)
The Internet, in particular the World-Wide-Web, continues to expand at an amazing pace. We propose a new infrastructure, SuperWeb, to harness global resources, such as CPU cycles or disk storage, and ... / the computation or require operating system modifications. As a br Some work in the area of fault-tolerance with malicious failures can

1928.4   Mobility and Extensibility in the StratOSphere Framework - Wu, Agrawal, Abbadi (1999)   (Correct)
We describe the design and implementation of our StratOSphere project, a framework which unifies distributed objects and mobile code applications. We begin by first examining dioeerent mobile code par... / and packages distributed operating systems Gos and distributed br processes for load-balancing fault-tolerance and resilience. Systems such

1918.8   Perspectives for High Performance Computing in Workstation Networks - Strumpen, Ramkumar, Casavant, Reddy   (Correct)
Networks of workstations have become increasingly popular for high performance computing. However, in order to becomea real alternative for MPPs, reliability and efficiency issues must be tackled. I... / reliability of conventional operating systems are likely to hinder br at user-level. Up to now fault tolerance issues have been treated

1914.6   The MultiSpace: an Evolutionary Platform for Infrastructural Services - Gribble, Welsh, Brewer, Culler (1999)   (Correct)
This paper presents the architecture for a Base, a clustered environment for building and executing highly available, scalable, but exible and adaptable infrastructure services. Our architecture has t... / c application and feature set operating system and hardware platform. br all of the dicult service faulttolerance availability and

1867.1   Adaptive Recovery for Mobile Environments - Nuno Neves (1997)   (Correct)
Mobile computing allows ubiquitous and continuous access to computing resources while the users travel or work at a client's site. The flexibility introduced by mobile computing brings new challenges ... / contents are lost or the operating system crashes. The first type of br new challenges to the area of fault tolerance. Failures that were rare

1861.1   Dynamic Reconfiguration of Distributed Applications - Hofmeister (1993)   (Correct)
Applications requiring concurrency or access to specialized hardware are naturally written as distributed applications, where each software component (module) can execute on a different machine, and m... / on top of existing operating systems and compilers requiring no br for load balancing software fault tolerance adaptation to changes in

1860.9   Performance of Consistent Checkpointing in a Modular Operating.. - Muller, Hue, Peyrouze (1994)   (Correct)
This paper presents an evaluation of the performance of a consistent checkpointing mechanism that has been integrated into a modular Mach micro-kernel based operating system. We have measured the perf... / Checkpointing in a Modular Operating System Results of the FTM br our objective was to provide fault tolerance transparency to user

1854.8   Checkpointing Distributed Shared Memory - Luis Silva (1997)   (Correct)
Distributed shared memory (DSM) is a very promising programming model for exploiting the parallelism of distributed memory systems, since it provides a higher level of abstraction than simple message ... / facility of the operating system neither requires the use of br have some kind of support for fault-tolerance. In this paper we present a

1853.9   A Web-Based Distributed Programming Environment - Aoki (1999)   (Correct)
A Web-Based Distributed Programming Environment Kiyoko F. Aoki A Java-based system, the GeoJAVA System, that allows a user to remotely compile his/her own C/C++ programs and execute them for visualiz... / space without specialized operating systems to handle such a procedure br which performs some checks for fault tolerance. For example in the case

1853.7   Testing of Fault-Tolerant and Real-Time Distributed Systems via.. - Scott Dawson (1996)   (Correct)
As software for distributed systems becomes more complex, ensuring that a system meets its prescribed specification is a growing challenge that confronts software developers. This is particularly impo... / on the Real-Time Mach operating system and later ported to other br and validation of the fault-tolerance and timing characteristics

1829.3   Supporting Customized Failure Models for Distributed Software - Hiltunen, Immanuel, Schlichting (1999)   (Correct)
The cost of employing software fault-tolerance techniques in distributed systems is strongly related to the type of failures to be tolerated. For example, in terms of the amount of redundancy required... / On The Osf ri Mk . Mach Operating System And Cords tmr A Variant br The cost of employing software fault-tolerance techniques in distributed

1819.3   Formal Design and Verification of a Reliable Computing Platform For.. - Butler, Di Vito (1992)   (Correct)
In this paper the design and formal verification of the Reliable Computing Platform (RCP), a fault-tolerant computing system for digital flight control applications, are presented. The RCP utilizes NM... / of a fault-tolerant operating system that schedules and executes br this limit is exceeded during system operation the system will mask the

1804.2   A Coherent Distributed File Cache With Directory Write-behind - Mann (1994)   (Correct)
Extensive caching is a key feature of the Echo distributed file system. Echo client machines maintain coherent caches of file and directory data and properties, with write-behind (delayed write-back) ... / for the module in the client operating system that performs these br seen by applications on file system operations and reducing the read

1788.8   Angel: Resource Unification in a 64-bit Micro-Kernel - Murray, Stiemerling, Wilkinson, Kelly (1993)   (Correct)
The appearance of 64-bit processors allows a new approach to microkernel design --- a single unified address space. This paper describes this kind of approach as adopted in Angel. From our experience ... / based message passing operating system relatively typical in br the operating system as certain system operations performed by one thread

1788.5   Fail-Safe Concurrency in the Eclipse System - Knop, Rego (1996)   (Correct)
Local or wide-area heterogeneous workstation clusters are relatively cheap and highly effective, though inherently unstable operating environments for long-running distributed computations. We found t... / issued by any process the operating system is made to provide combined br computations Without fault tolerance these computations may never

1785.6   Microlanguages for Operating System Specialization - Pu (1997)   (Correct)
Specialization is a technique that has the potential to provide operating system clients with the performance and functionality that they need, while still retaining the advantages of a simple generic... / Microlanguages for Operating System Specialization Calton br performance reliability and fault tolerance of the resulting kernel and

1784.8   Formal Specification and Analysis of Active Networks and.. - Denker, Meseguer, Talcott (1999)   (Correct)
Rewriting logic and the Maude language make possible a new methodology in which formal modeling and analysis can be used from the earliest phases of system design to uncover many errors and inconsiste... / communication services such as faulttolerance security and so on that

1783.9   Fault Tolerance Issues in Data Declustering for Parallel Database.. - Golubchik, Muntz (1994)   (Correct)
Maintaining the integrity of data and its accessibility are crucial tasks in database systems. Although each component in the storage hierarchy can be fairly reliable, a large collection of such compo... / interested in a continuously operating system we use the term data loss br in data loss. Furthermore the system operates at a degraded level of

1768.5   Cost-Effective Software Based Fault-Tolerant Routing in Pipelined.. - Young-Joo Suh   (Correct)
This paper presents a software based approach to fault-tolerant routing in networks using wormhole or virtual cut-through switching. When a message encounters a faulty output link, it is removed from ... / layer of the local node's operating system. The message passing software br virtual cut-through switching fault tolerance interconnection networks

1766.8   Accessing Files in an Internet: The Jade File System - Rao, Peterson (1993)   (Correct)
This paper introduces the Jade File System, which provides a uniform way to name and access files in an internet environment. Jade is a logical system that integrates a heterogeneous collection of exi... / supported by the Unix operating system to access files on the local br remote access methods and fault tolerance. Designing an internet-wide

1765.0   Javelin: Internet-Based Parallel Computing Using Java - Cappello, Christiansen, Ionescu.. (1997)   (Correct)
Java offers the basic infrastructure needed to integrate computers connected to the Internet into a seamless parallel computational resource: a flexible, easily-installed infrastructure for running co... / different CPUs and different operating systems. Also many machines are br high priority enhancement. Fault tolerance The Broker is responsible

1759.0   2K: A Reflective, Component-Based Operating System for Rapidly.. - Kon (1998)   (Correct)
Modern computing environments face both low-frequency infrastructural changes, such as software and hardware upgrades, and frequent changes, such as fluctuations in the network bandwidth and CPU load.... / A Reflective Component-Based Operating System for Rapidly Changing br mobility load balancing fault tolerance and quality of service for

1758.1   Supporting High-performance I/O in QoS-enabled ORB Middleware - Kuhns, Levine, Schmidt, O'Ryan (2000)   (Correct)
To be an effective platform for high-performance distributed applications, off-the-shelf Object Request Broker (ORB) middleware, such as CORBA, must preserve communication-layer quality of service (Qo... / and overview of the Solaris operating system. Supporting br concurrency control and fault tolerance. This requires an efficient

1716.4   ARMADA Middleware and Communication Services - Abdelzaher, Bjorklund, Dawson, Feng, .. (1997)   (Correct)
Real-time embedded systems have evolved during the past several decades from small customdesigned digital hardware to large distributed processing systems. As these systems become more complex, thei... / and emerging standards in operating systems and communication services. br that provide support for fault-tolerance and end-to-end guarantees

1709.5   Using Reflection for Flexibility and Extensibility in a Metacomputing .. - Nguyen-Tuong, Chapin, Grimshaw, Viles   (Correct)
We present system developers with a reflective model, the Reflective Graph and Event model (RGE), for building metacomputing applications, incorporating our design goals of flexibility, extensibilit... / in several contexts such as operating systems programming languages br including scheduling security fault tolerance programming languages and

1702.7   Fail-safe PVM: A portable package for distributed programming with.. - Leon (1993)   (Correct)
Many scientific problems benefit from computationsthat are parallel at a coarse grain. Collections of looselycoupled, heterogeneous computers are increasingly being applied to these problems. While in... / require modifications to the operating system. We describe the design and br Pvm Distributed Computing Fault-Tolerance Checkpoint Abd Asap

1698.2   An Overview of MSHN: The Management System for Heterogeneous Networks - Hensgen, Kidd, John, Schnaidt.. (1999)   (Correct)
The Management System for Heterogeneous Networks (MSHN) is a resource management system for use in heterogeneous environments. This paper describes the goals of MSHN, its architecture, and both comple... / is similar to a distributed operating system in that it views the set of br level of availability and more fault tolerance than would be available from

1697.6   Execution-Driven Simulation Of Error Recovery Techniques For.. - Frazier, Tamir (1997)   (Correct)
DERT (Distributed Error Recovery Testbed) is a testbed for simulation and performance evaluation of several classes of application-transparent distributed error recovery schemes. DERT is built on top ... / the target architecture and operating system. Thus simulation accuracy br is less disruptive to system operation e.g. because only

1693.7   Dynamic User Management System for web sites - Christian (2000)   (Correct)
With the growing quantity of information around the world, besides the software development community, many other fields are interested in finding solutions for efficient information management. In t... / of the approach on an operating system. In the development of this br concurrency scalability fault tolerance and transparency. Though the

1681.9   Group Orientation: a Paradigm for Distributed Systems of the Nineties - Veríssimo, Rodrigues (1992)   (Correct)
Increasing use of distributed systems, with the corresponding decentralization of activities, stimulates the need for structuring those activities around groups of participants, for reasons of consist... / using large objects ffl operating system support providing threads br Ciencia. performance or fault-tolerance reasons eg. replicated

1680.2   CesiumSpray: a Precise and Accurate Global Clock Service for.. - Veríssimo, Rodrigues, al. (1997)   (Correct)
In large-scale systems, such as Internet-based distributed systems, classical clocksynchronization solutions become impractical or poorly performing, due to the number of nodes and/or the distance. ... / machinery network and operating system may exhibit a precision in br in a legitimate situation of system operation i the system has stopped

1676.2   A Flexible Software Architecture for High Availability Computing - Iyer Kalbarczyk Whisnant (1998)   (Correct)
This paper presents an overview of the Chameleon architecture for supporting a wide range of criticality requirements in a heterogeneous network environment. Chameleon employs ARMORs--- Adaptive, Reco... / we can insert hooks in the operating system to trap all network I O br the dependability of the system. Operation in a network of

1661.9   Design Principles of Parallel Operating Systems - A PEACE Case Study - Schröder-Preikschat (1993)   (Correct)
Forthcoming massively parallel systems are distributed memory architectures. They consist of several hundreds to thousands of autonomous processing nodes interconnected by a high-speed network. A majo... / Design Principles of Parallel Operating Systems A Peace Case Study-

1661.7   Fault-Tolerant RT-Mach (FT-RT-Mach) and an Application to Real-Time.. - Egan, Kutz, Mikulin, Melhem..   (Correct)
Even though real-time systems have the stringent constraint of completing tasks before their deadlines, many existing real-time operating systems do not implement fault tolerance capabilities. In this... / many existing real-time operating systems do not implement fault br systems do not implement fault tolerance capabilities. In this paper

1656.2   Process State Capture and Recovery in High-Performance Heterogeneous.. - Ferrari (1998)   (Correct)
Process Introspection is a fundamentally new solution to the process state capture and recovery problem suitable for use in high-performance heterogeneous distributed systems. A process state capture ... / system of one architecture or operating system platform must be recoverable br . . . Fault Tolerance .

1642.3   Multi-Domain WDM Network Structures for Large-Scale Reconfigurable.. - Khaled Aly   (Correct)
Multi-domain wavelength-division multiplexing (M-WDM) is proposed as a generalization of single star-coupled WDM networks. The objective is to provide scalable interconnection schemes for different pa... /

1639.6   A Compiler-based Approach to Fault-Tolerance in Real-Time Systems - Ganesh Marlowe   (Correct)
Real-time systems are characterized as systems in which the correctness of the system depends both on agreement of the result of the computation with the intended semantics, and on the timeliness with... / from the language compiler operating system and architecture. We sketch br A Compiler-based Approach to Fault-Tolerance in Real-Time Systems A. K.

1635.4   Chameleon: Software Infrastructure for Adaptive Fault Tolerance - Bagchi, Whisnant, Kalbarczyk (1999)   (Correct)
This paper presents Chameleon, an adaptive infrastructure, which allows different levels of availability requirements to be simultaneously supported in a networked environment. Chameleon provides depe... / ARMORs the hardware or the operating system. Keywords adaptive fault br the dependability of the system. Operation in a network of

1635.4   Automated Techniques for Designing Embedded Signal Processors on.. - Kang, Gerber, Golubchik (1998)   (Correct)
In this paper, we present a performance-based technique to help synthesize high-bandwidth radar processors on commodity platforms. This problem is innately complex, for a number of reasons. Contempora... / CPU Scheduler for Multimedia Operating Systems. In Proceedings of br Finally there is the problem of fault-tolerance. Traditionally

1630.6   Implementing Dynamic Atomic Actions Using Reliable Servers - Hue, Muller, Peyrouze, Rochat (1993)   (Correct)
this paper we present the overall implementation of dynamic atomic actions using reliable servers. We describe an environment for building reliable servers which insulate programmers from mechanisms u... / to implement in a standard operating system due to shared memory br mechanisms used to achieved fault tolerance. To provide availability of

1627.4   Technology Transfer Project with ENEA 1996 - Mellin (1996)   (Correct)
This document describes a technology transfer activity of distributed active real-time database management systems in general, and the DeeDS prototype in particular, from University of Skovde to ENEA ... / using the OSE Delta real-time operating system kernel will visit ENEA br applications and how the fault-tolerance of DeeDS can be improved.

1615.8   I/O Performance of Scientific-Parallel Applications under PAFS - Toni Cortes (1996)   (Correct)
In this paper we present the behavior of PAFS in a scientific environment where big parallel applications are run. PAFS is a parallel/distributed file system with a cooperative cache that avoids the c... / network runs a micro-kernel operating system and all services are handled br is also the case for the file-system operations. This operating-system

1607.9   Use of Imprecise Computation to Enhance Dependability of Real-Time.. - Liu (1994)   (Correct)
In a system based on the imprecise-computation technique, each time-critical task is designed in such a way that it can produce a usable, approximate result in time whenever a failure or overload prev... / is being built on the Mach operating system to implement this br together with traditional fault-tolerance methods to reduce the costs

1599.4   On-line Error Monitoring for Several Data Structures - Jonathan Bright (1995)   (Correct)
this paper, we consider the problem of detecting errors in the answers given in response to data structure queries. For many programs a substantial fraction of the intricate error-prone code resides i... / to perform software based fault tolerance including recovery blocks

1596.4   Pact - A Fault Tolerant Parallel Programming Environment - Maier (1993)   (Correct)
this article, a new approach to parallel programming is presented which not only makes parallel programming very easy, but also provides user-transparent fault-tolerance. Programming ease in Pact is o... / Introduction Since operating systems for parallel computers have br guarantees user-transparent fault-tolerance with low overhead by using

1592.1   CLIP: A Checkpointing Tool for Message-Passing Parallel Programs - Chen (1997)   (Correct)
Checkpointing is a useful technique for rollback recovery of parallel applications. While extensive research has been performed on checkpointing in parallel environments, there are few checkpointers a... / but often requires operating system modifications However it br tool like CLIP can provide fault-tolerance on a massively parallel

1587.9   Stardust: an Environment for Parallel Programming on Networks of.. - Cabillic, Puaut (1996)   (Correct)
This paper describes Stardust, an environment for parallel programming on networks of heterogeneous machines. Stardust runs on distributed memory multicomputers and networks of workstations. Applicati... / workstations and different operating systems more computing power is br be chosen. . . Operating system Operating systems running on the

1585.4   Discovery and Hot Replacement of Replicated Read-Only File Systems.. - Zadok (1993)   (Correct)
We describe a mechanism for replacing files, including open files, of a read-only file system while the file system remains mounted; the act of replacement is transparent to the user. Such a "hot repl... / and outside the operating system we have made use of the Amd br desirable because running file system operations over many network hops is

1582.7   HFS: A Flexible File System for large-scale Multiprocessors - Orran Krieger (1993)   (Correct)
The Hurricane File System (HFS) is a new file system being developed for large-scale shared memory multiprocessors with distributed disks. The main goal of this file system is scalability; that is, th... / systems the application and operating system must cooperate to maximize br multiprocessors. We ignore fault tolerance at the level of disk failures

1562.6   Fault Detection Using Hints from the Socket Layer - Nuno Neves (1997)   (Correct)
Fault detection in distributed systems is usually accomplished using a variation of the polling or watch-dog techniques. With these techniques, however, a tradeoff has to be made between the speed of ... / a process. We considered the operating system as a black box and looked br in performance due to the fault-tolerance mechanisms. In this paper we

1561.6   A Software Overview of HARTS: A Distributed Real-Time System - Shin, Kandlur, Kiskis, Dodd..   (Correct)
Introduction 3 1.1 Introduction It has become a common practice to use digital computers for such embedded realtime applications as computer-integrated manufacturing, industrial process control, def... / environment is comprised of an operating system called HARTOS and three br SWG and HMON to validate the fault-tolerance mechanisms of HARTS.

1559.7   The Industrial Take-up of Formal Methods in Safety-Critical and Other .. - Bowen, Stavridou (1993)   (Correct)
Formal methods may be at the crossroads of acceptance by a wider industrial community. In order for the techniques to become widely used, the gap between theorists and practitioners must be bridged ef... / Abstraction provided by operating systems high level programming

1552.9   Secure High Performance Group Communication - McDaniel (1997)   (Correct)
1 Introduction The growth in collaborative applications has mirrored the expansion of distributed networks. Group based applications provide users more flexibility in the form and content of computer... / The Amoeba distributed operating system uses the group br provide reliability security fault tolerance and ordering semantics have

1551.9   Software Fault-Tolerant Distributed Applications in LiPS - Setz   (Correct)
This paper illustrates how software fault-tolerant distributed applications are implemented within LiPS version 2.4, a system for distributed computing using idle-cycles in networks of workstation. Th... / environment of different operating systems network protocols or br hypercomputing software fault-tolerance Linda idle-time recovery

1551.7   Using Reflection for Incorporating Fault-Tolerance Techniques into.. - Nguyen-Tuong, Grimshaw (1999)   (Correct)
As part of the Legion metacomputing project, we have developed a reflective model, the Reflective Graph & Event (RGE) model, for incorporating functionality into applications. In this paper we apply... / Second USENIX Symposium on Operating Systems Design and Implementation br Reflection for Incorporating Fault-Tolerance Techniques into Distributed

1542.4   Adaptability Using Reflection - Sonntag, Härtig, Kowalski.. (1994)   (Correct)
Adaptability, i.e. the ability of a system to adapt dynamically to changes in its execution environment, is considered as an important property of computer systems. Scaling directory replication in na... / features into the BirliX operating system So we are able to br in performance security or fault tolerance. A few examples are

CiteSeer - citeseer.org - Terms of Service - Privacy Policy - Copyright © 1997-2002 NEC Research Institute