This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.
14171.4 Configurable Fault-Tolerant Distributed Services - Hiltunen (1996)(Correct)
Fault tolerance---that is, the ability of a system to continue providing its specified service despite failures---is becoming more important as computers are increasingly used in application areas suc... / . . Operating Systems br models different parts of the system operation are encapsulated in object
13156.9 Building Secure and Reliable Network Applications - Birman (1996)(Correct)
ly, the remote procedure call problem, which an RPC protocol undertakes to solve,
consists of emulating LPC using message passing. LPC has a number of "properties" -- a single procedure
invocation res... / . Related Readings . Operating System Support For High Performance
8661.5 Abstractions for Constructing Dependable Distributed Systems - Mishra, Schlichting (1992)(Correct)
ions for Constructing Dependable Distributed Systems Shivakant Mishra 1 and Richard D. Schlichting TR 92-19 Abstract Distributed systems, in which multiple machines are connected by a communications n... / to standard hardware or operating system services but with improved br is as a single Operating System Operating System Services Services
8084.7 PVS Bibliography - Rushby (1998)(Correct)
this report, including the BibT
E
X bibliography, are available
at http://www.csl.sri.com/pvs-bib.html. PVS users are encouraged to use the
BibT
E
X entries from these files, which are as accurate, co... / the impression that the full operating system was verified as opposed to br to critical algorithms for fault tolerance in automobile and aircraft
8038.7 An Adaptive Resource Management Architecture For Global Distributed.. - Venkatasubramanian (1998)(Correct)
Advances in networking, communication, storage, computing, and multimedia technologies
coupled with many emerging application areas is fueling the merger of computing and communication
systems. This w... / In early object-oriented operating systems such as Choices and br The short version of ART system operational semantics using
7644.0 HFS: A flexible file system for shared-memory multiprocessors - Krieger (1994)(Correct)
The HURRICANE File System (HFS) is designed for large-scale, shared-memory multiprocessors. Its architecture
is based on the principle that a file system must support a wide variety of file structures... / HFS as part of the HURRICANE operating system running on the HECTOR br . . Logical file system operations
7632.2 A System For Constructing Configurable High-Level Protocols - Bhatti (1996)(Correct)
13
CHAPTER 1: INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15
1.1 Distributed Systems : : : : ... / a user level task on the Mach operating system. Additional micro-protocols br for many conversations about fault-tolerance and the event-driven model
7071.5 Operating System Services for Wide-Area Applications - Vahdat (1998)(Correct)
Operating System Services for Wide-Area Applications
by
Mohammad Amin Vahdat
Doctor of Philosophy in Computer Science
University of California, Berkeley
Professor Thomas E. Anderson, Cochair
Pro... / Operating System Services for Wide-Area br . . . Scalability and Fault Tolerance .
6931.7 Safety-Critical Systems, Formal Methods and Standards - Bowen, Stavridou (1993)(Correct)
Standards concerned with the development of safety-critical systems, and the software in such systems in particular, abound today as the software crisis increasingly affects the world of embedded comp... / The binding of application to operating system to architecture is a prime br failures was about minutes. Fault-tolerance was achieved by detecting
6283.1 Asynchrony in parallel computing: From dataflow to multithreading - Silc, Robic, Ungerer (1997)(Correct)
The paper presents an overview of the parallel computing models, architectures, and research projects that are based on asynchronous instruction scheduling. It starts with pure dataflow computing mode... / e.g. explicit operating system calls A MIMD computer in br of Computer Design and Fault Tolerance University of Karlsruhe
6102.3 On the Integration of Concurrency, Distribution and Persistence - Munro (1993)(Correct)
The principal tenet of the persistence model is that it abstracts over all the
physical properties of data such as how long it is stored, where it is stored, how it
is stored, what form it is kept in ... / NH decentralise the operating system across a number of nodes. br under the VAX VMS system. This system operated by mapping a file holding
5961.0 Supporting Fault-Tolerant Parallel Programming In Linda - Bakken (1994)(Correct)
17
CHAPTER 1: INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : 19
1.1 Motivation for Parallel Programming : : ... / turn with the x-kernel an operating system kernel that provides support br . . Fault Tolerance Abstractions
5499.4 The Freeze Free Algorithm For Process Migration - Roush (1995)(Correct)
tical pages without page faults.
The program counter identifies the current code page, and the stack pointer identifies
the current stack page. A heuristic identifies the current heap page by examinin... / . . Operating System br of a distributed file system. Operating Systems Review
5401.3 COYOTE: A System for Constructing Fine-Grain Configurable.. - Bhatti, Hiltunen, Schlichting, Chiu (1998)(Correct)
Communication-oriented abstractions such as atomic multicast, group RPC, and protocols for location-independent
mobile computing can simplify the development of complex applications built on distribut... / Reusable software D. . Operating Systems Communication Management br Systems Reliability -Fault-tolerance D. . Operating Systems
5026.4 Hive: Operating System Fault Containment For Shared-Memory.. - Chapin (1997)(Correct)
Reliability and scalability are major concerns when designing general-purpose operating systems for large-scale shared-memory multiprocessors. This dissertation describes Hive, an operating system wit... / I Hive Operating System Fault Containment For
4832.7 The Meerkat Multicomputer: Tradeoffs in Multicomputer Architecture - Bedichek (1994)(Correct)
The Meerkat Multicomputer: Tradeoffs in Multicomputer Architecture
by Robert C. Bedichek
Co-Chairpersons of Supervisory Committee: Professor Henry M. Levy
Professor Edward D. Lazowska
Department of C... / . . Operating System Implications of Meerkat- br the Intel Touchstone Delta System operated by Caltech on behalf of the
4741.1 NASA Langley's Research and Technology-Transfer Program in Formal.. - Butler, Carreño, Di Vito.. (1998)(Correct)
This paper presents an overview of NASA Langley's research program in formal methods.
The major goals of this work are to make formal methods practical for use on high integrity
systems, to orchestrat... / and actuators. The RCP operating system provides the applications br Often the physical fault-tolerance features of these systems are
4729.7 Performance Availability for Networks of Workstations - Arpaci-Dusseau (1999)(Correct)
Performance Availability
for Networks of Workstations
by
Remzi H. Arpaci-Dusseau
Software systems for large-scale distributed and parallel machines are difficult to build.
When run in dynamic, pro... / . . . Operating System . br are unaware of the specifics of system operation. The problem of attaining
4644.4 Program Representation And Execution In Real-Time Multiprocessor.. - Niehaus (1994)(Correct)
PROGRAM REPRESENTATION AND EXECUTION
IN REAL-TIME MULTIPROCESSOR SYSTEMS
FERUARY 1994
DOUGLAS NIEHAUS,
B.S., NORTHWESTERN UNIVERSITY
M.S., UNIVERSITY OF MICHIGAN
Ph.D., UNIVERSITY OF MASSACHUSETTS AMH... / methods a predictable operating system implementation and real-time br which performs the required file system operation and writes the appropriate
4615.2 The Enterprise Executive - Wong (1992)(Correct)
Enterprise is a graphical programming environment for designing, coding, debugging,
testing, monitoring, profiling and executing programs in a distributed hardware environment.
Enterprise code looks l... / refers to the computer's operating system. As Comer wrote br . . . Fault Tolerance
4586.4 System Support for Software Fault Tolerance in Highly Available.. - Sullivan (1992)(Correct)
Today, software errors are the leading cause of outages in fault tolerant systems. System availability can be improved despite software errors by fast error detection and recovery techniques that mini... / Ibm Systems Programs The Mvs Operating System And The Ims Dbms And Db
4296.9 Research Topics for Graduate Students - Science, York (1994)(Correct)
This document outlines some of the present research interests of
most members of the Department who are in a position to supervise
the research of students entering during 1999/2000 academic year. The... / research interests include operating systems and kernels virtual binary br in software functionality fault-tolerance is achieved. Problems with
4260.7 Local Anonymity In The Internet - Martin, Jr. (1999)(Correct)
Packet-switched computer networks of all sizes are widely used for personal, professional, and governmental
communication. However, the speed, versatility, and largely unregulated nature of computer n... / . . Operating System . br . . . Fault Tolerance .
4237.7 A Proxy Based Filtering Mechanism for the Mobile Environment - Zenel (1998)(Correct)
A Proxy Based Filtering Mechanism for the Mobile
Environment
Bruce Zenel
Host mobility complicates the standard networking model in unexpected ways. It
increases network heterogeneity, causing diff... / Related Work . Operating System Support for Dynamic Systems . br to the area in which my system operates. . Implementing another
4217.6 Scheduling Algorithms and Operating Systems Support for Real-Time.. - Ramamritham, Stankovic (1994)(Correct)
This paper summarizes the state of the real-time field in the areas of scheduling and
operating system kernels. Given the vast amount of work that has been done by both the
operations research and com... / Scheduling Algorithms and Operating Systems Support for Real-Time br . . Scheduling with Fault Tolerance Constraints
4010.4 Efficient Implementations of Software Architectures via Partial.. - Marlet, Thibault, Consel (1999)(Correct)
The notion of flexibility (that is, the ability to adapt to changing requirements or
execution contexts) is recognized as a key concern in structuring software, and many architectures
have been desi... / available platforms hardware operating systems etc.and features the br as well as safety fault tolerance and quality of service.
4000.2 WebOS: Operating System Services for Wide Area Applications - Vahdat (1997)(Correct)
In this paper, we argue for the power of providing a common
set of OS services to wide area applications, including
mechanisms for resource discovery, a global namespace, remote
process execution, res... / WebOS Operating System Services for Wide Area br resolution load balancing and fault tolerance. Second we provide a file
3938.9 Modular Specification Of Interaction Policies In Distributed Computing - Sturman (1996)(Correct)
Software executing on distributed systems consists of many asynchronous, autonomous components
which interact in order to coordinate local activity. The need for such coordination, as well
as requirem... / the underlying hardware and operating system as well as on the br that exports the desired system operations to the programming
3909.2 The UCSD Active Web - Pasquale (1997)(Correct)
The UCSD Department of Computer Science and Engineering recently submitted a
proposal for large-scale Research Infrastructure funding to the National Science
Foundation. The theme of the proposal is... / its strengths in network and operating systems design security br This includes monitoring system operation overseeing backups and
3851.0 Competitive Execution in a Distributed Environment - Cho (1996)(Correct)
of the Dissertation
Competitive Execution
in a Distributed Environment
by
Sung Hyun Cho
Doctor of Philosophy in Computer Science
University of California, Los Angeles, 1996
Professor David R. Jeffer... / protocols are transparent operating system facilities that involve br . . Replication to Improve Fault Tolerance
3834.9 WebOS: Software Support for Scalable Web Services - Amin Vahdat (1997)(Correct)
The burgeoning popularity of the Web is pushing against
the performance limits of the underlying infrastructure, presenting
a number of difficult challenges for the Web as a system.
We believe that re... / building a higher level Web operating system to efficiently manage these br communication scheduling fault tolerance and authentication. To this
3657.1 Efficient Reliable Group Communication For Distributed Systems - Kaashoek, Tanenbaum (1994)(Correct)
Many applications can profit from broadcast communication, but few operating systems provide primitives
that make broadcast communication available to user applications. In this paper we introduce pri... / communication but few operating systems provide primitives that br trade performance against fault tolerance. . Introduction Many
3626.9 TACOMA - fundamental abstractions supporting agent computing in a.. - Sudmann (1996)(Correct)
The concept of migrating processes between networked computers is not a new
one. However, a new computing paradigm is emerging in which an agent is
able to migrate between nodes in a heterogeneous net... / of this project is to provide operating system support for agents and br agent model. Privacy security fault tolerance and heterogeneity are
3538.1 Distributed Software Engineering - Invited State-of-the-Art Report - Kramer(Correct)
The term "Distributed Software Engineering" is
ambiguous
1
. It includes both the engineering of
distributed software and the process of distributed
development of software, such as cooperative work... / computers with independent operating systems connected to the network br services with replication for fault tolerance than to try and provide the
3518.8 Cluster-Based Scalable Network Services - Fox, Gribble, Chawathe, Brewer.. (1997)(Correct)
This paper has benefited from the detailed and perceptive comments of our reviewers, especially our shepherd Hank Levy. We thank Randy Katz and Eric Anderson for their detailed readings of early draft... / it is normally viewed as an operating system Multics Multiplexed br policies are left to the system operator. We describe our experiments
3394.8 Extensible Cluster-Based Scalable Network Services - Fox (1997)(Correct)
This paper has benefited from the detailed and perceptive comments of our reviewers, especially our shepherd Hank Levy. We also thank Randy Katz, Eric Anderson, David Culler provided valuable feedback... / it is normally viewed as an operating system Multics Multiplexed br policies are left to the system operator. We describe our
3345.9 Programming Languages: Specification - Rajan (1998)(Correct)
This thesis introduces Multiclock Esterel, a synchronous language, suitable for application
areas including embedded reactive control and digital hardware design where
considerable effort is directed ... / th Workshop on Real-Time Operating Systems and Software pages -
3336.8 A brief survey of systems providing process or object migration.. - Nuttall (1994)(Correct)
Migration is the movement of an active entity from one machine to another during execution. Such
migration may be used for dynamic load balancing purposes with the aim of gaining increased performance... / facilities As published in Operating Systems Review October Volume br object persistence improved fault tolerance and potentially more
3314.1 Toward The Design Of Large-Scale, Shared-Memory Multiprocessors - Scott (1992)(Correct)
The state-of-the-art in multiprocessing today employs thousands of high-performance
microprocessors. As system sizes continue to grow, increasing care must be taken to design
cost-efficient, balanced ... / software front as well. Many operating system issues such as memory br attention must be paid to fault tolerance which becomes increasingly
3286.5 Microkernel Operating Systems In Parallel Architectures - Blum (1994)(Correct)
MICROKERNEL
OPERATING SYSTEMS
IN
PARALLEL ARCHITECTURES
by
JOACHIM BLUM
In the past few years operating systems' complexity has increased substantially
because of the growing number of required serv... / Computer Science Microkernel Operating Systems In Parallel Architectures
3250.9 Market-Based Massively Parallel Internet Computing - Cappello, Christiansen, Neary.. (1997)(Correct)
Recent advances in Internet connectivity and implementations of safer distributed computing through languages such as Java provide the foundation for transforming computing resources into tradable com... / sets word sizes or operating systems. The infrastructure proposed br for faking computations. Fault tolerance In a potentially
3244.3 Flexible and Adaptive Control of Real-Time Distributed Object.. - Loyall, Atlas, Schantz, Gill.. (1999)(Correct)
Next-generation distributed systems have growing demands for real-time quality of service
(QoS), flexibility, and control over the often unpredictable environments in which they are
deployed. These... / and the underlying operating systems protocol stacks and br and preserve QoS during system operation. TAO's inband mechanisms
3240.9 GLUnix: a Global Layer Unix for a Network of Workstations - Ghormley (1997)(Correct)
ions
To provide remote execution of both parallel and sequential jobs, GLUnix extends some existing UNIX abstractions and introduces
new abstractions, borrowing heavily from MPP environments such as ... / and implementation of GLUnix operating system middleware for a cluster of br provide hooks for application fault-tolerance mechanisms. ffl
3137.5 Fault Tolerant Matrix Operations for Networks of Workstations Using.. - Plank (1997)(Correct)
Networks of workstations (NOWs) offer a cost effective platform for high-performance, long-running parallel
computations. However, these computations must be able to tolerate the changing and often fa... / a generalpurpose time-sharing operating system and each is often owned by a br scientific computing. The fault-tolerance is based on diskless
3105.6 Computing in the RAIN: A Reliable Array of Independent Nodes - Vasken Bohossian (1998)(Correct)
The RAIN project is a research collaboration between Caltech and NASA-JPL on distributed
computing and data storage systems for future spaceborne missions. The goal of the project is to
identify and d... / run in conjunction with operating system services and standard br Through software-implemented fault tolerance the system tolerates
3080.3 ULTRA III: Implementing a Scalable Shared-Memory Multiprocessor - Project (1989)(Correct)
MIMD system currently available, beyond bus-connected systems
with modest numbers of processors, is the BBN Butterfly. This relative scarcity of highly-parallel sharedmemory
machines is due to a commo... / programming environments operating systems coordination primitives br compilers I O issues and fault tolerance. Parallelizing compilers are
3028.5 HFS: A Performance-Oriented Flexible File System Based on.. - Krieger (1996)(Correct)
ing with credit is permitted. To copy otherwise, to republish, to post on
servers, to redistribute to lists, or to use any component of this work in other works, requires prior
specific permission and... / HFS as part of the Hurricane operating system running on the Hector shared br and of some basic file system operations. Section . presents the
3007.8 Algorithm-Based Diskless Checkpointing for Fault Tolerant Matrix.. - Plank (1995)(Correct)
This paper is an exploration of diskless checkpointing for distributed scientific computations. With the widespread use of the "Network Of Workstation" (NOW) platform for distributed computing, long-r... / type and each runs a special operating system so that every node is a br the algorithms are tuned for fault-tolerance and present the performance
2957.0 A Synopsis of the Legion Project - Grimshaw, Wulf, French, Weaver, Jr. (1994)(Correct)
The coming of giga-bit networks makes possible the realization of a single nationwide virtual computer
comprised of a variety of geographically distributed high-performance machines and workstations. ... / cannot replace existing host operating systems we cannot significantly br file and data access fault-tolerance ease-of-use and user
2955.5 Graduate Course: Reactive and Real-Time Systems - Koren, Tyszberowicz(Correct)
This article describes a graduate course on the
subject of "Reactive and Real-Time Systems", which
serves as the basis for courses taught by the authors at
Bar-Ilan University and at Tel-Aviv Universi... / networks computer operating systems man-machine interfaces br require a high degree of fault tolerance and are embedded in larger
2944.5 The Design, Implementation and Evaluation of RETHER: A Real-Time.. - Venkatramani (1996)(Correct)
of the Dissertation
The Design, Implementation and Evaluation of RETHER : A
Real-Time Ethernet Protocol
by
Chitra Venkatramani
Doctor of Philosophy
in
Computer Science
State University of New York a... / receiver ends run real-time operating systems and the jitter to and from br . v . . Fault Tolerance .
2933.1 Differentiated and Predictable Quality of Service in Web Server.. - Aron (2000)(Correct)
As the World Wide Web experiences increasing commercial and mission-critical use, server systems are expected to deliver high and predictable performance. The phenomenal improvement in microprocessor ... / management facilities in the operating system software are studied. This
2834.7 Persistent Store In A Dynamic Resource Management Environment - Bridgland (1994)(Correct)
9
Acknowledgements 11
Dedication 12
Definition of Terms 13
1 Introduction 14
1.1 Persistent Store Resource Management : : : : : : : : : : : : : : : 15
1.2 Trends in Operating System Design : : : : : ... / . Trends in Operating System Design br showing the average number of system operations' per HWO call for various
2805.9 Lessons from FTM: an Experiment in the Design and Implementation of a .. - Muller, al. (1995)(Correct)
This report describes an experiment in the design of a general purpose
fault tolerant system, FTM. The main objective of the FTM design was to implement
a "low-cost" fault tolerant system that could... / standard workstations. At the operating system level our goal was to br systems while offering fault tolerance transparency to user
2801.2 Secondary Storage Garbage Collection for Decentralized Object-Based.. - Björnerstedt (1990)(Correct)
This paper describes a mechanism for secondary storage garbage collection that may be used
to reclaim inaccessible resources in decentralized persistent object based systems.
Schemes for object addres... / virtual as provided by the operating system and hardware for the lowest br memory architecture such as fault tolerance and non-volatility. The
2797.6 Object Models for Distributed or Persistent Programming - Cahill Nixon (1997)(Correct)
As use of object orientation for application development has increased,
many researchers have investigated the design of object-based programming
languages for the distributed and persistent programmi... / systems network systems operating systems and computer architecture br constraints performance fault-tolerance ffl Support for
2739.5 Network Multicomputing Using Recoverable Distributed Shared Memory - John Carter (1993)(Correct)
A network multicomputer is a multiprocessor in which the
processors are connected by general-purpose networking
technology, in contrast to current distributedmemory multiprocessors
where a dedicated s... / may involve traps into the operating system kernel interrupts context br checkpointing to provide fault tolerance. Measurements of our
2724.3 Using Group Communication to Implement a Fault-Tolerant Directory.. - Kaashoek, Tanenbaum, Verstoep (1993)(Correct)
Group communication is an important paradigm for building distributed applications. This paper discusses a fault-tolerant distributed directory service based on group communication, and compares it wi... / the claim that a distributed operating system should provide both remote br which does not provide any fault tolerance at all. The paper concludes
2723.3 An Overview of the NYU Ultracomputer Project - Gottlieb (1986)(Correct)
The NYU Ultracomputer is a shared memory MIMD parallel computer design to contain
thousands of processors connected by an Omega network to a like number of
memory modules. A new coordination primitive... / designing and implementing the operating system. The Ultracomputer machine
2707.8 Enterprise: An Interactive Graphical Programming Environment For.. - Chan, Lu, al. (1992)(Correct)
Workstation environments have been in use for more than a decade now. Although a network
of workstations together represents a large amount of aggregate computing power, single users
often cannot util... / heterogeneous computers and operating systems and the complexity of br synchronization and fault tolerance allowing the rapid
2707.1 An Optically Interconnected Distributed Shared Memory System.. - Bogineni, Dowd (1992)(Correct)
This paper introduces an optically interconnected distributed shared memory (OIDSM) system. The
distributed shared memory (DSM) approach integrates both shared memory and distributed memory
system ide... / implementation in the Aegis operating system of the Apollo domain. The br networks have high fault-tolerance due to their passive nature
2677.2 Group Orientation: a Paradigm for Modern Distributed Systems - Paulo Ver'issimo (1992)(Correct)
Increasing use of distributed systems, with the corresponding
decentralisation, stimulates the need for
structuring activities around groups of participants,
for reasons of consistency, user-friendlin... / are penetrating too slowly in operating systems technology. Two important br activity for performance or fault-tolerance reasons e.g. replicated
2669.6 Extensible Resource Management For Cluster Computing - Islam, Prodromidis, Squillante.. (1996)(Correct)
this paper we present a new resource management system for allocating resources among such
applications in general-purpose distributed-memory parallel computers. Our system, Octopus, consists of sever... / these systems also provide operating system infrastructures for creating br scheduling strategies and a fault-tolerance strategy. Our hierarchical
2637.8 A Survey of Object-Oriented Concepts - Nierstrasz (1989)(Correct)
The object-oriented paradigm has gained popularity in various guises not only in programming languages, but in user interfaces,
operating systems, databases, and other areas. We argue that the fundame... / but in user interfaces operating systems databases and other areas.
2588.9 Cluster I/O with River: Making the Fast Case Common - Arpaci-Dusseau, Anderson, Treuhaft..(Correct)
We introduce River, a data-flow programming environment and I/O
substrate for clusters of computers. River is designed to provide maximum
performance in the common case --- even in the face of nonunif... / of sources unexpected operating system activity uneven load br be useful ffl Application Fault Tolerance. The ultimate goal is to
2578.7 Fast Group Communication for Standard Workstations - Vogels, Rodrigues, Veríssimo (1992)(Correct)
This paper presents a Group Communication Service suitable for standard
workstations. The communication service is designed to take
advantage of the technology offered by modern standard Local Area
Ne... / exploit specific network and operating system properties. Additionally br activity for performance or fault-tolerance reasons e.g. replicated
2544.4 Microkernels Meet Recursive Virtual Machines - Ford, Hibler, Lepreau, Tullmann.. (1996)(Correct)
This paper describes a novel approach to providing modular and extensible operating system functionality, and encapsulated environments, based on a synthesis of microkernel and virtual machine concept... / modular and extensible operating system functionality and br management demand paging fault tolerance and debugging support to
2539.4 FT-SR: A Programming Language For Constructing Fault-Tolerant.. - Thomas (1993)(Correct)
13
CHAPTER 1: INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : 15
1.1 Dependable System Construction---Princip... / of the x-kernel HP an operating system kernel designed for br can possibly be violated during system operation. The more restrictive the
2501.5 Hints for Computer System Design - Lampson (1983)(Correct)
Studying the design and implementation of a number of computer has led to some general hints for system design. They are described here and illustrated by many examples, ranging from hardware such as ... / at the. th ACM Symposium on Operating Systems Principles and appeared in br and use of a distributed file system. Operating Systems Review
2468.0 Formal Methods Technology Transfer: A View from NASA - Caldwell (1996)(Correct)
Since 1988 NASA Langley Research Center has supported a formal methods research group. From its inception, a primary goal of the program has been the transfer of formal methods technology into aerospa... / systems. Because adding faulttolerance mechanisms to a system
2462.9 An Evaluation of the Amoeba Group Communication System - Kaashoek, Tanenbaum (1996)(Correct)
The Amoeba group communication system has two
unique aspects: (1) it uses a sequencer-based protocol with
negative acknowledgements for achieving a total order on
all group messages; and (2) users cho... / of the Amoeba distributed operating system The delay for a br users choose the degree of fault tolerance they desire. This paper
2447.3 Optimizing Parallel Applications for Wide-Area Clusters - Bal, Plaat, Bakker, Dozy, Hofman (1998)(Correct)
Recent developments in networking technology cause a growing interest in connecting local-area
clusters of workstations over wide-area links, creating multilevel clusters. Often, latency and bandwidth... / GFLOPS peak performance. The operating system used on DAS is BSD OS from br such as heterogeneity fault-tolerance security accounting and
2442.2 Fault Manager for Distributed Operating Environments Design.. - Sens (1998)(Correct)
This paper presents the design, implementation, and performance evaluation of a
software fault manager for distributed applications. Dubbed STAR, it uses the natural
redundancy existing in networks of... / portable to UNIXTM-like operating systems. The current implementation br to offer a high level of fault tolerance. Fault management is
2439.7 Customizing Dependability with Reusable Software Components - Sturman, Agha (1996)(Correct)
Many large software systems have different components with varying requirements
for robustness and performance. Moreover, dependability requirements often change
throughout their software life-cycle. ... / techniques are fixed in the operating system or application code must be br correcting codes. The software fault-tolerance techniques used with each
2429.5 Practical Byzantine Fault Tolerance - Castro, Liskov (1999)(Correct)
This paper describes a new replication algorithm that is able to tolerate Byzantine faults. We believe that Byzantine-fault-tolerant algorithms will be increasingly important in the future because mal... / of the Third Symposium on Operating Systems Design and Implementation br snfsd executes file system operations directly in the memory
2423.4 Process Introspection: A Checkpoint Mechanism for High Performance.. - Ferrari (1996)(Correct)
The Process Introspection project is a design and implementation effort, the main goal of which is to construct a general purpose, flexible, efficient checkpoint/restart mechanism appropriate for use ... / on one architecture or operating system platform must be restartable br to implement a number of basic fault tolerance and load balancing schemes.
2362.3 RAIDframe: Rapid prototyping for disk arrays - Gibson, al. (1996)(Correct)
The complexity of advanced disk array architectures makes accurate representation necessary, arduous,
and error-prone. In this paper, we present RAIDframe, an array framework that separates architectu... / device driver in the Sprite operating system Chen b Lee To this br RAID level provides the same fault-tolerance but rotates parity units over
2345.2 Brevix Design 1.01 - The Brevix(Correct)
ions of devices for replication,
partitioning, striping, and so on.
Physical storage media and the drives
they are accessible through
Automatic migration of
parcels between levels
of the storage hiera... / Operating Systems Research Department br Future design work . Fault tolerance .
2307.6 Parallel Operating Systems - Garcia, Ferreira, Guedes(Correct)
ion Layer (HAL) - All these components are layered
on top of a hardware abstraction layer. This layer hides hardware specific
details, such as I/O interfaces and interrupt controllers, from the NT exe... / IV. Parallel Operating Systems Jo ao Garcia Paulo
2292.7 Isatis: A Customizable Distributed Object-Based Runtime System - Michel Banatre(Correct)
This paper discusses the design and implementation of a
customizable distributed object-based runtime system. Our main goal
in the system's design was to provide a distributed object-based system
su... / the abstractions provided by operating systems and the ones offered by br management mechanisms e.g.fault tolerance mechanism for Gothic and
2291.2 The DIOM Approach to Large-scale Interoperable Database Systems - Ling Liu(Correct)
A large-scale interoperable database system operating in a dynamic environment should provide a
uniform access user interface to its components, scalability to larger networks, evolution of database
s... / e.g.hardware platforms operating systems DBMS's Distributed br interoperable database system operating in a dynamic environment
2282.7 An Overview of Checkpointing in Uniprocessor and Distributed Systems, .. - Plank (1997)(Correct)
Checkpointing is the act of saving the state of a running program so that it may be reconstructed later
in time. It is an important basic functionality in computing systems that paves the way for powe... / is performed by the operating system. Typically any program can br facilitated by checkpointing Fault-tolerance rollback recovery Debugging
2250.8 Applying Adaptive Middleware to Manage End-to-End QoS for.. - Gill, Levine, Kuhns, Schmidt, al.(Correct)
Delivering end-to-end quality of service (QoS) for diverse
classes of distributed applications remains a significant R&D
challenge. While individual technologies based on prior research
have touched u... / of research and commercial operating systems networks and protocols now br a higher-level description of system operating regions decouples the
2246.8 ORCHESTRA: A Fault Injection Environment for Distributed Systems - Scott Dawson (1996)(Correct)
This paper reports on orchestra, a portable fault injection environment for testing implementations of
distributed protocols. The paper focuses on architectural features of orchestra that provide port... / on the Real-Time Mach operating system and later ported to other br and validation of the fault-tolerance and timing characteristics of
2237.2 The Distributed Interoperable Object Model and Its Application to.. - Liu, Pu (1995)(Correct)
A large-scale interoperable database system operating in a dynamic environment should provide uniform
access user interface to its components, scalability to larger networks, evolution of database
sch... / e.g.hardware platforms operating systems DBMS's Distributed br interoperable database system operating in a dynamic environment
2214.7 Programming Language Support for Writing Fault-Tolerant Distributed.. - Schlichting, Thomas (1995)(Correct)
Good programming language support can simplify the task of writing fault-tolerant
distributed software. Here, an approach to providing such support is described in which
a general high-level distribut... / using the x-kernel an operating system designed for experimenting br augmented with mechanisms for fault tolerance. Unlike approaches based on
2214.5 Safety Kernel Enforcement of Software Safety Policies - Wika (1995)(Correct)
Computing systems in which the consequences of failure are very serious are termed
safety-critical. Many such systems exist in application areas such as aerospace, defense,
transportation, power-gene... / also an issue and that the operating system and network implementation br . . . System Operation .
2208.3 Filters: QoS Support Mechanisms for Multipeer Communications - Yeadon, García, Hutchison, Shepherd (1996)(Correct)
The nature of distributed multimedia applications
is such that they require multipeer communication support
mechanisms. The multimedia traffic needs to be delivered to
end-systems, networks and end-us... / developed within a UNIX-like operating system. I. INTRODUCTION D br are mainly employed to support fault tolerance and distribution. They
2196.0 StratOSphere: Mobile Processing of Distributed Objects in Java - Wu, al. (1998)(Correct)
We describe the design and implementation of StratOSphere, a framework which unifies distributed objects and mobile code applications. We begin by first examining different mobile code paradigms that ... / and packages distributed operating systems Gos and distributed br processes for loadbalancing fault-tolerance and resilience. Systems
2153.3 Frangipani: A Scalable Distributed File System - Thekkath, Mann, Lee (1997)(Correct)
The ideal distributed file system wouldprovide all its users with coherent,
shared access to the same set of files,yet would be arbitrarily
scalable to provide more storage space and higher performanc... / through the standard operating system call interface. Programs br much of its scalability fault tolerance and easy administration from
2117.4 CESIUMSPRAY: A Precise and Accurate Global Time Service for.. - Veríssimo, Rodrigues, Casimiro (1997)(Correct)
In large-scale systems, such as Internet-based distributed systems, classical clocksynchronization
solutions become impractical or poorly performing, due to the number of
nodes and/or the distance amo... / machinery network and operating system may exhibit a precision br with at least one GPS-node. Fault-tolerance is achieved by replicating
2108.9 Transparent Result Caching - Vahdat, Anderson (1998)(Correct)
The goal of this work is to develop a general framework
for transparently managing the interactions and dependencies
among input files, development tools, and output
files. By unobtrusively monitoring... / of the th ACM Symposium on Operating Systems Principles pp. - br Note that techniques from the fault tolerance community could potentially
2086.6 Open Heterogeneous Computing in ActorSpace - Christian Callsen (1994)(Correct)
A number of efforts in heterogeneous computing involve the development of basic
architecture independent communication primitives. We present a new programming
paradigm, called ActorSpace, which provi... / and transparency at the operating system level. The important issue is br discussed as a way of achieving fault-tolerance in operating systems
2083.3 Automated Fault-Inject Based Dependability Analysis of Distributed.. - Stott(Correct)
Recently, there has been interest in developing a dependability benchmarks for computer systems. This will
require a way to inject several different types of faults into many different platforms and a... / the exit status for most operating systems this includes any uncaught br are non-intrusive-that is the system operates exactly the same while faults
2073.6 GATOSTAR: A Fault Tolerant Load Sharing Facility for Parallel.. - Bertil Folliot (1994)(Correct)
This paper presents how and why to unify load sharing and fault
tolerance facilities. A realization of a fault tolerant load sharing facility,
GATOSTAR, is presented and discussed. It is based on th... / software methods avoiding all operating system modification and hardware br why to unify load sharing and fault tolerance facilities. A realization of
2072.1 A Structured Approach to Redundant Disk Array Implementation - II., al. (1996)(Correct)
Error recovery in redundant disk arrays is typically performed in an ad hoc fashion, requiring architecture-specific
code which limits extensibility and is difficult to verify. In this paper, we descr... / device driver in the Sprite operating system Chen a Lee To this br Drapeau Menon multiple fault tolerance ATC Blaum STC
2066.7 The Role Of Network Traffic Statistics In Devising Object Migration.. - Jonnalagadda (1997)(Correct)
OF THE THESIS
The Role of Network Traffic Statistics in Devising Object
Migration Policies
by Lakshmikanth S. Jonnalagadda
Thesis Director: Professor James L. Flanagan
We target the problem of improvi... / were some attempts to provide operating system support for object migration. br to lowly loaded ones Fault Tolerance Objects can be copied or
2048.4 Chameleon: A Software Infrastructure for Adaptive Fault Tolerance - Kalbarczyk Bagchi (1999)(Correct)
This paper presents Chameleon, an adaptive infrastructure, which allows different levels of
availability requirements to be simultaneously supported in a networked environment. Chameleon
provides depe... / ARMORs the hardware and the operating system. Keywords adaptive fault br Infrastructure for Adaptive Fault Tolerance Z. Kalbarczyk S. Bagchi
2018.2 Failure Recovery Algorithms for Multi-Disk Multimedia Servers - Shenoy, Vin(Correct)
In this paper, we present two novel disk failure recovery methods that utilize the inherent characteristics
of video streams for efficient failure recovery. Whereas the first method exploits the seque... / to software failures and operating system crashes customers of br redundant disk arrays RAID fault tolerance video compression algorithms
2013.6 The synchronous dataflow programming language LUSTRE - Halbwachs, Caspi, Raymond, Pilaud (1991)(Correct)
This paper describes the language Lustre, which is a dataflow synchronous language, designed for programming reactive systems --- such as automatic control and monitoring systems --- as well as for de... / by PRC C CNRS operating systems Reactive systems apply br for reasons of performance fault tolerance and functionality
2012.7 Previous Work in Distributed Operating Systems NOW Retreat - Kim Keeton(Correct)
this paper: servers broadcast (un)availability when status changes; unknown Previous Work in Distributed Operating Systems
NOW Retreat
Kim Keeton, Steve Rodrigues, and Drew Roselli
April 3, 1995
Remo... / Previous Work in Distributed Operating Systems NOW Retreat Kim Keeton br Distributed Operating System Operating Systems Review v. no.
2010.6 Highly Reliable Upgrading of Components - Cook, Dage (1999)(Correct)
After a system is deployed, fixes, enhancements, and modifications all occur that change
the components that make up the system. Unfortunately, new versions of components
can introduce new errors and ... / have not addressed software fault tolerance many of the issues
1976.5 Compiler-Assisted Checkpointing - Micah Beck (1994)(Correct)
In this paper we present compiler-assisted checkpointing, a new technique which uses static
program analysis to optimize the performance of checkpointing. We achieve this performance
gain using libckp... / library runtime system or operating system and require no effort on br library runtime system or operating system and require no
1962.6 Failure Recovery Algorithms for Multimedia servers - Shenoy, Vin (1999)(Correct)
In this paper, we present two novel disk failure recovery methods that utilize the inherent characteristics of video
streams for efficient recovery. Whereas the first method exploits the inherent redu... / to software failures and operating system crashes customers of br redundant disk arrays RAID fault tolerance video compression
1941.1 SuperWeb: Research Issues in Java-Based Global Computing - Alexandrov, Ibel, Schauser, Scheiman (1996)(Correct)
The Internet, in particular the World-Wide-Web, continues to expand at an amazing pace.
We propose a new infrastructure, SuperWeb, to harness global resources, such as CPU cycles or
disk storage, and ... / the computation or require operating system modifications. As a br Some work in the area of fault-tolerance with malicious failures can
1928.4 Mobility and Extensibility in the StratOSphere Framework - Wu, Agrawal, Abbadi (1999)(Correct)
We describe the design and implementation of our StratOSphere project, a framework which unifies distributed objects and mobile code applications. We begin by first examining dioeerent mobile code par... / and packages distributed operating systems Gos and distributed br processes for load-balancing fault-tolerance and resilience. Systems such
1918.8 Perspectives for High Performance Computing in Workstation Networks - Strumpen, Ramkumar, Casavant, Reddy(Correct)
Networks of workstations have become increasingly popular for high
performance computing. However, in order to becomea real alternative for MPPs,
reliability and efficiency issues must be tackled. I... / reliability of conventional operating systems are likely to hinder br at user-level. Up to now fault tolerance issues have been treated
1867.1 Adaptive Recovery for Mobile Environments - Nuno Neves (1997)(Correct)
Mobile computing allows ubiquitous and continuous access
to computing resources while the users travel or work
at a client's site. The flexibility introduced by mobile computing
brings new challenges ... / contents are lost or the operating system crashes. The first type of br new challenges to the area of fault tolerance. Failures that were rare
1861.1 Dynamic Reconfiguration of Distributed Applications - Hofmeister (1993)(Correct)
Applications requiring concurrency or access to specialized hardware are naturally
written as distributed applications, where each software component (module) can execute
on a different machine, and m... / on top of existing operating systems and compilers requiring no br for load balancing software fault tolerance adaptation to changes in
1860.9 Performance of Consistent Checkpointing in a Modular Operating.. - Muller, Hue, Peyrouze (1994)(Correct)
This paper presents an evaluation of the performance of a consistent checkpointing mechanism
that has been integrated into a modular Mach micro-kernel based operating system. We
have measured the perf... / Checkpointing in a Modular Operating System Results of the FTM br our objective was to provide fault tolerance transparency to user
1854.8 Checkpointing Distributed Shared Memory - Luis Silva (1997)(Correct)
Distributed shared memory (DSM) is a very promising programming model for exploiting
the parallelism of distributed memory systems, since it provides a higher level of abstraction
than simple message ... / facility of the operating system neither requires the use of br have some kind of support for fault-tolerance. In this paper we present a
1853.9 A Web-Based Distributed Programming Environment - Aoki (1999)(Correct)
A Web-Based Distributed Programming
Environment
Kiyoko F. Aoki
A Java-based system, the GeoJAVA System, that allows a user to remotely
compile his/her own C/C++ programs and execute them for visualiz... / space without specialized operating systems to handle such a procedure br which performs some checks for fault tolerance. For example in the case
1853.7 Testing of Fault-Tolerant and Real-Time Distributed Systems via.. - Scott Dawson (1996)(Correct)
As software for distributed systems becomes more complex, ensuring
that a system meets its prescribed specification is a growing
challenge that confronts software developers. This is particularly
impo... / on the Real-Time Mach operating system and later ported to other br and validation of the fault-tolerance and timing characteristics
1829.3 Supporting Customized Failure Models for Distributed Software - Hiltunen, Immanuel, Schlichting (1999)(Correct)
The cost of employing software fault-tolerance techniques in distributed systems is
strongly related to the type of failures to be tolerated. For example, in terms of the
amount of redundancy required... / On The Osf ri Mk . Mach Operating System And Cords tmr A Variant br The cost of employing software fault-tolerance techniques in distributed
1819.3 Formal Design and Verification of a Reliable Computing Platform For.. - Butler, Di Vito (1992)(Correct)
In this paper the design and formal verification of the Reliable Computing Platform
(RCP), a fault-tolerant computing system for digital flight control applications,
are presented. The RCP utilizes NM... / of a fault-tolerant operating system that schedules and executes br this limit is exceeded during system operation the system will mask the
1804.2 A Coherent Distributed File Cache With Directory Write-behind - Mann (1994)(Correct)
Extensive caching is a key feature of the Echo distributed file system. Echo client machines maintain coherent caches of file and directory data and properties, with write-behind (delayed write-back) ... / for the module in the client operating system that performs these br seen by applications on file system operations and reducing the read
1788.8 Angel: Resource Unification in a 64-bit Micro-Kernel - Murray, Stiemerling, Wilkinson, Kelly (1993)(Correct)
The appearance of 64-bit processors allows a new approach to microkernel design --- a single unified address space. This paper describes this kind of approach as adopted in Angel. From our experience ... / based message passing operating system relatively typical in br the operating system as certain system operations performed by one thread
1788.5 Fail-Safe Concurrency in the Eclipse System - Knop, Rego (1996)(Correct)
Local or wide-area heterogeneous workstation clusters are relatively cheap and highly effective, though inherently unstable operating environments for long-running distributed computations. We found t... / issued by any process the operating system is made to provide combined br computations Without fault tolerance these computations may never
1785.6 Microlanguages for Operating System Specialization - Pu (1997)(Correct)
Specialization is a technique that has the potential to provide operating system clients with the performance and functionality that they need, while still retaining the advantages of a simple generic... / Microlanguages for Operating System Specialization Calton br performance reliability and fault tolerance of the resulting kernel and
1783.9 Fault Tolerance Issues in Data Declustering for Parallel Database.. - Golubchik, Muntz (1994)(Correct)
Maintaining the integrity of data and its accessibility are crucial tasks in database systems. Although
each component in the storage hierarchy can be fairly reliable, a large collection of such
compo... / interested in a continuously operating system we use the term data loss br in data loss. Furthermore the system operates at a degraded level of
1768.5 Cost-Effective Software Based Fault-Tolerant Routing in Pipelined.. - Young-Joo Suh(Correct)
This paper presents a software based approach to fault-tolerant routing in networks using
wormhole or virtual cut-through switching. When a message encounters a faulty output link, it is
removed from ... / layer of the local node's operating system. The message passing software br virtual cut-through switching fault tolerance interconnection networks
1766.8 Accessing Files in an Internet: The Jade File System - Rao, Peterson (1993)(Correct)
This paper introduces the Jade File System, which provides a uniform way to name and access
files in an internet environment. Jade is a logical system that integrates a heterogeneous
collection of exi... / supported by the Unix operating system to access files on the local br remote access methods and fault tolerance. Designing an internet-wide
1765.0 Javelin: Internet-Based Parallel Computing Using Java - Cappello, Christiansen, Ionescu.. (1997)(Correct)
Java offers the basic infrastructure needed to integrate computers connected to the Internet into a
seamless parallel computational resource: a flexible, easily-installed infrastructure for running co... / different CPUs and different operating systems. Also many machines are br high priority enhancement. Fault tolerance The Broker is responsible
1759.0 2K: A Reflective, Component-Based Operating System for Rapidly.. - Kon (1998)(Correct)
Modern computing environments face both low-frequency infrastructural changes, such as
software and hardware upgrades, and frequent changes, such as fluctuations in the network
bandwidth and CPU load.... / A Reflective Component-Based Operating System for Rapidly Changing br mobility load balancing fault tolerance and quality of service for
1758.1 Supporting High-performance I/O in QoS-enabled ORB Middleware - Kuhns, Levine, Schmidt, O'Ryan (2000)(Correct)
To be an effective platform for high-performance distributed
applications, off-the-shelf Object Request Broker (ORB) middleware,
such as CORBA, must preserve communication-layer
quality of service (Qo... / and overview of the Solaris operating system. Supporting br concurrency control and fault tolerance. This requires an efficient
1716.4 ARMADA Middleware and Communication Services - Abdelzaher, Bjorklund, Dawson, Feng, .. (1997)(Correct)
Real-time embedded systems have evolved during the past several decades from small customdesigned
digital hardware to large distributed processing systems. As these systems become more complex,
thei... / and emerging standards in operating systems and communication services. br that provide support for fault-tolerance and end-to-end guarantees
1702.7 Fail-safe PVM: A portable package for distributed programming with.. - Leon (1993)(Correct)
Many scientific problems benefit from computationsthat are parallel at a coarse grain. Collections of looselycoupled, heterogeneous computers are increasingly being applied to these problems. While in... / require modifications to the operating system. We describe the design and br Pvm Distributed Computing Fault-Tolerance Checkpoint Abd Asap
1698.2 An Overview of MSHN: The Management System for Heterogeneous Networks - Hensgen, Kidd, John, Schnaidt.. (1999)(Correct)
The Management System for Heterogeneous Networks
(MSHN) is a resource management system for use in
heterogeneous environments. This paper describes the
goals of MSHN, its architecture, and both comple... / is similar to a distributed operating system in that it views the set of br level of availability and more fault tolerance than would be available from
1697.6 Execution-Driven Simulation Of Error Recovery Techniques For.. - Frazier, Tamir (1997)(Correct)
DERT (Distributed Error Recovery Testbed) is a testbed for simulation and performance evaluation of several classes of application-transparent distributed error recovery schemes. DERT is built on top ... / the target architecture and operating system. Thus simulation accuracy br is less disruptive to system operation e.g. because only
1693.7 Dynamic User Management System for web sites - Christian (2000)(Correct)
With the growing quantity of information around the world, besides the software development
community, many other fields are interested in finding solutions for efficient information
management.
In t... / of the approach on an operating system. In the development of this br concurrency scalability fault tolerance and transparency. Though the
1681.9 Group Orientation: a Paradigm for Distributed Systems of the Nineties - Veríssimo, Rodrigues (1992)(Correct)
Increasing use of distributed systems, with the corresponding
decentralization of activities, stimulates the
need for structuring those activities around groups
of participants, for reasons of consist... / using large objects ffl operating system support providing threads br Ciencia. performance or fault-tolerance reasons eg. replicated
1680.2 CesiumSpray: a Precise and Accurate Global Clock Service for.. - Veríssimo, Rodrigues, al. (1997)(Correct)
In large-scale systems, such as Internet-based distributed systems, classical clocksynchronization
solutions become impractical or poorly performing, due to the number of nodes
and/or the distance. ... / machinery network and operating system may exhibit a precision in br in a legitimate situation of system operation i the system has stopped
1676.2 A Flexible Software Architecture for High Availability Computing - Iyer Kalbarczyk Whisnant (1998)(Correct)
This paper presents an overview of the Chameleon architecture for supporting a wide range of
criticality requirements in a heterogeneous network environment. Chameleon employs ARMORs---
Adaptive, Reco... / we can insert hooks in the operating system to trap all network I O br the dependability of the system. Operation in a network of
1661.7 Fault-Tolerant RT-Mach (FT-RT-Mach) and an Application to Real-Time.. - Egan, Kutz, Mikulin, Melhem..(Correct)
Even though real-time systems have the stringent constraint of completing tasks before
their deadlines, many existing real-time operating systems do not implement fault tolerance
capabilities. In this... / many existing real-time operating systems do not implement fault br systems do not implement fault tolerance capabilities. In this paper
1656.2 Process State Capture and Recovery in High-Performance Heterogeneous.. - Ferrari (1998)(Correct)
Process Introspection is a fundamentally new solution to the process state capture and recovery
problem suitable for use in high-performance heterogeneous distributed systems. A process
state capture ... / system of one architecture or operating system platform must be recoverable br . . . Fault Tolerance .
1639.6 A Compiler-based Approach to Fault-Tolerance in Real-Time Systems - Ganesh Marlowe(Correct)
Real-time systems are characterized as systems in
which the correctness of the system depends both on
agreement of the result of the computation with the
intended semantics, and on the timeliness with... / from the language compiler operating system and architecture. We sketch br A Compiler-based Approach to Fault-Tolerance in Real-Time Systems A. K.
1635.4 Chameleon: Software Infrastructure for Adaptive Fault Tolerance - Bagchi, Whisnant, Kalbarczyk (1999)(Correct)
This paper presents Chameleon, an adaptive infrastructure, which allows different levels of availability requirements to be simultaneously supported in a networked environment. Chameleon provides depe... / ARMORs the hardware or the operating system. Keywords adaptive fault br the dependability of the system. Operation in a network of
1635.4 Automated Techniques for Designing Embedded Signal Processors on.. - Kang, Gerber, Golubchik (1998)(Correct)
In this paper, we present a performance-based technique to help synthesize high-bandwidth
radar processors on commodity platforms. This problem is innately complex, for a number of
reasons. Contempora... / CPU Scheduler for Multimedia Operating Systems. In Proceedings of br Finally there is the problem of fault-tolerance. Traditionally
1630.6 Implementing Dynamic Atomic Actions Using Reliable Servers - Hue, Muller, Peyrouze, Rochat (1993)(Correct)
this paper we present the overall implementation of dynamic atomic
actions using reliable servers. We describe an environment for building reliable servers which insulate
programmers from mechanisms u... / to implement in a standard operating system due to shared memory br mechanisms used to achieved fault tolerance. To provide availability of
1627.4 Technology Transfer Project with ENEA 1996 - Mellin (1996)(Correct)
This document describes a technology transfer activity of distributed active
real-time database management systems in general, and the DeeDS prototype
in particular, from University of Skovde to ENEA ... / using the OSE Delta real-time operating system kernel will visit ENEA br applications and how the fault-tolerance of DeeDS can be improved.
1615.8 I/O Performance of Scientific-Parallel Applications under PAFS - Toni Cortes (1996)(Correct)
In this paper we present the behavior of PAFS in a scientific environment where big parallel
applications are run. PAFS is a parallel/distributed file system with a cooperative cache that
avoids the c... / network runs a micro-kernel operating system and all services are handled br is also the case for the file-system operations. This operating-system
1607.9 Use of Imprecise Computation to Enhance Dependability of Real-Time.. - Liu (1994)(Correct)
In a system based on the imprecise-computation technique, each time-critical task is designed in such a way that it can produce a usable, approximate result in time whenever a failure or overload prev... / is being built on the Mach operating system to implement this br together with traditional fault-tolerance methods to reduce the costs
1599.4 On-line Error Monitoring for Several Data Structures - Jonathan Bright (1995)(Correct)
this paper, we consider the problem of detecting
errors in the answers given in response to data structure
queries. For many programs a substantial fraction of the
intricate error-prone code resides i... / to perform software based fault tolerance including recovery blocks
1596.4 Pact - A Fault Tolerant Parallel Programming Environment - Maier (1993)(Correct)
this article, a new approach to parallel programming is presented which not only makes parallel programming very easy, but also provides user-transparent fault-tolerance. Programming ease in Pact is o... / Introduction Since operating systems for parallel computers have br guarantees user-transparent fault-tolerance with low overhead by using
1592.1 CLIP: A Checkpointing Tool for Message-Passing Parallel Programs - Chen (1997)(Correct)
Checkpointing is a useful technique for rollback recovery of parallel applications. While extensive research has been performed on checkpointing in parallel environments, there are few checkpointers a... / but often requires operating system modifications However it br tool like CLIP can provide fault-tolerance on a massively parallel
1587.9 Stardust: an Environment for Parallel Programming on Networks of.. - Cabillic, Puaut (1996)(Correct)
This paper describes Stardust, an environment for parallel programming on networks of heterogeneous
machines. Stardust runs on distributed memory multicomputers and networks of workstations.
Applicati... / workstations and different operating systems more computing power is br be chosen. . . Operating system Operating systems running on the
1585.4 Discovery and Hot Replacement of Replicated Read-Only File Systems.. - Zadok (1993)(Correct)
We describe a mechanism for replacing files, including open files, of a read-only file
system while the file system remains mounted; the act of replacement is transparent to
the user. Such a "hot repl... / and outside the operating system we have made use of the Amd br desirable because running file system operations over many network hops is
1582.7 HFS: A Flexible File System for large-scale Multiprocessors - Orran Krieger (1993)(Correct)
The Hurricane File System (HFS) is a new file system
being developed for large-scale shared memory multiprocessors
with distributed disks. The main goal of this file
system is scalability; that is, th... / systems the application and operating system must cooperate to maximize br multiprocessors. We ignore fault tolerance at the level of disk failures
1562.6 Fault Detection Using Hints from the Socket Layer - Nuno Neves (1997)(Correct)
Fault detection in distributed systems is usually accomplished using a variation of the polling or
watch-dog techniques. With these techniques, however, a tradeoff has to be made between the speed
of ... / a process. We considered the operating system as a black box and looked br in performance due to the fault-tolerance mechanisms. In this paper we
1561.6 A Software Overview of HARTS: A Distributed Real-Time System - Shin, Kandlur, Kiskis, Dodd..(Correct)
Introduction 3
1.1 Introduction
It has become a common practice to use digital computers for such embedded realtime
applications as computer-integrated manufacturing, industrial process control,
def... / environment is comprised of an operating system called HARTOS and three br SWG and HMON to validate the fault-tolerance mechanisms of HARTS.
1552.9 Secure High Performance Group Communication - McDaniel (1997)(Correct)
1 Introduction
The growth in collaborative applications has mirrored the expansion of distributed networks. Group
based applications provide users more flexibility in the form and content of computer... / The Amoeba distributed operating system uses the group br provide reliability security fault tolerance and ordering semantics have
1551.9 Software Fault-Tolerant Distributed Applications in LiPS - Setz(Correct)
This paper illustrates how software fault-tolerant distributed applications
are implemented within LiPS version 2.4, a system for distributed computing
using idle-cycles in networks of workstation.
Th... / environment of different operating systems network protocols or br hypercomputing software fault-tolerance Linda idle-time recovery
1551.7 Using Reflection for Incorporating Fault-Tolerance Techniques into.. - Nguyen-Tuong, Grimshaw (1999)(Correct)
As part of the Legion metacomputing project, we have developed a reflective
model, the Reflective Graph & Event (RGE) model, for incorporating
functionality into applications. In this paper we apply... / Second USENIX Symposium on Operating Systems Design and Implementation br Reflection for Incorporating Fault-Tolerance Techniques into Distributed
1542.4 Adaptability Using Reflection - Sonntag, Härtig, Kowalski.. (1994)(Correct)
Adaptability, i.e. the ability of a system to adapt dynamically
to changes in its execution environment, is
considered as an important property of computer systems.
Scaling directory replication in na... / features into the BirliX operating system So we are able to br in performance security or fault tolerance. A few examples are