This directory is created automatically and some papers may be mislabeled. Only document within the CiteSeer database are listed. The directory is intended to provide entry points for browsing the database and is not intended to be authoritative. Papers may not appear in all relevant categories. For example, papers in a sub-category may not appear in higher level categories.
Distributed Shared State (Position Paper) - Scott, Chen, Dwarkadas, Tang (2003)(Correct)
Increasingly, Internet-level distributed systems are oriented as much toward information access as they are toward computation. From computer-supported collaborative work to peer-to-peer computing, e-... / integrated S-DSM into the operating system kernel in order to support br scalability latency and fault tolerance most distributed
Improving Availability with Recursive Micro-Reboots: A Soft-State.. - Candea, Cutler, Fox (2003)(Correct)
Even after decades of software engineering research, complex computer systems still fail. This paper makes the case for
increasing research emphasis on dependability and, specifically, on improving av... / synchronous disk writes many operating systems cache metadata updates in br daemon process to provide fault tolerance for long-running UNIX
A Group Membership Protocol For An Intrusion-Tolerant Group.. - Ramasamy (2002)(Correct)
Group Communication Systems have been developed to address the problem of maintaining consistency of replicated information. This thesis describes the research work that resulted in the design, develo... / the middleware level the operating system level or the hardware level. br tolerance focuses on keeping the system operational in spite of benign
STORM: Lightning-Fast Resource Management - Frachtenberg, Petrini, Fernandez.. (2002)(Correct)
Although clusters are a popular form of high-performance computing (HPC), they remain more difficult to manage than sequential systems, or even symmetric multiprocessors. Furthermore, as cluster sizes... / GB I O buses node Operating system Red Hat Linux . br process-scheduling algorithms fault tolerance or usage policies can be
Memory Mapped Networks: a new deal for Distributed Shared Memories ? - The Scifs Experience (2002)(Correct)
Distributed Shared Memories (DSM) performance has always
suffered from high network latencies and software
communication layers with a large overhead. Memory
mapped networks such as Scalable Coherent ... / memory without involving the operating system. To show how DSM systems can br be kept smaller and hardware fault tolerance is improved. It is also
Scalable, Efficient Range Queries for Grid Information Services - Andrzejak, Xu (2002)(Correct)
Recent Peer-to-Peer (P2P) systems such as Tapestry, Chord or CAN act primarily as a Distributed Hash Table (DHT). A DHT is a data structure for distributed storing of pairs (key, data) which allows fa... / for example the type of the operating system network address CPU speed br by adding self-organization faulttolerance and an ability to efficiently
Loose Synchronization of Multithreaded Replicas - Basile, Whisnant, Kalbarczyk, Iyer (2002)(Correct)
Although multithreading can improve performance, it is a
source of nondeterminism in application behavior. Existing
approaches to replicating multithreaded applications either
synchronize replicas at ... / algorithm interferes with the operating system scheduler only when granting br and middlewares providing fault tolerance to CORBA objects. . Loose
Requirements of a Middleware for Managing a large - Hughes (2002)(Correct)
Programmable networking is an increasingly popular area of research in both industry and academia. Although most programmable network research projects seem to focus on the router architecture rather ... / it is located between the operating system and the application. The br and transactions. Security fault tolerance and usability are also
Flexible Distributed Process Topologies for Enterprise - Applications Christoph Hartwich (2002)(Correct)
Enterprise applications can be viewed as topologies of distributed processes that access
business data objects stored in one or more transactional datastores. There are several wellknown
topology patt... / distributed heavyweight operating system level processes address br properties like scalability fault tolerance or response time.
Group Communications And Database Replication: Techniques, Issues and .. - Wiesmann (2002)(Correct)
Databases are an important part of today's IT infrastructure: both companies and state institutions rely on database systems to store most of their important data. As we are more and more dependent on... / to all the people in the Operating System Lab and the Distributed br One way to ensure the fault-tolerance of a system is by replicating
On Improving Thread Migration: Safety and Performance - Jiang, Chaudhary (2002)(Correct)
Application-level migration schemes have been paid more attention recently because of their great potential for heterogeneous migration. unknown On Improving Thread Migration: Safety and
Performance
... / migration is a part of the operating system. Threads are moved around br dynamic load distribution fault tolerance eased system administration
UMLinux - A Versatile SWIFI Tool - Sieh, Buchacker (2002)(Correct)
This tool presentation describes UMLinux, a versatile framework for
testing the behavior of networked machines running the Linux operating system
in the presence of faults. UMLinux can inject a vari... /
Architecture-Based Exception Handling - Issarny, Banâtre (2001)(Correct)
Architecture-based development environments are becoming
an effective solution towards the construction of robust distributed
systems. Through the abstract description of complex software
systems conf... / or the underlying operating system in which case it takes the br has been paid to software fault tolerance and in particular exception
Towards Global Storage Management and Data Placement - Veitch, Riedel, Towers, Wilkes (2001)(Correct)
As users' and companies' dependence on shared, networked information services continues to increase, we will see continued growth in large data centers and service providers. This will happen both as ... / in networking security operating system design systems management br on the other hand for fault-tolerance of critical services and to
Practical Byzantine Fault Tolerance - Castro (2001)(Correct)
Our growing reliance on online services accessible on the Internet demands highly-available systems
that provide correct service without interruptions. Byzantine faults such as software bugs, operator... /
Dynamic Software Updating - Hicks, Moore, Nettles (2001)(Correct)
Many important applications must run continuously and
without interruption, yet must be changed to x bugs or upgrade
functionality. To date, no existing dynamic updating
system has achieved a practic... /
Introducing Fault Tolerance to Distributed Maple - Schreiner, Kusper, Bosa (2001)(Correct)
We have extended the parallel computer algebra environment Distributed
Maple by fault tolerance mechanisms such that the time spent
in a long running computation is not any more wasted by the eventu... / our control machine network operating system faults may happen in any br the root creates a task the system operates according to one of two modes
First Specification of APIs and Protocols for the MAFTIA Middleware - Neves, Verissimo (2001)(Correct)
This document describes the rst speci cation of the APIs and Protocols for the MAFTIA Middleware. The architecture of the middleware subsystem has been described in a previous document, where the seve... / TTCB and of course the Operating System OS This deliverable is br Malicious- and Accidental-Fault Tolerance for Internet Applications
Fault-Tolerant Cluster Management For Reliable High-Performance.. - Li, Goldberg, Tao, Tamir (2001)(Correct)
Clusters of COTS workstations/PCs are commonly used to implement cost-effective high-performance systems. A central coordinator/manager is often the simplest way to implement many of the operations re... / Recovery . Introduction Operating systems such as Amoeba that br as high-level coordination of fault tolerance mechanisms and interactions
Proactive Management of Software Aging - Castelli, Harper, Heidelberger.. (2001)(Correct)
this
paper may be copied or distributed royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of
this paper must be... / process group or entire operating system depending on the br another process. Most current fault-tolerance techniques are reactive in
LocALE: a Location-Aware Lifecycle Environment for Ubiquitous.. - de Ipina (2001)(Correct)
The LocALE (Location-Aware Lifecycle Environment)
framework provides a simple management interface for
controlling the lifecycle of CORBA distributed objects. It
supports mechanisms for the remote con... / is programmed its underlying operating system or any other system aspects br automatic activation and fault-tolerance facilities for the services
Middleware Support for Voting and Data Fusion - Zhiyuan (2001)(Correct)
Middleware is a class of software systems above the operating system which is becoming widely used for programming distributed systems. Voting is a fundamental operation when distributed systems invol... / of software systems above the operating system which is becoming widely br with respect to performance fault tolerance precision and correctness
The Architecture of a Secure Group Communication System Based on.. - Correia, Veríssimo, Neves (2001)(Correct)
This paper presents the architecture of a secure group
communication system with the fortress model of trust, where
the participants of the group equally trust one another. We
consider that only a sma... / they all assume that the operating system can be considered to be a br the recursive use of fault tolerance and fault prevention
The Design and Implementation of a Fault-Tolerant Cluster Manager - Ming (2001)(Correct)
Cluster management middleware schedules tasks on a cluster, controls access to shared resources, provides for task submission and monitoring, and coordinates the cluster's fault tolerance mechanisms. ... / local copy of an off-the-shelf operating system that is not designed to br and coordinates the cluster's fault tolerance mechanisms. Thus reliable
Enhancing Survivability of Security Services using Redundancy - Hiltunen, Schlichting, Ugarte (2001)(Correct)
Traditional distributed system services that provide guarantees
related to confidentiality, integrity, and authenticity enhance
security, but are not survivable since each attribute is
implemented by ... / of security where the existing operating system already provides br failure when considering fault-tolerance attributes. This problem is
The Marvel Programming Model: a higher-order distributed process.. - Schmitt, Stefani (2001)(Correct)
Contents
1 Introduction 2
1.1 Requirements for a distributed programming model . . . . . . . . . . . . . . . . . . . . . 3
1.2 Introducing the M-calculus . . . . . . . . . . . . . . . . . . . . . . .... / programming language e.g. operating system processes failure br abstractions e.g. for fault-tolerance or multi-party communication
Susceptibility of Modern Systems and Software to Soft Errors - Messer, Bernadat, Fu, Chen.. (2001)(Correct)
It is widely understood that most downtime is accounted for by
programming errors and administration time. However, recent
work has indicated an increasing cause of downtime may stem
from transient ha... / and the susceptibility of operating systems and applications to them br Cornell's Hypervisor-based fault tolerance system provides a similar
M-Calculus: A Higher-Order Distributed Process Calculus - Schmitt, Stefani (2001)(Correct)
this paper a new process calculus, called the M-calculus, which represents an attempt at
defining a formal distributed programming model. Key insights for the calculus are similar to those laid out
in... / programming language e.g. operating system processes failure br abstractions e.g. for fault-tolerance or multi-party communication
A Tutorial Of Lustre - Halbwachs, RAYMOND (2001)(Correct)
This document is an introduction to the language Lustre V4 and its associated
tools. We will not give a systematic presentation of the language, but
a complete bibliography is added. The basic referen... /
Performance Evaluation of the Quadrics Interconnection - Petrini, Coll, Frachtenberg, Hoisie (2001)(Correct)
In this paper we present an in-depth description of the Quadrics interconnection network (QsNET) and an experimental performance evaluation on a 64-node Alphaserver cluster. We expose the performance ... / User-level Communication Operating System Bypass. Introduction br for Quality of Service QoS fault-tolerance remote direct memory access
Handoff of Application Sessions Across Time and Space - Phan, Xu, Guy, Bagrodia (2001)(Correct)
Personal computing on mobile platforms such as laptops and
personal digital assistants, rather than in a traditional desktop
environment, is becoming increasingly more common. In this paper
we address... / of the architecture and operating system thus allowing a session to br migration in the fields of fault-tolerance and load-balancing. Process
Fault Tolerance for Cluster Computing Based on Functional Tasks - Schreiner, Kusper, Bosa (2001)(Correct)
We have extended the parallel computer algebra environment
Distributed Maple by fault tolerance mechanisms such that the
time spent in a long running computation is not any more wasted by
the eve... / our control machine network operating system faults may happen in any br hti message is issued the system operates analogously to a task
Gang Scheduling with Lightweight User-Level Communication - Frachtenberg, Petrini, Coll, Feng (2001)(Correct)
In this paper, we explore the performance of gang scheduling on a cluster using the Quadrics interconnection network. In such a cluster, the scheduler can take advantage of this network's unique capab... / delay removing the operating system from the communication br communication patterns and fault tolerance. The Elan network interface
JVM Susceptibility to Memory Errors - Deqing Chen Alan (2001)(Correct)
Modern computer systems are becoming more powerful and are using larger memories. However, except for very high end systems,
little attention is being paid to high availability. This is particularly t... / CPU status and can notify the operating system to handle the exception. br techniques to study the fault tolerance of UNIX systems. Fine is a
IP Network Configuration for Intradomain Traffic Engineering - Feldmann, Rexford (2001)(Correct)
The smooth operation of the Internet depends on the careful configuration of routers in thousands
of autonomous systems throughout the world. Configuring routers is extremely complicated because
of ... / For example Cisco's Internet Operating System IOS has over commands. br or peer for load balancing and fault tolerance. The delivery of traffic
The L4Ka Vision - Dannowski, Elphinstone, Liedtke.. (2001)(Correct)
Microkernels are minimal but highly flexible kernels. Both conventional and non-classical operating systems can be built on top or adapted to run on top of them. Microkernel-based architectures should... / and non-classical operating systems can be built on top or br including reliability and fault tolerance protection and security.
A Highly Adaptable Infrastructure for Service Discovery and.. - Lalana Kagal Vladimir (2001)(Correct)
In an age where wirelessly networked appliances and devices are becoming commonplace, there is a necessity for providing
a standard interface to them that is easily accessible by any mobile user. The ... / components managed by an operating system Gaia OS which acts as a br state management and increased fault tolerance. Even in the event of
Building Modern Distributed Systems - Pautet, Quinot, Tardieu (2001)(Correct)
Ada 95 has been the first standardized language to include distribution
in the core language itself. However, the set of features required by the Distributed
Systems Annex of the Reference Manual is... / hardware with a working operating system. More advanced concepts such br account advanced needs such as fault tolerance code migration or persistent
RTLinux with Address Spaces - Mehnert, Hohmuth, Schönberg, Härtig (2001)(Correct)
The combination of a real-time executive and an o#-the-shelf time-sharing operating system has the
potential of providing both predictability and the comfort of a large application base. To isolate t... / an o the-shelf time-sharing operating system has the potential of br This increased level of fault tolerance is desirable for many
An Erlang-based hierarchical distributed VoD System - Juan Sanchez Jose (2001)(Correct)
Video on Demand (VoD) is a service that enables users to request
any multimedia content at any time, without being constrained by any
pre-established scheduling. Current commercial solutions tend to... / tools has allowed for easy operating system and hardware neutrality. br time for any user. Fault tolerance with x expected uptime
Dealing with Denial-of-Service Attacks in Agent-enabled Active and.. - Karnouskos (2001)(Correct)
Denial of Service (DoS) attacks is a well-known
problem with victims even among prestigious commercial
sites. Such attacks in traditional networking are difficult
to recognize and to handle. An active... / buffer etc The Node Operating System NodeOS provides the basic br traffic and dependencies fault tolerance etc. The number of
Automatic Failure Detection and Recovery for Java Servers - Klemm, Singh (2001)(Correct)
Increasingly, server systems such as e-commerce and telecommunications servers are partially or
completely implemented in the Java programming language. One reason why many developers
prefer Java over... / are often only caught by the operating system which can result in the br in C or Cwith no additional fault tolerance provisions. Suppose this
Differentiated and Predictable Quality of Service in Web Server.. - Aron (2000)(Correct)
As the World Wide Web experiences increasing commercial and mission-critical use, server systems are expected to deliver high and predictable performance. The phenomenal improvement in microprocessor ... / management facilities in the operating system software are studied. This
Supporting High-performance I/O in QoS-enabled ORB Middleware - Kuhns, Levine, Schmidt, O'Ryan (2000)(Correct)
To be an effective platform for high-performance distributed
applications, off-the-shelf Object Request Broker (ORB) middleware,
such as CORBA, must preserve communication-layer
quality of service (Qo... / and overview of the Solaris operating system. Supporting br concurrency control and fault tolerance. This requires an efficient
Dynamic User Management System for web sites - Christian (2000)(Correct)
With the growing quantity of information around the world, besides the software development
community, many other fields are interested in finding solutions for efficient information
management.
In t... / of the approach on an operating system. In the development of this br concurrency scalability fault tolerance and transparency. Though the
Hierarchical Error Detection in a Software Implemented Fault.. - Bagchi, Srinivasan, Whisnant.. (2000)(Correct)
This paper proposes a hierarchical error detection framework for a Software Implemented Fault
Tolerance (SIFT) layer of a distributed system. A four-level error detection hierarchy is proposed in the
... / of platforms hardware and operating systems They can migrate from one br in a Software Implemented Fault Tolerance SIFT Environment
Slipstream Processors: Improving both Performance and Fault Tolerance - Sundaramoorthy, al. (2000)(Correct)
Processors execute the full dynamic instruction stream to arrive at
the final output of a program, yet there exist shorter instruction
streams that produce the same overall effect. We propose creating... / effect. Therefore the operating system creates two redundant br Improving both Performance and Fault Tolerance ABSTRACT Processors
Supporting a Flexible Parallel Programming Model on a Network of.. - Huang (2000)(Correct)
Execution Model
We provide an abstract parallel machine with shared memory to the programmer,
so the users are not concerned with message passing, data and execution distribution,
machine failures, a... / modification of the underlying operating system. Nested parallelism br high reliability distributed system. Operating Systems Review
Design and Implementation of QoS enabled OO Middleware - Kachroo, Krishnamurthy, Akers.. (2000)(Correct)
The current interest in the commodity Internet for commercial
purposes has helped to fuel R&D into advanced networks and
distributed applications. Much of this research is addressing a
common problem ... / endsystems Advances in operating system technology and techniques br availability and fault tolerance. Adaptive applications are
Scheduling with Global Information in Distributed Systems - Petrini, Feng (2000)(Correct)
One of the major problems faced by the developers of parallel programs is the lack of a clear separation between the programming model and the operating system. In this paper, we present a new methodo... / the programming model and the operating system. In this paper we present a br non-trivial implementation of fault tolerance and the lack of a
On the Integration of Configuration and Meta-Level Programming.. - Loques, Sztajnberg, Leite, Lobosco (2000)(Correct)
Configuration Programming, based on Architecture Description
Languages, and Meta-Level Programming are considered promising
approaches in the software engineering field. This paper shows that ther... / facilities of a particular operating system or by any mix of resources br from application modules during system operation. Configuration programming
Applying a Pattern Language to Develop Application-level Gateways - Schmidt (2000)(Correct)
Developers of communication applications must address recurring
design challenges related to efficiency, extensibility,
and robustness. These challenges are often independent of
application-specific r... / and relationships. Moreover operating system OS platform features br event loop integration and fault tolerance. Successful communication
Distributing Trust on the Internet - Cachin (2000)(Correct)
This paper describes an architecture for secure and fault-tolerant service replication in an
asynchronous network such as the Internet, where a malicious adversary may corrupt some
servers and contr... / vary in their con guration operating system physical location load etc. br way for enhancing the fault tolerance of centralized components is
Time-Sharing Parallel Jobs in the Presence of Multiple Resource.. - Fabrizio Petrini And (2000)(Correct)
Buered coscheduling is a new methodology that can substantially
increase resource utilization, improve response time, and simplify
the development of the run-time support in a parallel machine. I... / Job Scheduling Distributed Operating Systems Communication Protocols br at kernel level to provide fault tolerance in the communication. For
EtheReal: A Fault Tolerant Host-Transparent Mechanism for Bandwidth.. - Varadarajan (2000)(Correct)
of the Dissertation
EtheReal: A Fault Tolerant Host-Transparent Mechanism for
Bandwidth Guarantees over Switched Ethernet Networks
by
Srinidhi Varadarajan
Doctor of Philosophy
in
Computer Scien... / any changes to the end host operating system and network br they carry this legacy in their fault tolerance mechanisms which while
A Study of Slipstream Processors - Purser, al. (2000)(Correct)
A slipstream processor reduces the length of a running
program by dynamically skipping computation non-essential
for correct forward progress. The shortened program
runs faster as a result, but it is ... / is instantiated twice by the operating system and each copy has its own br performance trends and fault tolerance are related. Time redundancy
An Enabling Framework for Master-Worker Applications on the.. - Jean-Pierre Goux Department (2000)(Correct)
We describe MW -- a software framework that allows
users to quickly and easily parallelize scientific computations
using the master-worker paradigm on the computational
grid. MW provides both a "top l... / include the architecture operating system amount of memory disk br must address issues such as fault tolerance task scheduling and
Concepts for Dependable Distributed Discrete Event Simulation - Lüthi, Berchtold (2000)(Correct)
In many situations, parallel and distributed simulation is a well-suited approach to overcome performance as well as capacity limitations of complex simulation models. However, if distributed simulati... / which is provided by the operating system of each PE. The br Dependable Systems Fault Tolerance Hla Abstract In Many
Design Principles for Dynamic Object Systems - Salzmann (2000)(Correct)
Dynamic distributed object systems (e.g. Jini, Salutation) are a new generation of distributed object middleware that enable the system to adapt its configuration during runtime. Those kind of middlew... / Internet Transport Operating System Virtual Platform Middleware br example need a high grade of fault tolerance. Fallout may cause serious
DOORS: Towards High-performance Fault Tolerant CORBA - Balachandran Natarajan Dept (2000)(Correct)
An increasing number of applications are being developed
using distributed object computing middleware, such
as CORBA. Many of these applications require the underlying
middleware, operating systems, ... / the underlying middleware operating systems and networks to provide br scalability and fault tolerance. The Object Management Group
Failure Recovery Algorithms for Multimedia servers - Shenoy, Vin (2000)(Correct)
In this paper, we present two novel disk failure
recovery methods that utilize the inherent characteristics
of video streams for efficient recovery. Whereas the first
method exploits the inherent re... / to software failures and operating-system crashes customers of br disk arrays -RAID -Fault tolerance -Video compression
Integrating Subscription-based and Connection-oriented Communications .. - Kim, Hong, Kim, Kim (2000)(Correct)
Recently emerging component-based middleware technologies such as CORBA are widely believed
to be a viable solution to the software complexity problem of a distributed embedded computer
control syst... / on the mArx real-time operating system. Our measurements reveal br timing constraints and fault tolerance requirements. Recently
Data Replication Strategies for Fault Tolerance and Availability on.. - Amza, Cox, Zwaenepoel (2000)(Correct)
Recent work has shown the advantages of using persistent
memory for transaction processing. In particular, the
Vista transaction system uses recoverable memory to avoid
disk I/O, thus improving perfor... / namely power failures and operating system crashes. An un-interruptible br Data Replication Strategies for Fault Tolerance and Availability on Commodity
Implementing Journaling in a Linux Shared Disk File System - Kenneth Preslan Sistina (2000)(Correct)
In computer systems today, speed and responsiveness is often determined by network
and storage subsystem performance. Faster, more scalable networking interfaces
like Fibre Channel and Gigabit Ether... / code to the open source Linux operating system. We did this for several br caching and aggregating file system operations to improve performance by
S-DSM for Heterogeneous Machine Architectures - Eduardo Pinheiro Deqing (2000)(Correct)
Many---indeed most---distributed applications employ
some notion of distributed shared state: information required
at more than one location. For applications that
span the Internet, this state is alm... / program and signals from the operating system. The user calls support br for access control and fault tolerance. They must also accommodate
HADES: A distributed System for Dependable Hard Real-Time.. - Chevochot, Puaut, Cabillic, Colin.. (2000)(Correct)
Most dependable embedded real-time systems designed in the past have been specialized to meet the specific requirements of the application domain for which they were targeted, leading to inflexible an... / hard real-time operating system COTS components performance br signicant overhead for the basic system operations task creation context
Transparent Migration of Distributed Communicating Processes - Nasika, Dasgupta (2000)(Correct)
A Computing Community is a group of cooperating
machines that behave like a single system and runs all
general-purpose applications---without any modifications
to the shrink-wrapped binary applicat... / binary applications or the operating system. In order to realize such a br global scheduling fault tolerance and application
iMW: A Web-Based Problem Solving Environment for Grid Computing.. - Good, Goux (2000)(Correct)
Grid-enabled solvers are tied to complex grid computing platforms
and are therefore difficult to distribute. To make such solvers useful to a
wider community of users, remote access tools are needed... / and M. Humphrey. Legion An operating system for wide-area computing. br difficult issues such as fault tolerance task scheduling and
Highly configurable operating systems : the VVM approach - Piumarta, Folliot, Seinturier.. (2000)(Correct)
this paper is not to
tackle this problem, but rather to show that the VVM can be used as a \weaver constructor" for
any given weaving model (AspectJ-like, D-like, etc.). The natural way to do this is ... / Highly con gurable operating systems the VVM approach Ian br for communication fault tolerance mobility replication and
Extending MINIX with Real-Time Services and Fault Tolerance.. - Rogina, Wainer (2000)(Correct)
The MINIX operating system was extended with real-time services, ranging from A/D drivers to new scheduling algorithms and statistics collection. A testbed was constructed to tests several sensor repl... / Key-words Fault Tolerance Operating Systems Real-time Systems Sensing br with Real-Time Services and Fault Tolerance Capabilities Pablo J.
Operating System Management of MEMS-based Storage Devices - Griffin, Schlosser, Ganger, Nagle (2000)(Correct)
MEMS-based storage devices promise signicant
performance, reliability, and power improvements
relative to disk drives. This paper compares and
contrasts these two storage technologies and explores
ho... / of the th Symposium on Operating Systems Design and Implementation br to manage performance fault tolerance and power consumption. For
User-Level Infrastructure for System Call Interposition: A Platform.. - Jain, Sekar (2000)(Correct)
Several new approaches for detecting malicious attacks on computer systems and/or confining untrusted or malicious applications have emerged over the past several years. These techniques often rely on... / are implemented within the operating system kernel. We explore an br e.g.data encryption or fault-tolerance e.g.data replication It
Supporting Component-Based Software Development Using Domain Knowledge - Baum, al. (2000)(Correct)
A consistent implementation of component-based reuse
bears several implications for the design of the software
development process. For instance, requirements engineering
has to be tailored to particu... / at the domain of embedded operating systems which is perceived small br aspects such as scalability or fault tolerance the architecture as a whole
Architecture for a Grid Operating System - Krauter, Maheswaran (2000)(Correct)
A Grid architecture is proposed that is motivated by the large-scale routing principles in the Internet to provide an extensible, high-performance, scalable, and secure Grid. Central to the proposed a... / Architecture for a Grid Operating System Klaus Krauter and br of an intentional naming system Operating Systems Review Vol.
Evaluating The Performance of Non-Blocking Synchronisation on Modern.. - Tsigas, Zhang (2000)(Correct)
Parallel programs running on shared memory multiprocessors coordinate via shared data objects/structures. To ensure the consistency of the shared data structures, programs typically rely on some forms... / parallel application or by the operating system very often need to share data br locks . they provide high fault tolerance processor failures will
A Flexible, Interoperable Framework for Active Spaces - Kon, Hess, Román, Campbell, Mickunas (2000)(Correct)
this paper we describe the requirements faced by such a system and propose an integrated
architecture meeting these requirements. The paper focuses on a representation of Active
Spaces using standard ... / properly. Conventional operating systems already have a hard time br security and privacy fault-tolerance and quality of service.
Programming with Object Groups in CORBA - Felber, Guerraoui, Wiesmann (2000)(Correct)
Our Object Group Service extends CORBA with the ability to gather
several objects inside a group and to transparently handle the group membership
and the consistent invocations of the group members. W... / in Maf VB nor on any operating system facility e.g.as in br Request For Proposal on CORBA Fault Tolerance. This paper does not detail
An optimized MPI library for VIA/SCI cards - Sven Schindler Wolfgang (2000)(Correct)
Rapid developments in computer architecture and in networking
technology have driven the construction of clusters
of cluster. Now cluster computers are an inexpensive alternative
to parallel computers... / is necessary to involve the operating system kernel for starting DMA
Experimental evaluation of the fail-silent behavior of a distributed.. - Chevochot, Puaut (2000)(Correct)
Mainly for economic and maintainability reasons, more and more dependable real-time systems are built from Commercial Off-The-Shelf (COTS) components. To build these systems, a commonly-used assumptio... / COTS components hardware operating system The results show that with br they must be complemented by fault tolerance mechanisms error detection
Holistic Schedulability Analysis of a Fault-Tolerant Real-Time.. - Chevochot, Puaut (2000)(Correct)
The feasibility test of a hard real-time system must not
only take into account the temporal behavior of the application
tasks but also the behavior of the run-time support
in charge of executing appl... / components hardware and operating system COTS hardware and operating br a complex run-time support with fault-tolerance capabilities and made of
Request Sequencing: Optimizing Communication for the Grid - Arnold, Bachmann, Dongarra (2000)(Correct)
As we research to make the use of Computational Grids
seamless, the allocation of resources in these dynamic environments is
proving to be very unwieldy. In this paper, we introduce, describe and
... / popular variants of the UNIX operating system and parts of the system are br tasks. Users Applications Fault Tolerance Load Balancing Server NS
Fault Tolerant Wide-Area Parallel Computing - Weissman (2000)(Correct)
Executing parallel applications across distributed networks introduces
the problem of fault tolerance. A viable solution for fault tolerance must keep
overhead manageable and not compromise the hi... / in distributed systems and operating systems in which generic br introduces the problem of fault tolerance. A viable solution for fault
A Skeleton-Based Approach for the Design and Implementation of.. - Fethi Rabhi School (2000)(Correct)
It has long been argued that developing distributed software
is a difficult and error-prone activity. Based on previous
work on design patterns and skeletons, this paper
proposes a template-based appr... / programming language or host operating system. However such standards br data distribution and fault-tolerance. Part of the problem comes
A Simple, Fast and Scalable Non-Blocking Concurrent FIFO Queue for.. - Tsigas, Zhang (2000)(Correct)
A non-blocking FIFO queue algorithm for multiprocessor shared memory systems is presented in this paper. The algorithm is very simple, fast and scales very well in both symmetric and non-symmetric mul... / applications algorithms and operating systems for multiprocessor systems. br locks . they provide high fault tolerance processor failures will
Grid-Based File Access: The Legion I/O Model - White (2000)(Correct)
The unprecedented scale, heterogeneity, and varied usage patterns of grids pose significant technical challenges to any underlying file system that will support them. While grids present a host of new... / is an object-based grid operating system charged with reconciling a br scalability programming ease fault tolerance security and site autonomy.
Symbolic Program Execution Using the Erlang Verification Tool - Earle (2000)(Correct)
this article is as follows. First, we introduce Erlang, the
logic and the tool we use. Second, we present the symbolic program execution
and debugging techniques and sketch some of the ideas behind th... / are handled in an distributed operating system while maintaining the br by special architectures for fault tolerance robust hardware and
Reanimating SAFER in VDM-SL Using CORBA - Fenkam (2000)(Correct)
This paper presents a method for visual validation of systems based on their VDM-SL
specification. In a traditional development process acceptance tests are carried out too
late when a first release o... /
Nomad: A Scalable Operating System for Clusters of Uni and.. - Eduardo Souza De (2000)(Correct)
The recent improvements in workstation and interconnection network performance have popularized
the clusters of off-the-shelf workstations. However, the usefulness of these clusters is yet to
be ful... / Nomad A Scalable Operating System for Clusters of Uni and br high disk I O throughput and fault tolerance anyway For instance in
Home-based Release Consistency in Object-based Software DSM Systems - Markus Zahn Computing (2000)(Correct)
This paper discusses the application of consistency models in objectbased
software distributed shared memory (DSM) systems. In particular,
we propose a home-based release consistency protocol as appli... / the same multithreaded operating system is used for single- and br of Computer Design and Fault Tolerance University of Karlsruhe
Writing High-Performance Server Applications in Haskell Case Study: A .. - Marlow (2000)(Correct)
Server applications, and in particular networkbased
server applications, place a unique combination
of demands on a programming language:
lightweight concurrency and high I/O throughput
are both impor... / access to the log le. . Operating-System Threads Operating system br a malevolent client. Fault tolerance is as important as
An Approach For Network Communications Systems Recovery - Mitchell, Brown (2000)(Correct)
In this paper we examine the problem of failures within network communications and telecom systems and outline a localised Invisible Recovery solution to such systems. We introduce a new approach that... / pattern which is neither operating system nor protocol specific. This br know -way hand shake. Our IR system operates in the form of an envelope
Application-Level Fault Tolerance as a Complement to System-Level.. - Joshua Haines Jhaines (2000)(Correct)
As multiprocessor systems become more complex, their reliability will need to
increase as well. In this paper we propose a novel technique which is applicable to a wide variety
of distributed real-t... / system software includes the operating system and components such as the br Application-Level Fault Tolerance as a Complement to
Modeling and Analysis of Software Aging and Rejuvenation - Trivedi, Vaidyanathan.. (2000)(Correct)
Software systems are known to suffer from outages due
to transient errors. Recently, the phenomenon of "software
aging", one in which the state of the software system degrades
with time, has been repo... / data collected from the UNIX operating system over a period of time. The br design diversity techniques for fault tolerance in software systems such as
Distributed architectures - Fernandez (2000)(Correct)
evels enforce authorization constraints. Each level and
mappings can be described using OO models and patterns.
Authentication--- Uses cryptographic protocols
Filtering---Some objects need to be fil... / Heterogeneity-A variety of Operating systems Unix several varieties br legal or security reasons Fault tolerance-Ability to stay up in the
A Universal Framework for Managing Metadata in the Distributed Dragon .. - Wedde, Siepmann (2000)(Correct)
In the multimedia field, metadata are becoming increasingly important for efficiently cataloguing the abundant flood of information. (Metadata are data on information structures.) The number of electr... / in order to provide uniform operating system support running on a network br This results in higher system fault tolerance and faster local read
Injecting Distributed Capabilities into Legacy Applications Through.. - Boyd, Dasgupta (2000)(Correct)
Applications and operating systems can be augmented
with extra functionality by injecting additional middleware
into the boundary layer between them, without
tampering with their binaries. Using this ... / Abstract Applications and operating systems can be augmented with extra br are extractable during the system operation they are best captured
An object-oriented concurrent and distributed programming platform.. - Reinfelds (2000)(Correct)
y.
Today's demands on program and data dependability
are so strong and so important, that the design and
development of distributed concurrent multi-platform
application programs is becoming the rule ... / data files were handled by the operating system. The management of br distribution structure fault tolerance and security. The current
Supporting the Design of Adaptable Operating Systems Using.. - Netinant Constantinides Elrad (2000)(Correct)
Supporting separation of concerns in the design of operating
systems can provide a number of benefits such as
reusability, extensibility and reconfigurability. However,
in order to maximize these bene... / the Design of Adaptable Operating Systems Using Aspect-Oriented br scheduling and fault tolerance cut across the basic
Giotto: A Time-triggered Language for Embedded Programming - Henzinger, Horowitz, Kirsch (2000)(Correct)
Giotto provides an abstract programmer's model for the implementation
of embedded control systems with hard real-time constraints. A
typical control application consists of periodic software tasks t... / together with a real-time operating system Typical activities of the br and achieving a degree of fault tolerance through replication and error
The Cactus Approach to Building Configurable Middleware - Hiltunen, Schlichting (2000)(Correct)
Introduction
A number of fundamental abstractions and supporting software mechanisms have been developed for simplifying
the problems associated with programming highly dependable distributed systems... / the application and above the operating system. For each type of br state machine approach to fault tolerance by ensuring that changes to
Providing Infrastructure And Interface To High-Performance.. - Arnold, Dongarra, Lee, Wheeler (2000)(Correct)
The NetSolve project was established to aid scientists
who prefer not to be concerned with the usual tedium associated
with nding and maintaining software libraries
which they use to create programs,... / popular variants of the UNIX operating system and parts of the system are br Users Applications Fault Tolerance Load Balancing Server NS
Virtualizing Operating Systems for Seamless Distributed Environments - Boyd, Dasgupta (2000)(Correct)
Applications and operating systems can be augmented
with extra functionality by injecting additional middleware
into the boundary layer between them, without tampering
with their binaries. Using this ... / Virtualizing Operating Systems for Seamless Distributed br the support for the overall system operation within the context of the
Towards Rapid Development of Configurable, Reliable, and Scalable.. - Buskens, Sabnani (2000)(Correct)
This paper presents Aurora, a software toolkit that dramatically reduces the effort required to develop
configurable, reliable, and scalable wireless applications. The toolkit consists of software lib... / in these systems excluding operating system and third party class br that provide initialization and fault tolerance support typically needed by
Integration of CORBA Services with a Dynamic Real-time Architecture - Andreas Polze Janek (2000)(Correct)
The Common Object Request Broker Architecture (CORBA) is the most
successful representative for an object-based distributed computing
architecture. Although CORBA simplifies the implementation of co... / assumes that the underlying operating system supports either the priority br CORBA The idea of providing fault tolerance as additional feature to
Replication of CORBA Objects - Pascal Felber Rachid (2000)(Correct)
Distributed computing is one of the major trends in the computer
industry. As systems become more distributed, they also become
more complex and have to deal with new kinds of problems, such as
par... /
Harmonious Internal Clock Synchronization - Wedde, Freund (2000)(Correct)
Internal clock synchronization has been investigated, or employed, for quite a number of years, under the requirement of good upper bounds for the deviation, or accuracy, between a predefined master n... / capacity each The local operating system is Suse Linux . .This br for the purpose of achieving fault tolerance. Nearly all previous
The USC Autonomous Flying Vehicle (AFV) Project: Year 2000 Status - Montgomery (2000)(Correct)
this document. The BIT and health checks software has not yet been
implemented. The vision processing software runs on a ground-based computer but has not
yet been ported to execute onboard the AVATAR... / contains both the realtime operating system RTOS and flight software. br software for increasing robot fault tolerance ffl vision processing
Systematic Customization of Middleware - Zarras (2000)(Correct)
The urgent need to deal with problems that are frequently met in many different families of
application led to the evolution and standardization of a software layer that lies between the
application a... / application and the underlying operating system. This layer is widely known br security transactions fault tolerance etc. Middleware is typically
Extending the Execution Environment with DITools - Serra, Navarro (2000)(Correct)
This document describes the Dynamic Interposition Tools (DITools), a set of tools bringing an
environment in which dynamically-linked executables can be extended at runtime with unforeseen
functionali... / UPC-DAC- - Keywords operating systems extensibility br improvements e.g. to provide fault tolerance data stream encription
Survivability Measure - Millen (2000)(Correct)
nfigurations
Services
s s
s s s
1
1
Figure 1: Service Hierarchy
Services were given a survivability ordering: one service is no more survivable
than another if every service set that supports t... / as a mail application and an operating system. The hardware is also built br and software. To study fault tolerance and recon guration we
Using Redundancy to Increase Survivability - Hiltunen, Schlichting, Ugarte (2000)(Correct)
This paper focuses on two key requirements for using redundancy to improve survivability, the development
of appropriate techniques and the availability of suitable system support. We begin by discuss... / of security where the existing operating system already provides br system failure when considering faulttolerance attributes. This problem is
Intrusion Tolerant - Systems Partha Pal (2000)(Correct)
this paper
have outlined our approach, and presented several key
problems that we are currently investigating unknown Intrusion Tolerant Systems
Partha P. Pal, Franklin Webber,
Richard E. Schantz an... / such as routers. At the operating system level we may see br failures if we follow fault-tolerance terminology caused by the
Designing Efficient Fault-Tolerant Systems on Wireless Networks - Guohong Cao Department (2000)(Correct)
Introduction
The falling cost of both communication and mobile computing devices (laptop computers, hand-held computers,
etc.) is making mobile computing affordable to both business users and private... / the network level and the operating system level. At the network level br on wireless links. Traditional fault-tolerance schemes cannot be directly
Evaluating the Scalability Distributed Systems - Jogalekar, Woodside (2000)(Correct)
Many distributed systems must scalable, meaning that they must economically deployable a wide range sizes and con#gurations. This paper presents scalability metric based cost- e#ectiveness, where e#ec... / Microsoft's Windows operating system is discussed using in-memory br acceptable cost-bene t ratio the system operator. value the threshold is
XML based interfaces to TelORB - Johannesson, Tajallaei (2000)(Correct)
The requirements on future telecommunication platforms include high performance,
fault tolerant, open architectures and scalability. TelORB, which is a distributed
operating system for large-scale, em... / which is a distributed operating system for large-scale embedded br include high performance fault tolerance open architectures and
Problem Formulations for QoS Management in Automatic Control - By Martin Sanfridson (2000)(Correct)
Title
Language Keywords
Document type Date
Quality-of-service management is a method where applications negotiate with a broker for
resources. How can the notion of QoS, which comes from multimedia an... / by a broker is located to the operating system. The broker cooperates with br Can QoS improve the fault tolerance or does it make fault
Performance Availability for Networks of Workstations - Arpaci-Dusseau (1999)(Correct)
Performance Availability
for Networks of Workstations
by
Remzi H. Arpaci-Dusseau
Software systems for large-scale distributed and parallel machines are difficult to build.
When run in dynamic, pro... / . . . Operating System . br are unaware of the specifics of system operation. The problem of attaining
Local Anonymity In The Internet - Martin, Jr. (1999)(Correct)
Packet-switched computer networks of all sizes are widely used for personal, professional, and governmental
communication. However, the speed, versatility, and largely unregulated nature of computer n... / . . Operating System . br . . . Fault Tolerance .
Efficient Implementations of Software Architectures via Partial.. - Marlet, Thibault, Consel (1999)(Correct)
The notion of flexibility (that is, the ability to adapt to changing requirements or
execution contexts) is recognized as a key concern in structuring software, and many architectures
have been desi... / available platforms hardware operating systems etc.and features the br as well as safety fault tolerance and quality of service.
Chameleon: A Software Infrastructure for Adaptive Fault Tolerance - Kalbarczyk Bagchi (1999)(Correct)
This paper presents Chameleon, an adaptive infrastructure, which allows different levels of
availability requirements to be simultaneously supported in a networked environment. Chameleon
provides depe... / ARMORs the hardware and the operating system. Keywords adaptive fault br Infrastructure for Adaptive Fault Tolerance Z. Kalbarczyk S. Bagchi
Highly Reliable Upgrading of Components - Cook, Dage (1999)(Correct)
After a system is deployed, fixes, enhancements, and modifications all occur that change
the components that make up the system. Unfortunately, new versions of components
can introduce new errors and ... / have not addressed software fault tolerance many of the issues
Mobility and Extensibility in the StratOSphere Framework - Wu, Agrawal, Abbadi (1999)(Correct)
We describe the design and implementation of our StratOSphere project, a framework which unifies distributed objects and mobile code applications. We begin by first examining dioeerent mobile code par... / and packages distributed operating systems Gos and distributed br processes for load-balancing fault-tolerance and resilience. Systems such
A Web-Based Distributed Programming Environment - Aoki (1999)(Correct)
A Web-Based Distributed Programming
Environment
Kiyoko F. Aoki
A Java-based system, the GeoJAVA System, that allows a user to remotely
compile his/her own C/C++ programs and execute them for visualiz... / space without specialized operating systems to handle such a procedure br which performs some checks for fault tolerance. For example in the case
Supporting Customized Failure Models for Distributed Software - Hiltunen, Immanuel, Schlichting (1999)(Correct)
The cost of employing software fault-tolerance techniques in distributed systems is
strongly related to the type of failures to be tolerated. For example, in terms of the
amount of redundancy required... / On The Osf ri Mk . Mach Operating System And Cords tmr A Variant br The cost of employing software fault-tolerance techniques in distributed