Results 1 - 10
of
85
Protecting Routing Infrastructures from Denial of Service Using Cooperative Intrusion Detection
- In New Security Paradigms Workshop
, 1997
"... We present a solution to the denial of service problem for routing infrastructures. When a network suffers from denial of service, packets cannot reach their destinations. Existing routing protocols are not well-equipped to deal with denial of service; a misbehaving router -- which may be caused by ..."
Abstract
-
Cited by 47 (3 self)
- Add to MetaCart
We present a solution to the denial of service problem for routing infrastructures. When a network suffers from denial of service, packets cannot reach their destinations. Existing routing protocols are not well-equipped to deal with denial of service; a misbehaving router -- which may be caused by software/hardware faults, misconfiguration, or malicious attacks -- may be able to disable entire networks. To protect network infrastructures from routers that incorrectly drop packets and misroute packets, we hypothesize failure models for routers and present protocols that detect and respond to those misbehaving routers. Based on realistic assumptions, we prove that our protocols have the following properties: (1) A well-behaved router never incorrectly claims another router as a misbehaving router; (2) If a network has misbehaving routers, one or more of them can be located; (3) Misbehaving routers will eventually be removed.
On a New Class of Codes for Identifying Vertices in Graphs
- IEEE Transactions on Information Theory
, 1998
"... We investigate a new class of codes for the optimal covering of vertices in an undirected graph G such that any vertex in G can be uniquely identified by examining the vertices that cover it. We define a ball of radius t centered on a vertex v to be the set of vertices in G that are at dis ..."
Abstract
-
Cited by 38 (3 self)
- Add to MetaCart
We investigate a new class of codes for the optimal covering of vertices in an undirected graph G such that any vertex in G can be uniquely identified by examining the vertices that cover it. We define a ball of radius t centered on a vertex v to be the set of vertices in G that are at distance at most t from v: The vertex v is then said to cover itself and every other vertex in the ball with center v: Our formal problem statement is as follows: Given an undirected graph G and an integer t 1, find a (minimal) set C of vertices such that every vertex in G belongs to a unique set of balls of radius t centered at the vertices in C: The set of vertices thus obtained constitutes a code for vertex identification. We first develop topology-independent bounds on the size of C: We then develop methods for constructing C for several specific topologies such as binary cubes, nonbinary cubes, and trees. We also describe the identification of sets of vertices using covering codes that uniquely identify single vertices. We develop methods for constructing optimal topologies that yield identifying codes with a minimum number of codewords. Finally, we describe an application of the theory developed in this paper to fault diagnosis of multiprocessor systems.
Scheduling multiprocessor tasks -- An overview
- EUROPEAN JOURNAL OF OPERATIONAL RESEARCH
, 1996
"... Multiprocessor tasks require more than one processor at the same moment of time. This relatively new concept in scheduling theory emerged with the advent of parallel computing systems. In this work we present the state of the art for multiprocessor task scheduling. We show the rationale behind the c ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
Multiprocessor tasks require more than one processor at the same moment of time. This relatively new concept in scheduling theory emerged with the advent of parallel computing systems. In this work we present the state of the art for multiprocessor task scheduling. We show the rationale behind the concept of multiprocessor tasks. The standard three-field notation is extended to accommodate multiprocessor tasks. The main part of the work is presentation of the results in multiprocessor tasks scheduling both for parallel and for dedicated processors.
Formally Verified On-Line Diagnosis
- IEEE Transactions on Software Engineering
, 1997
"... A reconfigurable fault tolerant system achieves the attributes of dependability of operations through fault detection, fault isolation and reconfiguration, typically referred to as the FDIR paradigm. Fault diagnosis is a key component of this approach, requiring an accurate determination of the h ..."
Abstract
-
Cited by 31 (9 self)
- Add to MetaCart
A reconfigurable fault tolerant system achieves the attributes of dependability of operations through fault detection, fault isolation and reconfiguration, typically referred to as the FDIR paradigm. Fault diagnosis is a key component of this approach, requiring an accurate determination of the health and state of the system. An imprecise state assessment can lead to catastrophic failure due to an optimistic diagnosis, or conversely, result in underutilization of resources because of a pessimistic diagnosis. Differing from classical testing and other off-line diagnostic approaches, we develop procedures for maximal utilization of the system state information to provide for continual, on-line diagnosis and reconfiguration capabilities as an integral part of the system operations. Our diagnosis approach, unlike existing techniques, does not require administered testing to gather syndrome information but is based on monitoring the system message traffic among redundant system fu...
Selected problems of scheduling tasks in multiprocessor computing systems
- PhD thesis, Instytut Informatyki Politechnika Poznanska
, 1997
"... ..."
Continual On-Line Diagnosis of Hybrid Faults
- Dependable Computing for Critical Applications 4
, 1995
"... An accurate system-state determination is essential in ensuring system dependability. An imprecise state assessment can lead to catastrophic failure through optimistic diagnosis, or underutilization of resources due to pessimistic diagnosis. Dependability is usually achieved through a fault detectio ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
An accurate system-state determination is essential in ensuring system dependability. An imprecise state assessment can lead to catastrophic failure through optimistic diagnosis, or underutilization of resources due to pessimistic diagnosis. Dependability is usually achieved through a fault detection, isolation and reconguration (FDIR) paradigm, of which the diagnosis procedure is a primary component. Fault resolution in on-line diagnosis is key to providing an accurate system-state assessment. Most diagnostic strategies are based on limited fault models that adopt either an optimistic (all faults s-a-X) or pessimistic (all faults Byzantine) bias. Our Hybrid Fault-Effects Model (HFM) handles a continuum of fault types that are distinguished by their impact on system operations. While this approach has been shown to enhance system functionality and dependability, on-line FDIR is required to make the HFM practical. In this paper, we develop a methodology for utilization of the system-st...
Distributed On-Line Diagnosis in the Presence of Arbitrary Faults
- In Proceedings of the 23rd International Symposium on Fault-Tolerant Computing
, 1993
"... This paper introduces a new fault model for system-level diagnosis and a class of on-line distributed diagnosis algorithms that operate correctly under the model. The algorithms are guaranteed to operate correctly in the presence of faulty nodes that disseminate arbitrarily corrupted diagnostic info ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
This paper introduces a new fault model for system-level diagnosis and a class of on-line distributed diagnosis algorithms that operate correctly under the model. The algorithms are guaranteed to operate correctly in the presence of faulty nodes that disseminate arbitrarily corrupted diagnostic information. The fault model addresses the practical issue of designing an inter-node test to cover diagnosis algorithm operation. Since an explicit test to detect arbitrary failures is not practical, evidence of a node's faulty behavior is provided by examining diagnostic messages exchanged by the node. In many practical systems, algorithm overhead using the new fault model is only twice that required for algorithms using the PMC fault model. The key results of this paper include a description of the new fault model, the specification of a class of on-line distributed diagnosis algorithms that use this fault model, and proofs of their correctness.
Gossip-style failure detection and distributed consensus for scalable heterogeneous clusters
- Cluster Computing
, 2001
"... Abstract ⎯ Gossip protocols provide a means by which failures can be detected in large, distributed systems in an asynchronous manner without the limits associated with reliable multicasting for group communications. However, in order to be effective with application recovery and reconfiguration, th ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
Abstract ⎯ Gossip protocols provide a means by which failures can be detected in large, distributed systems in an asynchronous manner without the limits associated with reliable multicasting for group communications. However, in order to be effective with application recovery and reconfiguration, these protocols require mechanisms by which failures can be detected with system-wide consensus in a scalable fashion. This paper presents three new gossipstyle protocols supported by a novel algorithm to achieve consensus in scalable, heterogeneous clusters. The roundrobin protocol improves on basic randomized gossiping by distributing gossip messages in a deterministic order that optimizes bandwidth consumption. Redundant gossiping is completely eliminated in the binary round-robin protocol, and the round-robin with sequence check protocol is a useful extension that yields efficient detection times without the need for system-specific optimization. The distributed consensus algorithm works with these gossip protocols to achieve agreement among the operable nodes in the cluster on the state of the system featuring either a flat or a layered design. The various protocols are simulated and evaluated in terms of consensus time and scalability using a high-fidelity, fault-injection model for distributed systems comprised of clusters of workstations connected by highperformance networks. Index Terms ⎯ Cluster computing, consensus, failure detection, fault tolerance, gossip protocol, Myrinet. 1
Automatic Model-Driven Recovery in Distributed Systems
"... Automatic system monitoring and recovery has the potential to provide a low-cost solution for high availability. However, automating recovery is difficult in practice because of the challenge of accurate fault diagnosis in the presence of low coverage, poor localization ability, and false positives ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Automatic system monitoring and recovery has the potential to provide a low-cost solution for high availability. However, automating recovery is difficult in practice because of the challenge of accurate fault diagnosis in the presence of low coverage, poor localization ability, and false positives that are inherent in many widely used monitoring techniques. In this paper, we present a holistic model-based approach that overcomes these challenges and enables automatic recovery in distributed systems. To do so, it uses theoretically sound techniques including Bayesian estimation and Markov decision theory to provide controllers that choose good, if not optimal, recovery actions according to a user-defined optimization criteria. By combining monitoring and recovery, the approach realizes benefits that could not have been obtained by using them in isolation. In this paper, we present two recovery algorithms with complementary properties and trade-offs, and validate our algorithms (through simulation) by fault injection on a realistic e-commerce system.
Membership and System Diagnosis
- IN PROCEEDINGS OF THE 14TH IEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS
, 1995
"... A membership service is a service in a distributed system that maintains and provides information about which sites are functioning and which have failed at any given time. System diagnosis, on the other hand, is a method for detecting faulty processing elements and distributing this information to ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
A membership service is a service in a distributed system that maintains and provides information about which sites are functioning and which have failed at any given time. System diagnosis, on the other hand, is a method for detecting faulty processing elements and distributing this information to non-faultyelements. In spite of the apparent similarity of goals, these two fields have been considered separately from their beginnings. In this paper, we attempt to compare these fields and show the fundamental differences and the similarities. We demonstrate that the problems are closely related, with the major differences being the assumptions made about the failure model, the testing methods, and the type of service guarantees provided to the application. Furthermore, we demonstrate that the fields are closely enough related that some algorithms utilized in one field can easily be transformed into algorithms in the other. As examples, we derive new membership algorithms from a distribut...

