Results 1 - 10
of
230
A Survey of Rollback-Recovery Protocols in Message-Passing Systems
, 1996
"... this paper, we use the terms event logging and message logging interchangeably ..."
Abstract
-
Cited by 716 (22 self)
- Add to MetaCart
this paper, we use the terms event logging and message logging interchangeably
Consistent global states of distributed systems: Fundamental concepts and mechanisms
- DISTRIBUTED SYSTEMS
, 1993
"... ..."
Detection of Weak Unstable Predicates in Distributed Programs
- IEEE Transactions on Parallel and Distributed Systems
, 1994
"... This paper discusses detection of global predicates in a distributed program. Earlier algorithms for detection of global predicates proposed by Chandy and Lamport work only for stable predicates. A predicate is stable if it does not turn false once it becomes true. Our algorithms detect even unstabl ..."
Abstract
-
Cited by 96 (34 self)
- Add to MetaCart
(Show Context)
This paper discusses detection of global predicates in a distributed program. Earlier algorithms for detection of global predicates proposed by Chandy and Lamport work only for stable predicates. A predicate is stable if it does not turn false once it becomes true. Our algorithms detect even unstable predicates without excessive overhead. In the past, such predicates have been regarded as too difficult to detect. The predicates are specified using a logic described formally in this paper. We discuss detection of weak conjunctive predicates which are formed by conjunction of predicates local to processes in the system. Our detection methods will detect if such a predicate is true for any interleaving of events in the system, whether the predicate is stable or not. Also, any predicate which can be reduced to a set of weak conjunctive predicates is detectable. This class of predicates captures many global predicates that are of interest to a programmer. The message complexity of our algor...
Fundamentals of Fault-Tolerant Distributed Computing in Asynchronous Environments
- ACM Computing Surveys
, 1999
"... Fault tolerance in distributed computing is a wide area with a significant body of literature that is vastly diverse in methodology and terminology. This paper aims at structuring the area and thus guiding readers into this interesting field. We use a formal approach to define important terms like f ..."
Abstract
-
Cited by 94 (9 self)
- Add to MetaCart
Fault tolerance in distributed computing is a wide area with a significant body of literature that is vastly diverse in methodology and terminology. This paper aims at structuring the area and thus guiding readers into this interesting field. We use a formal approach to define important terms like fault, fault tolerance, and redundancy. This leads to four distinct forms of fault tolerance and to two main phases in achieving them: detection and correction. We show that this can help to reveal inherently fundamental structures that contribute to understanding and unifying methods and terminology. By doing this, we survey many existing methodologies and discuss their relations. The underlying system model is the close-to-reality asynchronous message-passing model of distributed computing.
Event Composition in Time-dependent Distributed Systems
- IN COOPIS
, 1999
"... Many interesting application systems, ranging from workflow management and CSCW to air traffic control, are eventdriven and time-dependent and must interact with heterogeneous components in the real world. Event services are used to glue together distributed components. They assume a virtual gl ..."
Abstract
-
Cited by 55 (11 self)
- Add to MetaCart
Many interesting application systems, ranging from workflow management and CSCW to air traffic control, are eventdriven and time-dependent and must interact with heterogeneous components in the real world. Event services are used to glue together distributed components. They assume a virtual global time base to trigger actions and to order events.
Debugging Multi-Agent Systems Using Design Artifacts: The Case of Interaction Protocols
- In Proceedings of AAMAS-02
, 2002
"... Debugging multi-agent systems (which are concurrent, distributed, and consist of complex components) is difficult, yet crucial. We propose that the debugging process can be improved by following an agent-oriented design methodology, and then using the design artifacts in the debugging phase. We pres ..."
Abstract
-
Cited by 49 (16 self)
- Add to MetaCart
Debugging multi-agent systems (which are concurrent, distributed, and consist of complex components) is difficult, yet crucial. We propose that the debugging process can be improved by following an agent-oriented design methodology, and then using the design artifacts in the debugging phase. We present an example of this scheme which uses interaction protocols to debug agent interaction. Interaction protocols are specified using AUML and are translated to Petri nets. The debugger uses the Petri nets to monitor conversations and to provide precise and informative error messages when protocols aren't correctly followed by the agents.
Detection of global predicates: Techniques and their limitations
- Distributed Computing
, 1998
"... We show that the problem of predicate detection in distributed systems is NP-complete. In the past, efficient algorithms have been developed for special classes of predicates such as stable predicates, observer-independent predicates, and conjunctive predicates. We introduce a class of predicates, s ..."
Abstract
-
Cited by 47 (7 self)
- Add to MetaCart
(Show Context)
We show that the problem of predicate detection in distributed systems is NP-complete. In the past, efficient algorithms have been developed for special classes of predicates such as stable predicates, observer-independent predicates, and conjunctive predicates. We introduce a class of predicates, semi-linear predicates, which properly contains all of the above classes. We first discuss stable, observer-independent and semi-linear classes of predicates and their relationships with each other. We also study closure properties of these classes with respect to conjunction and disjunction. Finally, we discuss algorithms for detection of predicates in these classes. We provide a non-deterministic, detection algorithm for each class of predicate. We show that each class can be equivalently characterized by the degree of non-determinism present in the algorithm. Stable predicates are defined as those that can be detected by an algorithm with the most nondeterminism. All other classes can be derived by appropriately constraining the non-determinism in this algorithm.
Detection of Strong Unstable Predicates in Distributed Programs
- IEEE Transactions on Parallel and Distributed Systems
, 1996
"... This paper discusses detection of global predicates in a distributed program. A run of a distributed program results in a set of sequential traces, one for each process. These traces may be combined to form many global sequences consistent with the single run of the program. A strong global predicat ..."
Abstract
-
Cited by 44 (9 self)
- Add to MetaCart
This paper discusses detection of global predicates in a distributed program. A run of a distributed program results in a set of sequential traces, one for each process. These traces may be combined to form many global sequences consistent with the single run of the program. A strong global predicate is true in a run if it is true for all global sequences consistent with the run. We present algorithms which detect if the given strong global predicate became true in a run of a distributed program. 1 Introduction Detection of global predicates is a fundamental problem in distributed computing. It arises in the designing, debugging and testing of distributed programs. Global predicates can be classified into two types - stable and unstable. A stable predicate is one which never turns false once it becomes true. An unstable predicate is one without such a property. Its value may alternate between true and false. Detection of stable predicates has been addressed in the literature by means ...
Plausible Clocks: Constant Size Logical Clocks for Distributed Systems
, 1996
"... In a Distributed System with N sites, the precise detection of causal relationships between events can only be done with vector clocks of size N. This gives rise to scalability and efficiency problems for accurate logical clocks. In this paper we propose a class of logical clocks called plausible ..."
Abstract
-
Cited by 43 (3 self)
- Add to MetaCart
In a Distributed System with N sites, the precise detection of causal relationships between events can only be done with vector clocks of size N. This gives rise to scalability and efficiency problems for accurate logical clocks. In this paper we propose a class of logical clocks called plausible clocks that can be implemented with a number of components not affected by the size of the system and yet they provide good ordering accuracy. We develop rules to combine plausible clocks to produce more accurate clocks. Several examples of plausible clocks and their combination are presented. Using a simulation model, we evaluate the performance of these clocks. We also present examples of applications where constant size clocks can be used.