Results 1 - 10
of
1,095
Grid Information Services for Distributed Resource Sharing
, 2001
"... Grid technologies enable large-scale sharing of resources within formal or informal consortia of individuals and/or institutions: what are sometimes called virtual organizations. In these settings, the discovery, characterization, and monitoring of resources, services, and computations are challengi ..."
Abstract
-
Cited by 712 (52 self)
- Add to MetaCart
Grid technologies enable large-scale sharing of resources within formal or informal consortia of individuals and/or institutions: what are sometimes called virtual organizations. In these settings, the discovery, characterization, and monitoring of resources, services, and computations are challenging problems due to the considerable diversity, large numbers, dynamic behavior, and geographical distribution of the entities in which a user might be interested. Consequently, information services are a vital part of any Grid software infrastructure, providing fundamental mechanisms for discovery and monitoring, and hence for planning and adapting application behavior. We present here an information services architecture that addresses performance, security, scalability, and robustness requirements. Our architecture defines simple low-level enquiry and registration protocols that make it easy to incorporate individual entities into various information structures, such as aggregate directories that support a variety of different query languages and discovery strategies. These protocols can also be combined with other Grid protocols to construct additional higher-level services and capabilities such as brokering, monitoring, fault detection, and troubleshooting. Our architecture has been implemented as MDS-2, which forms part of the Globus Grid toolkit and has been widely deployed and applied.
Group Communication Specifications: A Comprehensive Study
- ACM COMPUTING SURVEYS
, 1999
"... View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented Group Communication Systems (GCSs). However, the guarantees of different GCSs are for ..."
Abstract
-
Cited by 370 (15 self)
- Add to MetaCart
(Show Context)
View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented Group Communication Systems (GCSs). However, the guarantees of different GCSs are formulated using varying terminologies and modeling techniques, and the specifications vary in their rigor. This makes it difficult to analyze and compare the different systems. This paper provides a comprehensive set of clear and rigorous specifications, which may be combined to represent the guarantees of most existing GCSs. In the light of these specifications, over thirty published GCS specifications are surveyed. Thus, the specifications serve as a unifying framework for the classification, analysis and comparison of group communication systems. The survey also discusses over a dozen different applications of group communication systems, shedding light on the usefulness of the p...
Total order broadcast and multicast algorithms: Taxonomy and survey
- ACM COMPUTING SURVEYS
, 2004
"... ..."
Sinfonia: a new paradigm for building scalable distributed systems
- In SOSP
, 2007
"... We propose a new paradigm for building scalable distributed systems. Our approach does not require dealing with message-passing protocols—a major complication in existing distributed systems. Instead, developers just design and manipulate data structures within our service called Sinfonia. Sinfonia ..."
Abstract
-
Cited by 153 (12 self)
- Add to MetaCart
(Show Context)
We propose a new paradigm for building scalable distributed systems. Our approach does not require dealing with message-passing protocols—a major complication in existing distributed systems. Instead, developers just design and manipulate data structures within our service called Sinfonia. Sinfonia keeps data for applications on a set of memory nodes, each exporting a linear address space. At the core of Sinfonia is a novel minitransaction primitive that enables efficient and consistent access to data, while hiding the complexities that arise from concurrency and failures. Using Sinfonia, we implemented two very different and complex applications in a few months: a cluster file system and a group communication service. Our implementations perform well and scale to hundreds of machines.
PeerReview: Practical accountability for distributed systems
"... We describe PeerReview, a system that provides accountability in distributed systems. PeerReview ensures that Byzantine faults whose effects are observed by a correct node are eventually detected and irrefutably linked to a faulty node. At the same time, PeerReview ensures that a correct node can al ..."
Abstract
-
Cited by 144 (18 self)
- Add to MetaCart
(Show Context)
We describe PeerReview, a system that provides accountability in distributed systems. PeerReview ensures that Byzantine faults whose effects are observed by a correct node are eventually detected and irrefutably linked to a faulty node. At the same time, PeerReview ensures that a correct node can always defend itself against false accusations. These guarantees are particularly important for systems that span multiple administrative domains, which may not trust each other. PeerReview works by maintaining a secure record of the messages sent and received by each node. The record is used to automatically detect when a node’s behavior deviates from that of a given reference implementation, thus exposing faulty nodes. PeerReview is widely applicable: it only requires that a correct node’s actions are deterministic, that nodes can sign messages, and that each node is periodically checked by a correct node. We demonstrate that Peer-Review is practical by applying it to three different types of distributed systems: a network filesystem, a peer-to-peer system, and an overlay multicast system.
On the quality of service of failure detectors.
- IEEE Trans. Computers,
, 2002
"... ..."
(Show Context)
Failure Detection and Consensus in the Crash-Recovery Model
, 1999
"... We study the problems of failure detection and consensus in asynchronous systems in which processes may crash and recover, and links may lose messages. We first propose new failure detectors that are particularly suitable to the crash-recovery model. We next determine under what conditions stable ..."
Abstract
-
Cited by 123 (9 self)
- Add to MetaCart
We study the problems of failure detection and consensus in asynchronous systems in which processes may crash and recover, and links may lose messages. We first propose new failure detectors that are particularly suitable to the crash-recovery model. We next determine under what conditions stable storage is necessary to solve consensus in this model. Using the new failure detectors, we give two consensus algorithms that match these conditions: one requires stable storage and the other does not. Both algorithms tolerate link failures and are particularly efficient in the runs that are most likely in practice --- those with no failures or failure detector mistakes. In such runs, consensus is achieved within 3# time and with 4n messages, where # is the maximum message delay and n is the number of processes in the system. 1 Introduction The problem of solving consensus in asynchronous systems with unreliable failure detectors (i.e., failure detectors that make mistakes) was first ...
The SecureRing Protocols for Securing Group Communication
- In Hawaii International Conference on System Sciences
, 1998
"... The SecureRing group communication protocols provide reliable ordered message delivery and group membership services despite Byzantine faults such as might be caused by modifications to the programs of a group member following illicit access to, or capture of, a group member. The protocols multicast ..."
Abstract
-
Cited by 119 (2 self)
- Add to MetaCart
(Show Context)
The SecureRing group communication protocols provide reliable ordered message delivery and group membership services despite Byzantine faults such as might be caused by modifications to the programs of a group member following illicit access to, or capture of, a group member. The protocols multicast messages to groups of processors within an asynchronous distributed system and deliver messages in a consistent total order to all members of the group. They ensure that correct members agree on changes to the membership, that correct processors are eventually included in the membership, and that processors that exhibit detectable Byzantine faults are eventually excluded from the membership. To provide these message delivery and group membership services, the protocols make use of an unreliable Byzantine fault detector. 1.
An Asynchronous Model of Locality, Failure, and Process Mobility
- THEORETICAL COMPUTER SCIENCE
, 1997
"... We present a model of distributed computation which is based on a fragment of the pi-calculus relying on asynchronous communication. We enrich the model with the following features: the explicit distribution of processes to locations, the failure of locations and their detection, and the mobility of ..."
Abstract
-
Cited by 116 (4 self)
- Add to MetaCart
We present a model of distributed computation which is based on a fragment of the pi-calculus relying on asynchronous communication. We enrich the model with the following features: the explicit distribution of processes to locations, the failure of locations and their detection, and the mobility of processes. Our contributions are two folds. At the specification level, we give a synthetic and flexible formalization of the features mentioned above. At the verification level, we provide original methods to reason about the bisimilarity of processes in the presence of failures.
Coyote: A System for Constructing Fine-Grain Configurable Communication Services
- ACM TRANSACTIONS ON COMPUTER SYSTEMS
, 1998
"... Communication-oriented abstractions such as atomic multicast, group RPC, and protocols for location-independent mobile computing can simplify the development of complex applications built on distributed systems. This paper describes Coyote, a system that supports the construction of highly modular ..."
Abstract
-
Cited by 107 (15 self)
- Add to MetaCart
Communication-oriented abstractions such as atomic multicast, group RPC, and protocols for location-independent mobile computing can simplify the development of complex applications built on distributed systems. This paper describes Coyote, a system that supports the construction of highly modular and configurable versions of such abstractions. Coyote extends the notion of protocol objects and hierarchical composition found in existing systems with support for finer-grain objects called micro-protocols that implement individual semantic properties of the target service. A customized service is constructed by selecting micro-protocols based on their semantic guarantees and configuring them together with a standard runtime system to form a composite protocol implementing the service. Micro-protocols within a composite protocol can share data and are executed using an event-driven paradigm that enhances configurability. The overall approach is described and illustrated with exampl...