Results 1 - 10
of
20
Wait-Free Synchronization
- ACM Transactions on Programming Languages and Systems
, 1993
"... A wait-free implementation of a concurrent data object is one that guarantees that any process can complete any operation in a finite number of steps, regardless of the execution speeds of the other processes. The problem of constructing a wait-free implementation of one data object from another lie ..."
Abstract
-
Cited by 873 (28 self)
- Add to MetaCart
(Show Context)
A wait-free implementation of a concurrent data object is one that guarantees that any process can complete any operation in a finite number of steps, regardless of the execution speeds of the other processes. The problem of constructing a wait-free implementation of one data object from another lies at the heart of much recent work in concurrent algorithms, concurrent data structures, and multiprocessor architectures. In the first part of this paper, we introduce a simple and general technique, based on reduction to a consensus protocol, for proving statements of the form "there is no wait-free implementation of X by Y ." We derive a hierarchy of objects such that no object at one level has a wait-free implementation in terms of objects at lower levels. In particular, we show that atomic read/write registers, which have been the focus of much recent attention, are at the bottom of the hierarchy: they cannot be used to construct wait-free implementations of many simple and familiar da...
A Methodology for Implementing Highly Concurrent Data Objects
, 1993
"... A concurrent object is a data structure shared by concurrent processes. Conventional techniques for implementing concurrent objects typically rely on critical sections: ensuring that only one process at a time can operate on the object. Nevertheless, critical sections are poorly suited for asynchro ..."
Abstract
-
Cited by 357 (11 self)
- Add to MetaCart
(Show Context)
A concurrent object is a data structure shared by concurrent processes. Conventional techniques for implementing concurrent objects typically rely on critical sections: ensuring that only one process at a time can operate on the object. Nevertheless, critical sections are poorly suited for asynchronous systems: if one process is halted or delayed in a critical section, other, nonfaulty processes will be unable to progress. By contrast, a concurrent object implementation is lock free if it always guarantees that some process will complete an operation in a finite number of steps, and it is wait free if it guarantees that each process will complete an operation in a finite number of steps. This paper proposes a new methodology for constructing lock-free and wait-free implementations of concurrent objects. The object’s representation and operations are written as stylized sequential programs, with no explicit synchronization. Each sequential operation is automatically transformed into a lock-free or wait-free operation using novel synchronization and memory management algorithms. These algorithms are presented for a multiple instruction/multiple data (MIMD) architecture in which n processes communicate by applying atomic read, wrzte, load_linked, and store_conditional operations to a shared memory.
The Topological Structure of Asynchronous Computability
- JOURNAL OF THE ACM
, 1996
"... We give necessary and sufficient combinatorial conditions characterizing the tasks that can be solved by asynchronous processes, of which all but one can fail, that communicate by reading and writing a shared memory. We introduce a new formalism for tasks, based on notions from classical algebra ..."
Abstract
-
Cited by 157 (12 self)
- Add to MetaCart
We give necessary and sufficient combinatorial conditions characterizing the tasks that can be solved by asynchronous processes, of which all but one can fail, that communicate by reading and writing a shared memory. We introduce a new formalism for tasks, based on notions from classical algebraic and combinatorial topology, in which a task's possible input and output values are each associated with high-dimensional geometric structures called simplicial complexes. We characterize computability in terms of the topological properties of these complexes. This characterization has a surprising geometric interpretation: a task is solvable if and only if the complex representing the task's allowable inputs can be mapped to the complex representing the task's allowable outputs by a function satisfying certain simple regularity properties. Our formalism thus replaces the "operational" notion of a wait-free decision task, expressed in terms of interleaved computations unfolding ...
Efficient Synchronization on Multiprocessors with Shared Memory
- ACM Transactions on Programming Languages and Systems
, 1986
"... A new formalism is given for read-modify-write (RMW) synchronization operations. This formalism is used to extend the memory reference combining mechanism, introduced in the NYU Ultracomputer, to arbitrary RMW operations. A formal correctness proof of this combining mechanism is given. General requi ..."
Abstract
-
Cited by 94 (2 self)
- Add to MetaCart
(Show Context)
A new formalism is given for read-modify-write (RMW) synchronization operations. This formalism is used to extend the memory reference combining mechanism, introduced in the NYU Ultracomputer, to arbitrary RMW operations. A formal correctness proof of this combining mechanism is given. General requirements for the practicality of combining are discussed. Combining is shown to be practical for many useful memory access operations. This includes memory updates of the form mem_val := mem_val op val, where op need not be associative, and a variety of synchronization primitives. The computation involved is shown to be closely related to parallel prefix evaluation. 1. INTRODUCTION Shared memory provides convenient communication between processes in a tightly coupled multiprocessing system. Shared variables can be used for data sharing, information transfer between processes, and, in particular, for coordination and synchronization. Constructs such as the semaphore introduced by Dijkstra in ...
Contention in Shared Memory Algorithms
, 1993
"... Most complexitymeasures for concurrent algorithms for asynchronous sharedmemory architectures focus on process steps and memory consumption. In practice, however, performance of multiprocessor algorithms is heavily influenced by contention, the extent to which processes access the same location at t ..."
Abstract
-
Cited by 64 (1 self)
- Add to MetaCart
(Show Context)
Most complexitymeasures for concurrent algorithms for asynchronous sharedmemory architectures focus on process steps and memory consumption. In practice, however, performance of multiprocessor algorithms is heavily influenced by contention, the extent to which processes access the same location at the same time. Nevertheless, even though contention is one of the principal considerations affecting the performance of real algorithms on real multiprocessors, there are no formal tools for analyzing the contention of asynchronous shared-memory algorithms. This paper introduces the first formal complexity model for contention in multiprocessors. We focus on the standard multiprocessor architecture in which n asynchronous processes communicate by applying read, write, and read-modify-write operations to a shared memory. We use our model to derive two kinds of results: (1) lower bounds on contention for well known basic problems such as agreement and mutual exclusion, and (2) trade-offs betwe...
Diffracting trees
- In Proceedings of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures. ACM
, 1994
"... Shared counters are among the most basic coordination structures in multiprocessor computation, with applications ranging from barrier synchronization to concurrent-data-structure design. This article introduces diffracting trees, novel data structures for shared counting and load balancing in a dis ..."
Abstract
-
Cited by 62 (13 self)
- Add to MetaCart
Shared counters are among the most basic coordination structures in multiprocessor computation, with applications ranging from barrier synchronization to concurrent-data-structure design. This article introduces diffracting trees, novel data structures for shared counting and load balancing in a distributed/parallel environment. Empirical evidence, collected on a simulated distributed shared-memory machine and several simulated message-passing architectures, shows that diffracting trees scale better and are more robust than both combining trees and counting networks, currently the most effective known methods for implementing concurrent counters in software. The use of a randomized coordination method together with a combinatorial data structure overcomes the resiliency drawbacks of combining trees. Our simulations show that to handle the same load, diffracting trees and counting networks should have a similar width w, yet the depth of a diffracting tree is O(log w), whereas counting networks have depth O(log 2 w). Diffracting trees have already been used to implement highly efficient producer/consumer queues, and we believe diffraction will prove to be an effective alternative paradigm to combining and queue-locking in the design of many concurrent data structures.
Counting Networks and Multi-Processor Coordination (Extended Abstract)
- In Proceedings of the 23rd Annual Symposium on Theory of Computing
, 1991
"... ) James Aspnes Maurice Herlihy y Nir Shavit z Digital Equipment Corporation Cambridge Research Lab CRL 90/11 September 18, 1991 Abstract Many fundamental multi-processor coordination problems can be expressed as counting problems: processes must cooperate to assign successive values from a g ..."
Abstract
-
Cited by 52 (8 self)
- Add to MetaCart
) James Aspnes Maurice Herlihy y Nir Shavit z Digital Equipment Corporation Cambridge Research Lab CRL 90/11 September 18, 1991 Abstract Many fundamental multi-processor coordination problems can be expressed as counting problems: processes must cooperate to assign successive values from a given range, such as addresses in memory or destinations on an interconnection network. Conventional solutions to these problems perform poorly because of synchronization bottlenecks and high memory contention. Motivated by observations on the behavior of sorting networks, we offer a completely new approach to solving such problems. We introduce a new class of networks called counting networks, i.e., networks that can be used to count. We give a counting network construction of depth log 2 n using n log 2 n "gates," avoiding the sequential bottlenecks inherent to former solutions, and having a provably lower contention factor on its gates. Finally, to show that counting networks are not ...
Scalable Concurrent Counting
- ACM Transactions on Computer Systems
, 1995
"... The notion of counting is central to a number of basic multiprocessor coordination problems, such as dynamic load balancing, barrier synchronization, and concurrent data structure design. In this paper, we investigate the scalability of a variety of counting techniques for large-scale multiprocessor ..."
Abstract
-
Cited by 28 (12 self)
- Add to MetaCart
(Show Context)
The notion of counting is central to a number of basic multiprocessor coordination problems, such as dynamic load balancing, barrier synchronization, and concurrent data structure design. In this paper, we investigate the scalability of a variety of counting techniques for large-scale multiprocessors. We compare counting techniques based on: (1) spin locks, (2) message passing, (3) distributed queues, (4) software combining trees, and (5) counting networks. Our comparison is based on a series of simple benchmarks on a simulated 64-processor Alewife machine, a distributed-memory multiprocessor currently under development at MIT. Although locking techniques are known to perform well on small-scale, bus-based multiprocessors, serialization limits performance and contention can degrade performance. Both counting networks and combining trees substantially outperform the other methods by avoiding serialization and alleviating contention, although combining tree throughput is more sensitive t...
Scalable Concurrent Priority Queue Algorithms
- In Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
, 1999
"... This paper addresses the problem of designing bounded range priority queues, that is, queues that support a fixed range of priorities. Bounded range priority queues are fundamental in the design of modern multiprocessor algorithms -- from the application level to lowest levels of the operating sy ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
This paper addresses the problem of designing bounded range priority queues, that is, queues that support a fixed range of priorities. Bounded range priority queues are fundamental in the design of modern multiprocessor algorithms -- from the application level to lowest levels of the operating system kernel. While most of the available priority queue literature is directed at existing small-scale machines, we chose to evaluate algorithms on a broader concurrency scale using a simulated 256 node shared memory multiprocessor architecture similar to the MIT Alewife. Our empirical evidence suggests that the priority queue algorithms currently available in the literature do not scale. Based on these findings, we present two simple new algorithms, LinearFunnels and FunnelTree, that provide true scalability throughout the concurrency range. 1 Introduction Priority queues are a fundamental class of data structures used in the design of modern multiprocessor algorithms. Their uses r...
The Decidability of Distributed Decision Tasks
- In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
, 1997
"... ) Maurice Herlihy Computer Science Department Brown University, Providence RI 02912 herlihy@cs.brown.edu Sergio Rajsbaum y Instituto de Matem'aticas U.N.A.M., D.F. 04510, M'exico rajsbaum@servidor.unam.mx Abstract A task is a distributed coordination problem in which each proces ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
) Maurice Herlihy Computer Science Department Brown University, Providence RI 02912 herlihy@cs.brown.edu Sergio Rajsbaum y Instituto de Matem'aticas U.N.A.M., D.F. 04510, M'exico rajsbaum@servidor.unam.mx Abstract A task is a distributed coordination problem in which each process starts with a private input value taken from a finite set, communicates with the other processes by applying operations to shared objects, and eventually halts with a private output value, also taken from a finite set. A protocol is a distributed program that solves a task. A protocol is t-resilient if it tolerates failures by t or fewer processes. A task is solvable in a given model of computation if it has a t-resilient protocol in that model. A set of tasks is decidable in a given model of computation if there exists an effective procedure for deciding whether any task in that set has a t-resilient protocol. This paper gives the first necessary and sufficient conditions for task decidability in ...