Results 1  10
of
19
Laws of Order: Expensive Synchronization in Concurrent Algorithms Cannot be Eliminated
"... Building correct and efficient concurrent algorithms is known to be a difficult problem of fundamental importance. To achieve efficiency, designers try to remove unnecessary and costly synchronization. However, not only is this manual trialanderror process adhoc, time consuming and errorprone, b ..."
Abstract

Cited by 31 (4 self)
 Add to MetaCart
(Show Context)
Building correct and efficient concurrent algorithms is known to be a difficult problem of fundamental importance. To achieve efficiency, designers try to remove unnecessary and costly synchronization. However, not only is this manual trialanderror process adhoc, time consuming and errorprone, but it often leaves designers pondering the question of: is it inherently impossible to eliminate certain synchronization, or is it that I was unable to eliminate it on this attempt and I should keep trying? In this paper we respond to this question. We prove that it is impossible to build concurrent implementations of classic and ubiquitous specifications such as sets, queues, stacks, mutual exclusion and readmodifywrite operations, that completely eliminate the use of expensive synchronization. We prove that one cannot avoid the use of either: i) readafterwrite
Approximate SharedMemory Counting Despite a Strong Adversary
"... A new randomized asynchronous sharedmemory data structure is given for implementing an approximate counter that can be incremented up to n times. For any fixed ɛ, the counter achieves a relative error of δ with high probability, at the cost of O(((1/δ) log n) O(1/ɛ) ) register operations per increm ..."
Abstract

Cited by 17 (12 self)
 Add to MetaCart
(Show Context)
A new randomized asynchronous sharedmemory data structure is given for implementing an approximate counter that can be incremented up to n times. For any fixed ɛ, the counter achieves a relative error of δ with high probability, at the cost of O(((1/δ) log n) O(1/ɛ) ) register operations per increment and O(n 4/5+ɛ ((1/δ) log n) O(1/ɛ) ) register operations per read. The counter combines randomized sampling for estimating large values with an expander for estimating small values. This is the first sublinear solution to this problem that works despite a strong adversary scheduler that can observe internal states of processes. An application of the improved counter is an improved protocol for solving randomized sharedmemory consensus, which reduces the best previously known individual work complexity from O(n log n) to an optimal O(n), resolving one of the last remaining open problems concerning consensus in this model. 1
The Complexity of Renaming
"... We study the complexity of renaming, a fundamental problem in distributed computing in which a set of processes need to pick distinct names from a given namespace. We prove an individual lower bound of Ω(k) process steps for deterministic renaming into any namespace of size subexponential in k, whe ..."
Abstract

Cited by 15 (10 self)
 Add to MetaCart
(Show Context)
We study the complexity of renaming, a fundamental problem in distributed computing in which a set of processes need to pick distinct names from a given namespace. We prove an individual lower bound of Ω(k) process steps for deterministic renaming into any namespace of size subexponential in k, where k is the number of participants. This bound is tight: it draws an exponential separation between deterministic and randomized solutions, and implies new tight bounds for deterministic fetchandincrement registers, queues and stacks. The proof of the bound is interesting in its own right, for it relies on the first reduction from renaming to another fundamental problem in distributed computing: mutual exclusion. We complement our individual bound with a global lower bound of Ω(k log(k/c)) on the total step complexity of renaming into a namespace of size ck, for any c ≥ 1. This applies to randomized algorithms against a strong adversary, and helps derive new global lower bounds for randomized approximate counter and fetchandincrement implementations, all tight within logarithmic factors. 1
The Complexity of ObstructionFree Implementations
, 2009
"... Obstructionfree implementations of concurrent objects are optimized for the common case where there is no step contention, and were recently advocated as a solution to the costs associated with synchronization without locks. In this paper, we study this claim and this goes through precisely definin ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
(Show Context)
Obstructionfree implementations of concurrent objects are optimized for the common case where there is no step contention, and were recently advocated as a solution to the costs associated with synchronization without locks. In this paper, we study this claim and this goes through precisely defining the notions of obstructionfreedom and step contention. We consider several classes of obstructionfree implementations, present corresponding generic object implementations, and prove lower bounds on their complexity. Viewed collectively, our results establish that the worstcase operation time complexity of obstructionfree implementations is high, even in the absence of step contention. We also show that lockbased implementations are not subject to some of the timecomplexity lower bounds we present.
Polylogarithmic Concurrent Data Structures from Monotone Circuits
, 2010
"... A method is given for constructing a max register, a linearizable, waitfree concurrent data structure that supports a write operation and a read operation that returns the largest value previously written. For fixed m, an mvalued max register is constructed from onebit multiwriter multireader r ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
(Show Context)
A method is given for constructing a max register, a linearizable, waitfree concurrent data structure that supports a write operation and a read operation that returns the largest value previously written. For fixed m, an mvalued max register is constructed from onebit multiwriter multireader registers at a cost of at most ⌈lg m ⌉ atomic register operations per write or read. An unbounded max register is constructed with cost O(min(log v, n)) to read or write a value v, where n is the number of processes. It is also shown how a max register can be used to transform any monotone circuit into a waitfree concurrent data structure that provides write operations setting the inputs to the circuit and a read operation that returns the value of the circuit on the largest input values previously supplied. The cost of a write is bounded by O(Sd min(⌈lg m⌉, n), where m is the size of the alphabet for the circuit, S is the number of gates whose value changes as the result of the write, and d is the number of inputs to each gate; the cost of a read is min(⌈lg m⌉, O(n)). While the resulting data structure is not linearizable in general, it satisfies a weaker but natural consistency condition. As an application, we obtain a simple, linearizable, waitfree counter implementation with a cost of O(min(log n log v, n)) to perform an increment and O(min(log v, n)) to perform a read, where v is the current value of the counter. For polynomiallymany
Synchronizing without locks is inherently expensive
 In Proceedings of the ACM Symposium on Principles of Distributed Computing
, 2006
"... It has been politically correct to blame locks for their fragility, especially since researchers identified obstructionfreedom: a progress condition that precludes locking while being weak enough to raise the hope for good performance. This paper attenuates this hope by establishing lower bounds on ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
It has been politically correct to blame locks for their fragility, especially since researchers identified obstructionfreedom: a progress condition that precludes locking while being weak enough to raise the hope for good performance. This paper attenuates this hope by establishing lower bounds on the complexity of obstructionfree implementations in contentionfree executions: those where obstructionfreedom was precisely claimed to be effective. Through our lower bounds, we argue for an inherent cost of concurrent computing without locks. We first prove that obstructionfree implementations of a large class of objects, using only overwriting or trivial primitives in contentionfree executions, have Ω(n) space complexity and Ω(log 2 n) (obstructionfree) step complexity. These bounds apply to implementations of many popular objects, including variants of fetch&add, counter, compare&swap, and LL/SC. When arbitrary primitives can be applied in contentionfree executions, we show that, in any implementation of binary consensus, or any perturbable object, the number of distinct base objects accessed and memory stalls incurred by some process in a contention free execution is Ω ( √ n). All these results hold regardless of the behavior of processes after they become aware of contention. We also prove that, in any obstructionfree implementation of a perturbable object in which processes are not allowed to fail their operations, the number of memory stalls incurred by some process that is unaware of contention is Ω(n).
Max Registers, Counters, and Monotone Circuits
, 2009
"... A method is given for constructing a max register, a linearizable, waitfree concurrent data structure that supports a write operation and a read operation that returns the largest value previously written. For fixed m, an mvalued max register can be constructed from onebit multiwriter multireade ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
A method is given for constructing a max register, a linearizable, waitfree concurrent data structure that supports a write operation and a read operation that returns the largest value previously written. For fixed m, an mvalued max register can be constructed from onebit multiwriter multireader registers at a cost of at most ⌈lg m ⌉ atomic register operations per write or read. The construction takes the form of a binary search tree: applying classic techniques for building unbalanced search trees gives an unbounded max register with cost O(min(log v, n)) to read or write a value v, where n is the number of processes. It is also shown how a max register can be used to transform any monotone circuit into a waitfree concurrent data structure that provides write operations setting the inputs to the circuit and a read operation that returns the value of the circuit on the largest input values previously supplied. The cost of a write is bounded by O(Sd min(⌈lg m⌉, n), where m is the size of the alphabet for the circuit, S is the number of gates whose value changes as the result of the write, and d is the number of inputs to each gate; the cost of a read is min(⌈lg m⌉, O(n)). While the resulting data structure is not linearizable in general, it satisfies a weaker but natural consistency
Boundedwait combining: Constructing robust and highthroughput shared objects
 In Proceedings of the 20th International Symposium on Distributed Computing (DISC’06
, 2006
"... Shared counters are among the most basic coordination structures in distributed computing. Known implementations of shared counters are either blocking, nonlinearizable, or have a sequential bottleneck. We present the first counter algorithm that is both linearizable, nonblocking, and can provably ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Shared counters are among the most basic coordination structures in distributed computing. Known implementations of shared counters are either blocking, nonlinearizable, or have a sequential bottleneck. We present the first counter algorithm that is both linearizable, nonblocking, and can provably achieve high throughput in ksynchronous executions – executions in which process speeds vary by at most a constant factor k. The algorithm is based on a novel variation of the software combining paradigm that we call boundedwait combining. It can thus be used to obtain implementations, possessing the same properties, of any object that supports combinable operations, such as a stack or a queue. Unlike previous combining algorithms where processes may have to wait for each other indefinitely, in the boundedwait combining algorithm, a process only waits for other processes for a bounded period of time and then ‘takes destiny in its own hands’. In order to reason rigorously about the parallelism attainable by our algorithm, we define a novel metric for measuring the throughput of shared objects, which we believe is interesting in its own right. We use this metric to prove that our algorithm achieves throughput of Ω(N / log N) in ksynchronous executions, where N is the number of processes that can participate in the algorithm. Our algorithm uses two tools that we believe may prove useful for obtaining highly parallel nonblocking implementation of additional objects. The first are “synchronous locks”, locks that are respected by processes only in ksynchronous executions and are disregarded otherwise; the second are “pseduotransactions ” a weakening of regular transactions that allows higher parallelism.
Tight Bounds for Asynchronous Renaming
, 2011
"... This paper presents the first tight bounds on the complexity of sharedmemory renaming, a fundamental problem in distributed computing in which a set of processes need to pick distinct identifiers from a small namespace. We first prove an individual lower bound of Ω(k) process steps for deterministi ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
This paper presents the first tight bounds on the complexity of sharedmemory renaming, a fundamental problem in distributed computing in which a set of processes need to pick distinct identifiers from a small namespace. We first prove an individual lower bound of Ω(k) process steps for deterministic renaming into any namespace of size subexponential in k, where k is the number of participants. The bound is tight: it draws an exponential separation between deterministic and randomized solutions, and implies new tight bounds for deterministic concurrent fetchandincrement counters, queues and stacks. The proof is based on a new reduction from renaming to another fundamental problem in distributed computing: mutual exclusion. We complement this individual bound with a global lower bound of Ω(k log(k/c)) on the total step complexity of renaming into a namespace of size ck, for any c ≥ 1. This result applies to randomized algorithms against a strong adversary, and helps derive new global lower bounds for randomized approximate counter implementations, that are tight within logarithmic factors. On the algorithmic side, we give a protocol that transforms any sorting network into a strong adaptive renaming algorithm, with expected cost equal to the depth of the sorting network. This gives a tight adaptive renaming algorithm with expected step complexity O(log k), where k is the contention in the current execution. This algorithm is the first to achieve sublinear time, and it is timeoptimal as per our randomized lower bound. Finally, we use this renaming protocol to build monotoneconsistent counters with logarithmic step complexity and linearizable fetchandincrement registers with polylogarithmic cost.