Results 1 - 10
of
24
Software Transactional Memory
, 1995
"... As we learn from the literature, flexibility in choosing synchronization operations greatly simplifies the task of designing highly concurrent programs. Unfortunately, existing hardware is inflexible and is at best on the level of a Load Linked/Store Conditional operation on a single word. Building ..."
Abstract
-
Cited by 414 (9 self)
- Add to MetaCart
As we learn from the literature, flexibility in choosing synchronization operations greatly simplifies the task of designing highly concurrent programs. Unfortunately, existing hardware is inflexible and is at best on the level of a Load Linked/Store Conditional operation on a single word. Building on the hardware based transactional synchronization methodology of Herlihy and Moss, we offer software transactional memory (STM), a novel software method for supporting flexible transactional programming of synchronization operations. STM is non-blocking, and can be implemented on existing machines using only a Load Linked/Store Conditional operation. We use STM to provide a general highly concurrent method for translating sequential object implementations to lock-free ones based on implementing a k-word compare&swap STM-transaction. Empirical evidence collected on simulated multiprocessor architectures shows that the our method always outperforms all the lock-free translation methods in ...
Lock-Free Linked Lists Using Compare-and-Swap
- In Proceedings of the Fourteenth Annual ACM Symposium on Principles of Distributed Computing
, 1995
"... Lock-free data structures implement concurrent objects without the use of mutual exclusion. This approach can avoid performance problems due to unpredictable delays while processes are within critical sections. Although universal methods are known that give lock-free data structures for any abstract ..."
Abstract
-
Cited by 84 (1 self)
- Add to MetaCart
Lock-free data structures implement concurrent objects without the use of mutual exclusion. This approach can avoid performance problems due to unpredictable delays while processes are within critical sections. Although universal methods are known that give lock-free data structures for any abstract data type, the overhead of these methods makes them inefficient when compared to conventional techniques using mutual exclusion, such as spin locks. We give lock-free data structures and algorithms for implementing a shared singly-linked list, allowing concurrent traversal, insertion, and deletion by any number of processes. We also show how the basic data structure can be used as a building block for other lock-free data structures. Our algorithms use the single word Compare-and-Swap synchronization primitive to implement the linked list directly, avoiding the overhead of universal methods, and are thus a practical alternative to using spin locks. 1 Introduction A concurrent object is an...
Non-blocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1998
"... Most multiprocessors are multiprogrammed in order to achieve acceptable response time and to increase their uti-lization. Unfortunately, inopportune preemption may significantly degrade the performance of synchronized parallel applications. To address this problem, researchers have developed two pri ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
Most multiprocessors are multiprogrammed in order to achieve acceptable response time and to increase their uti-lization. Unfortunately, inopportune preemption may significantly degrade the performance of synchronized parallel applications. To address this problem, researchers have developed two principal strategies for concurrent, atomic update of shared data structures: (1) preemption-safe locking and (2) non-blocking (lock-free) algorithms. Preemption-safe locking requires kernel support. Non-blocking algorithms generally require a universal atomic primitive such as compare-and-swap orload-linked/store-conditional, and are widely regarded as inefficient. We evaluate the performance of preemption-safe lock-based and non-blocking implementations of important data structures—queues, stacks, heaps, and counters—including non-blocking and lock-based queue algorithms of our own, in micro-benchmarks and real applications on a 12-processor SGI Challenge multiprocessor. Our results indicate that our non-blocking queue consistently outperforms the best known alternatives, and that data-structure-specific non-blocking algorithms, which exist for queues, stacks, and counters, can work extremely well. Not only do they outperform preemption-safe lock-based algorithms on multiprogrammed machines, they also outperform ordinary locks on dedicated machines. At the same time, since general-purpose non-blocking techniques do not yet appear to be practical, preemption-safe locks remain the preferred alternative for complex data structures: they outperform
Real-Time Computing with Lock-Free Shared Objects
- ACM Transactions on Computer Systems
, 1995
"... This paper considers the use of lock-free shared objects within hard real-time systems. As the name suggests, lock-free shared objects are distinguished by the fact that they are not locked. As such, they do not give rise to priority inversions, a key advantage over conventional, lock-based object-s ..."
Abstract
-
Cited by 50 (7 self)
- Add to MetaCart
This paper considers the use of lock-free shared objects within hard real-time systems. As the name suggests, lock-free shared objects are distinguished by the fact that they are not locked. As such, they do not give rise to priority inversions, a key advantage over conventional, lock-based object-sharing approaches. Despite this advantage, it is not immediately apparent that lock-free shared objects can be employed if tasks must adhere to strict timing constraints. In particular, lock-free object implementations permit concurrent operations to interfere with each other, and repeated interferences can cause a given operation to take an arbitrarily long time to complete. The main contribution of this paper is to show that such interferences can be bounded by judicious scheduling. This work pertains to periodic, hard real-time tasks that sharelock-free objects on a uniprocessor. In the first part of the paper, scheduling conditions are derived for such tasks, for both static and dynamic pri...
On the Space Complexity of Randomized Synchronization
- Journal of the ACM
, 1993
"... The "wait-free hierarchy" provides a classification of multiprocessor synchronization primitives based on the values of n for which there are deterministic wait-free implementations of n-process consensus using instances of these objects and read-write registers. In a randomized wait-free setting, t ..."
Abstract
-
Cited by 36 (8 self)
- Add to MetaCart
The "wait-free hierarchy" provides a classification of multiprocessor synchronization primitives based on the values of n for which there are deterministic wait-free implementations of n-process consensus using instances of these objects and read-write registers. In a randomized wait-free setting, this classification is degenerate, since n-process consensus can be solved using only O(n) read-write registers. In this paper, we propose a classification of synchronization primitives based on the space complexity of randomized solutions to n-process consensus. A historyless object, such as a read-write register, a swap register, or a test&set register, is an object whose state depends only on the last nontrivial operation that was applied to it. We show that, using historyless objects,\Omega\Gamma p n) object instances are necessary to solve n-process consensus. This lower bound holds even if the objects have unbounded size and the termination requirement is non-deterministi...
Universal Operations: Unary Versus Binary
, 1996
"... 1 1 Introduction 2 2 Related Work 5 3 Preliminaries 7 3.1 The Asynchronous Shared-Memory Model : : : : : : : : : : : : : : : : : : : 7 3.2 Sensitivity : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 4 The Left/Right Algorithm 11 4.1 The General Scheme : : : : : : : ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
1 1 Introduction 2 2 Related Work 5 3 Preliminaries 7 3.1 The Asynchronous Shared-Memory Model : : : : : : : : : : : : : : : : : : : 7 3.2 Sensitivity : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 4 The Left/Right Algorithm 11 4.1 The General Scheme : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11 4.2 The Left/Right Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13 4.2.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13 4.2.2 The code : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14 4.2.3 Correctness of the Algorithm : : : : : : : : : : : : : : : : : : : : : : 16 4.2.4 Analysis of the Algorithm : : : : : : : : : : : : : : : : : : : : : : : : 18 4.3 Inherently Asymmetric Data Structures : : : : : : : : : : : : : : : : : : : : 21 5 The Decision Algorithm 23 5.1 Monotone Paths : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 23 5.1.1 One Phase :...
A performance evaluation of lock-free synchronization protocols
- In Proceedings of the 13th Annual ACM Symposium on Principles of Distributed Computing (PODC
, 1994
"... In this paper, we investigate the practical performance of lock-free techniques that provide synchronization on shared-memory multiprocessors. Our goal is to provide a technique to allow designers of new protocols to quickly determine an algorithm’s performance characteristics. We develop a simple a ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
In this paper, we investigate the practical performance of lock-free techniques that provide synchronization on shared-memory multiprocessors. Our goal is to provide a technique to allow designers of new protocols to quickly determine an algorithm’s performance characteristics. We develop a simple analytical performance model based on the architectural observations that memory accesses are expensive, synchronization instructions are more expensive, and that optimistic synchronization policies result in wasted communication bandwidth which can slow the system as a whole. Using our model, we evaluate the performance of five existing lock-free synchronization protocols. We validate our analysis by comparing our results with simulations of a parallel machine. Given this analysis, we identify those protocols which show promise of good performance in practice. In addition, we note that no existing protocols provide insensitivity to common delays while still offering performance equivalent to locks. Accordingly, we introduce a protocol, based on a combination of existing lock-free techniques, which satisfies these criteria. 1
The Performance of Work Stealing in Multiprogrammed Environments
- IN PROCEEDINGS OF THE 1998 ACM SIGMETRICS INTERNATIONAL CONFERENCE ON MEASUREMENT AND MODELING OF COMPUTER SYSTEMS, POSTER SESSION
, 1997
"... We study the performance of user-level thread schedulers in multiprogrammed environments. Our goal is a user-level thread scheduler that delivers efficient performance under multiprogramming without any need for kernel-level resource management, such as coscheduling or process control. We show that ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
We study the performance of user-level thread schedulers in multiprogrammed environments. Our goal is a user-level thread scheduler that delivers efficient performance under multiprogramming without any need for kernel-level resource management, such as coscheduling or process control. We show that a non-blocking implementation of the work-stealing algorithm achieves this goal. With this implementation, the execution time of a computation running with arbitrarily many processes on arbitrarily many processors can be modeled as a simple function of work and critical-path length. This model holds even when the processes run on a set of processors that arbitrarily grows and shrinks over time. We observe linear speedup whenever the number of processes is small relative to the average parallelism.
Relative Performance of Preemption-Safe Locking and Non-Blocking Synchronization on Multiprogrammed Shared Memory Multiprocessors
- In Proceedings of the 11th International Parallel Processing Symposium (IPPS
, 1997
"... Most multiprocessors are multiprogrammed to achieve acceptable response time. Unfortunately, inopportune preemption may significantly degrade the performance of synchronized parallel applications. To address this problem, researchers have developed two principal strategies for concurrent, atomic upd ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Most multiprocessors are multiprogrammed to achieve acceptable response time. Unfortunately, inopportune preemption may significantly degrade the performance of synchronized parallel applications. To address this problem, researchers have developed two principal strategies for concurrent, atomic update of shared data structures: (1) preemption-safe locking and (2) non-blocking (lock-free) algorithms. Preemption-safe locking requires kernel support. Non-blocking algorithms generally require a universal atomic primitive, and are widely regarded as inefficient. We present a comparison of the two alternative strategies, focusing on four simple but important concurrent data structures--- stacks, FIFO queues, priority queues and counters---in microbenchmarks and real applications on a 12-processor SGI Challenge multiprocessor. Our results indicate that data-structurespecific non-blocking algorithms, which exist for stacks, FIFO queues and counters, can work extremely well: not only do they ...
A Fully Asynchronous Reader/Writer Mechanism for Multiprocessor Real-Time Systems
, 1997
"... Data sharing among tasks within multiprocessor real-time systems is a crucial issue. This report presents a fully asynchronous mechanism of sharing data between a single writer and multiple readers. The writer and all the readers are allowed to access the shared data asynchronously in a loop-free an ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Data sharing among tasks within multiprocessor real-time systems is a crucial issue. This report presents a fully asynchronous mechanism of sharing data between a single writer and multiple readers. The writer and all the readers are allowed to access the shared data asynchronously in a loop-free and wait-free manner because neither locking operations nor repeated actions of read-and-check are involved. Its implementation uses only (n + 2) buffer slots for n readers, and employs an atomic `Store-IfZero ' operation which can be easily simulated with the Compare-and-Swap instruction. Since neither writing nor reading the shared data imposes any effect upon other tasks in the system, this mechanism introduces no impact upon the timing behaviour of tasks. When employed by real-time applications, it helps to reduce blocking and priority inversion problems incurred by the commonly used lock-based synchronization mechanisms. 1 Introduction Data sharing is a basic approach to achieving inter...

