Results 1 - 10
of
49
Software transactional memory for dynamic-sized data structures
- IN PROCEEDINGS OF THE 22ND ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING
, 2003
"... We propose a new form of software transactional memory (STM) designed to support dynamic-sized data structures, and we describe a novel non-blocking implementation. The non-blocking property we consider is obstruction-freedom. Obstruction-freedom is weaker than lock-freedom; as a result, it admits s ..."
Abstract
-
Cited by 274 (21 self)
- Add to MetaCart
We propose a new form of software transactional memory (STM) designed to support dynamic-sized data structures, and we describe a novel non-blocking implementation. The non-blocking property we consider is obstruction-freedom. Obstruction-freedom is weaker than lock-freedom; as a result, it admits substantially simpler and more efficient implementations. A novel feature of our obstruction-free STM implementation is its use of modular contention managers to ensure progress in practice. We illustrate the utility of our dynamic STM with a straightforward implementation of an obstruction-free red-black tree, thereby demonstrating a sophisticated non-blocking dynamic data structure that would be difficult to implement by other means. We also present the results of simple preliminary performance experiments that demonstrate that an "early release " feature of our STM is useful for reducing contention, and that our STM lends itself to the effective use of modular contention managers.
Transactional Lock-Free Execution of Lock-Based Programs
- In Proceedings of the Tenth International Conference on Architectural Support for Programming Languages and Operating Systems
, 2002
"... This paper is motivated by the difficulty in writing correct high-performance programs. Writing shared-memory multithreaded programs imposes a complex trade-off between programming ease and performance, largely due to subtleties in coordinating access to shared data. To ensure correctness programmer ..."
Abstract
-
Cited by 148 (9 self)
- Add to MetaCart
This paper is motivated by the difficulty in writing correct high-performance programs. Writing shared-memory multithreaded programs imposes a complex trade-off between programming ease and performance, largely due to subtleties in coordinating access to shared data. To ensure correctness programmers often rely on conservative locking at the expense of performance. The resulting serialization of threads is a performance bottleneck. Locks also interact poorly with thread scheduling and faults, resulting in poor system performance.
The repeat offender problem: A mechanism for supporting dynamic-sized, lock-free data structures
- In Proceedings of the 16th International Symposium on Distributed Computing
, 2002
"... We define the Repeat Offender Problem (ROP). Elsewhere, we have presented the first dynamic-sized lock-free data structures that can free memory to any standard memory allocator—even after thread failures—without requiring special support from the operating system, the memory allocator, or the hardw ..."
Abstract
-
Cited by 44 (10 self)
- Add to MetaCart
We define the Repeat Offender Problem (ROP). Elsewhere, we have presented the first dynamic-sized lock-free data structures that can free memory to any standard memory allocator—even after thread failures—without requiring special support from the operating system, the memory allocator, or the hardware. These results depend on a solution to the ROP problem. Here we present the first solution to the ROP problem and its correctness proof. Our solution is implementable in most modern shared memory multiprocessors. M/S MTV29-01
The JX Operating System
- In Proceedings of the Usenix Annual Technical Conference
, 2002
"... This paper describes the architecture and performance of the JX operating system. JX is both an operating system completely written in Java and a runtime system for Java applications. ..."
Abstract
-
Cited by 35 (4 self)
- Add to MetaCart
This paper describes the architecture and performance of the JX operating system. JX is both an operating system completely written in Java and a runtime system for Java applications.
A scalable lock-free stack algorithm
- In SPAA’04: Symposium on Parallelism in Algorithms and Architectures
, 2004
"... The literature describes two high performance concurrent stack algorithms based on combining funnels and elimination trees. Unfortunately, the funnels are linearizable but blocking, and the elimination trees are non-blocking but not linearizable. Neither is used in practice since they perform well o ..."
Abstract
-
Cited by 30 (5 self)
- Add to MetaCart
The literature describes two high performance concurrent stack algorithms based on combining funnels and elimination trees. Unfortunately, the funnels are linearizable but blocking, and the elimination trees are non-blocking but not linearizable. Neither is used in practice since they perform well only at exceptionally high loads. The literature also describes a simple lock-free linearizable stack algorithm that works at low loads but does not scale as the load increases. The question of designing a stack algorithm that is non-blocking, linearizable, and scales well throughout the concurrency range, has thus remained open. This paper presents such a concurrent stack algorithm. It is based on the following simple observation: that a single elimination array used as a backoff scheme for a simple lock-free stack is lock-free, linearizable, and scalable. As our empirical results show, the resulting eliminationbackoff stack performs as well as the simple stack at low loads, and increasingly outperforms all other methods (lock-based and non-blocking) as concurrency increases. We believe its simplicity and scalability make it a viable practical alternative to existing constructions for implementing concurrent stacks.
Proving correctness of highlyconcurrent linearisable objects
- In PPoPP
, 2006
"... We study a family of implementations for linked lists using finegrain synchronisation. This approach enables greater concurrency, but correctness is a greater challenge than for classical, coarse-grain synchronisation. Our examples are demonstrative of common design patterns such as lock coupling, o ..."
Abstract
-
Cited by 27 (7 self)
- Add to MetaCart
We study a family of implementations for linked lists using finegrain synchronisation. This approach enables greater concurrency, but correctness is a greater challenge than for classical, coarse-grain synchronisation. Our examples are demonstrative of common design patterns such as lock coupling, optimistic, and lazy synchronisation. Although they are are highly concurrent, we prove that they are linearisable, safe, and they correctly implement a highlevel abstraction. Our proofs illustrate the power and applicability of rely-guarantee reasoning, as well of some of its limitations. The examples of the paper establish a benchmark challenge for other reasoning techniques.
A Simple, Fast and Scalable Non-Blocking Concurrent FIFO Queue for Shared Memory Multiprocessor Systems
- in Proceedings of the 13th ACM Symposium on Parallel Algorithms and Architectures
, 2001
"... A non-blocking FIFO queue algorithm for multiprocessor shared memory systems is presented in this paper. The algorithm is very simple, fast and scales very well in both symmetric and non-symmetric multiprocessor shared memory systems. Experiments on a 64-node SUN Enterprise 10000 -- a symmetric mult ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
A non-blocking FIFO queue algorithm for multiprocessor shared memory systems is presented in this paper. The algorithm is very simple, fast and scales very well in both symmetric and non-symmetric multiprocessor shared memory systems. Experiments on a 64-node SUN Enterprise 10000 -- a symmetric multiprocessor system -- and on a 64-node SGI Origin 2000 -- a cache coherent non uniform memory access multiprocessor system -- indicate that our algorithm considerably outperforms the best of the known alternatives in both multiprocessors in any level of multiprogramming. This work introduces two new, simple algorithmic mechanisms. The first lowers the contention to key variables used by the concurrent enqueue and/or dequeue operations which consequently results in the good performance of the algorithm, the second deals with the pointer recycling problem, an inconsistency problem that all non-blocking algorithms based on the compare-and-swap synchronisation primitive have to address. In our construction we selected to use compare-and-swap since compare-and-swap is an atomic primitive that scales well under contention and either is supported by modern multiprocessors or can be implemented efficiently on them.
Fastforward for efficient pipeline parallelism: A cache-optimized concurrent lock-free queue
- In PPoPP ’08: Proceedings of the The 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, 2008
"... A Cache-Optimized Concurrent Lock-Free Queue Low overhead core-to-core communication is critical for efficient pipeline-parallel software applications. This paper presents Fast-Forward, a cache-optimized single-producer/single-consumer concurrent lock-free queue for pipeline parallelism on multicore ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
A Cache-Optimized Concurrent Lock-Free Queue Low overhead core-to-core communication is critical for efficient pipeline-parallel software applications. This paper presents Fast-Forward, a cache-optimized single-producer/single-consumer concurrent lock-free queue for pipeline parallelism on multicore architectures, with weak to strongly ordered consistency models. Enqueue and dequeue times on a 2.66 GHz Opteron 2218 based system are as low as 28.5 ns, up to 5x faster than the next best solution. FastForward’s effectiveness is demonstrated for real applications by applying it to line-rate soft network processing on Gigabit Ethernet with general purpose commodity hardware.
Formal verification of a practical lock-free queue algorithm
- In FORTE
, 2004
"... Abstract. We describe a semi-automated verification of a slightly optimised version of Michael and Scott’s lock-free FIFO queue implementation. We verify the algorithm with a simulation proof consisting of two stages: a forward simulation from an automaton modelling the algorithm to an intermediate ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
Abstract. We describe a semi-automated verification of a slightly optimised version of Michael and Scott’s lock-free FIFO queue implementation. We verify the algorithm with a simulation proof consisting of two stages: a forward simulation from an automaton modelling the algorithm to an intermediate automaton, and a backward simulation from the intermediate automaton to an automaton that models the behaviour of a FIFO queue. These automata are encoded in the input language of the PVS proof system, and the properties needed to show that the algorithm implements the specification are proved using PVS’s theorem prover. 1
Integrating Non-blocking Synchronisation in Parallel Applications: Performance Advantages and Methodologies
- In Proceedings of the 3rd ACM Workshop on Software and Performance (WOSP'02
, 2002
"... In this paper we investigate how performance and speedup of applications would be aoeected by using non-blocking rather than blocking synchronisation. The results obtained show that for many applications, non-blocking synchronisation lead to significant speedups for a fairly large number of processo ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
In this paper we investigate how performance and speedup of applications would be aoeected by using non-blocking rather than blocking synchronisation. The results obtained show that for many applications, non-blocking synchronisation lead to significant speedups for a fairly large number of processors, while it never slows the applications down. As part of this investigation this paper also provides a set of efficient and simple translations that show how typical blocking operations found in parallel applications, such as simple locks, queues and lock trees can be translated into non-blocking equivalents that use hardware primitives common in modern multiprocessor systems. With these translations this paper clearly demonstrates that it is easy for the application designer/programmer to replace the blocking operations commonly found on with nonblocking equivalents ones. For the empirical results a set of representative applications running on a large-scale ccNUMA machine were used.

