Results 1 - 10
of
31
Early Experience with a Commercial Hardware Transactional Memory Implementation
, 2009
"... We report on our experience with the hardware transactional memory (HTM) feature of two revisions of a prototype multicore processor. Our experience includes a number of promising results using HTM to improve performance in a variety of contexts, and also identifies some ways in which the feature co ..."
Abstract
-
Cited by 95 (13 self)
- Add to MetaCart
We report on our experience with the hardware transactional memory (HTM) feature of two revisions of a prototype multicore processor. Our experience includes a number of promising results using HTM to improve performance in a variety of contexts, and also identifies some ways in which the feature could be improved to make it even better. We give detailed accounts of our experiences, sharing techniques we used to achieve the results we have, as well as describing challenges we faced in doing so. This technical report expands on our ASPLOS paper [9], providing more detail and reporting on additional work conducted since that paper was written.
The repeat offender problem: A mechanism for supporting dynamic-sized, lock-free data structures
- In Proceedings of the 16th International Symposium on Distributed Computing
, 2002
"... We define the Repeat Offender Problem (ROP). Elsewhere, we have presented the first dynamic-sized lock-free data structures that can free memory to any standard memory allocator—even after thread failures—without requiring special support from the operating system, the memory allocator, or the hardw ..."
Abstract
-
Cited by 61 (12 self)
- Add to MetaCart
(Show Context)
We define the Repeat Offender Problem (ROP). Elsewhere, we have presented the first dynamic-sized lock-free data structures that can free memory to any standard memory allocator—even after thread failures—without requiring special support from the operating system, the memory allocator, or the hardware. These results depend on a solution to the ROP problem. Here we present the first solution to the ROP problem and its correctness proof. Our solution is implementable in most modern shared memory multiprocessors. M/S MTV29-01
Nonblocking memory management support for dynamic-sized data structures
- ACM Trans. Comput. Syst
, 2005
"... Conventional dynamic memory management methods interact poorly with lock-free synchronization. In this article, we introduce novel techniques that allow lock-free data structures to allocate and free memory dynamically using any thread-safe memory management library. Our mechanisms are lock-free in ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
Conventional dynamic memory management methods interact poorly with lock-free synchronization. In this article, we introduce novel techniques that allow lock-free data structures to allocate and free memory dynamically using any thread-safe memory management library. Our mechanisms are lock-free in the sense that they do not allow a thread to be prevented from allocating or freeing memory by the failure or delay of other threads. We demonstrate the utility of these techniques by showing how to modify the lock-free FIFO queue implementation of Michael and Scott to free unneeded memory. We give experimental results that show that the overhead introduced by such modifications is moderate, and is negligible under low contention.
DCAS is not a Silver Bullet for Nonblocking Algorithm Design
- In SPAA ’04: Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
, 2004
"... Despite years of research, the design of efficient nonblocking algorithms remains difficult. A key reason is that current shared-memory multiprocessor architectures support only single-location synchronisation primitives such as compareand-swap (CAS) and load-linked/store-conditional (LL/SC). Recent ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Despite years of research, the design of efficient nonblocking algorithms remains difficult. A key reason is that current shared-memory multiprocessor architectures support only single-location synchronisation primitives such as compareand-swap (CAS) and load-linked/store-conditional (LL/SC). Recently researchers have investigated the utility of doublecompare-and-swap (DCAS)—a generalisation of CAS that supports atomic access to two memory locations—in overcoming these problems. We summarise recent research in this direction and present a detailed case study concerning a previously published nonblocking DCAS-based doubleended queue implementation. Our summary and case study clearly show that DCAS does not provide a silver bullet for nonblocking synchronisation. That is, it does not make the design and verification of even mundane nonblocking data structures with desirable properties easy. Therefore, our position is that while slightly more powerful synchronisation primitives can have a profound effect on ease of algorithm design and verification, DCAS does not provide sufficient additional power over CAS to justify supporting it in hardware.
DCAS-Based Concurrent Deques
, 2000
"... The computer industry is currently examining the use of strong synchronization operations such as double compare-and-swap (DCAS) as a means of supporting non-blocking synchronization on tomorrow's multiprocessor machines. However, before such a strong primitive will be incorporated into hardwar ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
The computer industry is currently examining the use of strong synchronization operations such as double compare-and-swap (DCAS) as a means of supporting non-blocking synchronization on tomorrow's multiprocessor machines. However, before such a strong primitive will be incorporated into hardware design, its utility needs to be proven by developing a body of effective non-blocking data structures using DCAS. As part of this effort, we present two new linearizable non-blocking implementations of concurrent deques using the DCAS operation. The first uses an array representation, and improves on former algorithms by allowing uninterrupted concurrent access to both ends of the deque while correctly handling the difficult boundary cases when the deque is empty or full. The second uses a linked-list representation, and is the first non-blocking unbounded-memory deque implementation. It too allows uninterrupted concurrent access to both ends of the deque.
Dynamic-sized lockfree data structures
, 2002
"... We address the problem of integrating lockfree shared data structures with standard dynamic allocation mechanisms (such as malloc and free). We have two main contributions. The first is the design and experimental analysis of two dynamic-sized lockfree FIFO queue implementations, which extend Michae ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
We address the problem of integrating lockfree shared data structures with standard dynamic allocation mechanisms (such as malloc and free). We have two main contributions. The first is the design and experimental analysis of two dynamic-sized lockfree FIFO queue implementations, which extend Michael and Scott’s previous implementation by allowing unused memory to be freed. We compare our dynamic-sized implementations to the original on 16-processor and 64-processor multiprocessors. Our experimental results indicate that the performance penalty for making the queue dynamic-sized is modest, and is negligible when contention is not too high. These results were achieved by applying a solution to the Repeat Offender Problem (ROP), which we recently posed and solved. Our second contribution is another application of ROP solutions. Specifically, we show how to use any ROP solution to achieve a general methodology for transforming lockfree data structures that rely on garbage collection into ones that use explicit storage reclamation.
Efficient and Reliable Lock-Free Memory Reclamation Based on . . .
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
"... We present an efficient and practical lock-free method for semi-automatic (application-guided) memory reclamation based on reference counting, aimed for use with arbitrary lock-free dynamic data structures. The method guarantees the safety of local as well as global references, supports arbitrary me ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
(Show Context)
We present an efficient and practical lock-free method for semi-automatic (application-guided) memory reclamation based on reference counting, aimed for use with arbitrary lock-free dynamic data structures. The method guarantees the safety of local as well as global references, supports arbitrary memory reuse, uses atomic primitives that are available in modern computer systems, and provides an upper bound on the amount of memory waiting to be reclaimed. To the best of our knowledge, this is the first lock-free method that provides all of these properties. We provide analytical and experimental study of the method. The experiments conducted have shown that the method can also provide significant performance improvements for lock-free algorithms of dynamic data structures that require strong memory management.
Making lockless synchronization fast: Performance implications of memory reclamation
- In 2006 International Parallel and Distributed Processing Symposium (IPDPS 2006
, 2006
"... Achieving high performance for concurrent applications on modern multiprocessors remains challenging. Many programmers avoid locking to improve performance, while others replace locks with non-blocking synchronization to protect against deadlock, priority inversion, and convoying. In both cases, dyn ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
(Show Context)
Achieving high performance for concurrent applications on modern multiprocessors remains challenging. Many programmers avoid locking to improve performance, while others replace locks with non-blocking synchronization to protect against deadlock, priority inversion, and convoying. In both cases, dynamic data structures that avoid locking, require a memory reclamation scheme that reclaims nodes once they are no longer in use. The performance of existing memory reclamation schemes has not been thoroughly evaluated. We conduct the first fair and comprehensive comparison of three recent schemes—quiescent-state-based reclamation, epoch-based reclamation, and hazard-pointer-based reclamation—using a flexible microbenchmark. Our results show that there is no globally optimal scheme. When evaluating lockless synchronization, programmers and algorithm designers should thus carefully consider the data structure, the workload, and the execution environment, each of which can dramatically affect memory reclamation performance. 1
Lock-free and practical deques using single-word compare-andswap.
- In 8th International Conference on Principles of Distributed Systems,
, 2004
"... ..."
(Show Context)
Progress Guarantee for Parallel Programs via Bounded Lock-Freedom
- in Conf. on Prog. Lang. Design and Impl
"... Parallel platforms are becoming ubiquitous with modern comput-ing systems. Many parallel applications attempt to avoid locks in order to achieve high responsiveness, aid scalability, and avoid deadlocks and livelocks. However, avoiding the use of system locks does not guarantee that no locks are act ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
Parallel platforms are becoming ubiquitous with modern comput-ing systems. Many parallel applications attempt to avoid locks in order to achieve high responsiveness, aid scalability, and avoid deadlocks and livelocks. However, avoiding the use of system locks does not guarantee that no locks are actually used, because progress inhibitors may occur in subtle ways through various program struc-tures. Notions of progress guarantee such as lock-freedom, wait-freedom, and obstruction-freedom have been proposed in the liter-ature to provide various levels of progress guarantees. In this paper we formalize the notions of progress guarantees us-ing linear temporal logic (LTL). We concentrate on lock-freedom and propose a variant of it denoted bounded lock-freedom, which is more suitable for guaranteeing progress in practical systems. We use this formal definition to build a tool that checks if a concurrent program is bounded lock-free for a given bound. We then study the interaction between programs with progress guarantees and the un-derlying system (e.g., compilers, runtimes, operating systems, and hardware platforms). We propose a means to argue that an underly-ing system supports lock-freedom. A composition theorem asserts that bounded lock-free algorithms running on bounded lock-free supporting systems retain bounded lock-freedom for the composed execution.