Results 1 - 10
of
24
Efficient synchronization under global EDF scheduling on multiprocessors
- In proceedings of the Euromicro conference on Real-Time Systems
, 2006
"... We consider coordinating accesses to shared data structures in multiprocessor real-time systems scheduled under preemptive global EDF. To our knowledge, prior work on global EDF has focused only on systems of independent tasks. We take an initial step here towards a generic resource-sharing framewor ..."
Abstract
-
Cited by 31 (10 self)
- Add to MetaCart
(Show Context)
We consider coordinating accesses to shared data structures in multiprocessor real-time systems scheduled under preemptive global EDF. To our knowledge, prior work on global EDF has focused only on systems of independent tasks. We take an initial step here towards a generic resource-sharing framework by con-sidering simple shared objects, such as queues, stacks, and linked lists. In many applications, the predominate use of synchroniza-tion constructs is for sharing such simple objects. We analyze two synchronization methods for such objects, one based on queue-based spin locks and a second based on lock-free algorithms. 1
Feather-trace: A light-weight event tracing toolkit
- In Proceedings of the Third International Workshop on Operating Systems Platformsfor Embedded Real-Time Applications (OSPERT'07
, 2007
"... Abstract We present a light-weight event tracing toolkit for real-time operating systems on the Intel x86 platform. Our approach is wait-free, multiprocessor-safe, and intro-duces very low overhead. Only a single unconditional jump instruction is required to distinguish between en-abled and disabled ..."
Abstract
-
Cited by 27 (18 self)
- Add to MetaCart
(Show Context)
Abstract We present a light-weight event tracing toolkit for real-time operating systems on the Intel x86 platform. Our approach is wait-free, multiprocessor-safe, and intro-duces very low overhead. Only a single unconditional jump instruction is required to distinguish between en-abled and disabled events. As a case study, we traced the locking behavior of the Linux kernel and severalsoft real-time multimedia applications. Our results provide strong support for the wide-spread assumption thatshort non-nested critical sections are the common case in practice. 1 Introduction When developing operating systems and embedded sys-tems, event tracing facilities are an essential tool. Such facilities allow developers to trace the behavior of thesystem being developed by collecting performance and state data while the system in question executes for lateroffline analysis. The ability to better understand observed behaviors and to obtain high-resolution timinginformation greatly helps to both debug failures and improve performance. Thus, it is not surprising that therehas been considerable recent interest in tracing frameworks [5, 7, 11, 19, 20]. Prior work. For general-purpose operating systems,powerful and flexible solutions have been developed and
Allocating memory in a lock-free manner
, 2005
"... The potential of multiprocessor systems is often not fully realized by their system services. Certain synchronization methods, such as lock-based ones, may limit the parallelism. It is significant to see the impact of wait/lock-free synchronization design in key services for multiprocessor systems, ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
(Show Context)
The potential of multiprocessor systems is often not fully realized by their system services. Certain synchronization methods, such as lock-based ones, may limit the parallelism. It is significant to see the impact of wait/lock-free synchronization design in key services for multiprocessor systems, such as the memory allocation service. Efficient, scalable memory allocators for multithreaded applications on multiprocessors is a significant goal of recent research projects. We propose a lock-free memory allocator, to enhance the parallelism in the system. Its architecture is inspired by Hoard, a successful concurrent memory allocator, with a modular, scalable design that preserves scalability and helps avoiding false-sharing and heap blowup. Within our effort on designing appropriate lock-free algorithms to construct this system, we propose a new non-blocking data structure called flat-sets, supporting conventional “internal” operations as well as “inter-object” operations, for moving items between flat-sets. We implemented the memory allocator in a set of multiprocessor systems (UMA Sun Enterprise 450 and ccNUMA Origin 3800) and studied its behaviour. The results show that the good properties of Hoard w.r.t. false-sharing and heap-blowup are preserved, while the scalability properties are enhanced even further with the help of lock-free synchronization.
Wait-free Programming for General Purpose Computations on Graphics Processors
"... The fact that graphics processors (GPUs) are today’s most powerful computational hardware for the dollar has motivated researchers to utilize the ubiquitous and powerful GPUs for general-purpose computing. Recent GPUs feature the single-program multiple-data (SPMD) multicore architecture instead of ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
The fact that graphics processors (GPUs) are today’s most powerful computational hardware for the dollar has motivated researchers to utilize the ubiquitous and powerful GPUs for general-purpose computing. Recent GPUs feature the single-program multiple-data (SPMD) multicore architecture instead of the single-instruction multiple-data (SIMD). However, unlike CPUs, GPUs devote their transistors mainly to data processing rather than data caching and flow control, and consequently most of the powerful GPUs with many cores do not support any synchronization mechanisms between their cores. This prevents GPUs from being deployed more widely for general-purpose computing. This paper aims at bridging the gap between the lack of synchronization mechanisms in recent GPU architectures and the need of synchronization mechanisms in parallel applications. Based on the intrinsic features of recent GPU architectures, we construct strong synchronization objects like wait-free and t-resilient read-modify-write objects for a general model of recent GPU architectures without strong hardware synchronization primitives like test-andset and compare-and-swap. Accesses to the wait-free objects have time complexity O(N), whether N is the number of processes. Our result demonstrates that it is possible to construct wait-free synchronization mechanisms for GPUs without the need of strong synchronization primitives in hardware and that wait-free programming is possible for GPUs.
Lock-free concurrent data structures
- PROGRAMMING MULTI-CORE AND MANY-CORE COMPUTING SYSTEMS , SABRI PLLANA ET AL
"... ..."
A Study of the Behavior of Synchronization Methods in Commonly Used Languages and Systems
"... Abstract—Synchronization is a central issue in concurrency and plays an important role in the behavior and performance of modern programmes. Programming languages and hardware designers are trying to provide synchronization constructs and primitives that can handle concurrency and synchronization is ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Synchronization is a central issue in concurrency and plays an important role in the behavior and performance of modern programmes. Programming languages and hardware designers are trying to provide synchronization constructs and primitives that can handle concurrency and synchronization issues efficiently. Programmers have to find a way to select the most appropriate constructs and primitives in order to gain the desired behavior and performance under concurrency. Several parameters and factors affect the choice, through complex interactions among (i) the language and the language constructs that it supports, (ii) the system architecture, (iii) possible run-time environments, virtual machine options and memory management support and (iv) applications. We present a systematic study of synchronization strategies, focusing on concurrent data structures. We have chosen concurrent data structures with different number of contention spots. We consider both coarse-grain and fine-grain locking strategies, as well as lock-free methods. We have investigated synchronization-aware implementations in C++, C # (.NET and Mono) and Java. Considering the machine architectures, we have studied the behavior of the implementations on both Intel’s Nehalem and AMD’s Bulldozer. The properties that we study are throughput and fairness under different workloads and multiprogramming execution environments. For NUMA architectures fairness is becoming as important as the typically considered throughput property. To the best of our knowledge this is the first systematic and comprehensive study of synchronization-aware implementations. This paper takes steps towards capturing a number of guiding principles and concerns for the selection of the programming environment and synchronization methods in connection to the application and the system characteristics. I.
The Non-blocking Programming Paradigm in Large Scale Scientific Computations ⋆
"... Abstract. Non-blocking implementation of shared data objects is a new alternative approach to the problem of designing scalable shared data objects for multiprocessor systems. Non-blocking implementations allow multiple tasks to access a shared object at the same time, but without enforcing mutual e ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Non-blocking implementation of shared data objects is a new alternative approach to the problem of designing scalable shared data objects for multiprocessor systems. Non-blocking implementations allow multiple tasks to access a shared object at the same time, but without enforcing mutual exclusion to accomplish this. Since, in non-blocking implementations of shared data objects, one process is not allowed to block another process, non-blocking shared data objects have the following significant advantages over lock-based ones: 1) they avoid lock convoys and contention points (locks). 2) they provide high fault tolerance (processor failures will never corrupt shared data objects) and eliminates deadlock scenarios, where two or more tasks are waiting for locks held by the other. 3) they do not give priority inversion scenarios. As shown in [1, 2], non-blocking synchronisation has better performance in certain application than blocking synchronisation. In this paper, we try to provide an in depth understanding of the performance benefits of integrating non-blocking synchronisation in scientific computing applications. 1
www.hurray.isep.ipp.pt 1
"... Consider the problem of scheduling a set of tasks on a single processor such that deadlines are met. Assume that tasks may share data and that linearizability, the most common correctness condition for data sharing,must be satisfied.We find that linearizability can severely penalize schedulability. ..."
Abstract
- Add to MetaCart
(Show Context)
Consider the problem of scheduling a set of tasks on a single processor such that deadlines are met. Assume that tasks may share data and that linearizability, the most common correctness condition for data sharing,must be satisfied.We find that linearizability can severely penalize schedulability. We identify, however, two special cases where linearizability causes no or not too large penalty on schedulability.
Efficient & Lock-Free Modified Skip List in Concurrent Environment
, 2015
"... Abstract: In this era the trend of increasing software demands continues consistently, the traditional approach of faster processes comes to an end, forcing major processor manufactures to turn to multi-threading and multi-core architectures, in what is called the concurrency revolution. At the hea ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract: In this era the trend of increasing software demands continues consistently, the traditional approach of faster processes comes to an end, forcing major processor manufactures to turn to multi-threading and multi-core architectures, in what is called the concurrency revolution. At the heart of many concurrent applications lie concurrent data structures. Concurrent data structures coordinate access to shared resources; implementing them is hard. The main goal of this paper is to provide an efficient and practical lock-free implementation of modified skip list data structure. That is suitable for both fully concurrent (large multi-processor) systems as well as pre-emptive (multi-process) systems. The algorithms for concurrent MSL based on mutual exclusion, Causes blocking which has several drawbacks and degrades the system's overall performance. Non-blocking algorithms avoid blocking, and are either lock-free or wait-free.
Effective Use of Non-blocking Data Structures in a Deduplication Application
"... Abstract Efficient multicore programming demands fundamental data structures that support a high degree of concurrency. Existing research on non-blocking data structures promises to satisfy such demands by providing progress guarantees that allow a significant increase in parallelism while avoiding ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Efficient multicore programming demands fundamental data structures that support a high degree of concurrency. Existing research on non-blocking data structures promises to satisfy such demands by providing progress guarantees that allow a significant increase in parallelism while avoiding the safety hazards of lock-based synchronizations. It is well-acknowledged that the use of non-blocking containers can bring significant performance benefits to applications where the shared data experience heavy contention. However, the practical implications of integrating these data structures in real-world applications are not well-understood. In this paper, we study the effective use of non-blocking data structures in a data deduplication application which performs a large number of concurrent compression operations on a data stream using the pipeline parallel processing model. We present our experience of manually refactoring the application from using conventional lock-based synchronization mechanisms to using a wait-free hash map and a set of lock-free queues to boost the degree of concurrency of the application. Our experimental study explores the performance trade-offs of parallelization mechanisms that rely on a) traditional blocking techniques, b) fine-grained mutual exclusion, and c) lock-free and wait-free synchronization.