• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 447
Next 10 →

Inter-Block GPU Communication via Fast Barrier Synchronization

by Shucai Xiao, Wu-chun Feng
"... Abstract—While GPGPU stands for general-purpose computation on graphics processing units, the lack of explicit support for inter-block communication on the GPU arguably hampers its broader adoption as a general-purpose computing device. Interblock communication on the GPU occurs via global memory an ..."
Abstract - Cited by 39 (2 self) - Add to MetaCart
synchronization: GPU lock-based synchronization and GPU lock-free synchronization. We then evaluate the efficacy of each approach via a micro-benchmark as well as three well-known algorithms — Fast Fourier Transform (FFT), dynamic programming, and bitonic sort. For the microbenchmark, the experimental results

Transactional Memory: Architectural Support for Lock-Free Data Structures

by Maurice Herlihy, J. Eliot B. Moss
"... A shared data structure is lock-free if its operations do not require mutual exclusion. If one process is interrupted in the middle of an operation, other processes will not be prevented from operating on that object. In highly concurrent systems, lock-free data structures avoid common problems asso ..."
Abstract - Cited by 1031 (27 self) - Add to MetaCart
associated with conventional locking techniques, including priority inversion, convoying, and difficulty of avoiding deadlock. This paper introduces transactional memory, a new multiprocessor architecture intended to make lock-free synchronization as efficient (and easy to use) as conventional techniques

Synchronize between blocks on CUDA GPU

by unknown authors
"... that emphasizes using many relatively slow threads concurrently, rather than executing on a single thread very fast. This approach is more suitable for general purpose parallel computation. CUDA (Compute Unified Device Architecture) is a parallel computing architecture developed by NVIDIA, allows de ..."
Abstract - Add to MetaCart
, that include 1.) Simple Synchronization; 2.) Tree-Based Synchronization; 3.) GPU Lock-Free Synchronizations. Those three different algorithms were tested using simple n-body particle simulations.

Software Transactional Memory

by Nir Shavit, Dan Touitou , 1995
"... As we learn from the literature, flexibility in choosing synchronization operations greatly simplifies the task of designing highly concurrent programs. Unfortunately, existing hardware is inflexible and is at best on the level of a Load Linked/Store Conditional operation on a single word. Building ..."
Abstract - Cited by 695 (14 self) - Add to MetaCart
Load Linked/Store Conditional operation. We use STM to provide a general highly concurrent method for translating sequential object implementations to lock-free ones based on implementing a k-word compare&swap STM-transaction. Empirical evidence collected on simulated multiprocessor architectures

Implementing Lock-Free Queues

by John D. Valois - In Proceedings of the Seventh International Conference on Parallel and Distributed Computing Systems, Las Vegas, NV , 1994
"... We study practical techniques for implementing the FIFO queue abstract data type using lock-free data structures, which synchronize the operations of concurrent processes without the use of mutual exclusion. Two new algorithms based on linked lists and arrays are presented. We also propose a new sol ..."
Abstract - Cited by 67 (1 self) - Add to MetaCart
We study practical techniques for implementing the FIFO queue abstract data type using lock-free data structures, which synchronize the operations of concurrent processes without the use of mutual exclusion. Two new algorithms based on linked lists and arrays are presented. We also propose a new

A Lock-Free Multiprocessor OS Kernel

by Henry Massalin, Calton Pu , 1991
"... Typical shared-memory multiprocessor OS kernels use interlocking, implemented as spinlocks or waiting semaphores. We have implemented a complete multiprocessor OS kernel (including threads, virtual memory, and I/O including a window system and a file system) using only lock-free synchronization meth ..."
Abstract - Cited by 107 (2 self) - Add to MetaCart
Typical shared-memory multiprocessor OS kernels use interlocking, implemented as spinlocks or waiting semaphores. We have implemented a complete multiprocessor OS kernel (including threads, virtual memory, and I/O including a window system and a file system) using only lock-free synchronization

Hogwild!: A lock-free approach to parallelizing stochastic gradient descent

by Feng Niu, Benjamin Recht, Stephen J. Wright , 2011
"... Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work a ..."
Abstract - Cited by 161 (9 self) - Add to MetaCart
Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work

A Methodology for Implementing Highly Concurrent Data Objects

by Maurice Herlihy , 1993
"... A concurrent object is a data structure shared by concurrent processes. Conventional techniques for implementing concurrent objects typically rely on critical sections: ensuring that only one process at a time can operate on the object. Nevertheless, critical sections are poorly suited for asynchro ..."
Abstract - Cited by 350 (10 self) - Add to MetaCart
, with no explicit synchronization. Each sequential operation is automatically transformed into a lock-free or wait-free operation using novel synchronization and memory management algorithms. These algorithms are presented for a multiple instruction/multiple data (MIMD) architecture in which n processes communicate

Lock-free cuckoo hashing,”

by Nhan Nguyen , Philippas Tsigas , 2014
"... Abstract-This paper presents a lock-free cuckoo hashing algorithm; to the best of our knowledge this is the first lockfree cuckoo hashing in the literature. The algorithm allows mutating operations to operate concurrently with query ones and requires only single word compare-and-swap primitives. Qu ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Abstract-This paper presents a lock-free cuckoo hashing algorithm; to the best of our knowledge this is the first lockfree cuckoo hashing in the literature. The algorithm allows mutating operations to operate concurrently with query ones and requires only single word compare-and-swap primitives

A performance evaluation of lock-free synchronization protocols

by Anthony Lamarca - In Proceedings of the 13th Annual ACM Symposium on Principles of Distributed Computing (PODC , 1994
"... In this paper, we investigate the practical performance of lock-free techniques that provide synchronization on shared-memory multiprocessors. Our goal is to provide a technique to allow designers of new protocols to quickly determine an algorithm’s performance characteristics. We develop a simple a ..."
Abstract - Cited by 25 (1 self) - Add to MetaCart
In this paper, we investigate the practical performance of lock-free techniques that provide synchronization on shared-memory multiprocessors. Our goal is to provide a technique to allow designers of new protocols to quickly determine an algorithm’s performance characteristics. We develop a simple
Next 10 →
Results 1 - 10 of 447
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University