Download:
|
by Nir Shavit, Asaph Zemach
In Proceedings of the 17th Annual ACM Symposium on Principals of Distributed Computing (PODC), Santa Barbara
http://www.math.tau.ac.il/~asaph/export/cf.ps.gz
Add To MetaCart
Abstract:
We enhance the well established software combining synchronization technique to create combining funnels. Previous software combining methods used a statically assigned tree whose depth was logarithmic in the total number of processors in the system. On a shared memory multiprocessors the new method allows to dynamically build combining trees with depth logarithmic in the actual number of processors accessing the data structure concurrently. The structure is comprised from a series of combining layers through which processor's requests are funneled. These layers use randomization instead of a rigid tree structure to allow processors to find partners for combining. By using an adaptive scheme the funnel can change width and depth to accommodate different access frequencies without requiring global agreement as to its size. Rather, processors choose parameters of the protocol privately, making this scheme very simple to implement and tune. When we add an "elimination " mechanism to the funnel structure, the randomly constructed "tree " is transformed into a "forest " of disjoint (and on average shallower) trees of requests, thus enhancing the level of parallelism and decreasing latency. We present two new linearizable combining funnel based data structures: a fetch-and-add object and a stack. We study the performance of these structures by benchmarking them against the most efficient software implementations of fetchand-add and stacks known to date, combining trees and elimination trees, on a simulated shared memory multiprocessor using Proteus. Our empirical data shows that combining funnel based fetch-and-add outperforms combining trees of fixed height by as much as 70%. In fact, even compared to combining trees optimized for a given load, funnel performance is the same or better. Elimination trees, which are not linearizable, are 10 % faster than funnels under highest load, but as load drops combining funnels adapt their size, giving them a 34 % lead in latency.
Citations
|
554
|
Linearizability: A correctness condition for concurrent objects
– Herlihy, Wing
- 1990
|
|
384
|
Algorithms for scalable synchronization on shared-memory multiprocessors
– Mellor-Crummey, Scott
- 1991
|
|
338
|
Hierarchical correctness proofs for distributed algorithms
– Lynch, Tuttle
- 1987
|
|
229
|
PROTEUS: A High-Performance Parallel-Architecture Simulator
– Brewer, Dellarocas, et al.
- 1991
|
|
202
|
The performance of spin lock alternatives for shared-memory multiprocessors
– Anderson
- 1990
|
|
195
|
hot spot¨contention and combining in multistage interconnection networks,” in Interconnection networks for high-performance parallel computers. Los Alamitos
– Pfister, Norton
- 1994
|
|
154
|
The MIT alewife machine : A large-scale distributed-memory multiprocessor
– Agarwal, D’Souza, et al.
- 1991
|
|
146
|
Efficient synchronization primitives for large-scale cache-coherent multiprocessor
– Goodman, Vernon, et al.
- 1989
|
|
97
|
Synchronization algorithms for shared-memory multiprocessors
– Graunke, Thakkar
- 1990
|
|
93
|
Distributing hot-spot addressing in large-scale multiprocessors
– Yew, Tzeng, et al.
- 1987
|
|
92
|
Basic techniques for the efficient coordination of very large numbers of cooperating sequential processors. Programming Languages and Systems
– Gottlieb, Lubachevsky, et al.
- 1983
|
|
81
|
E cient Synchronization on Multiprocessors with Shared Memory
– Kruskal, Rudolph, et al.
- 1988
|
|
64
|
1985]. “The IBM research parallel processor prototype (RP3): Introduction and architecture
– Pfister, Brantley, et al.
|
|
47
|
Reactive synchronization algorithms for multiprocessors
– Lim, Agarwal
- 1994
|
|
47
|
Waiting Algorithms for Synchronization in Large-Scale Multiprocessors
– Lim, Agarwal
- 1993
|
|
47
|
Adaptive backoff synchronization techniques
– AganvaJ, Cherian
- 1989
|
|
45
|
Diffracting trees
– Shavit, Zemach
- 1996
|
|
34
|
Elimination trees and the construction of pools and stacks
– Shavit, Touitou
- 1995
|
|
27
|
Scalable concurrent counting
– Herlihy, Lim, et al.
- 1995
|
|
26
|
Processing Hot Spots’ in High Performance Systems
– Gawlick
- 1985
|
|
21
|
The NYU Ultracomputer – designing an MIMD parallel computer
– Gottlieb, Grishman, et al.
- 1984
|
|
20
|
et al. The IBM research parallel processor prototype (RP3): Introduction and architecture
– Pfister
- 1985
|
|
11
|
Empirical Studies of Competitive Spinning for A Shared Memory Multiprocessor
– Karlin, Li, et al.
- 1991
|
|
10
|
Dellarocas, ‘‘Proteus User Documentation
– Brewer, N
- 1992
|
|
2
|
The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor. In Scalable Shared &femory Multiprocessors
– Nussbaum
- 1991
|
|
1
|
A High-Performance ParallelArchitecture Simulator
– PROTEUS
- 1991
|