DMCA
Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors (1991)
Cached
Download Links
- [www.csd.uoc.gr]
- [www.cc.gatech.edu]
- [calab.kaist.ac.kr]
- [www.cc.gatech.edu]
- [ftp.cs.rochester.edu]
- [www.cs.rice.edu]
- [www.eecg.toronto.edu]
- [www.eecg.toronto.edu]
- [www.cs.rice.edu]
- [www.cs.utexas.edu]
- [www.cs.toronto.edu]
- [www.cse.usf.edu]
- [www.cos.ufrj.br]
- [www.cs.uiuc.edu]
- [www.csee.usf.edu]
- [www.cse.usf.edu]
- [www.cos.ufrj.br]
- DBLP
Other Repositories/Bibliography
Venue: | ACM Transactions on Computer Systems |
Citations: | 562 - 32 self |
Citations
728 | Memory consistency and event ordering in scalable shared-memory multiprocessors
- Gharachorloo, Lenoski, et al.
- 1990
(Show Context)
Citation Context ...e, increasing the performance of busy-wait locks and barriers is not the only possible rationale for implementing synchronization mechanisms in hardware. Recent work on weaklyconsistent shared memory =-=[1, 13, 28]-=- has suggested the need for synchronization \fences" that provide clean points for memory semantics. Combining networks, likewise, may improve the performance of memory with bursty access patterns (ca... |
347 | A methodology for implementing highly concurrent objects
- Herlihy
- 1993
(Show Context)
Citation Context ...e instructions have uses other than the construction of busy-wait locks. Fetch and store and compare and swap, for example, are essential for manipulating pointers to build concurrent data structures =-=[20, 32]-=-. Because of their general utility, fetch and instructions are substantially more attractive than special-purpose synchronization primitives. Future designs for shared memory machines should include a... |
342 |
Solution of a problem in concurrent programming control
- Dijkstra
- 1965
(Show Context)
Citation Context ...ded to be subtle, and costly in time 1sand space, requiring both a large number of shared variables and a large number of operations to coordinate concurrent invocations of synchronization primitives =-=[12, 25,26,36,40]-=-. Modern multiprocessors generally include more sophisticated atomic operations, permitting simpler and faster coordination strategies. Particularly common are various fetch and operations [22], which... |
258 | A new solution of dijkstra’s concurrent programming problem
- Lamport
- 1974
(Show Context)
Citation Context ...d and Kanodia [41], a ticket lock corresponds to the busy-wait implementation of a semaphore using an eventcount and a sequencer. It can also be thought of as an optimization of Lamport's bakery lock =-=[24]-=-, which was designed for fault-tolerance rather than performance. Instead of spinning on the release counter, processors using a bakery lock repeatedly examine the tickets of their peers. Though it pr... |
256 | Weak ordering - a new definition
- Adve, Hill
- 1990
(Show Context)
Citation Context ...IS allocated m shared memory // locally a.cesslble to processor vpld processor private sense Boolean := true processor private vp”ld Integer // a unique virtual processor Index // lnltlally // rounds =-=[1]-=- [k] flag = false for all 1,k // rounds [L] [k] role = // winner If k > 0, I mod z-k = O, 1 + Z-(k–i) < P, and 2-k < P // bye If k > 0, 1 mod 2-k = O, aud 1 + 2- (k-1) >= P // loser lf k > 0 and 1 mod... |
251 |
A √ N Algorithm for Mutual Exclusion in Decentralized Systems
- Maekawa
- 1985
(Show Context)
Citation Context ...complexity of the resulting solutions is the principal motivation for the development of fetch and primitives. Other researchers have considered mutual exclusion in the context of distributed systems =-=[31, 39, 43, 45, 46]-=-, but the characteristics of message passing are di erent enough from shared memory operations that solutions do not transfer from one environment tothe other. Our pseudo-code notation is meant to be ... |
250 | An Evaluation of Directory Schemes for Cache Coherence
- Agarwal, Simoni, et al.
- 1988
(Show Context)
Citation Context ...algorithms includes the BBN Butter y [8], the IBM RP3 [37], Cedar [50], the BBN Monarch [42], the NYU Ultracomputer [17], and proposed large-scale multiprocessors with directory-based cache coherence =-=[3]-=-. Our tree-based barrier will also perform well on these machines. It induces less network load, and requires total space proportional to P , rather than P log P , but its critical path is longer by a... |
246 |
The performance of spin lock alternatives for shared-memory multiprocessors
- Anderson
- 1990
(Show Context)
Citation Context ...roducing delay on each processor between consecutive probes of the lock. The simplest approach employs a constant delay; more elaborate schemes use some sort of backo on unsuccessful probes. Anderson =-=[5]-=- reports the best performance with exponential backo ; our experiments con rm this result. Pseudo-code for a test and set lock with exponential backo appears in algorithm 1. Test and set su ces when u... |
241 | A fast mutual exclusion algorithm
- Lamport
- 1987
(Show Context)
Citation Context ...ded to be subtle, and costly in time 1sand space, requiring both a large number of shared variables and a large number of operations to coordinate concurrent invocations of synchronization primitives =-=[12, 25,26,36,40]-=-. Modern multiprocessors generally include more sophisticated atomic operations, permitting simpler and faster coordination strategies. Particularly common are various fetch and operations [22], which... |
239 |
Hot spot” contention and combining in multistage interconnection networks
- Pfister, Norton
- 1985
(Show Context)
Citation Context ... 14, 38, 49, 511. When many processors busy-wait on a single synchronization variable, they create a hot spot that is the target of a disproportionate share of the network traffic. Pfister and Norton =-=[38]-=- showed that the presence of hot spots can severely degrade performance for all traffic in multistage interconnection networks, not just traffic due to synchronizing processors. As part of a larger st... |
184 | An Optimal Algorithm For Mutual Exclusion in Computer Networks
- Ricart, Agrawala
- 1981
(Show Context)
Citation Context ...complexity of the resulting solutions is the principal motivation for the development of fetch and primitives. Other researchers have considered mutual exclusion in the context of distributed systems =-=[31, 39, 43, 45, 46]-=-, but the characteristics of message passing are di erent enough from shared memory operations that solutions do not transfer from one environment tothe other. Our pseudo-code notation is meant to be ... |
159 | Efficient Synchronization Primitives for Large-Scale CacheCoherent Multiprocessors
- Goodman, Vernon, et al.
- 1989
(Show Context)
Citation Context ...e bottlenecks that become markedly more pronounced in larger machines and applications. As a consequence, the overhead of busy-wait synchronization is widely regarded as a serious performance problem =-=[2, 6, 11, 14, 38, 49, 51]-=-. When many processors busy-wait on a single synchronization variable, they create a hot spot that is the target of a disproportionate share of the network tra c. P ster and Norton [38] showed that th... |
140 | A tree-based algorithm for distributed mutual exclusion
- Raymond
- 1989
(Show Context)
Citation Context ...complexity of the resulting solutions is the principal motivation for the development of fetch and primitives. Other researchers have considered mutual exclusion in the context of distributed systems =-=[31, 39, 43, 45, 46]-=-, but the characteristics of message passing are di erent enough from shared memory operations that solutions do not transfer from one environment tothe other. Our pseudo-code notation is meant to be ... |
139 |
Algorithms for Mutual Exclusion
- Raynal, Beeson
- 1986
(Show Context)
Citation Context ...ended to be subtle, and costly in time and space, requiring both a large number of shared variables and a large number of operations to coordinate concurrent invocations of synchronization primitives =-=[12, 25, 26, 36, 40]-=-. Modern multiprocessors generally include more sophisticated atomic operations, permitting simpler and faster coordination strategies. Particularly common are various fetch and \Phi operations [22], ... |
134 | The performance implications of thread management alternatives for shared-memory multiprocessors
- Anderson, Lazowska, et al.
- 1989
(Show Context)
Citation Context ...e bottlenecks that become markedly more pronounced in larger machines and applications. As a consequence, the overhead of busy-wait synchronization is widely regarded as a serious performance problem =-=[2, 6, 11, 14, 38, 49, 51]-=-. When many processors busy-wait on a single synchronization variable, they create a hot spot that is the target of a disproportionate share of the network tra c. P ster and Norton [38] showed that th... |
125 | The NYU Ultracomputer - Designing an MIMD Shared Memory Parallel Computer - Gottlieb, Grishman, et al. - 1983 |
119 |
Weak ordering—a new definition
- Adve, Hill
- 1990
(Show Context)
Citation Context ...P; otherwise false // parentpointer = &nodes[floor((i-1)/4)].childnotready[(i-1) mod 4], // or &dummy if i = 0 // childpointers[0] = &nodes[2*i+1].parentsense, or &dummy if 2*i+1 ?= P // childpointers=-=[1]-=- = &nodes[2*i+2].parentsense, or &dummy if 2*i+2 ?= P // initially childnotready = havechild and parentsense = false procedure treebarrier with nodes[vpid] do repeat until childnotready = --false, fal... |
114 |
Distributing hot-spot addressing in large-scale multiprocessors
- Yew, Tzeng, et al.
- 1987
(Show Context)
Citation Context ...e bottlenecks that become markedly more pronounced in larger machines and applications. As a consequence, the overhead of busy-wait synchronization is widely regarded as a serious performance problem =-=[2, 6, 11, 14, 38, 49, 51]-=-. When many processors busy-wait on a single synchronization variable, they create a hot spot that is the target of a disproportionate share of the network tra c. P ster and Norton [38] showed that th... |
111 | Highly parallel computing - Almasi, Gottlieb - 1989 |
106 | The wisconsin multicube: A new large-scale cache-coherent multiprocessor - Goodman, Woest - 1988 |
106 |
Synchronization algorithms for shared-memory multiprocessors
- Graunke, Thakkar
- 1990
(Show Context)
Citation Context ...ko , it is not possible to obtain a lock with an expected constant number of network transactions, due to the unpredictability of the length of critical sections. Anderson [5] and Graunke and Thakkar =-=[18]-=-have proposed locking algorithms that achieve the constant bound on cache-coherent multiprocessors that support atomic fetch and increment or fetch and store, respectively. The trick is for each proce... |
101 |
Synchronization with Eventcounts and Sequencers
- Reed, Kanodia
- 1979
(Show Context)
Citation Context ...equest counter and waiting until the result (its ticket) is equal to the value of the release counter. It releases the lock by incrementing the release counter. In the terminology of Reed and Kanodia =-=[41]-=-, a ticket lock corresponds to the busy-wait implementation of a semaphore using an eventcount and a sequencer. It can also be thought of as an optimization of Lamport's bakery lock [24], which was de... |
93 | Efficient synchronization on multiprocessors with shared memory
- Kruskal, Rudolph, et al.
- 1988
(Show Context)
Citation Context ...5,26,36,40]. Modern multiprocessors generally include more sophisticated atomic operations, permitting simpler and faster coordination strategies. Particularly common are various fetch and operations =-=[22]-=-, which atomically read, modify, and write a memory location. Fetch and operations include test and - set, fetch and store (swap), fetch and add, and compare and swap. 1 More recently, there have been... |
85 |
Dynamic Decentralized Cache Schemes for MIMD Parallel Processors
- Rudolph, Segall
- 1984
(Show Context)
Citation Context ...rhead, the test and set lock can be modi ed to use a test and set instruction only when a previous read indicates that the test and set might succeed. This so-called test-and{test and - set technique =-=[44]-=- ensures that waiting processors poll with read requests during the time that a lock is held. Once the lock becomes available, some fraction of the waiting processors detect that the lock is free and ... |
83 |
Two algorithms for barrier synchronization
- Hensgen, Finkel, et al.
- 1988
(Show Context)
Citation Context ...e shared state variables, and simultaneously eliminate one of the two spinning episodes, by \reversing the sense" of the variables (and leaving them with di erent values) between consecutive barriers =-=[19]-=-. 5 The resulting code is shown in algorithm 7. Arriving processors decrement count and then wait until sense has a di erent value than it did in the previous barrier. The last arriving processor rese... |
63 |
Adaptive backoff synchronization techniques
- Agarwal, Cherian
- 1989
(Show Context)
Citation Context ...e bottlenecks that become markedly more pronounced in larger machines and applications. As a consequence, the overhead of busy-wait synchronization is widely regarded as a serious performance problem =-=[2, 6, 11, 14, 38, 49, 51]-=-. When many processors busy-wait on a single synchronization variable, they create a hot spot that is the target of a disproportionate share of the network traffic. Pfister and Norton [38] showed that... |
62 |
An Economical Solution to the Cache Coherence Problem
- Archibald, Baer
- 1985
(Show Context)
Citation Context ...p ag should be the fastest algorithm on large-scale multiprocessors that use broadcast to maintain cache coherence (either in snoopy cache protocols [15] orin directory-based protocols with broadcast =-=[7]-=-). It requires only O(P ) updates to shared variables in order to tally arrivals, compared to O(P log P ) for the dissemination barrier. Its updates are simple writes, which are cheaper than the read-... |
59 | Broadcasts: A paradigm for distributed programs
- Schneider
- 1980
(Show Context)
Citation Context ...complexity of the resulting solutions is the principal motivation for the development of fetch and primitives. Other researchers have considered mutual exclusion in the context of distributed systems =-=[31, 39, 43, 45, 46]-=-, but the characteristics of message passing are di erent enough from shared memory operations that solutions do not transfer from one environment tothe other. Our pseudo-code notation is meant to be ... |
58 |
The E ect of Scheduling Discipline on Spin Overhead in Shared Memory Parallel Systems
- Zahorjan, Lazowska, et al.
- 1991
(Show Context)
Citation Context ...lier are not running. In this situation the test and set lock may be preferred to the FIFO alternatives. Additional mechanisms can ensure that a process is not preempted while actually holding a lock =-=[47, 52]-=-. All of the spin lock algorithms wehave considered require some sort of fetch and instructions. The test and set lock of course requires test and set. The ticket lock requires fetch and - increment. ... |
57 | Highly Parallel Computing. Benjamin/Cummings - Almasi, Gottlieb - 1994 |
49 |
The information structure of distributed mutual exclusion algorithms
- Sanders
- 1987
(Show Context)
Citation Context ...complexity of the resulting solutions is the principal motivation for the development of fetch and primitives. Other researchers have considered mutual exclusion in the context of distributed systems =-=[31, 39, 43, 45, 46]-=-, but the characteristics of message passing are di erent enough from shared memory operations that solutions do not transfer from one environment tothe other. Our pseudo-code notation is meant to be ... |
45 |
Medusa: An Experiment in Distributed Operating System Structure
- Ousterhout, Scelza, et al.
- 1980
(Show Context)
Citation Context ...e CPU resources if spinning processes may be preempted. (Busy-wait barriers may also waste cycles in the presence of preemption.) We can avoid this problem by co-scheduling processes that share locks =-=[34]-=-. Alternatively (for mutual exclusion), a test and set lock with exponential backo will allow latecomers to acquire the lock when processes that arrived earlier are not running. In this situation the ... |
42 |
The butterfly barrier
- BROOKS
- 1986
(Show Context)
Citation Context ... algorithm uses no atomic instructions other than read and write, and performs the minimum possible number of operations across the processor-memory interconnect. 3.3 The Dissemination Barrier Brooks =-=[9]-=- has proposed a symmetric \butter y barrier," in which processors participate as equals, performing the same operations at each step. Each processor in a butter y barrier participates in a sequence of... |
42 |
The NYU Ultracomputer---Designing an MIMD Shared Memory Parallel Computer
- Gottlieb, Grishman, et al.
- 1983
(Show Context)
Citation Context ... and store (swap), fetch and add, and compare and swap. 1 More recently, there have been proposals for multistage interconnection networks that combine concurrent accesses to the same memory location =-=[17, 37, 42]-=-, multistage networks that have special synchronization variables embedded in each stage of the network [21], and special-purpose cache hardware to maintain a queue of processors waiting for the same ... |
41 |
The Monarch Parallel Processor Hardware Design
- Rettberg, Crowther, et al.
- 1990
(Show Context)
Citation Context ... and store (swap), fetch and add, and compare and swap. 1 More recently, there have been proposals for multistage interconnection networks that combine concurrent accesses to the same memory location =-=[17, 37, 42]-=-, multistage networks that have special synchronization variables embedded in each stage of the network [21], and special-purpose cache hardware to maintain a queue of processors waiting for the same ... |
39 | Multi-model parallel programming in Psyche
- Scott, LeBlanc, et al.
- 1990
(Show Context)
Citation Context ...lier are not running. In this situation the test and set lock may be preferred to the FIFO alternatives. Additional mechanisms can ensure that a process is not preempted while actually holding a lock =-=[47, 52]-=-. All of the spin lock algorithms wehave considered require some sort of fetch and instructions. The test and set lock of course requires test and set. The ticket lock requires fetch and - increment. ... |
34 |
An approach to automating the verification of compact parallel coordination programs i
- Lubachevsky
- 1984
(Show Context)
Citation Context ...sor resets count and reverses sense. Consecutive barriers cannot interfere with each other because all operations on count occur before sense is toggled to release the waiting processors. Lubachevsky =-=[29]-=- presents a similar barrier algorithm that uses two shared counters and a processor private two-state flag. The private flag selects which counter to use; consecutive barriers use alternate counters. ... |
33 |
Processor Self Scheduling for Multiple-Nested Parallel Loops
- Tang, Yew
- 1986
(Show Context)
Citation Context ...well to large numbers of processors, even using adaptive backo strategies. Our experiments (see section 4.4) con rm this conclusion. 6 Commenting on Tang and Yew's barrier algorithm (algorithm 3.1 in =-=[48]-=-), Agarwal and Cherian [2] show that on a machine in which contention causes memory accesses to be aborted and retried, the expected number of memory accesses initiated by each processor to achieve a ... |
32 | Concurrent Programming: Principles and Practice. The Benjamin/Cummings Publishing Company - Andrews - 1991 |
29 | The Performance Implications of Spin-Waiting Alternatives for Shared-Memory Multiprocessors - Anderson - 1989 |
25 |
Synchronization with Multiprocessor Caches
- Lee, Ramachandran
- 1990
(Show Context)
Citation Context ...tistage networks that have special synchronization variables embedded in each stage of the network [21], and special-purpose cache hardware to maintain a queue of processors waiting for the same lock =-=[14, 28,35]-=-. The principal purpose of these hardware primitives is to reduce the impact of busy waiting. Before adopting them, it is worth considering the extent to which software techniques can achieve a simila... |
25 |
et al, "The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture
- Pfister
(Show Context)
Citation Context ... and store (swap), fetch and add, and compare and swap. 1 More recently, there have been proposals for multistage interconnection networks that combine concurrent accesses to the same memory location =-=[17, 37, 42]-=-, multistage networks that have special synchronization variables embedded in each stage of the network [21], and special-purpose cache hardware to maintain a queue of processors waiting for the same ... |
24 |
The mutual exclusion problem: part I—a theory of interprocess communication
- Lamport
- 1986
(Show Context)
Citation Context ...ded to be subtle, and costly in time 1sand space, requiring both a large number of shared variables and a large number of operations to coordinate concurrent invocations of synchronization primitives =-=[12, 25,26,36,40]-=-. Modern multiprocessors generally include more sophisticated atomic operations, permitting simpler and faster coordination strategies. Particularly common are various fetch and operations [22], which... |
23 |
A New Solution to Lamport's Concurrent Programming Problem using Small Shared Variables
- Peterson, Fischer
- 1983
(Show Context)
Citation Context ...ded to be subtle, and costly in time 1sand space, requiring both a large number of shared variables and a large number of operations to coordinate concurrent invocations of synchronization primitives =-=[12, 25,26,36,40]-=-. Modern multiprocessors generally include more sophisticated atomic operations, permitting simpler and faster coordination strategies. Particularly common are various fetch and operations [22], which... |
23 |
hot spot" contention and combining in multistage interconnection networks
- Pfister, Norton
- 1985
(Show Context)
Citation Context ...e bottlenecks that become markedly more pronounced in larger machines and applications. As a consequence, the overhead of busy-wait synchronization is widely regarded as a serious performance problem =-=[2, 6, 11, 14, 38, 49, 51]-=-. When many processors busy-wait on a single synchronization variable, they create a hot spot that is the target of a disproportionate share of the network tra c. P ster and Norton [38] showed that th... |
21 |
Synchronization Barrier and Related Tools for Shared Memory Parallel Programming
- Lubachevsky
- 1989
(Show Context)
Citation Context ...nodes array would be scattered statically across the memory banks of the machine, or replaced by a scattered set of variables. 3.4 Tournament Barriers Hensgen, Finkel, and Manber [19] and Lubachevsky =-=[30]-=- have also devised tree-style \tournament" barriers. The processors involved in a tournament barrier begin at the leaves of a binary tree, much as they would in a combining tree of fan-in two. One pro... |
18 |
The onset of hot spot contention
- Kumar, Pfister
- 1986
(Show Context)
Citation Context ...t of an idle machine) on the Butter y caused by 60 processor barriers using local and network polling strategies. 35 4 4 4 4 4 4 4 4 4sindependent of the network topology. A study by Kumar and P ster =-=[23]-=- shows the onset of hot-spot contention to be rapid. P ster and Norton argue for hardware message combining in interconnection networks to reduce the impact of hot spots. They base their argument prim... |
18 |
Butterfly parallel processor overview
- Laboratories
- 1986
(Show Context)
Citation Context ...ut using the interconnection network. On a machine with coherent caches, processors spin only on locations in their caches. On a machine in which shared memory is distributed (e.g., the BBN Butterfly =-=[8]-=-, the IBM RP3 [37], or a shared-memory hypercube [10]), processors spin only on locations in the local portion of shared memory. The implication of our work is that efficient synchronization algorithm... |
16 |
Architecture of the Cedar parallel supercomputer
- Yew
- 1986
(Show Context)
Citation Context ...by a constant factor), and is therefore faster. The class of machines for which the dissemination barrier should outperform all other algorithms includes the BBN Butter y [8], the IBM RP3 [37], Cedar =-=[50]-=-, the BBN Monarch [42], the NYU Ultracomputer [17], and proposed large-scale multiprocessors with directory-based cache coherence [3]. Our tree-based barrier will also perform well on these machines. ... |
16 |
E cientSynchronization Primitives for Large-scale Cache-Coherent Multiprocessors
- Goodman, Vernon, et al.
- 1989
(Show Context)
Citation Context ...e bottlenecks that become markedly more pronounced in larger machines and applications. As a consequence, the overhead of busy-wait synchronization is widely regarded as a serious performance problem =-=[2, 6, 11, 14, 38, 49, 51]-=-. When many processors busy-wait on a single synchronization variable, they create a hot spot that is the target of a disproportionate share of the network tra c. P ster and Norton [38] showed that th... |
10 |
Characterizing the synchronization behavior of parallel programs, in: PPEALS 88
- Davis, Hennessy
- 1988
(Show Context)
Citation Context ...e bottlenecks that become markedly more pronounced in larger machines and applications. As a consequence, the overhead of busy-wait synchronization is widely regarded as a serious performance problem =-=[2, 6, 11, 14, 38, 49, 51]-=-. When many processors busy-wait on a single synchronization variable, they create a hot spot that is the target of a disproportionate share of the network tra c. P ster and Norton [38] showed that th... |
9 |
Distributed synchronizers
- Jayasimha
- 1988
(Show Context)
Citation Context ...terconnection networks that combine concurrent accesses to the same memory location [17, 37, 42], multistage networks that have special synchronization variables embedded in each stage of the network =-=[21]-=-, and special-purpose cache hardware to maintain a queue of processors waiting for the same lock [14, 28,35]. The principal purpose of these hardware primitives is to reduce the impact of busy waiting... |
9 |
Concurrent Queues: Practical Fetch-and-Phi Algorithms
- Mellor-Crummey
- 1987
(Show Context)
Citation Context ...e instructions have uses other than the construction of busy-wait locks. Fetch and store and compare and swap, for example, are essential for manipulating pointers to build concurrent data structures =-=[20, 32]-=-. Because of their general utility, fetch and \Phi instructions are substantially more attractive than special-purpose synchronization primitives. Future designs for shared memory machines should incl... |
8 |
Adaptive Backo Synchronization Techniques
- Agarwal, Cherian
- 1989
(Show Context)
Citation Context ...e bottlenecks that become markedly more pronounced in larger machines and applications. As a consequence, the overhead of busy-wait synchronization is widely regarded as a serious performance problem =-=[2, 6, 11, 14, 38, 49, 51]-=-. When many processors busy-wait on a single synchronization variable, they create a hot spot that is the target of a disproportionate share of the network tra c. P ster and Norton [38] showed that th... |
6 |
The Shared Memory Hypercube
- Brooks
- 1988
(Show Context)
Citation Context ...with coherent caches, processors spin only on locations in their caches. On a machine in which shared memory is distributed (e.g., the BBN Butter y [8], the IBM RP3 [37], or a shared-memory hypercube =-=[10]-=-), processors spin only on locations in the local portion of shared memory. The implication of our work is that e cient synchronization algorithms can be constructed in software for shared-memory mult... |
6 |
The NYU Ultracomputer-Designing an MIMD Shared-Memory Parallel Computer
- Rudolph, Snir
- 1983
(Show Context)
Citation Context ...and_store (swap) , fetch-and-add, and compare _and_swap.1 More recently, there have been proposals for multistage interconnection networks that combine concurrent accesses to the same memory location =-=[17, 37, 42]-=-, multistage networks that have special synchronization variables embedded in each stage of the network [211, and special-purpose cache hardware to maintain a queue of processors waiting for the same ... |
5 |
Barrier Synchronization over Multistage Interconnection Networks
- Lee
- 1990
(Show Context)
Citation Context ...kel, and Manber's algorithm to use sense reversal to avoid re-initializing ag variables in each round. These same modi cations have been discovered independently by Craig Lee of Aerospace Corporation =-=[27]-=-. Hensgen, Finkel, and Manber provide performance gures for the Sequent Balance (a bus-based, cache-coherent multiprocessor), comparing their tournament algorithm against the dissemination barrier, as... |
4 |
Weak Ordering|A New De nition
- Adve, Hill
- 1990
(Show Context)
Citation Context ...P; otherwise false // parentpointer = &nodes[floor((i-1)/4)].childnotready[(i-1) mod 4], // or &dummy if i = 0 // childpointers[0] = &nodes[2*i+1].parentsense, or &dummy if 2*i+1 >= P // childpointers=-=[1]-=- = &nodes[2*i+2].parentsense, or &dummy if 2*i+2 >= P // initially childnotready = havechild and parentsense = false procedure tree_barrier with nodes[vpid] do repeat until childnotready = {false, fal... |
4 |
ster et al. The IBM Research Parallel Processor Prototype (RP3
- P
- 1985
(Show Context)
Citation Context ... and store (swap), fetch and add, and compare and swap. 1 More recently, there have been proposals for multistage interconnection networks that combine concurrent accesses to the same memory location =-=[17, 37, 42]-=-, multistage networks that have special synchronization variables embedded in each stage of the network [21], and special-purpose cache hardware to maintain a queue of processors waiting for the same ... |
4 |
Algorithms for distributing hot-spot addressing
- Tang, Yew
- 1987
(Show Context)
Citation Context ...e bottlenecks that become markedly more pronounced in larger machines and applications. As a consequence, the overhead of busy-wait synchronization is widely regarded as a serious performance problem =-=[2, 6, 11, 14, 38, 49, 51]-=-. When many processors busy-wait on a single synchronization variable, they create a hot spot that is the target of a disproportionate share of the network tra c. P ster and Norton [38] showed that th... |
4 | Concurrent queues: Practical fetch-and- algorithms - Mellor-Crummey - 1987 |
3 |
y parallel processor overview
- Butter
- 1986
(Show Context)
Citation Context ...out using the interconnection network. On a machine with coherent caches, processors spin only on locations in their caches. On a machine in which shared memory is distributed (e.g., the BBN Butter y =-=[8]-=-, the IBM RP3 [37], or a shared-memory hypercube [10]), processors spin only on locations in the local portion of shared memory. The implication of our work is that e cient synchronization algorithms ... |
3 |
The butterfly barrier Int
- Brooks
- 1986
(Show Context)
Citation Context ... algorithm uses no atomic instructions other than read and write, and performs the minimum possible number of oprations across the processor-memory interconnect. 3.3. The Dissemination Barrier Brooks =-=[9]-=- has proposed a symmetric “butterfly barrier,” in which processors participate as equals, performing the same operations at each step. Each processor in a butterfly barrier participates in a sequence ... |
3 |
The IBM research parallel processor prototype (RP3): Introduction and architecture
- Kleinfelder, Melton, et al.
- 1985
(Show Context)
Citation Context ...and_store (swap) , fetch-and-add, and compare _and_swap.1 More recently, there have been proposals for multistage interconnection networks that combine concurrent accesses to the same memory location =-=[17, 37, 42]-=-, multistage networks that have special synchronization variables embedded in each stage of the network [211, and special-purpose cache hardware to maintain a queue of processors waiting for the same ... |
2 |
Scalability, Combining and the NYU Ultracomputer. Ohio State University Parallel Computing Workshop
- Gottlieb
- 1990
(Show Context)
Citation Context ...e contention due to synchronization|is invalid. 14 P ster and Norton estimate that message combining will increase the size and possibly the cost of an interconnection network 6- to 32-fold. Gottlieb =-=[16]-=- indicates that combining networks are di cult to bit-slice. 36s5 Summary of Recommendations We have presented a detailed comparison of new and existing algorithms for busy-wait synchronization on sha... |
2 |
Algorithms for Mutual Exclusion. MIT Press Series in Scienti c Computation
- Raynal
- 1986
(Show Context)
Citation Context ...ded to be subtle, and costly in time 1sand space, requiring both a large number of shared variables and a large number of operations to coordinate concurrent invocations of synchronization primitives =-=[12, 25,26,36,40]-=-. Modern multiprocessors generally include more sophisticated atomic operations, permitting simpler and faster coordination strategies. Particularly common are various fetch and operations [22], which... |
2 | Combining and the NYU Ultracomputer - Scalability - 1990 |
2 | An approach to automating the veri of compact parallel coordination programs - Lubachevsky - 1984 |
2 |
Synchronization with EventCounts and Sequencers,” Cmm. ACM,Vol
- Reed, Kanodia
- 1979
(Show Context)
Citation Context ...equest counter and waiting until the result (its ticket) is equal to the value of the release counter. It releases the lock by incrementing the release counter. In the terminology of Reed and Kanodia =-=[41]-=-, a ticket lock corresponds to the busy-wait implementation of a semaphore using an eventcount and a sequencer. It can also be thought of as an optimization of a Lamport’s bakery lock [241, which was ... |
2 | A New Solution of Dijkstra's Concurrent Programming Problem - Lampoft - 1974 |
1 |
An approach to automating the veri cation of compact parallel coordination programs
- Lubachevsky
- 1984
(Show Context)
Citation Context ...sor resets count and reverses sense. Consecutive barriers cannot interfere with each other because all operations on count occur before sense is toggled to release the waiting processors. Lubachevsky =-=[29]-=- presents a similar barrier algorithm that uses two shared counters and a processor private two-state ag. The private ag selects which counter to use; consecutive barriers use alternate counters. Anot... |
1 | P et al. The IBM research parallel processor prototype (RP3): Introduction and architecture - unknown authors - 1985 |
1 | The performance lrnphcatlons of thread management alternatives for shared-memory multiprocessors - ANDERSON, LAZOWSKA, et al. - 1989 |
1 | III The shared memory hypercube Parallel Cornput. 6 - BROOKS - 1988 |
1 |
The Wisconsin Multlcube: A new large-scale cache coherent multiprocessor
- J
- 1988
(Show Context)
Citation Context ...o be a problem. Our tree-based barrier with wakeup flag should be the fastest algorithm on large-scale multiprocessors that use broadcast to maintain cache coherence (either in snoopy cache protocols =-=[15]-=- or in directory-based protocols with broadcast [71). It requires only 0(P) updates to shared variables in order to tally arrivals compared to 0( P log P) for the dissemination barrier. Its updates ar... |
1 | Synchromzation algorithms for shared-memory multiprocessors - GRAUNE, THAKAR - 1990 |
1 |
A methodology for implementing highly concurrent data structures
- HERLIH
- 1990
(Show Context)
Citation Context ...instructions have uses other than the construction of busy-wait locks. Fetch_ and_store and compare_ and_swap, for example, are essential for manipulating pointers to build concurrent data structures =-=[20, 32]-=-. Because of their general utility, f et ch_and_@ instructions are substantially more attractive than special-purpose synchronization primitives. Future designs for shared memory machines should inclu... |
1 | A fast mutual exclusion algorithm - LANIPORT - 1987 |
1 |
Barrier synchromzatlon over multistage interconnection networks
- LEE
- 1990
(Show Context)
Citation Context ...l, and Manber’s algorithm to use sense reversal to avoid reinitializing flag variables in each round. These same modifications have been discovered independently by Craig Lee of Aerospace Corporation =-=[27]-=-. Hensgen, Finkel, and Manber provide performance figures for the Sequent Balance (a bus-based, cache-coherent multiprocessor), comparing their tournament algorithm against the dissemination barrier a... |
1 |
A ~ algorithm for mutual exclusion in decentralized systems
- MAEAWA
- 1985
(Show Context)
Citation Context ...plexity of the resulting solutions is the principal motivation for the development of fetch_ and_@ primitives. Other researchers have considered mutual exclusion in the context of distributed systems =-=[31, 39, 43, 45, 46]-=- but the ‘Hardware combmmg can reduce the time to achieve a barrier from ()(log P) to O(1) steps If processors happen to arrive at the barrier simultaneously ACM Transactions on Computer Systems, Vol ... |
1 | Algorithms for Mutual Excluszon - RAYNAL - 1986 |
1 |
AND AGRAWALA, A K. An optimal algorithm for mutual exclusion in computer networks
- RIGART
- 1981
(Show Context)
Citation Context ...plexity of the resulting solutions is the principal motivation for the development of fetch_ and_@ primitives. Other researchers have considered mutual exclusion in the context of distributed systems =-=[31, 39, 43, 45, 46]-=- but the ‘Hardware combmmg can reduce the time to achieve a barrier from ()(log P) to O(1) steps If processors happen to arrive at the barrier simultaneously ACM Transactions on Computer Systems, Vol ... |
1 |
Synchronization in distributed programs, ACM Trans. Program Lang. Syst
- SCHNEIDER
- 1982
(Show Context)
Citation Context ...plexity of the resulting solutions is the principal motivation for the development of fetch_ and_@ primitives. Other researchers have considered mutual exclusion in the context of distributed systems =-=[31, 39, 43, 45, 46]-=- but the ‘Hardware combmmg can reduce the time to achieve a barrier from ()(log P) to O(1) steps If processors happen to arrive at the barrier simultaneously ACM Transactions on Computer Systems, Vol ... |
1 | The shared memory hypercube. Parallel Computing - Brooks - 1988 |
1 | The mutual exclusion problem: Part I--A theory of interprocess communication; Part II--Statement and solutions - Lampoft - 1986 |
1 | A tree-based Mgorithm for distributed mutuM exclusion - Raymond - 1989 |
1 | Translated from the French by - Beeson - 1986 |
1 | Architecture of the Cedar parMlel supercomputer - Yew - 1986 |