18 citations found. Retrieving documents...
B.H. Lim and A. Agarwal. Waiting Algorithms for Synchronization in Large-Scale Multlprocessors. In ACM Transactions on Computer Systems, 11(3):253- 294, August 1993.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Combining Funnels: A new twist on an old tale... - Shavit, Zemach (1998)   (Correct)

....a highly decentralized manner through convergence of the independent choices made by individual processors. Adaptive algorithms, allowing the data structure to change behavior to accommodate different access frequencies, have been used both in locking (see Karlin et al. 14] and Lim and Agarwal [17]) and for more general fetch and operations [16] The work of Lim and Agarwal [16] showed the performance benefit of dynamically switching between lock ing an object and using (static) combining trees, based on whether the overhead of the latter justifies the added potential for parallelism. ....

....by certain parameters which must be chosen and tuned for each application. The number of layers, the width of each layer and the delay at each level can all be optimized based on the expected load on the object and the specifics of the machine being used. As noticed by previous researchers [14, 17, 16] using an adaptive data structure we can provide a solution that dynamically adjusts its parameters based on actual conditions encountered. The number of collisions a processor is involved in at each access to the object can serve as an indicator of the load. Few collisions serve as evidence to ....

B.H. Lim and A. Agarwal. Waiting Algorithms for Synchronization in Large-Scale Multlprocessors. In ACM Transactions on Computer Systems, 11(3):253- 294, August 1993.


An Analysis of Software Interface Issues for SMT Processors - Redstone (2002)   (1 citation)  (Correct)

....testing if the event has occurred. On both a superscalar and multiprocessors, spinning can waste processor resources due to the opportunity cost of not context switching to another thread. Researchers have measured and developed techniques addressing the costs of spinning on these processors [37, 4, 40, 42, 5, 50, 10, 36, 90]; we do not investigate them here. Spinning can exact a larger performance cost on SMT, because all threads share pipeline resources. We term pipeline resources as all resources shared between contexts that are necessary to execute instructions that is, all shared resources except the caches ....

....while a processor holds the lock; only one occurs when a processor writes the flag of another processor. In addition to increased bus contention, spinning also wastes resources due to the opportunity cost of not context switching to a thread that can perform useful work. A few papers such as [40, 42, 90, 89] investigate how to best choose when to context switch to another thread and how to schedule threads to reduce spinning cost. These techniques can work synergistically with the techniques to remove spinning evaluated in this chapter. Several studies examine inter thread interactions on ....

LIM, B.-H., AND AGARWAL, A. Waiting algorithms for synchronization in large-scale multiprocessors. ACM Transactions on Computer Systems 11, 3 (August 1993).


Barrier Synchronization on a Loaded SMP using Two-Phase.. - Tsafrir, Feitelson (2002)   (Correct)

....can be found. Speci cally, for locks (exponential distribution) spinning for ln(e 1) 0:54 of CS leads to a competitive factor of e e 1 , and for barriers (uniform distribution) spinning for 1 2 p 5 1 0:62 that overhead results in a competitive ratio of 1 2 p 5 1 1:62 [8]. However, when the number of processes exceeds the number of processors, spinning was concluded not to be useful for barriers, and immediate blocking was preferred. An important factor that is lacking in previous work is taking a global view of the system when it is overloaded. For example, the ....

....and executing on larger systems. The x axis shows the number of threads relative to the machine size, rather than absolute numbers. the common assumption that the occurrence of synchronization events obeys some time invariant canonical probability distribution (e.g. the Poisson arrivals of [8]) The importance of alternate synchronization is evident when considering the effect of granularity on spin success. All the positive results regarding spinning are for negrain, or sometimes medium grain jobs; we would like coarse grain jobs not to spin. But in general the granularity of a job ....

B.-H. Lim and A. Agarwel. Waiting algorithms for synchronization in large-scale multiprocessors. ACM Trans. Computer Systems, 11(3):253294, August 1993.


Barrier Synchronization on a Loaded SMP using Two-Phase.. - Tsafrir, Feitelson (2001)   (Correct)

.... cally, for locks (exponential distribution) spinning for ln(e 1) 0:54 the context switch overhead leads to a competitive factor of e e 1 , and for barriers (uniform distribution) spinning for 1 2 p 5 1 0:62 that overhead results in a competitive ratio of 1 2 p 5 1 1:62 [9]. However, when the number of processes exceeds the number of processors, spinning was concluded not to be useful for barriers, and immediate blocking was preferred. 1 Interactions of the synchronization algorithm with the operating system kernel have also been proposed. For example, this can be ....

....are released again into the ready queue. Almost all our ndings are related to and can be explained based on this phenomenon. This refutes the common assumption that the occurrence of synchronization events obeys some time invariant canonical probability distribution (e.g. the Poisson arrivals of [9]) The importance of alternate synchronization is evident when considering the e ect of granularity on spin success. All the positive results regarding spinning are for ne grain, or maybe medium grain jobs; we would like coarse grain jobs not to spin. But in general the granularity of a job is ....

B-H. Lim and A. Agarwel, \Waiting algorithms for synchronization in large-scale multiprocessors ". ACM Trans. Computer Systems 11(3), pp. 253-294, August 1993.


Informing Algorithms for Efficient Scheduling of.. - Antonopoulos..   (Correct)

....conclusion that adaptive competitive spinning algorithms generally outperform static competitive spinning ones. The latter are considered, in their turn, better than actively spinning and immediate blocking algorithms. The evaluation has been carried out on the Firefly Multiprocessor. However, in [7] B. Lim and A. Agarwal reach a different conclusion. Their experiments on the Alewife multiprocessor show that immediately blocking always sustains performance close to the best among all strategies compared. The contradiction of results is attributed to the fact that the context switch on Alewife ....

....higher user level priority. Competitive spinning strategies have not been evaluated. The context switch overhead for the system we have experimented on ranges from 3 to 22 sec, so we expect the performance differentiation among competitive spinning and immediate blocking algorithms to be marginal [7]. It must be noted that informing algorithms preserve in contradiction with the scheduler conscious algorithms of L. Kontothanassis et al. 6] the main characteristics of their simple counterparts (time complexity, memory and network overhead, FCFS service of requests etc. Their memory ....

B.-H. Lim and A. Agarwal. Waiting algorithms for synchronization in large-scale multiprocessors. ACM Transactions on Computer Systems, 11(3):253--294, August 1993.


Combining Funnels: A Dynamic Approach To Software Combining - Shavit, Zemach (2000)   (1 citation)  (Correct)

....the central object. The combining layer structure provides the basis for an adaptive combining structure. Adaptive algorithms, allowing the data structure to change behavior to accommodate different access frequencies, have been used both in locking (see Karlin et al. 17] and Lim and Agarwal [20]) and for more general fetch and Phi operations [19] The work of Lim and Agarwal [19] showed the performance benefit of dynamically switching between locking an object and using (static) combining trees, based on whether the overhead of the latter justifies the added potential for parallelism. ....

....becomes clear that tuning the structure, that is, optimizing its parameters for each application and load, is not a feasible solution. The solution is to use an adaptive strategy for automatically tuning the parameters to the current load on the data structure. As noticed by previous researchers [17, 19, 20], using an adaptive data structure one can provide a solution that dynamically adjusts its parameters based on actual conditions encountered and can be approximately as good as the best existing method for each specific access pattern. Combining funnels allow the user to devise a general adaptive ....

B.H. Lim and A. Agarwal. Waiting Algorithms for Synchronization in Large-Scale Multiprocessors. In ACM Transactions on Computer Systems, 11(3):253--294, August 1993.


Job Scheduling in Multiprogrammed Parallel Systems - Feitelson (1997)   (16 citations)  (Correct)

.... perform a context switch [438] This policy is known to be competitive, meaning that it is guaranteed to limit the overhead to twice the overhead that would be experienced by an optimal off line policy [301] The competitive factor can be reduced if the distribution of waiting times is known [369]. Some systems go even farther and add interfaces to allow synchronization considerations to affect the scheduling decisions. The main idea is to enhance the service given to threads that have other threads waiting for them. Three mechanisms have been proposed in the literature: handoff ....

B-H. Lim and A. Agarwal, "Waiting algorithms for synchronization in large scale multiprocessors ". ACM Trans. Comput. Syst. 11(3), pp. 253--294, Aug 1993.


Scalable Concurrent Priority Queue Algorithms - Shavit, Zemach (1999)   (Correct)

....Though other structures like diffracting trees [32] and counting networks [3] provide efficient implementations of fetch and increment, their operations cannot be readily transformed into the new bounded fetch and increment required for our priority queues. As the research of Lim and Agarwal [22, 23], Della Libera and Shavit [12] and Karlin et al. 19] has shown, the key to delivering good performance over a wide range of concurrency levels, is the ability of a data structure to adapt to the load actually encountered. The adaption techniques of Lim and Agarwal [22] use a centralized form of ....

B.H. Lim and A. Agarwal. Waiting Algorithms for Synchronization in Large-Scale Multiprocessors. In ACM Transactions on Computer Systems, 11(3):253--294, August 1993.


Combining Funnels: A new twist on an old tale. . . - Shavit, Zemach (1998)   (Correct)

....a highly decentralized manner through convergence of the independent choices made by individual processors. Adaptive algorithms, allowing the data structure to change behavior to accommodate different access frequencies, have been used both in locking (see Karlin et al. 14] and Lim and Agarwal [17]) and for more general fetch and Phi operations [16] The work of Lim and Agarwal [16] showed the performance benefit of dynamically switching between locking an object and using (static) combining trees, based on whether the overhead of the latter justifies the added potential for parallelism. ....

.... this implementation works extremely poorly for frequently accessed counters [3, 11, 19] Other implementations have the opposite behavior, they perform well when accessed frequently but their overhead is prohibitively high for rarely accessed objects [3, 19] As noticed by previous researchers [14, 16, 17] using an adaptive data structure we can provide a solution that dynamically adjusts its parameters based on actual conditions encountered and can be approximately as good as the best existing method for each specific access pattern. Our adaptive strategy allows us to optimize for only a few ....

B.H. Lim and A. Agarwal. Waiting Algorithms for Synchronization in Large-Scale Multiprocessors. In ACM Transactions on Computer Systems, 11(3):253--294, August 1993.


Combining Funnels: A new twist on an old tale. . . - Shavit, Zemach (1998)   (Correct)

....in a highly decentralized manner through convergence of the independent choices made by individual processors. Adaptive algorithms, allowing the data structure to change behavior to accommodate different access frequencies, have been used both in locking (see Karlin et al. 14] and Lim and Agarwal [17]) and for more general fetch and Phi operations [16] The work of Lim and Agarwal [16] showed the performance benefit of dynamically switching between locking an object and using (static) combining trees, based on whether the overhead of the latter justifies the added potential for parallelism. ....

.... this implementation works extremely poorly for frequently accessed counters [3, 11, 19] Other implementations have the opposite behavior, they perform well when accessed frequently but their overhead is prohibitively high for rarely accessed objects [3, 19] As noticed by previous researchers [14, 16, 17] using an adaptive data structure we can provide a solution that dynamically adjusts its parameters based on actual conditions encountered and can be approximately as good as the best existing method for each specific access pattern. Our adaptive strategy allows us to optimize for only a few ....

B.H. Lim and A. Agarwal. Waiting Algorithms for Synchronization in Large-Scale Multiprocessors. In ACM Transactions on Computer Systems, 11(3):253--294, August 1993.


Scalable Concurrent Priority Queue Algorithms - Nir Shavit (1999)   (Correct)

....more efficient implementations of fetch and increment and decrement. However, as far as we can see their operations cannot be readily transformed to provide the bounded fetch and decrement operation which is at the core of our priority queue implementation. As the research of Lim and Agarwal [22, 23], Della Libera and Shavit [12] and Karlin et al. 19] has shown, the key to delivering good performance over a wide range of concurrency levels, is the ability of a data structure to adapt to the load actually encountered. The adaption techniques of Lim and Agarwal [22] use a centralized form of ....

B.H. Lim and A. Agarwal. Waiting Algorithms for Synchronization in Large-Scale Multiprocessors. In ACM Transactions on Computer Systems, 11(3):253--294, August 1993.


A Quantitative Architectural Evaluation of.. - Nikolopoulos..   (2 citations)  (Correct)

....preemptions and the inconsistencies between synchronization algorithms and the operating system scheduling strategy. There are four general solution frameworks: 1) Each process makes an independent decision between busy waiting and blocking to relinquish the processor and avoid underutilization [9, 15]. This solution is applicable to locks and barriers; 2) a non blocking synchronization algorithm is used (Section 3.2) 3) the kernel guarantees that a process is not preempted while executing in a critical section [17] 4) the lock holder polls the status of its peer processes and avoids ....

B. Lim and A. Agarwal, Waiting Algorithms for Synchronization in Large-Scale Multiprocessors, ACM Trans. on Computer Systems, 11(3), pp. 253--294, 1993.


Time/Contention Trade-offs for Multiprocessor Synchronization - Anderson, Yang (1996)   (Correct)

....towards programs that busy wait, they also have implications regarding mutual exclusion mechanisms that are based on blocking. In particular, while blocking can be used to synchronize multiple processes on a single processor, busy waiting is still fundamental for synchronization across processors [13]. Our bounds imply that tradeoffs exist between contention and time complexity and between atomicity and time complexity in any multiprocessor setting, even if blocking is used for synchronization within a processor. For wait free algorithms, Herlihy has characterized synchronization primitives by ....

B.-H. Lim and A. Agarwal, "Waiting Algorithms for Synchronization in Large-Scale Multiprocessors", ACM Transactions on Computer Systems, Vol. 11, No. 3, August, 1993, pp. 253-294.


A Quantitative Architectural Evaluation of.. - Nikolopoulos..   (2 citations)  (Correct)

....preemptions and the inconsistencies between synchronization algorithms and the operating system scheduling strategy. There are four general solution frameworks: 1) Each process makes an independent decision between busy waiting and blocking to relinquish the processor and avoid underutilization [7, 12]. This solution is applicable to locks and barriers; 2) a non blocking synchronization algorithm is used (Section 3.2) 3) the kernel guarantees that a process is not preempted while executing in a critical section [14] 4) the lock holder polls the status of its peer processes and avoids ....

B. Lim and A. Agarwal, Waiting Algorithms for Synchronization in Large-Scale Multiprocessors, ACM Trans. on Computer Systems, 11(3), pp. 253--294, 1993.


A Cooperative Approach to Two-Phase Waiting - Arpaci-Dusseau, Culler   (Correct)

....analysis is an attractive technique for bounding the worst case performance because it requires no knowledge of the input distribution. Researchers have examined competitive algorithms for waiting algorithms in a number of contexts: processes competing for shared locks in a multiprocessor [17, 20, 25], processes synchronizing with barriers in a space shared environment [22] and processes communicating in a distributed time shared setting [15] In each of these cases, competitive analysis is not appropriate because the input stream of waiting times is part of the system and not adversarial: ....

B.-H. Lim and A. Agarwal. Waiting Algorithms for Synchronization in Large-Scale Multiprocessors. ACM Transactions on Computer Systems, 11(3):253--294, Aug. 1993.


Scheduler-Conscious Synchronization - Kontothanassis, Wisniewski, Scott (1994)   (19 citations)  (Correct)

.... to avoid preempting a process that holds a test and set lock [9, 29] or to recover from this preemption if it occurs [3, 6] Others have developed heuristics that allow a process to guess whether it would be better to relinquish the processor, rather than spin, while waiting for a lock or barrier [14, 22, 33]. Our work builds on these previous efforts in three specific ways: 1) We demonstrate that interactions between scheduling and synchronization are a much more serious problem for scalable synchronization algorithms than they are for smallscale, centralized algorithms. Moreover, existing proposals ....

....it made the wrong decision, it biases its decision in favor of the other alternative the next time around. Whether this sort of adaptation works better than a simple static choice appears to depend on the relative costs of bookkeeping and context switching; Karlin et al. 14] and Lim and Agarwal [22] reach different conclusions. Interaction with peer processes can provide better information about the peers status, provided that they respond promptly to enquiries (when running) In section 4.1 we use a handshaking technique in two of our mutual exclusion algorithms. To hand a peer a lock, a ....

[Article contains additional citation context not shown here]

LIM, B.-H. and AGARWAL, A. Waiting Algorithms for Synchronization in Large-Scale Multiprocessors. ACM Transactions on Computer Systems, 11(3):253--294, August 1993.


Characterizing the Performance of Algorithms for Lock-free Objects - Johnson   (Correct)

....synchronization techniques against test and set spin locks, and the Mellor Crummey and Scott lock. Graunke and Thakkar [18] present performance measurements of test and set and ticket based locks. Other authors have examined particular aspects of synchronization performance. Lim and Agrawal [36] examine the performance tradeoffs between spinning and blocking. They present analytical models to derive the best point for a blocked process to switch from spinning to blocking. Glenn, Pryor, Conroy, and Johnson [16] present analytical models which show that a thrashing phenomenon can occur due ....

B.H. Lim and A. Agrawal. Waiting algorithms for synchronization in large-scale multiprocessors. ACM Trans. on Computer Systems, 11(3):253--294, 1993.


Register Relocation: Flexible Contexts for Multithreading - Waldspurger, Weihl (1993)   (31 citations)  (Correct)

....and the average synchronization fault latency (L) is exponentially distributed. Thus, there is a fixed probability of a synchronization fault on each execution cycle, and wait times for synchronization are exponentially distributed, which is reasonable for producer consumer synchronization [14]. The context switch cost is set to S = 8 cycles, which is 2 cycles more than the cost used in Section 3.2. This allows for simple bookkeeping and test operations (e.g. an add and conditional branch) which can be used to implement a thread unloading policy. The thread unloading policy used in ....

....is 2 cycles more than the cost used in Section 3.2. This allows for simple bookkeeping and test operations (e.g. an add and conditional branch) which can be used to implement a thread unloading policy. The thread unloading policy used in these experiments is a competitive, two phase algorithm [14]. A context is unloaded when the cost of repeated, unsuccessful attempts to continue execution equals the cost of unloading and blocking the context. Note that the cost assessed for loading and unloading a context is based on C, the number of registers required by the context (see Section 2.5) ....

B. H. Lim and A. Agarwal. "Waiting Algorithms for Synchronization in Large-Scale Multiprocessors", MIT VLSI Memo #91-632, July 1991.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC