| G. Pfister and V. A. Norton. "hot spot" contention and combining in multistage interconnection networks. IEEE Trans. on Computers, 34(10), Oct 1985. |
....processor unblocking its siblings as soon as it awakes. Simulations by Yew, Tzeng, and Lawrie [38] show that a software combining tree can signi cantly decrease memory contention and prevent tree saturation (a form of network congestion that delays the response of the network to all references [31]) in multistage interconnection networks, by distributing accesses across the memory modules of the machine. A New Tree Based Barrier Algorithm The principal shortcoming of the combining tree barrier, from our point of view, is that is requires processors to spin on memory locations that cannot ....
....network latency at rest with latency during a barrier. We nd that network latency is virtually una ected when processors are able to spin on shared locations without going through the interconnect. With only remote access to shared memory, latency more than doubles. Studies by P ster and Norton [31] show that hot spot contention can lead to tree saturation in multistage interconnection networks with blocking nodes and distributed routing control, inde33 10 20 30 40 50 60 70 80 0 250 500 750 1000 1250 1500 1750 2000 2250 2500 Processors Time ( sec) 4 4 4 4 4 4 4 4 4 4 44 4 44 4 ....
G. F. P ster and V. A. Norton. \Hot spot" contention and combining in multistage interconnection networks. IEEE Transactions on Computers, C-34(10):943-948, Oct. 1985.
....contention, introducing performance bottlenecks that become markedly more pronounced in larger machines and applications. When many processors busy wait on a single synchronization variable, they create a hot spot that gets a disproportionate share of the network traffic. Pfister and Norton [22] showed that the presence of hot spots can severely degrade performance for all traffic in multistage interconnection networks, not just traffic due to synchronizing processors. Agarwal and Cherian [2] found that references to synchronization variables on cachecoherent multiprocessors cause cache ....
....than non synchronization references. They also observed that synchronization accounted for as much as 49 of network traffic in simulations of a 64 processor dance hall machine, in which each access to a shared variable traverses the processor memory interconnection network. Pfister and Norton [22] argue for message combining in multistage interconnection networks. They base their argument primarily on anticipated hot spot contention for locks, noting that they know of no quantitative evidence to support or deny the value of combining for general memory traffic. Hardware combining appears ....
[Article contains additional citation context not shown here]
G. F. Pfister and V. A. Norton. "Hot spot" contention and combining in multistage interconnection networks. IEEE Transactions on Computers, C-34(10):94948, Oct. 1985.
....access to additional buffering, this scheme is able to greatly improve loss performance under low to moderate load conditions. One big disadvantage of Scheme BP is that under high uniform loading and under certain non uniform traffic scenarios, it can lead to a phenomenon known as tree saturation [18]. Once an SE gets congested, it sends a backpressure signal to the b upstream SEs that feed it, causing them to also fill up. Then the same thing happens to the b 2 SEs that are two stages upstream of the congested SE. If the overload situation persists long enough, a tree of SEs rooted at the ....
G. F. Pfister and V. A. Norton, ""Hot Spot" Contention and Combining in Multistage Interconnection Networks," IEEE Trans. Comput., vol. 34, pp. 943--948, Oct. 1985.
....is found. Rather than employing static scheduling at compile time, self scheduling is a means of smoothing out the perturbations. An advantage of self scheduling is that the operating system may be by passed provided the problem of hot spot access to the iteration counter is solved [ 41 ] In [ 35 ] the claims of various statistical models for the performance of selfscheduling of loop iterations have been tested out on a BBN Butterfly TC2000. The models tested turned out to be broadly accurate. On traditional NUMA machines it is also possible to partition the available ....
G. F. Pfister and V. A. Norton. "hot spot" contention and combining in multistage interconnection networks. IEEE Transactions on Computers, 34(10):943--948, 1985.
....is the number of processors attempting to get the lock. Test test set still has the potential to generate a lot of unnecessary traffic. Even worse the destination of all that traffic may be a single node (the hot spot ) leading to congestion that can degrade the performance of the entire system [PN85]. To mitigate the impact of hot spots, Rudolph and Gottlieb first proposed combining requests in the interconnection network [Rud81] when two requests ON( N 18 to the same memory location meet in a switch competing for the same output port, the switch sends a single combined request in place ....
Gregory F. Pfister and V. Alan Norton. "Hot spot" contention and combining in multistage interconnection networks. IEEE Transactions on Computers, C-34(10):943--948, October 1985.
....will become much more congested than others. However, since oblivious routers are unaware of such hot spots, they will be chosen for routing paths even when equal length paths with no congestion exist as alternatives. Since many software applications create hot spots or other non uniform traffic [Pfister Norton 85] this is a serious drawback to oblivious routing. The performance of oblivious routing is analyzed in more detail in Chapter 3 and explored through simulations in Chapter 4. 14 Buffering Buffering can be applied to oblivious routing in many ways. Two basic types of buffering are available: ....
G. F. Pfister and V. A. Norton. "Hot spot" contention and combining in multistage interconnection networks. IEEE Trans. on Computers, C-34(10), October 1985.
....of fine grained combining in hardware. In both of these projects, P processors were connected to P memory units through a two way shuffleexchange network, through which all read and write requests were passed. In 1985, Pfister et al. discovered an effect which they called tree saturation [PN85] which made the need for combining even more pressing. Suppose that some fraction h, 0 h 1, of the 9 memory requests made by each processor are directed to a single memory unit (the hot spot ) while the other 1 Gamma h requests are made randomly. As h increases, the buffers in the ....
G.F Pfister and V. A. Norton. "Hot Spot" Contention and Combining in Multistage Interconnection Networks. In Proc. IEEE Intl. Conf. on Parallel Processing, 1985. .
....the performance of such a centralised implementation would not scale with the size of the multicomputer, as the finite throughput of that single processor would at some point become saturated. This paper describes a way of implementing queues on such machines in which hot spot contention [PN85] does not arise. Section 2 describes the properties which a queue shared by many asynchronous processes must have; Section 3 then describes how H T H T Figure 1: Implementing a Queue in a Single Memory Computer 1 1 2 3 4 5 6 7 8 9 10 11 0 H T past future present Figure 2: Conceptual ....
G.F Pfister and V. A. Norton. "Hot Spot" Contention and Combining in Multistage Interconnection Networks. In Proc. IEEE Intl. Conf. on Parallel Processing, 1985.
....become much more congested than others; since oblivious routers do not respond to hot spots, paths will be chosen through congestion even when alternative paths exist. This is a serious drawback to oblivious routing since many software applications create hot spots or other non uniform traffic [8]. B. Minimal Adaptive Routing Since most networks provide multiple shortest paths between most source destination pairs, it is intuitive to design a router that allows flexibility in choosing among these paths. Routers which allow some choice among minimal length paths based on local or temporal ....
G. F. Pfister and V. A. Norton, ""Hot spot" contention and combining in multistage interconnection networks," IEEE Trans. on Computers, vol. C-34, no. 10, Oct. 1985.
....fetch and adds take the same time to complete as a single read or write memory access. In particular, the accesses are not serialized, and the queue does not become a bottleneck. The combining of independent requests is supposed to prevent the shared index variable from becoming a memory hot spot [459, 353]. 52 shared struct f int data; int turn; int in count, out count; g q[N] shared int count, head, tail; insert( data ) if (count N) if (faa(count,1) N) index : faa(tail,1) mod N myturn : 2 faa(q[index] in count,1) wait until q[index] turn = myturn q[index] data : data ....
....independently using a shared global queue, so threads belonging to distinct jobs may execute simultaneously. The architecture calls for a shared memory that is accessed via a multistage network. If one of the applications has a memory hot spot, the whole network may suffer from tree saturation [459]. This may block memory accesses from threads in other applications. The severity of this problem and possible solutions are the subject of considerable controversy [14, sect. 10.3.8] 84 Sharing does not only effect performance; if proper care is not taken, sharing may compromise user data ....
G. F. Pfister and V. A. Norton, ""Hot-spot" contention and combining in multistage interconnection networks". IEEE Trans. Comput. C-34(10), pp. 943--948, Oct 1985.
....independently using a shared global queue, so threads belonging to distinct jobs may execute simultaneously. The architecture calls for a shared memory that is accessed via a multistage network. If one of the applications has a memory hot spot, the whole network may suffer from tree saturation [280]. This may block memory accesses from threads in other applications. The severity of this problem and possible solutions are the subject of considerable controversy [8, sect. 10.3.8] Sharing does not only effect performance; if proper care is not taken, sharing may compromise user data ....
G. F. Pfister and V. A. Norton, ""Hot-spot" contention and combining in multistage interconnection networks". IEEE Trans. Comput. C-34(10), pp. 943--948, Oct 1985.
....is made between low contention and high contention algorithms. On parallel machines with noncombining networks, high contention read steps or write steps can be quite slow, as each of the requests for a highly contended location is serviced one by one, creating a serial bottleneck or hot spot [55]. Moreover, intermediate nodes on the path to the contended destination become congested as well, so a single hot spot can even delay requests destined for other nodes in the network. If all p processors request the same location, a common occurrence in crcw pram algorithms, a direct ....
<F3.748e+05> G. F. Pfister and V. A.<F3.851e+05> Norton,<F3.971e+05> "Hot spot" contention and combining in multistage interconnection <F3.851e+05> networks, IEEE Trans. Comput., C-34 (1985), pp. 943--948.
....coming out of that wire are consecutively assigned numbers i; i 8; i (8 Delta 2) as in Figure 3. However, under high loads, the balancer toggle bits, especially the one at the root balancer of the tree, will be accessed by many processes concurrently, forming contention hot spots [26] and sequential bottlenecks that are as bad as that of a centralized spin lock protected counter implementation. Diffracting trees overcome the problem by having a prism mechanism in front of the toggle bit of every balancer, allowing independent pairs of tokens to be diffracted in separate ....
G. Pfister and V. Norton. "hot spot" contention and combining in multistage interconnection networks. IEEE Transactions on Computers, C-34(10):943--948, 1985.
....Clearly, if we have a hot spot , i.e. a memory word for which contention is high, then in any implementation, some operations trying to access this word will be delayed for a long time. One can even argue that in this case, operations will be delayed even when they are supported in hardware [5, 24]. However, such a hot spot should not delay far away operations. This paper proposes to evaluate implementations by their sensitivity , measuring to what distance a hot spot influences the performance of other operations. Roughly stated, the sensitivity is the longest distance from one operation ....
G. Pfister and A. Norton. "Hot spot" contention and combining in multistage interconnection networks. IEEE Trans. Comput., C-34(10):943--948, 1985.
....the global memory. The number of stages in the network (or the size of the bus) increases with the number of processors. Thus, even if sufficient memory bandwidth can be provided, the minimum network latency increases with the number of processors. Even though techniques such as message combining [80] and hierarchical shared memory [46] have been proposed for reducing the message traffic and message latency, most existing large scale multiprocessors use distributed memory. Distributed memory multiprocessors are considered to be more scalable. The message latency on a distributed memory ....
G. F. Pfister and V. A. Norton. "Hot-Spot" Contention and Combining in Multistage Interconnection Networks. IEEE Transactions on Computers, C34 (10):943--948, October 1985.
....such as the Test (Test Set) lock [SR84] is that they required O(N 2 ) network transactions to service N processors attempting to enter a critical section simultaneously. In some cases these locks were a major contributor to network contention, including the problem of so called hot spots [PN85]. To remedy this, Goodman suggested the queue on sync bits [GVW89] which involves local spinning on a synchronization flag, thereby eliminating the O(N 2 ) network operations. This solution was meant to be implemented in hardware, but Graunke and Thakkar [GT90] and Andersson [And90] implemented ....
G. F. Pfister and A. Norton. "Hot Spot" Contention and Combining in Multistage Interconenction Networks. IEEE Transactions on Computers, October 1985.
....system size, for different levels of communication locality. Another question that we address using our model is the effect of communication hot spots in mesh networks. It has been shown that hot spots can seriously degrade the performance of indirect (e.g. multistage) interconnection networks [PfN85]. We investigate to what extent this is true for the mesh networks. Performance imbalance caused by the deadlock avoidance algorithm. In section 2.4 we pointed out that the deadlock free routing algorithm of Dally and Seitz generates asymmetric loads on the virtual channels in the network. ....
....impact system performance. Hot spots arise when a number of processors make a significant fraction of their requests to a single memory module, or to a single node, in a multiprocessor. The issue has been studied using open queueing models in the context of multistage interconnection networks [LKK86, PfN85, YTL87]. We examine the effect of hot spots in mesh networks by assigning some fraction, F hot , of requests from each processor to a particular node in the system, while the remaining fraction 1 F hot is distributed uniformly across all processors. In Figure 4.9 we plot the mean response time (sum of ....
[Article contains additional citation context not shown here]
G. F. PFISTER and V. A. NORTON, "Hot Spot" Contention and Combining in Multistage Interconnection Networks, IEEE Transactions on Computers C-34, 10 (October 1985), .
....for applications that have been optimized for NUMA systems and migrate and replicate data objects to improve locality. A major type of non uniform traffic is a hot spot; that is, a single memory that has an unusually high probability of being accessed by all or many of the processors. Early papers [23, 30] identified hot spots as a major cause of performance degradation in shared memory interconnection networks. The degradation is exacerbated by tree saturation [23] which even obstructs memory traffic to non hot spots locations. Significant progress has been made in reducing hot spot traffic, ....
....that is, a single memory that has an unusually high probability of being accessed by all or many of the processors. Early papers [23, 30] identified hot spots as a major cause of performance degradation in shared memory interconnection networks. The degradation is exacerbated by tree saturation [23] which even obstructs memory traffic to non hot spots locations. Significant progress has been made in reducing hot spot traffic, especially hot spot traffic due to synchronization. Techniques include separate synchronization networks (possibly with combining) 18] and hot spot free software ....
G.F. Pfister and V.A. Norton. "Hot spot" contention and combining in multistage interconnection networks. IEEE Transactions on Computers, 34(10):--395, October 1985.
....gang scheduling of SCore D realizes multi user, multiparallel process environment. To implement efficient and practical gang scheduling, we developed network preemption. It can be very difficult to estimate and guarantee the maximum time in a large network, considering the effect of hot spots [12]. This situation becomes a severe problem in implementing real time scheduling. We have already proposed an architectural support for gang scheduling, called Drain [5] The Drain mechanism can guarantee the maximum time to reach the steady state. We found that the signal delivery of the SunOS can ....
Gregory F. Pfister and V. Alan Norton. "Hot Spot" Contention and Combining in Multistage Interconnection Networks. IEEE Transactions on Computers, pages 943--948, October 1985.
....performs a study to determine when it is better to simply wait and when it is better to try to use split phase transactions. The CM 5 does not include any message combining mechanisms in the data network. G. Pfister and V. Norton showed that combining can help avoid hot spots in a data network [PN85]. A. Chien found that combining does not work very well for many real programs, but that it is possible to dynamically detect hot spots [Chi86] P. C. Yew et al. explored how to use software combining to avoid hot spots [YTL87] W. Dally analyzed the hot spot behavior of meshes [Dal87] The ....
G. F. Pfister and V. A. Norton. "hot spot" contention and combining in multistage interconnection networks. IEEE Transactions on Computers, C-34, pages 943--948, October 1985.
....since oblivious routers are do not respond to hot spots, paths will be chosen through congestion even when equal length paths with no congestion exist as alternatives. This is a serious drawback to oblivious routing since many software applications create hot spots or other non uniform traffic [Pfister Norton 85] 4.1.4 Buffering Buffering can be applied to oblivious routing in many ways. Two basic types of buffering are available: blocking and non blocking. Blocking buffering is simply in place buffering of messages. In blocking buffering, messages do not give up the resources they hold when they ....
G. F. Pfister and V. A. Norton. "Hot spot" contention and combining in multistage interconnection networks. IEEE Trans. on Computers, C-34(10), October 1985.
....to produce large amounts of memory and interconnection network contention, which causes performance bottlenecks that become markedly more pronounced in larger machines and applications. As a consequence, the overhead of busy wait synchronization is widely regarded as a serious performance problem [2, 6, 11, 14, 38, 49, 51]. When many processors busy wait on a single synchronization variable, they create a hot spot that is the target of a disproportionate share of the network traffic. Pfister and Norton [38] showed that the presence of hot spots can severely degrade performance for all traffic in multistage ....
....of busy wait synchronization is widely regarded as a serious performance problem [2, 6, 11, 14, 38, 49, 51] When many processors busy wait on a single synchronization variable, they create a hot spot that is the target of a disproportionate share of the network traffic. Pfister and Norton [38] showed that the presence of hot spots can severely degrade performance for all traffic in multistage interconnection networks, not just traffic due to synchronizing processors. As part of a larger study, Agarwal and Cherian [2] investigated the impact of synchronization on overall program ....
[Article contains additional citation context not shown here]
G. F. Pfister and V. A. Norton. "Hot spot" contention and combining in multistage interconnection networks. IEEE Transactions on Computers, C-34(10):943--948, Oct. 1985.
....is made between low contention and high contention algorithms. On parallel machines with non combining networks, high contention read steps or write steps can be quite slow, as each of the requests for a highly contended location is serviced one by one, creating a serial bottleneck or hot spot [PN85] Moreover, intermediate nodes on the path to the contended destination become congested as well, so a single hot spot can even delay requests destined for other nodes in the network. If all p processors request the same location, a common occurrence in crcw pram algorithms, a direct ....
G. F. Pfister and V. A. Norton. "Hot spot" contention and combining in multistage interconnection networks. IEEE Trans. on Computers, C-34(10):943--948, 1985.
....hardware and software dynamic page (or cache block) placement) is of interest, but outside of the scope of this paper. A major type of non uniform traffic is a hot spot; that is, a single memory that has an unusually high probability of being accessed by all or many of the processors. Early papers [23, 30] identified hot spot memory modules as a major cause of performance degradation in shared memory interconnection networks. The degradation is exacerbated by tree saturation [23] which even obstructs memory traffic to non hot spots locations. Significant progress has been made in reducing hot ....
....memory that has an unusually high probability of being accessed by all or many of the processors. Early papers [23, 30] identified hot spot memory modules as a major cause of performance degradation in shared memory interconnection networks. The degradation is exacerbated by tree saturation [23] which even obstructs memory traffic to non hot spots locations. Significant progress has been made in reducing hot spot traffic, especially hot spot traffic due to synchronization. Techniques include separate synchronization networks (possibly with combining) 18] and hot spot free software ....
G.F. Pfister and V.A. Norton. "Hot spot" contention and combining in multistage interconnection networks. IEEE Transactions on Computers, 34(10):--395, October 1985.
....2D mesh it seems to be even worse in terms of routing steps (see Figure 13 on page 39) Overloading (Section 5.2) however, improves considerably the efficiency of R vln (see Figure 15 on page 44) 5.1. 3 Combining queues So far, the combining queues have mainly been used in avoiding hot spots [55, 42, 70], and in the connection of the Ultracomputer project [23, 8] to accomplish fetch add primitives [43, 13, 14] The results in those papers mainly deal with the case, where processors constantly generate packets at certain probability (assumptions on the distribution of the generated packets vary a ....
G.F. Pfister and V.A. Norton. "Hot Spot" Contention and Combining in Multistage Interconnection Networks. IEEE Transactions on Computers, C-34(10):943 -- 948, 1985.
No context found.
G. Pfister and V. A. Norton. "hot spot" contention and combining in multistage interconnection networks. IEEE Trans. on Computers, 34(10), Oct 1985.
No context found.
Pfister, G. F. and Norton, V. A. "Hot spot" contention and combining in multistage interconnection networks. IEEE Trans. Comput., C-34, 10(Oct. 1985), 934-948.
No context found.
. G. F. Pfister and V. A. Norton. "Hot Spot" contention and combining in multistage interconnection networks, IEEE Trans. Comput., Vol. C-34, No. 10, Oct. 1985, 934-948.
No context found.
G. Pfister and V. A. Norton. "hot spot" contention and combining in multistage interconnection networks. IEEE Trans. on Computers, 34(10), Oct 1985.
No context found.
G. Pfister and V. A. Norton. "hot spot" contention and combining in multistage interconnection networks. IEEE Trans. on Computers, 34(10), Oct 1985.
No context found.
G. F. Pfister and V. A. Norton. "Hot Spot" Contention and Combining in Multistage Interconnection Networks. IEEE Transactions on Computers, C-34(10):943--948, Oct. 1985.
No context found.
G. P ster and V. Norton. \Hot spot" contention and combining in multistage interconnection networks. IEEE Transactions on Computers, C-34(10):943-948, October 1985. 379
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC