| E. Markatos and T. LeBlanc, Load Balancing vs. Locality Management in Shared-Memory Multiprocessors, Technical Report 399, Computer Science Department, University of Rochester, Rochester, New York (October 1991). |
....thread packages more efficient. Anderson, et al. 2] discuss the gain from using local ready queues, and [1] shows how to do user level scheduling. Schoenberg and Hummel [31] explain how to avoid allocating one stack per thread and switching contexts in nested parallel for loops. Markatos, et al. [42] present a thorough study of the tradeoffs between load balancing and locality in shared memory machines. Keppel [34] describes a portable threads package that supports efficient barrier synchronization and non pre emptive threads. Some threads packages support fine grain parallelism. The Uniform ....
Evangelos P. Markatos and Thomas L. Blanc. Load balancing vs. locality management in shared-memory multiprocessor. In Proceedings of the 1992 International Conference on Parallel Processing, volume I, Architecture, pages I:258--267, Boca Raton, Florida, August 1992. CRC Press.
....as the one described elsewhere [34] would be required to ensure further scalability. The effectiveness of our scheduler has been demonstrated on one SMP; future work involves studying its applicability to a scalable, NUMA multiprocessor by combining it with locality based scheduling techniques [6, 30, 40]. For example, to schedule threads on a hardware coherent cluster of SMPs, our scheduling algorithm could be used to maintain one shared queue on each SMP, and threads would be moved between SMPs only when required. We have shown that our space efficient scheduler is well suited for programs with ....
Evangelos P. Markatos and Thomas L. Blanc. Load balancing vs. locality management in sharedmemory multiprocessor. In Proceedings of the 1992 International Conference on Parallel Processing, volume I, Architecture, pages I:258--267, Boca Raton, Florida, August 1992.
....by measurements on different architectures. He concludes that new programming models will have to be used to efficiently exploit modern machines and locality information will have to be used for scheduling. Even in cases of an imbalanced load, these locality scheduling can lead to better results ( ML91] Squillante and Lazowska used a queuing model to examine the influence of locality and affinity scheduling on the performance of parallel systems. Their results show that only little information is necessary to improve performance ( SL89] 1.3 Overview In contrast to the papers mentioned ....
E.P. Markatos and T.J. LeBlanc. Load balancing vs. locality management in shared-memory multiprocessors. Technical Report 399, University of Rochester, Computer Science Department, Rochester New York 14627, October 1991. 40
....(Section 4.1.5 will describe the insertion and retrieval routines in more details. Our setup pays more attention to data locality than load balancing, because with non uniform memory accesses, it is generally important to reduce remote communication between processors in different clusters [170]. A complication with a parallel workbag is termination detection. We want to terminate the program when the bag queue becomes empty permanently. In the sequential case (Figure 4.1) because there is only one thread of control, when the queue becomes empty, it stays empty. The execution can ....
Evangelos P. Markatos and Thomas J. LeBlanc. Load Balancing vs. Locality Management in Shared-Memory Multiprocessors. Technical Report 399, The University of Rochester, Computer Science Department, Rochester, New York 14627, October 1991.
....use the diagonal assignment strategy because of our assumptions about the data skew (see Section 3) 1 3 5 2 4 2 4 1 3 5 3 5 2 4 1 4 1 3 5 2 5 2 4 1 3 Figure 3: Processor assignment using magic squares assignment for a 5x5 grid 2. 3 Scheduling Following the classification of scheduling given in [ML92] the scheduling strategies in SN and SE systems can be classified as data affinity and load balancing scheduling, respectively. In SN 7 systems, queries are decomposed into sub queries based on the the degree of declustering of the data being accessed. The sub queries are scheduled to be ....
....fails to recognize the non uniform costs of data access in our architecture. Our approach to scheduling decomposes queries into sub queries and is processor initiated. However, our approach takes data affinity into consideration and thus is similar to the scheduling approach advocated in [ML92] Each node has a private work queue, similar to a SN scheduling strategy. However, any node can access any other node s work queue through DSM. When a query is submitted to the DBMS the declustering directory is consulted to determine the fragments that need to be accessed. Based on this ....
E. P. Markatos and T. J. LeBlanc. Load balancing vs. locality management in shared-memory multiprocessors. In Proceedings of the International Conference on Parallel Processing, August 1992.
....and latency will be reduced, resulting in better execution time. Furthermore, if multiple iterations access the same data, communication requirements can be reduced by executing them on the same processor. Locality is a difficult issue to address, as it must be weighed against load balancing [22]. If the data is striped across the processors, a block scheduling of tasks to processors can minimize communication. However, the schedule produced by GSS will ignore this structure of the data and require more communication. GSS does produce a schedule in which it is extremely likely that ....
Markatos, E. P., and LeBlanc, T. J. Load balancing vs. locality management in shared-memory multiprocessors. Tech. Rep. 399, Computer Science Department, University of Rochester, October 1991.
....queue, and no explicit checks must be done to determine which type of thread is running. 4.2. Queue Structures Local ready queues are used for each of the three types of threads (RTC, iterative, and fork join) This both reduces contention [Ande88] and allows scheduling to be locality driven [Mark92]. Three different queues are used, because each is managed somewhat differently, as discussed in Sections 4.3 4.5. This also allows using multiple types of threads in an application. Access to the RTC and iterative queues is non locking in order to avoid locking overhead. It is up to an ....
....the gain from using local ready queues, and [Ande91] shows how to do user level scheduling [Ande91] Schoenberg and Hummel [Humm91] explain how to avoid allocating a stack per thread and context switching in nested parallel for loops, which are a form of run to completion threads. Markatos et al. [Mark92] presents a thorough study of the tradeoffs between load balancing and locality in shared memory machines. Keppel [Kepp93] describes a portable threads package that supports efficient barrier synchronization and non preemptive threads. Threads packages that support finer grain parallelism include ....
Markatos, E. P. and T. J. LeBlanc. Load balancing vs. locality management in shared-memory multiprocessors. Proc. 1992 International Conference on Parallel Processing, August 1992, p. I:258267.
....example, let us assume that two tasks executing on adjacent processing units take 20 and 10 cycles, respectively. On the average, 50 of the time the second processing unit is idle, resulting in overall utilization of 75 . Load imbalance causes performance loss for large scale parallel machines [68]. Since distributed processor organizations also partition their physical resources among tasks similar to parallel machines, they are confronted with load imbalance problems. Large variations in the amount of computation of adjacent tasks causes load imbalance resulting in successor tasks waiting ....
E. P. Markatos and T. J. LeBlanc. Load balancing vs. locality management in shared-memory multiprocessors. Technical Report TR 399, Oct. 1991.
.... above (which now also comprised the parallel functional IDA implementation) The scheduling algorithms we implemented share one property: one of the children created at each fork is kept at home, since common sense dictates this as the minimal locality requirement for schedulers (see also [Markatos92] who shows that releasing this requirement is punished heavily) We implemented the following scheduling algorithms: Shared task list Newly created tasks are put into one shared task list (one newly created task is kept at home, as indicated above) This algorithm does nothing explicitly to ....
E. P. Markatos and T. J. LeBlanc. Load balancing vs. locality management in Shared-Memory multiprocessors. 1992 Int. Conf. on Parallel Processing, pages 258--267, 1992.
....may cause the ARB to overflow causing the task to stall until speculation is resolved. 3) Large tasks may result in a loss of opportunity because narrow PUs cannot put large amounts of intra task parallelism to use. Load imbalance causes performance loss similar to large scale, parallel machines [10]. Large variations in the amount of computation of adjacent tasks causes load imbalance resulting in successor tasks waiting for predecessor task to complete and retire. There are two kinds of overhead associated with tasks: 1) task start overhead, and (2) task end overhead. Task start overhead ....
E. P. Markatos and T. J. LeBlanc. Load balancing vs. locality management in shared-memory multiprocessors. Technical Report URCSD-TR 399, University of Rochester, Oct. 1991.
....of tuples has no logical implications, the diagonal and Magic Squares strategies may be equivalent. For our experiments we use the diagonal assignment strategy because of our assumptions about the data skew (see Section 3) 2. 3 Scheduling Following the classification of scheduling given in [18] the scheduling strategies in SN and SE systems can be classified as data affinity and load balancing scheduling, respectively. In SN systems, queries are decomposed into sub queries based on the the degree of declustering of the data being accessed. The sub queries are scheduled to be processed ....
....fails to recognize the non uniform costs of data access in our architecture. Our approach to scheduling decomposes queries into sub queries and is processor initiated. However, our approach takes data affinity into consideration and thus is similar to the scheduling approach advocated in [18]. Each node has a private work queue, similar to a SN scheduling strategy. However, any node can access any other node s work queue through DSM. When a query is submitted to the DBMS the declustering directory is consulted to determine the fragments that need to be accessed. Based on this ....
E. P. Markatos and T. J. LeBlanc. Load balancing vs. locality management in shared-memory multiprocessors. In Proceedings of the International Conference on Parallel Processing, August 1992.
....concepts of pruning and of ordering queues to favor larger threads concepts borrowed by Filaments. Cilk 5 [FLR98] compiles a fast clone and slow clone for each parallel function, and then executes the slow clone when all processors are busy, which is a similar idea to pruning. Markatos et al. MB92] present a thorough study of the tradeoffs between load balancing and locality in shared memory machines with respect to thread scheduling. The most related threads packages that provide efficient fine grain parallelism are the Uniform System [TC88] Chores [EZ93] and TAM [CDG 93] The first ....
Evangelos P. Markatos and Thomas L. Blanc. Load balancing vs. locality management in shared-memory multiprocessor. In Proceedings of the 1992 International Conference on Parallel Processing, volume I, Architecture, pages I:258--267, Boca Raton, Florida, August 1992. CRC Press.
....packages more efficient. Anderson et al. ALL89] discuss the gain from using local ready queues, and [ABLL92] shows how to do user level scheduling. Schoenberg and Hummel [HS91] explain how to avoid allocating a stack per thread and switching contexts in nested parallel for loops. Markatos et al. [MB92] present a thorough study of the tradeoffs between load balancing and locality in shared memory machines. Keppel [Kep93] describes a portable threads package that supports efficient barrier synchronization and non preemptive threads. Many threads packages support more fine grain parallelism. The ....
Evangelos P. Markatos and Thomas L. Blanc. Load balancing vs. locality management in sharedmemory multiprocessor. In Proceedings of the 1992 International Conference on Parallel Processing, volume I, Architecture, pages I:258--267, Boca Raton, Florida, August 1992. CRC Press.
No context found.
E. Markatos and T. LeBlanc, Load Balancing vs. Locality Management in Shared-Memory Multiprocessors, Technical Report 399, Computer Science Department, University of Rochester, Rochester, New York (October 1991).
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC