| H. Li, S. Tandri, M. Stumm, and K. C. Sevcik, "Locality and Loop Scheduling on NUMA Multiprocessors, " International Conference on Parallel Processing, Vol. II, pp. 140--144, 1993. |
....memory and the use of deeper memory hierarchies, locality management is an even bigger issue in today s processors. In order to preserve locality within an SMP, each processor maintains task anity it must nish its own task assignment prior to stealing a task from another processor (similar to [16], except using xed size tasks) This is done by using a per processor task queue, and having a processor retrieve tasks from the head of its queue but steal from the tail of another processor s queue. Once a task is stolen from another processor s task queue, it is moved and owned by the stealing ....
H. Li, S. Tandri, M. Stumm, and K. C. Sevcik. Locality and loop scheduling on NUMA multiprocessors. In 1993 International Conference on Parallel Processing, pages 140-147, August 1993.
....static scheduling [1] self scheduling [1] guided self scheduling [11] and factoring [6] to name a few. However, these strategies are not applicable when loops carry dependences, such as in wavefront parallelism. Markatos and LeBlanc [10] propose Affinity based Scheduling (AFS) and Li et al. [7] propose Locality based Dynamic Scheduling (LDS) to schedule parallel DOALL loop iterations for locality as well as load balance. Both techniques address locality by initially assigning each processor a local set of independent iterations from a DOALL loop in a manner which corresponds to a ....
H. Li, S. Tandri, M. Stumm, and K. C. Sevcik. Locality and loop scheduling on NUMA multiprocessors. In Proc. 1993 Intl. Conf. on Parallel Processing, pages II140--II147, August 1993.
....has been shown to work well in practise. Factoring is the only scheme supported by a comprehensive mathematical model. In addition, two decreasing block size algorithms have been presented that take data locality into account, Affinity Scheduling [12] AFS) and Locality based Dynamic Scheduling [13] (LDS) In both schemes, a subset of the tasks is placed in a local queue on each processor. The data needed by those tasks is stored on the same processor. The processor executes its local tasks using a decreasing block size scheme until they are exhausted. Then, if other processors are still ....
H. Li, S. Tandri, M. Stumm, and K. Sevcik, "Locality and Loop Scheduling on NUMA Multiprocessors," in Proceedings of the 1993 International Conference on Parallel Processing, pp. II 140 -- II 127, Aug. 1993. 13
....but it only works in the case that the same data is reused several times. That happens, for example, when the parallel loop is surrounded by a sequential one, and then the same chunk is executed repeatedly. A locality based dynamic scheduling (LDS) designed for NUMA machines, is described in [LTSS93]. LDS considers the data space is partitioned throughout the processors. It computes the sizes of the chunks using a GSS like scheme. Each chunk is assigned to a processor by demand and it includes those iterations whose data is stored in the local memory of the processor assigned. In the case of ....
Li, H., Tandri, S., Stummu, M. and Sevcik, K., "Locality and Loop Scheduling on NUMA Multiprocessors ", in IEEE Int. Conf. on Parallel Processing, Aug. 1993, pp. 140--147.
....regarding interactions between the iterations is often lost due to the desire to have a single list of tasks (e.g. 14] 27] and many of the approaches assume that iterations are independent, requiring no communication, and or target a shared memory target architecture. Recent research [20] [16] has added consideration for processor affinity to the task queue models so that locality and data reuse are taken into account: iterations that use the same data are assigned to the same processor unless they need to be moved to balance load. In Affinity Scheduling [20] data is moved to the ....
....iterations that use the same data are assigned to the same processor unless they need to be moved to balance load. In Affinity Scheduling [20] data is moved to the local cache when first accessed, and the scheduling algorithm assigns iterations in blocks. In Locality based Dynamic Scheduling [16], data is initially distributed in block, cyclic, etc. fashion, and each processor first executes the iterations which access local data. Both of these approaches still assume a shared memory environment. Numerous other approaches have been proposed for scheduling loop iterations, especially if ....
Hui Li, Sudarsan Tandri, Michael Stumm, and Kenneth C. Sevcik. Locality and Loop Scheduling on NUMA Multiprocessors. In Proceedings of the 1993 International Conference on Parallel Processing, pages II-140--II-147. CRC Press, Inc., August 1993.
....been explored [4, 7, 15, 16] Hou, Ansari and Ren proposed a genetic operator based on the precedence relations between tasks in a task graph to perform scheduling in multiprocessor systems [17] Li, Tandri, et. al, consider locality in attempts to dynamically balance the load across processors [18]. Liu studied allocating tasks to non contiguous processors in attempts to increase processor efficiency [19] What these scheduling algorithms and techniques, as well as others [3 6] lack is consideration of the communication costs which arise from the allocation schemes. This is not unusual ....
H. Li, S. Tandri, M. Stumm, and K. C. Sevcik, "Locality and loop scheduling on numa multiprocessors, " in Proceedings of the 1993 International Conference on Parallel Processing, pp. 140--147, 1993.
....in the scheduling of multi dimensional applications, such as the affine by statement technique [3] and the index shift method [17] However, these methods do not consider resource constrained designs. Other methods focus on multi processor scheduling and are not applicable to the target problem [10, 11, 16, 18, 19]. In a previous study, we extended the concept of multi dimensional retiming, introducing the idea of schedule based multi dimensional retiming without resource constraints [20] In this method, a feasible linear schedule allows us to restructure the loop body represented by a general form of ....
H. Li, S. Tandri, M. Stumm and K. C. Sevcik, " Locality and Loop Scheduling on NUMA Multiprocessors," Proc. of the 1993 International Conference on Parallel Processing, Vol. II, pp. 140-147, 1993.
....machine. Dynamic load balancing is defined as the allocation of the work of a single application to processors at run time so that the execution time of the application is minimized. Many researchers have explored the problem of dynamic scheduling of parallel loop iterations [7] 8] 9] 10] 11] [12] [13] but most have not considered the problems of a distributed memory environment. A few researchers [11] 12] have suggested approaches that take locality issues into account for machines with non uniform memory access times, but their main target is still shared memory machines. Researchers ....
....run time so that the execution time of the application is minimized. Many researchers have explored the problem of dynamic scheduling of parallel loop iterations [7] 8] 9] 10] 11] 12] 13] but most have not considered the problems of a distributed memory environment. A few researchers [11] [12] have suggested approaches that take locality issues into account for machines with non uniform memory access times, but their main target is still shared memory machines. Researchers [13] have also looked at dynamic scheduling of loop iterations on a network of workstations using Dataparallel C, ....
[Article contains additional citation context not shown here]
H. Li, S. Tandri, M. Stumm, and K. C. Sevcik, "Locality and Loop Scheduling on NUMA Multiprocessors," in Proc. of the 1993 Int'l Conference on Parallel Processing, pp. II-140--II-147, CRC Press, Inc., Aug. 1993.
....a cyclic iteration assignment scheme is used. Thus each processor would execute iterations essentially at a distance p(where p is the number of processors) apart. The essence of AFS scheduling is still maintained. 2.3. 2 Locality based Dynamic Scheduling Locality based Dynamic Scheduling(LDS)[27] is similar to Affinity Scheduling in that it tries to schedule tasks according to data locality. Chunk sizes are determined in a manner similar to GSS. LDS allocates a chunk K i = d R i 2p e, such that R i 1 = R i K i . But unlike GSS where the chunks are consecutive iterations of ....
H. Li, S. Tandri, M. Stumm and K.C. Sevcik, "Locality and Loop Scheduling on NUMA Multiprocessors," Int. Conf. on Parallel Processing, pp. II-140-II-147, Aug. 1993.
....time. Moreover, dynamic scheduling is known to incur a heavier scheduling overhead due to its need to access and to modify a global iteration queue. Dynamic scheduling, on the other hand, can potentially achieve a more balanced workload among the processors [KW85, HSF92, PK87, FYTZ87, TN91, ML92, LTSS93] In applications where the workload is irregularly distributed among the parallel loop iterations, it is quite possible that a program s total execution time can be shorter under dynamic scheduling. However, many parallel loops in well known benchmarking programs are found to have little ....
H. Li, S. Tandri, M. Stumm, and K. C. Sevcik. Locality and loop scheduling on NUMA multiprocessors. In Proc. International Conference on Parallel Processing, volume II: Software, pages 140--147, August 1993.
....we select a compromise. We assume that the penalty for the compromise distribution is not high and that a compromise distribution is better than choosing between load balancing and locality. If load imbalances exist, run time dynamic scheduling algorithms such as locality based dynamic scheduling [26] can be utilized. During the iterative update process, A 1 takes on CyclicRCyclic, the distribution attribute of L1 i . L1 i is forced to be BlockCyclicRBlockCyclic(BCRBC) resolving its distribution attribute CyclicRcyclic and the attribute of node B 1 connected to it. L2 j takes the Block ....
Hui Li, Sudarsan Tandri, Michael Stumm, and Kenneth C. Sevcik. Locality and Loop Scheduling on NUMA Multiprocessors. In Proc. of the International Conference on Parallel Processing, volume II, pages 140--147, August 1993.
....than the local memory. Memory allocation may have a significant effect on the access time on NUMA. The memory allocation is a difficult problem when we are dealing with real programs such as the Perfect benchmarks. Nonetheless, the authors of a recent algorithm, the LDS, for dynamic scheduling [6] have good news for static scheduling after they study the performance of a few numerical kernels. They suggest that, on NUMA architectures, static scheduling is superior to all known dynamic scheduling algorithms when the processors have roughly an equal workload (which apparently is measured in ....
....such alignment will most likely result in cache hits in the second parallel region when a are b are referenced. Currently, there is no satisfactory mechanism to allow dynamic scheduling to take advantage of the two kinds of opportunities we mentioned above. New possibilities are explored in [8] [6]. Nonetheless, it is commonly believed that dynamic scheduling tends to produce more balanced workload than static scheduling where the operation counts of individual iterations vary significantly. Dynamic scheduling has several variants whose main difference is in the number of iterations each ....
[Article contains additional citation context not shown here]
H. Li, S. Tandri, M. Stumm, and K. C. Sevcik. Locality and loop scheduling on NUMA multiprocessors. In Proc. 1993 International Conference on Parallel Processing, pages II:140--147, August 1993.
No context found.
H. Li, S. Tandri, M. Stumm, and K. C. Sevcik, "Locality and Loop Scheduling on NUMA Multiprocessors, " International Conference on Parallel Processing, Vol. II, pp. 140--144, 1993.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC