| Markatos, E. Scheduling for Locality in Shared Memory Multiprocessors. Ph.D. thesis, University of Rochester, 1993. |
....to balance load. To name just a few, the Topaz operating 23 1 1 2 P 1 P 3 P 0 P Figure 2 3. Message pattern under computation migration. A thread on processor P 0 makes n tra c is not shown. system [72] for the Firefly multiprocessor workstation [92] migrates inactive threads. Markatos [70] explored a scheduling policy that favors locality over load balance: threads are initially scheduled based on expected accesses, and idle threads are migrated to balance load. Anderson et al. 3]studied, among other things, the performance implications of using local queues for waiting threads; ....
E. Markatos. Scheduling for Locality in Shared-Memory Multiprocessors. PhD thesis, University of Rochester, 1993.
....percents on small UMA machines. Although the improvements are small he concludes that its worth to include affinity considerations because of an increasing importance in fine grained applications. The importance of thread placement and locality considerations has been showed by Markatos, e.g. in [Mar93] or [ML93] by measurements on different architectures. He concludes that new programming models will have to be used to efficiently exploit modern machines and locality information will have to be used for scheduling. Even in cases of an imbalanced load, these locality scheduling can lead to ....
....onto another central list. This structures and strategies worked fine for smaller machines with few processors and UMA architecture. On larger machines however operating systems and userlevel libraries using these structures experienced a severe loss in performance. For example Markatos in [Mar93] shows that neglecting locality of data on current machines could lead to very low efficiency, especially on modern NUMA architectures. To understand the reasons for the importance of including locality information into considerations on structures and mechanisms in userlevel thread libraries, ....
[Article contains additional citation context not shown here]
Evangelos Markatos. Scheduling for Locality in Shared-Memory Multiprocessors. PhD thesis, University of Rochester, Rochester, New York, 1993.
.... automatic (as opposed to user specified) data distribution is discussed in [76, 77, 78, 79, 80] 38 Decomposition and scheduling algorithms that attempt to co locate process and data in shared memory multiprocessors, in order to reduce the costs of remote memory accesses, are discussed in [81]. The compilation of Fortran codes for a data flow machine that supports only data dependences is discussed [82] The alternative formulation proposed in this thesis for execution tags would be well suited for implementation on a machine with only one type of dependence supported. Some issues on ....
....machine with nonuniform memory access, the access to remote memory can be orders of magnitude slower than for access to local memory. Therefore, it is important to distribute data and computations in such a manner that the computations performed by a processor involve mostly local memory accesses [81]. From a processor s perspective, the higher the locality of access the greater the performance. However, enforcing locality can degrade load balance, since the predefined distribution of work leaves less room for dynamic adjustments of load. Implementation of autoscheduling on distributed ....
E. Markatos, Scheduling for Locality in Shared-Memory Multiprocessors. Ph.D. thesis, Department of Computer Science, University of Rochester, 1993.
....drive code for thread management, thus achieving a dynamic scheduling mechanism with zero operating system overhead. The subject of schedulingfor locality in shared memory machines, i.e. assigning processes to processors based on the location of the data they will access, is approached in [Markatos, 1993]. Theoretical models that have been developed for task scheduling usually oversimplify the problem due to its large complexity, or consider some subproblems only. An interesting example is [Haddad, 1989] which models the partitioned load allocation problem as an integer programming min max ....
Markatos, E. (1993). Scheduling for Locality in Shared-Memory Multiprocessors. Ph.D. dissertation, Department of Computer Science, University of Rochester, New York.
....on a distributedmemory machine. Some experimental results on a CM 5 comparing our heuristics with some other approaches are presented. 1 Introduction The problem of scheduling a set of applications on a multiprocessor system has been investigated in a number of different points of view [Sev93] [Mar93] [TSG91] In addition, there are a number of distinguishing characteristics which the scheduling problem may have. In this paper we classify the various aspects of it into two different levels. The lower level (kernel level scheduling policies) is developed to support multiprogramming in the ....
Evangelos Markatos. Scheduling for Locality in Shared-Memory Multiprocessors. Ph.d. thesis, Department of Computer Science, College of Arts and Sciences, University of Rochester, Rochester, NY, 1993.
....data tend to be near each other. Markatos and LeBlanc assumed various workload models and used synthetic programs to show that scheduling processes without regard for their data sharing patterns could lead to significant performance degradation on a wide range of multiprocessor architectures [21]. Chandra et al. have investigated the effect of OS scheduling and page migration policies on the cache behavior and execution time of applications running on the DASH [9] In contrast to previous work which has assumed that a workload s data sharing patterns are constant while system policies ....
....and locality management policies. They are particularly interested in finding efficient ways to share multiprocessor resources among a wide range of application types, including combinations of sequential and parallel applications, long running scientific workloads, and IO bound applications. Both [21] and [9] assume a CC NUMA environment in which pages have fixed home nodes and are never replicated. 9] allows pages to migrate between compute nodes, while the architectural model in [21] requires pages to remain on a fixed node for the duration of the application. The DASH project manages cache ....
[Article contains additional citation context not shown here]
Evangelos Markatos. Scheduling for locality in shared-memory multiprocessors. Technical Report 457, University of Rochester, May 1993.
....parallelism during a simulation (Konas and Yew 1991) The performance of a parallel simulation depends on the efficient utilization of the processors in a multiprocessor system. Two significant factors that determine the effective use of a multiprocessor are load balancing and locality management (Markatos 1993). Load balancing refers to the dynamic redistribution of the workload among the participating processors so that the load is continuously balanced across the multiprocessor. Locality management, on the other hand, refers to the execution of a computation on the processor closer to the storage ....
.... self scheduling of parallel programs has shown that, in the presence of nonuniform memory access (NUMA) characteristics in the host multiprocessor, we need to simultaneously address the issues of load balancing and of locality management if we are to achieve the most efficient parallel execution (Markatos 1993). It has also been shown that scheduling decisions should be fast introducing minimum overhead into the computation (Anderson 1991) This means that affinity related decisions should use minimum information, and should not introduce significant overhead into the scheduling process. Otherwise, the ....
Markatos, E. 1993. Scheduling for Locality in SharedMemory Multiprocessors. PhD thesis, Department of Computer Science, University of Rochester, Rochester, New York.
.... schemes described so far is that, especially in computers with a memory hierarchy, it may be preferable to execute certain iterations on certain processors in order to reuse data or to avoid false sharing; this need for locality exploitation was the motivation for developing affinity scheduling [152, 153]. This algorithm combines static and dynamic approaches. It first assigns dn=pe loop iterations to each processor on the basis of block partitioning. At run time, whenever a processor is idle, 1=p th of the remaining iterations are removed from the most heavily loaded processor and transferred to ....
....is chosen as a classical example of a loop nest where each iteration of the outer loop performs a different amount of work; the corresponding code is shown in Figure 5.1.a. ffl Adjoint Convolution: This benchmark has been used to evaluate the effectiveness of dynamic loop mapping strategies [76, 153]; the code is shown in Figure 5.1.b. Although the code consists of a loop nest which has a form similar to that of upper triangular matrix multiplication, the size of the arrays involved is considerably smaller. ffl Upper Triangular Matrix Multiplication: The multiplication of two upper ....
E. Markatos, Scheduling for Locality in Shared-Memory Multiprocessors, PhD Thesis, Department of Computer Science, University of Rochester, 1993.
.... several mechanisms: the OS can migrate or replicate an application s data pages [4, 5, 11, 12] the OS or user level thread scheduler may attempt to schedule threads on processors where they have previously executed and built up a certain amount of memory or cache state (e.g. affinity scheduling [15, 18, 21]) or programmer hints for task scheduling may be embedded in an object s specification in an object oriented, task queue based parallel language [6] The task queue model is widely used for parallel programming and is well suited for dynamically changing environments [20] In contrast to NUMA ....
.... benefit very little from sharing based placement policies [19, 7] However, previous work has also shown that applications based on a task queue model can show appreciable performance gains from policies that place tasks near the data they reference or near other tasks that reference the same data [6, 15]. We have therefore restricted our work to applications conforming to a task queue model. Using a task queue model in a DSM or COMA environment raises several questions that previous work has not addressed: 1. Do task scheduling policies affect an application s data sharing patterns Do some task ....
Evangelos Markatos. Scheduling for locality in shared-memory multiprocessors. Technical Report 457, University of Rochester, May 1993.
.... than one application on the same processor (across applications) related work at the University of Rochester concentrates on the benefits of reusing cached data when executing lightweight threads of the same application on the same processor (within applications) Markatos1992] Markatos1992a] Markatos1993] The Rochester work also extends the notion of locality management to include one more level in the memory hierarchy by considering systems that may have local cache, local memory, and remote memory, such as the BBN TC2000. Markatos [Markatos1993] first demonstrates that fine grain parallel ....
....(within applications) Markatos1992] Markatos1992a] Markatos1993] The Rochester work also extends the notion of locality management to include one more level in the memory hierarchy by considering systems that may have local cache, local memory, and remote memory, such as the BBN TC2000. Markatos [Markatos1993] first demonstrates that fine grain parallel programs, because of the overhead required to load data into the local cache or memory, typically perform much worse than coarse grain implementations even though the cost of thread management is negligible. This motivates the need for techniques that ....
E. P. Markatos, Scheduling for Locality in Shared-Memory Multiprocessors, Ph.D. Thesis, Department of Computer Science, University of Rochester, Rochester, New York, May, 1993.
....of the same application on the same processor (within applications) The Rochester work also extends the notion of locality management to include one more level in the memory hierarchy by considering systems that may have local cache, local memory and remotememory, such as the BBN TC2000. Markatos [7] first demonstrates that fine grain parallel programs, because of the overhead required to load data into the local cache, or memory, typically perform much worse than coarse grain implementations even though the cost of thread management is negligible. This motivates the need for techniques that ....
E. P. Markatos, Scheduling for Locality in Shared-Memory Multiprocessors, Ph.D. Thesis, Department of Computer Science, University of Rochester, Rochester, New York, May, 1993. 18
No context found.
Markatos, E. Scheduling for Locality in Shared Memory Multiprocessors. Ph.D. thesis, University of Rochester, 1993.
No context found.
Evangelos Markatos. Scheduling for Locality in Shared-Memory Multiprocessors. PhD thesis, Department of Computer Science, University of Rochester, 1993.
No context found.
Evangelos Markatos. Scheduling for Locality in Shared-Memory Multiprocessors. PhD thesis, Department of Computer Science, University of Rochester, 1993.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC