Results 1 - 10
of
32
A Historical Application Profiler for Use by Parallel Schedulers
- In Job Scheduling Strategies for Parallel Processing
, 1997
"... Scheduling algorithms that use application and system knowledge have been shown to be more effective at scheduling parallel jobs on a multiprocessor than algorithms that do not. This paper focuses on obtaining such information for use by a scheduler in a network of workstations environment. The log ..."
Abstract
-
Cited by 82 (0 self)
- Add to MetaCart
Scheduling algorithms that use application and system knowledge have been shown to be more effective at scheduling parallel jobs on a multiprocessor than algorithms that do not. This paper focuses on obtaining such information for use by a scheduler in a network of workstations environment. The log files from three parallel systems are examined to determine both how to categorize parallel jobs for storage in a job database and what job information would be useful to a scheduler. A Historical Profiler is proposed that stores information about programs and users, and manipulates this information to provide schedulers with execution time predictions. Several preemptive and non-preemptive versions of the FCFS, EASY and Least Work First scheduling algorithms are compared to evaluate the utility of the profiler. It is found that both preemption and the use of application execution time predictions obtained from the Historical Profiler lead to improved performance.
Preemptive scheduling of parallel jobs on multiprocessors
- In SODA
, 1996
"... Abstract. We study the problem of processor scheduling for n parallel jobs applying the method of competitive analysis. We prove that for jobs with a single phase of parallelism, a preemptive scheduling algorithm without information about job execution time can achieve a mean completion time within ..."
Abstract
-
Cited by 44 (3 self)
- Add to MetaCart
Abstract. We study the problem of processor scheduling for n parallel jobs applying the method of competitive analysis. We prove that for jobs with a single phase of parallelism, a preemptive scheduling algorithm without information about job execution time can achieve a mean completion time within 2 − 2 2 times the optimum. In other words, we prove a competitive ratio of 2 − n+1 n+1. The result is extended to jobs with multiple phases of parallelism (which can be used to model jobs with sublinear speedup) and to interactive jobs (with phases during which the job has no CPU requirements) to derive solutions guaranteed to be within 4 − 4 times the optimum. In comparison n+1 with previous work, our assumption that job execution times are unknown prior to their completion is more realistic, our multiphased job model is more general, and our approximation ratio (for jobs with a single phase of parallelism) is tighter and cannot be improved. While this work presents theoretical results obtained using competitive analysis, we believe that the results provide insight into the performance of practical multiprocessor scheduling algorithms that operate in the absence of complete information.
Adaptive Work Stealing with Parallelism Feedback
"... Abstract We present an adaptive work-stealing thread scheduler, A-STEAL, for fork-join multithreaded jobs, like those written using the Cilk multithreaded language or the Hood work-stealinglibrary. The A-STEAL algorithm is appropriate for large parallel servers where many jobs share a common multipr ..."
Abstract
-
Cited by 38 (8 self)
- Add to MetaCart
Abstract We present an adaptive work-stealing thread scheduler, A-STEAL, for fork-join multithreaded jobs, like those written using the Cilk multithreaded language or the Hood work-stealinglibrary. The A-STEAL algorithm is appropriate for large parallel servers where many jobs share a common multiprocessorresource and in which the number of processors available to a particular job may vary during the job's execution. A-STEALprovides continual parallelism feedback to a job scheduler in the form of processor requests, and the job must adapt its ex-ecution to the processors allotted to it. Assuming that the job scheduler never allots any job more processors than requestedby the job's thread scheduler, A-STEAL guarantees that the job completes in near-optimal time while utilizing at least a con-stant fraction of the allotted processors. Our analysis models the job scheduler as the thread sched-uler's adversary, challenging the thread scheduler to be robust to the system environment and the job scheduler's administra-tive policies. We analyze the performance of A-STEAL using "trim analysis, " which allows us to prove that our thread sched-uler performs poorly on at most a small number of time steps, while exhibiting near-optimal behavior on the vast majority.To be precise, suppose that a job has work T1 and critical-path length T1. On a machine with P processors, A-STEALcompletes the job in expected O(T1/eP + T1 + L lg P) timesteps, where L is the length of a scheduling quantum and ePdenotes the O(T1 + L lg P)-trimmed availability. This quan-tity is the average of the processor availability over all but
Using Parallel Program Characteristics in Dynamic Processor Allocation Policies
- Performance Evaluation
, 1996
"... In multiprocessors a parallel program's execution time is directly influenced by the number of processors it is allocated. The problem of scheduling parallel programs in a multiprogrammed environment becomes one of determining how to best allocate processors to the different simultaneously e ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
(Show Context)
In multiprocessors a parallel program's execution time is directly influenced by the number of processors it is allocated. The problem of scheduling parallel programs in a multiprogrammed environment becomes one of determining how to best allocate processors to the different simultaneously executing programs in order to minimize mean response time. In this paper we address the problem of how many processors to allocate to each of the executing parallel jobs by examining the following questions: 1. Is allocating processors equally among all jobs (equipartitioning) a desirable property of a scheduling algorithm? 2. Does using information about the service demand of parallel jobs significantly reduce mean response time? 3. Does using information about the efficiency with which parallel jobs execute significantly reduce mean response time? 4. Does allocating each job a number of processors corresponding to the knee of the execution time -- efficiency curve significantly reduc...
Performance-Driven Processor Allocation
, 2000
"... This work is focused on processor allocation in shared-memory multiprocessor systems, where no knowledge of the application is available when applications are submitted. We perform the processor allocation taking into account the characteristics of the application measured at run-time. We want to ..."
Abstract
-
Cited by 27 (5 self)
- Add to MetaCart
(Show Context)
This work is focused on processor allocation in shared-memory multiprocessor systems, where no knowledge of the application is available when applications are submitted. We perform the processor allocation taking into account the characteristics of the application measured at run-time. We want to demonstrate the importance of an accurate performance analysis and the criteria used to distribute the processors. With this aim, we present the SelfAnalyzer, an approach to dynamically analyze the performance of applications (speedup and execution time), and the Performance-Driven Processor Allocation (PDPA), a new scheduling policy which distributes processors considering both the global conditions of the system and the particular characteristics of running applications. This work also defends the importance of the interaction between the medium-term and the long-term scheduler to control the multiprogramming level in the case of the clairvoyant scheduling policies 1 . We have implemented our proposal in a SGI Origin2000 with 64 processors and we have compared its performance with that of some scheduling policies proposed so far and with the native IRIX scheduling policy. Results show that the combination of the SelfAnalyzer+PDPA with the medium/long-term scheduling interaction outperforms the rest of the scheduling policies evaluated. The evaluation shows that in workloads where a simple equipartition performs well, the PDPA also performs well, and in extreme workloads where all the applications have a bad performance, our proposal can achieve a speedup of 3.9 with respect to an equipartition and 11.8 with respect to the native IRIX scheduling policy.
Implementing Multiprocessor Scheduling Disciplines
- In Proceedings of IPPS/SPDP ’97 Workshop. Lecture Notes in Computer Science
"... An important issue in multiprogrammed multiprocessor systems is the scheduling of parallel jobs. Consequently, there has been a considerable amount of analytic research in this area recently. A frequent criticism, however, is that proposed disciplines that are studied analytically are rarely eve ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
An important issue in multiprogrammed multiprocessor systems is the scheduling of parallel jobs. Consequently, there has been a considerable amount of analytic research in this area recently. A frequent criticism, however, is that proposed disciplines that are studied analytically are rarely ever implemented and even more rarely incorporated into commercial scheduling software. In this paper, we seek to bridge this gap by describing how at least one commercial scheduling system, namely Platform Computing's Load Sharing Facility, can be extended to support a wide variety of new scheduling disciplines.
Dynamic vs. Static Quantum-Based Parallel Processor Allocation
- In JSSPP
, 1996
"... This paper improves upon previous synthetic workload models and compares the performance of dynamic spatial equipartitioning (EQS) and the semi-static quantum-based FB-PWS processor allocation defined in [23], under synthetic workloads that have not previously been considered. ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
(Show Context)
This paper improves upon previous synthetic workload models and compares the performance of dynamic spatial equipartitioning (EQS) and the semi-static quantum-based FB-PWS processor allocation defined in [23], under synthetic workloads that have not previously been considered.
Parallel Application Scheduling on Networks of Workstations
- Journal of Parallel and Distributed Computing
, 1997
"... Parallel applications can be executed using the idle computing capacity of workstation clusters. However, it remains unclear how to most effectively schedule the processors among different applications. Processor scheduling algorithms that were successful for shared-memory machines have proven to be ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
Parallel applications can be executed using the idle computing capacity of workstation clusters. However, it remains unclear how to most effectively schedule the processors among different applications. Processor scheduling algorithms that were successful for shared-memory machines have proven to be inadequate for distributed memory environments due to the high costs of remote memory accesses and redistributing data. We investigate how knowledge of system load and application characteristics can be used in scheduling decisions. We propose the new algorithm AEP(2) which, by properly exploiting both the information types above, performs better than other non-preemptive scheduling rules, and nearly as well as idealized versions of preemptive rules (with free preemption). We conclude that AEP(2) is suitable for use in scheduling parallel applications on networks of workstations. 1
Benefits of Speedup Knowledge in Memory-Constrained Multiprocessor Scheduling
- PERFORMANCE EVALUATION
, 1996
"... ..."
Parallel Application Characterization for Multiprocessor Scheduling Policy Design
- OF LECTURES NOTES IN COMPUTER SCIENCE
, 1996
"... Much of the recent work on multiprocessor scheduling disciplines has used abstract workload models to explore the fundamental, high-level properties of the various alternatives. As continuing work on these policies increases their level of sophistication, however, it is clear that the choice of appr ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
(Show Context)
Much of the recent work on multiprocessor scheduling disciplines has used abstract workload models to explore the fundamental, high-level properties of the various alternatives. As continuing work on these policies increases their level of sophistication, however, it is clear that the choice of appropriate policies must be guided at least in part by the typical behavior of actual parallel applications. Our goal in this paper is to examine a variety of such applications, providing measurements of properties relevant to scheduling policy design. We give measurements for both hand-coded parallel programs (from the SPLASH benchmark suites) and compiler-parallelized programs (from the PERFECT Club suite) running on a KSR-2 shared-memory multiprocessor. The measurements we present are intended primarily to address two aspects of multiprocessor scheduling policy design: -- In the spectrum between aggressively dynamic and static allocation policies, what is an appropriate choice for the rat...