15 citations found. Retrieving documents...
Michael Wan, Reagan Moore, George Kremenek, and Ken Steube. A batch scheduler for the Intel Paragon with a non-contiguous node allocation algorithm. In Proceedings of the 10th International Parallel Processing Symposium, pages 29--40, 1996.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Theory and Practice in Parallel Job Scheduling - Feitelson, Rudolph.. (1994)   (60 citations)  (Correct)

....satisfaction was greatly increased since smaller jobs tended to get through faster, because they could bypass the very big ones. Henderson [35] describes the Portable Batch System (PBS) another system in which performance gains are achieved by moving away from strict FCFS scheduling. Wan et al. [92] also implement a non FCFS scheduler that uses a variation of a 2 D buddy system to do processor allocation for the Intel Paragon. Thread oriented scheduling Nelson, Towsley, and Tantawi [57] compare four cases in which parallel jobs are scheduled in either a centralized or de centralized fashion, ....

....and reflects directly on the degree to which large investments in parallel hardware are used efficiently. Throughput figures are hardly ever used. Reported utilization figures vary from 50 for the NASA Ames iPSC 860 hypercube [21] through around 70 for the CTC SP2 [37] 74 for the SDSC Paragon [92] and 80 for the Touchstone Delta [54] up to more than 90 for the LLNL Cray T3D [20] Utilization figures in the 80 90 range are now becoming more common, due to the use of more elaborate batch queueing mechanisms [49,79,92] and gang scheduling [20] These figures seem to leave only little ....

[Article contains additional citation context not shown here]

M. Wan, R. Moore, G. Kremenek, and K. Steube, "A batch scheduler for the Intel Paragon with a non-contiguous node allocation algorithm". In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 48-- 64, Springer-Verlag, 1996. Lecture Notes in Computer Science Vol. 1162.


Implementing Multiprocessor Scheduling Disciplines - Parsons, Sevcik   (16 citations)  (Correct)

....and low system utilizations. On the other hand, most research results support the need for both preemption and mechanisms for adjusting processor allocations of jobs. Given that a number of high performance computing centers have begun to develop their own scheduling software [Hen95,Lif95,SCZL96,WMKS96] it is clear that existing commercial scheduling software is often inadequate. To support these centers, however, mechanisms to extend existing systems with external (customer provided) policies are starting to become available in commercial software [SCZL96] This allows new scheduling policies ....

Michael Wan, Regan Moore, George Kremenek, and Ken Steube. A batch scheduler for the Intel Paragon with a non-contiguous node allocation algorithm. In Dror G. Feitelson and Larry Rudolph, editors, Job Scheduling Strategies for Parallel Processing, Lecture Notes in Computer Science Vol. 1162, pages 48--64. Springer-Verlag, 1996.


Parallel Job Scheduling on Heterogeneous Networks of.. - Lynch   (Correct)

....Hotovy [Hot96] reports a two phase distribution of job duration. Jobs using from 2 16 processors had decreasing runtimes associated with more processors, but jobs using more than 16 processors tended to use more resources as their parallelism increased. Many studies [FN95, SSG95, Hot96, SGS96, WMKS96] have also noted the large numbers of small, short jobs often over half the total number of jobs. As well, they have all found that these small jobs consume a small fraction of the total resources, e.g. node seconds or CPU cycles. Downey [Dow97] proposes a model where the sequential lifetime ....

Michael Wan, Reagan Moore, George Kremenek, and Ken Steube. A batch scheduler for the Intel Paragon with a non-contiguous node allocation algorithm. In Dror G. Feitelson and Larry Rudolph, editors, Job Scheduling Strategies for Parallel Processing, pages 48--64. Springer-Verlag, 1996. Lect. Notes Comput. Sci. vol. 1162.


The Workload on Parallel Supercomputers: Modeling the.. - Lublin, Feitelson (2001)   (7 citations)  (Correct)

....Archive [19] The rst log is that from the San Diego Supercomputer Center Intel Paragon machine. This machine has 416 nodes, of which 352 constitute the batch partition and 48 are in the interactive partition. Scheduling is based on NQS, with special handling of partitions and node con gurations [26]. The log spans all of 1995 and 1996, with about two thirds of the jobs in the rst year; the total is 113515 successful jobs. This log was originally available are two separate logs for the two years, and therefore sometimes appears so in our analysis (as SDSC95 and SDSC96) The second log is ....

....possibility that the workload evolves over time. 8. 1 Distinction between Interactive and Batch Jobs The SDSC95 and SDSC96 logs include a distinction between interactive jobs, which are submitted directly to the machine s scheduler, and batch jobs that are handled by the NQS batch queueing system [26]. This enables us to repeat all the analysis of the previous sections on each subset of jobs separately. 23 0 5 10 15 20 0 0.05 0.1 0.15 0.2 log consume time pdf SDSC95 all batch interactive 0 5 10 15 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 log runtime pdf SDSC95 all batch ....

M. Wan, R. Moore, G. Kremenek, and K. Steube, \A batch scheduler for the Intel Paragon with a non-contiguous node allocation algorithm". In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 48-64, SpringerVerlag, 1996. Lect. Notes Comput. Sci. vol. 1162. 28


Predicting Queue Times on Space-Sharing Parallel Computers - Downey (1997)   (28 citations)  (Correct)

....have reported lifetime distributions for parallel scientific applications, but none discuss the shape of the lifetime distribution or use it to develop a workload model. Hotovy et al. 10] 11] describe the workload on the IBM SP2 at the Cornell Theory Center. Steube and Moore [14] and Wan et al. [15] describe the workload and scheduling policy on the Intel Paragon at the San Diego Supercomputer Center; in [14] the distribution of lifetimes for batch jobs appears to fit the uniform log model proposed here (for jobs less than six hours in duration) Feitelson and Nitzberg [6] describe the ....

M. Wan, R. Moore, G. Kremenek, and K. Steube. A batch scheduler for the Intel Paragon with a non-contiguous node allocation algorithm. In D. G. Feitelson and L. Rudolph, editors, Job Scheduling Strategies for Parallel Processing, Springer-Verlag LNCS Vol 1162, pages 48--64, April 1996.


Modeling the Effects of Contention on Application Performance in .. - Figueira (1997)   (5 citations)  (Correct)

....used executes O(p 2 ) steps, each of which consisting of five memory accesses and seven arithmetic operations. 5. The number of applications executing on a computer is generally limited by the computer s resources and is typically small (e.g. 7. 7 on the average in the Intel Paragon at SDSC [50]) delay comm ij , pp i pm i pp i pm i 1 ip pp i pm i 41 Figure 3 5 shows the modeled and measured times for executing the SOR benchmark on the Sun in non dedicated mode, parameterized by problem size (which is NN) In this experiment, two more applications are executing on the ....

M. Wan, R. Moore, G. Kremenek, K. Steube, "A Batch Scheduler for the Intel Paragon with a Non-Contiguous Node Allocation Algorithm ", in Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing, pp. 29-40, April 1996.


Modeling the Effects of Contention on Application Performance in .. - Figueira (1997)   (5 citations)  (Correct)

....used executes O(p 2 ) steps, each of which consisting of five memory accesses and seven arithmetic operations. 5. The number of applications executing on a computer is generally limited by the computer s resources and is typically small (e.g. 7. 7 on the average in the Intel Paragon at SDSC [50]) delay comm ij , pp i pm i pp i pm i 1 ip pp i pm i 41 Figure 3 5 shows the modeled and measured times for executing the SOR benchmark on the Sun in non dedicated mode, parameterized by problem size (which is NN) In this experiment, two more applications are executing on the ....

M. Wan, R. Moore, G. Kremenek, K. Steube, "A Batch Scheduler for the Intel Paragon with a Non-Contiguous Node Allocation Algorithm ", in Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing, pp. 29-40, April 1996.


Job Scheduling in Multiprogrammed Parallel Systems - Feitelson (1997)   (16 citations)  (Correct)

....use. It is safe to speculate that this will also be the primary use of parallel workstations. Other systems attempt to combine interactive and batch usage modes. For example, it is possible to partition the machine statically into a large batch partition and a smaller interactive partition [400, 422, 606]. Alternatively, it is possible to initially allocate all the resources to batch processing, and preempt PEs and memory in favor of interactive jobs as needed. This leads to the notion of adaptive resource allocation for the batch jobs, using the same techniques as for interactive jobs (e.g. time ....

....does not become a bottleneck. Another approach is to coordinate the PE allocation using a hierarchical structure. For example, a buddy system can be used where a partition can be composed of a number of blocks from different levels, such that the sum of the allocated PEs is the requested number [375, 606]. Alternatively, one can use a virtual hierarchy of control on the PEs themselves. Such a structure forms the basis of the so called wave scheduling mechanism developed for the MICROS distributed operating system [595, 596] It guarantees that the allocated PEs are close to each other in the ....

[Article contains additional citation context not shown here]

M. Wan, R. Moore, G. Kremenek, and K. Steube, "A batch scheduler for the Intel Paragon with a non-contiguous node allocation algorithm". In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 48--64, Springer-Verlag, 1996. Lecture Notes in Computer Science Vol. 1162.


Job Scheduling in Multiprogrammed Parallel Systems - Feitelson (1997)   (16 citations)  (Correct)

.... allocations are also squares with sides that are powers of two, leading to significant internal fragmentation [219, 218] These problems are solved by using the buddy system only to identify free submeshes, and allocating a number of free submeshes of different sizes to satisfy each request [224, 352]. Such a scheme is used by NQS to pack batch jobs on the Intel Paragon. The price is that the allocation is not necessarily a rectangle, and may even be non contiguous. Another interesting modification is to use a precise buddy system, in which buddy sizes are not predefined powers of two, but ....

....of service demands is wide. In this case preemption should be used, because there is a good chance that new jobs will be shorter than those currently in the system [292, 276] As measurements of actual workloads have repeatedly shown that job service demands have a large coefficient of variation [118, 115, 359, 171, 352], the conclusion is that time slicing will lead to reduced response times. 55 Fairness and accounting A quest for fairness can be interpreted as implying that all jobs receive equal service. For example, if one job has more threads than another, each of its threads will run for less time. But ....

[Article contains additional citation context not shown here]

M. Wan, R. Moore, G. Kremenek, and K. Steube, "A batch scheduler for the Intel Paragon with a non-contiguous node allocation algorithm". In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 48--64, Springer-Verlag, 1996. Lecture Notes in Computer Science Vol. 1162.


Improved Utilization and Responsiveness with Gang Scheduling - Feitelson, Jette (1997)   (54 citations)  (Correct)

.... has a distinct set of resource limits associated with it) The idea is that the user would choose the queue that best represents the application s needs, and the system would then be able to select jobs from the different queues to create a job mix that uses the system s resources effectively [31]. However, experience indicates that this information is unreliable, as shown by the distributions of queue time utilization in Fig. 3. The graphs show that users tend to be extremely sloppy in selecting the queue, thus undermining the whole scheme. The graphs show the distributions in buckets of ....

....while small jobs are allowed up to 4 hours provided at least 32 nodes are available. Thus, if only a few nodes are available, all jobs are restricted to 10 minutes, and responsiveness for short jobs is improved. This achieves a similar effect to setting aside a pool of nodes for interactive jobs [31]. During non prime time these restrictions are removed. Again, we assume the scheduler knows the runtimes of all jobs. Gang: gang scheduling with no information regarding runtimes. The jobs are packed into slots using the buddy scheme, including alternate scheduling [4] Two versions with ....

M. Wan, R. Moore, G. Kremenek, and K. Steube, "A batch scheduler for the Intel Paragon with a non-contiguous node allocation algorithm". In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 48-- 64, Springer-Verlag, 1996. Lecture Notes in Computer Science Vol. 1162.


Theory and Practice in Parallel Job Scheduling - Feitelson, Rudolph.. (1997)   (60 citations)  (Correct)

....will complete. Henderson describes the Portable Batch System (PBS) another system in which performance gains are achieved by moving away from strict FCFS scheduling [36] Wan et al. developed a scheduler that uses a variation of a 2 D buddy system to do processor allocation for the Intel Paragon[91]. 2.2.3 Thread oriented scheduling Nelson, Towsley, and Tantawi compared four cases in which parallel jobs were scheduled in either a centralized or de centralized fashion, and the threads of a job were either spread across all processors or were all executed on one processor [58] They found ....

....and reflects directly on the degree to which large investments in parallel hardware are used efficiently. Throughput figures are hardly ever used. Reported utilization figures vary from 50 for the NASA Ames iPSC 860 hypercube [22] through around 70 for the CTC SP2 [38] 74 for the SDSC Paragon [91] and 80 for the Touchstone Delta [55] up to more than 90 for the LLNL Cray T3D [21] Utilization figures in the 80 90 range are now becoming more common, due to the use of more elaborate batch queueing mechanisms [50, 80, 91] and gang scheduling [21] These figures seem to leave only little ....

[Article contains additional citation context not shown here]

M. Wan, R. Moore, G. Kremenek, and K. Steube, "A batch scheduler for the Intel Paragon with a non-contiguous node allocation algorithm". In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 48--64, Springer-Verlag, 1996. Lecture Notes in Computer Science Vol. 1162.


Modeling the Effects of Contention on the Performance of.. - Figueira, al. (1996)   (10 citations)  (Correct)

....are two additional applications executing on the Sun which alternate computation and communication cycles. The contending applications communicate 25 and 76 of 1. The number of applications executing on a computer is generally limited by the computer s resources and typically small, e.g. see [18]. pcomp i pcomm i pcomp i pcomm i p 2 = pcomm 1 0.2 0.7 0.3 0.8 = pcomm 2 0.2 0.3 = pcomp 1 0.2 0.7 0.3 0.8 = pcomp 2 0.7 0.8 = delay comp i delay comm i pcomp i pcomm i pcomp i pcomm i 1 i p pcomp i pcomm i C sun p 8 HPDC 96 the time, ....

M. Wan, R. Moore, G. Kremenek, and K. Steube, "A Batch Scheduler for the Intel Paragon with a Non-contiguous Node Allocation Algorithm", in Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing, pp. 29-40, April 1996.


Improved Utilization and Responsiveness with Gang Scheduling - Feitelson, Jette (1997)   (54 citations)  (Correct)

.... has a distinct set of resource limits associated with it) The idea is that the user would choose the queue that best represents the application s needs, and the system would then be able to select jobs from the different queues to create a job mix that uses the system s resources effectively [31]. However, experience indicates that this information is unreliable, as shown by the distributions of queue time utilization in Fig. 3. The graphs show that users tend to be extremely sloppy in selecting the queue, thus undermining the whole scheme. The graphs show the distributions in buckets of ....

....while small jobs are allowed up to 4 hours provided at least 32 nodes are available. Thus, if only a few nodes are available, all jobs are restricted to 10 minutes, and responsiveness for short jobs is improved. This achieves a similar effect to setting aside a pool of nodes for interactive jobs [31]. During non prime time these 1 Our workload model does not include a daily cycle of job submittals it is a continuous stream of jobs with the same statistical properties. Thus in our simulations the distinction is only in the scheduling policy, which is switched every 12 hours. restrictions ....

M. Wan, R. Moore, G. Kremenek, and K. Steube, "A batch scheduler for the Intel Paragon with a non-contiguous node allocation algorithm". In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 48--64, Springer-Verlag, 1996. Lecture Notes in Computer Science Vol. 1162.


Knowledge Of Characteristics In Multiprogrammed Multiprocessor.. - Parsons (1997)   (Correct)

.... response times measured in hours (despite median job lengths of only minutes) are considered to be common [Hot96a] Given the few available choices, high performance computing centers have turned to implementing their own scheduling software to meet the needs of their users [Hen95, Lif95, SCZL96, WMKS96] Commercial scheduling software companies have responded to this need by providing mechanisms allowing external (customer provided) policies to be implemented on top of the existing software base [SCZL96] In this section, the implementation of a variety of fully functional scheduling ....

Michael Wan, Regan Moore, George Kremenek, and Ken Steube. A batch scheduler for the Intel Paragon with a non-contiguous node allocation algorithm. In Dror G. Feitelson and Larry Rudolph, editors, Job Scheduling Strategies for Parallel Processing, Lecture Notes in Computer Science Vol. 1162, pages 48--64. Springer-Verlag, 1996.


Presicting Queue Times On Space-Sharing Parallel Computers - Downey (1996)   (Correct)

No context found.

Michael Wan, Reagan Moore, George Kremenek, and Ken Steube. A batch scheduler for the Intel Paragon with a non-contiguous node allocation algorithm. In Proceedings of the 10th International Parallel Processing Symposium, pages 29--40, 1996.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC