| B. Gorda and R. Wolski. Time Sharing Massively Parallel Machines. In International Conference on Parallel Processing, volume II, pages 214--217, August 1995. |
....evidence exists. Theory suggests that preemption be used to ensure good response times for small jobs [64] especially since workloads have a high variability in computational requirements [21] This comes close on the heels of actual systems that implement gang scheduling for just this reason [46,32,27,20]. Actually two metrics may be used to gauge the responsiveness of a system: the actual response time (or turnaround time, i.e. the time from submittal to termination) or the slowdown (the ratio of the response time on a loaded system to the response time on a dedicated system) Using actual ....
B. Gorda and R. Wolski, "Time sharing massively parallel machines". In Intl. Conf. Parallel Processing, vol. II, pp. 214--217, Aug 1995.
....commercial machines such as the CM 5 from Thinking Machines [29] the Intel Paragon [25] the SGI multiprocessors running IRIX [1] the Meiko CS 2 [14] the Alliant FX 8 [41] and the MasPar and DAP SIMD arrays. Gang scheduling has also been used in a production system on a BBN Butterfly at LLNL [20], which Parts of this research have been presented at conferences [18, 19] Part of this work was done while at the IBM T. J. Watson Research Center, Yorktown Heights, NY 10598. Supported in part by grants from the Alexander Silberman Hebrew University Foundation for Applied Science and ....
Gorda, B., and Wolski, R. Time sharing massively parallel machines. Intl. Conf. Parallel Processing, Aug. 1995.
....1) Table 1. Gang schedulers on distributed memory parallel machines Comm. Pre Hardware Scheduler Platform Level emptive Support (anonymous) FR92] Makbilian OS Yes Yes CMOST[Thi92] TMC CM 5 User Yes Yes Medusa[OSS80,Ous82] Cm OS Yes Yes Meiko CS 2 Meiko CS 2 OS Yes N A MPCI GangScheduler[GW95 ] BBN TC2000 N A No Yes OSF 1 AD[ZRB 93] Intel Paragon OS Yes No PScheD[LG97] Cray T3E User Yes N A SCore D[HTI 96,HTI97b] Workstation Cluster User Yes No SHARE[FPR96] IBM SP 2 User No No In this paper, parallel process is defined as a set of UNIX processes that are execution ....
Brent Gorda and Rich Wolski. Time Sharing Massively Parallel Machines. In
....and greater variation in results, apparently due to difficulties faced by the class scheduler in managing far more runnable threads than processors. 3 Gang Scheduler Design The gang scheduler developed by LLNL for Digital clusters is an evolution of earlier ones developed for the BBN TC2000 [7, 8] and Cray T3D [6, 10, 11] systems. Both implementations were very successful at adding a time sharing capability Table 1. Speedup achieved with gang scheduling on a busy computer Thread Count Speedup Percent Efficiency 1 1.000 100.0 2 1.983 99.1 3 2.998 99.9 4 3.989 99.7 5 4.992 99.8 6 ....
Gorda, B. and Wolski, R.: Timesharing massively parallel machines. International Conference on Parallel Processing, volume II, (Aug 1995) 214--217.
....commercial machines such as the CM 5 from Thinking Machines [29] the Intel Paragon [25] the SGI multiprocessors running IRIX [1] the Meiko CS 2 [14] the Alliant FX 8 [41] and the MasPar and DAP SIMD arrays. Gang scheduling has also been used in a production system on a BBN Butterfly at LLNL [20], which is now being ported to a new Cray T3D machine, and several other experimental systems [13, 39, 4, 17, 6] At first blush, it might appear that gang scheduling is a luxury that may not be worth the price. An optimal packing of gangs that gives minimal wasted processors is an NP complete ....
B. Gorda and R. Wolski, "Time sharing massively parallel machines". In Intl. Conf. Parallel Processing, Aug 1995.
....same time. Gang scheduling is a prominent feature of the Connection Machine CM 5 system [28] and is available on the Intel Paragon [17] the Meiko CS 2, and multiprocessor SGI workstations [2] It has also been used extensively in a home grown system on a BBN Butterfly at Lawrence Livermore Labs [13], which has recently been ported to their new Cray T3D system. The main drawback of using gang scheduling is the problem of fragmentation. Specifically, it may happen that a number of jobs are scheduled to run, and a few PEs are left over, but they are insufficient for any of the other queued ....
.... Argonne Natl Lab 15654 were parallel submit trace, not run trace 400 node Paragon 32500 jobs, 12 94 4 95 SDSC Intel scheduler [29, 17] San Diego SC 25867 were parallel 126 node Butterfly 35848 jobs, 1991 1992 home grown gang scheduler [14] LLNL 30000 were parallel no direct access to trace [13] 512 node IBM SP2 17947 jobs, 9 95 11 95 Scheduling by IBM LoadLeveler Cornell Theory Ctr 8598 were parallel no direct access to trace [15] 96 node Paragon 1723 jobs Intel scheduler ETH Zurich no direct access to trace [27] Table 1. Summary of systems and traces used in workload analysis. ....
B. Gorda and R. Wolski, "Time sharing massively parallel machines". In Intl. Conf. Parallel Processing, Aug 1995.
.... algorithms developed by Ousterhout fall into this category [438] The simplest is the matrix algorithm, which was implemented in the Medusa operating system on CM [439] in the Meiko CS 2 operating system, in the gang scheduler used for the BBN Butterfly at Lawrence Livermore National Lab [241, 240], in the experimental gang scheduling runtime library of MAXI, the Makbilan operating system [198] In the SHARE scheduler for the IBM SP2 [217] and in the DQT scheme designed for the RWC1 machine 21 This terminology is motivated by the common graphical rendering of a CM 5, where the PEs are ....
B. Gorda and R. Wolski, "Time sharing massively parallel machines". In Intl. Conf. Parallel Processing, vol. II, pp. 214--217, Aug 1995.
.... algorithms developed by Ousterhout fall into this category [268] The simplest is the matrix algorithm, which was implemented in the Medusa operating system on CM [269] in the Meiko CS 2 operating system, in the gang scheduler used for the BBN Butterfly at Lawrence Livermore National Lab [145, 144], in the experimental gang scheduling runtime library of MAXI, the Makbilan operating system [122] In the SHARE scheduler for the IBM SP2 [130] and in the DQT scheme designed for the RWC1 machine [169, 168] The idea of this algorithm is to view scheduling space as a matrix, where columns ....
B. Gorda and R. Wolski, "Time sharing massively parallel machines". In Intl. Conf. Parallel Processing, vol. II, pp. 214--217, Aug 1995.
....without assuming knowledge about the workload. It has therefore enjoyed considerable popularity among vendors, at least in the form of hype (all vendors claim to support some form of gang scheduling) but good implementations also exist. There have been a number of experimental implementations [38,15,12,54,22,25] that demonstrate its usefulness. Academically speaking, gang scheduling has repeatedly been shown to be inferior to dynamic partitioning (see below) but only by a small margin [23,8] The main drawbacks cited are interference with cache state, and possible loss of resources to fragmentation. As ....
B. Gorda and R. Wolski, "Time sharing massively parallel machines". In Intl. Conf. Parallel Processing, Aug 1995.
....Indeed, gang scheduling can actually give better service (reduced response time) and improved utilization, so using it leads to a win win situation relative to variable partitioning. The results agree with actual experience on the LLNL Cray T3D, which employs a home grown gang scheduler [12,17] (the original system software uses variable partitioning) When this scheduler was ported to the new Cray machine, utilization nearly doubled from 33.4 to 60.9 on average. Additional tuning has led to weekly utilizations that top 96 . 2 Approaches to Scheduling Jobs of Given Size The ....
....original version of this Gang Scheduler was developed for the BBN TC2000 computer. The BBN computer permitted programs to be assigned processors without locality constraints. Its timesharing through shared memory and paging was successful at providing both excellent interactivity and utilization [13,12]. 4.2 Policy Overview The T3D Gang Scheduler allocates processors and barrier circuits for all programs. In order to satisfy the diverse computational requirements of our clients, the programs are classified by access requirements: Interactive class jobs require responsive service Debug ....
B. Gorda and R. Wolski, "Time sharing massively parallel machines". In Intl. Conf. Parallel Processing, vol. II, pp. 214--217, Aug 1995.
....parallel computing at LLNL, a BBN TC2000 with 126 processors was acquired in 1989. The TC2000 has a shared memory architecture and originally supported space sharing only. The gang scheduler for the TC2000 reserves all resources at system startup and controls all resource scheduling from that time [6, 7]. User programs require no code changes to communicate with the gang scheduler. However, the program must load with a modified version of the mandatory parallel program initiation library. Rather than securing resources directly from the operating system, this library secures resources from the ....
B. Gorda and R. Wolski, Timesharing massively parallel machines. International Conference on Parallel Processing, volume II, Aug 1995, pp. 214-217.
....evidence exists. Theory suggests that preemption be used to ensure good response times for small jobs [65] especially since workloads have a high variability in computational requirements [22] This comes close on the heels of actual systems that implement gang scheduling for just this reason [47, 33, 28, 21]. Actually two metrics may be used to gauge the responsiveness of a system: the actual response time (or turnaround time, i.e. the time from submittal to termination) or the slowdown (the ratio of the response time on a loaded system to the response time on a dedicated system) Using actual ....
B. Gorda and R. Wolski, "Time sharing massively parallel machines". In Intl. Conf. Parallel Processing, vol. II, pp. 214--217, Aug 1995.
....Indeed, gang scheduling can actually give better service (reduced response time) and improved utilization, so using it leads to a win win situation relative to variable partitioning. The results agree with actual experience on the LLNL Cray T3D, which employs a home grown gang scheduler [12, 17] (the original system software uses variable partitioning) When this scheduler was ported to the new Cray machine, utilization nearly doubled from 33.4 to 60.9 on average. Additional tuning has led to weekly utilizations that top 96 . 2 Approaches to Scheduling Jobs of Given Size The ....
....original version of this Gang Scheduler was developed for the BBN TC2000 computer. The BBN computer permitted programs to be assigned processors without locality constraints. Its timesharing through shared memory and paging was successful at providing both excellent interactivity and utilization [13, 12]. 4.2 Policy Overview The T3D Gang Scheduler allocates processors and barrier circuits for all programs. In order to satisfy the diverse computational requirements of our clients, the programs are classified by access requirements: ffl Interactive class jobs require responsive service ffl ....
B. Gorda and R. Wolski, "Time sharing massively parallel machines". In Intl. Conf. Parallel Processing, vol. II, pp. 214--217, Aug 1995.
....as the number of nodes in each program is prohibitively large. In Figure 10a we depict predicted execution time (measured in processor clock cycles) as a function of machine size in processors. The estimates depicted in the graph are consistent with other experiments conducted using the TC2000 [3, 14]. Note that most of the exploitable parallelism is exhausted by 50 processors. The difference between the improvement gained between 50 processors and 100 processors is small compared to the gain achieved in the range between 1 and 50. In [14] the performance of GJ is measured on the machine ....
....show similar scheduling results for PIC. The scheduler elects to exploit no more than two parallel threads in each loop body due to the granularity of the TC2000. Given the communication to computation ratio of the TC2000, neither code can effectively use more than approximately 50 processors. In [3], a study is made of the parallel programs executed on the BBN TC2000 during three years of production parallel computing. Over the course of that time, greater than 70 of the parallel jobs used 30 processors or less out of a possible 128. If GJ and PIC can be thought of as average ....
B. Gorda and R. Wolski, Timesharing Massively Parallel Machines, submitted to Proc. of 1995 International Conference on Parallel Processing, August 1995.
No context found.
B. Gorda and R. Wolski. Time Sharing Massively Parallel Machines. In International Conference on Parallel Processing, volume II, pages 214--217, August 1995.
No context found.
B. Gorda and R. Wolski. Time Sharing Massively Parallel Machines. In International Conference on Parallel Processing, volume II, pages 214--217, August 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC