| Larry Dowdy. On the Partitioning of Multiprocessor Systems. Technical Report 88-06, Department of Computer Science, Vanderbilt University, July 1988. |
....line parameter of the application or setting an environment variable tions with high efficiency, and the algorithm continues allocating processors. In order to avoid a great number of re allocations, which will imply a great overhead, the equal efficiency extrapolates the efficiency curve [Dowdy88], from the most recently measured efficiency (calculated by the SelfAnalyzer) Once extrapolated, the equal efficiency works in the following way: it initially assigns a single processor to each application, and then it assigns the remaining processors one by one to the application with the ....
L. Dowdy, "On the Partitioning of Multiprocessor Systems", Tech. Report, Vanderbilt University, June 1988.
....about application s efficiency, it moves processors from applications with low efficiency to applications with high efficiency, and repeat the algorithm. In order to avoid a great number of re allocations, that will imply a great overhead, the equal efficiency extrapolates the efficiency curve [Dowdy88], see Figure 4, from the most recently measured efficiency (calculated by the SelfAnalyzer) Once extrapolated, the equal efficiency works in the following way: it initially assigns a single processor to each application, and then it assigns the remaining processors one by one to the application ....
L. Dowdy, "On the Partitioning of Multiprocessor Systems", Technical Report, Vanderbilt University, June 1988.
....equal to c, it is straightforward to show from equation (5) that (N j Gamma 1)OE 0 N j (N j Gamma 1)fi 0 = 1 c Gamma 1: 7) We will use the above equation to define particular overhead characteristics in the experiments in section 4.3. Finally, we consider an alternate speedup model [5, 16], that has been used widely in studies of scheduling policy performance [13, 17, 4, 20] S j (n) ffi 1)n ffi n : 8) We note that this function is a special case of equation (5) in which fi 0 = 0 and OE 0 = 1 1 ffi : 9) Thus, the curves in Figure 2(a) are also examples of the ....
L. W. Dowdy, On the Partitioning of Multiprocessor Systems. Technical Report, Vanderbilt University, July 1988.
....compiled with the NanosCompiler and linked with the NthLib. The CpuManager applies the equal eff proposed in [23] The goal of the equal eff is to maximize the system efficiency. It uses the dynamically calculated efficiency of the applications, obtained through the SelfAnalyzer, to extrapolate [7] the complete efficiency curve. Once extrapolated, the equal eff works in the following way: it initially assigns a single processor to each application, and then it assigns the remaining processors one by one to the application with the currently highest (extrapolated) efficiency. SGI MP: ....
L.Dowdy. "On the Partitioning of Multiprocessor Systems". Technical Report, Vanderbilt University, June 1988
....First, to address the local irregularities in the efficiency curve, we use an artificial curve extrapolated from the most recently measured efficiency. In particular, having just measured the efficiency of an application on P processors to be #, we use the function Efficiency(p) 1 #) p #) [5], choosing # so that the function evaluates to # at p = P . Next, we determine allocations by following an equal efficiency rule; that is, we allocate processors in a way that causes all applications to have about equal efficiencies according to our extrapolated curves 7 . In particular, we ....
L. Dowdy. On the Partitioning of Multiprocessor Systems. Technical report, Vanderbilt University, June 1988.
....and in scalability analysis. Carlson94] proposes an algorithm for off line detection of phases, and [Carlson92] describes three families of methods for smoothing execution profile data. They are useful in automatically detecting phases in execution profiles. The execution signature [Dowdy88; Leuze89; Dowdy94] is defined as the execution time of the parallel algorithm, T(p) as a function of the number of processors. It can further be divided into two components: T p T p T p comp comm ( where T comp (p) is the computation part, called the computation signature, and T ....
Dowdy, L. W., "On the partitioning of multiprocessor systems," Technical Report, Department of Computer Science, Vanderbilt University, Nashville, TN, 1988.
....full speedup functions for the applications. Rather, we have the presumably accurate value measured for the most recent allocation, as well as potentially out of date information obtained for previous allocations. To overcome this problem, we employ a simple analytic speedup function, taken from [5], as a substitute for the jobs actual speedups. We parameterize the speedup function for job j 12 so that it intersects the speedup estimate obtained in the most recent measurement interval. Our allocation scheme then walks along these speedup curves. This procedure uses only the most recent ....
L. Dowdy. On the Partitioning of Multiprocessor Systems. Technical report, Vanderbilt University, June 1988.
....case in current software. An application is dynamic if it can perform this repartitioning function. This distinction between static and dynamic applications is captured by the speedup functions used for them. Following [2] for dynamic applications we use a synthetic speedup function (taken from [3]) S dynamic (n) 1 fi)n= fi n) The jobs in our workloads choose speedup parameter fi uniformly from 30 to 300, the range considered in [2] For static partitioning jobs, we assume that the jobs have been partitioned into N threads, so that they can make use of the full machine should it ....
L. Dowdy. On the partitioning of multiprocessor systems. Technical report, Vanderbilt University, June 1988.
....rate function and is used to characterize the rate at which the work, W i , is executed when allocated a specified number of processors, p i . This models the efficiency of the parallel job. A number of models of parallel system and parallel program performance have been proposed and studied [10, 35, 7, 32]. We use the following execution rate function, used in a number of previous studies [5, 23, 30] which has been derived from an execution rate function (also called an execution signature) proposed by Dowdy [7] F = S(p i ) 1 fi i ) p i fi i p i : In this equation S(p i ) is the ....
.... system and parallel program performance have been proposed and studied [10, 35, 7, 32] We use the following execution rate function, used in a number of previous studies [5, 23, 30] which has been derived from an execution rate function (also called an execution signature) proposed by Dowdy [7]: F = S(p i ) 1 fi i ) p i fi i p i : In this equation S(p i ) is the speedup obtained when the job is executed on p i processors and fi i is the parameter that is used to determine the efficiency of the job. If the number of processors allocated to a job, p i , is fixed for the ....
L. Dowdy. On the partitioning of multiprocessor systems. In Performance 1990: An Internationional Conference on the Performance of Computers and Computer Networks, pages 99--129, March 1990.
....to address the local irregularities in the efficiency curve, we use an artificial curve extrapolated from the most recently measured efficiency alone. In particular, having just measured the efficiency of an application on p processors, we use the function #1 ##=#p ##, which is taken from [5], choosing # so that the function interpolates the most recent efficiency measurement. Next, we determine allocations by following an equal efficiency rule;thatis,we allocate processors in a way that causes all applications to have about equal efficiencies according to our extrapolated curves ....
L. Dowdy. On the Partitioning of Multiprocessor Systems. Technical report, Vanderbilt University, June 1988.
....case in current software. An application is dynamic if it can perform this repartitioning function. This distinction between static and dynamic applications is captured by the speedup functions used for them. Following [3] for dynamic applications we use a synthetic speedup function (taken from [4]) S dynamic (n) 1 fi)n= fi n) The jobs in our workloads choose speedup parameter fi uniformly from 30 to 300, the range considered in [3] For static partitioning jobs, we assume that the jobs have been partitioned into N threads, so that they can make use of the full machine should it ....
L. Dowdy. On the partitioning of multiprocessor systems. Technical report, Vanderbilt University, June 1988.
....N (i.e. zero) the 2 point distribution has highest C N among all distributions of N, and the spread distribution has C N in between. # E is derived from a deterministic function g : E (k) # # # g(N) g(k) k =N 1, P. k =1,2, N, We use the following functional form derived from [Do88] for g: g(k) k b (1 b)k ####### , k =1,2, P. Figure 2.2 plots this nondecreasing ERF for several values of b. # As in [MEB88] two extremes of correlation between job demand and parallelism are considered: no correlation (r = 0) in which case D and N are independent, and full ....
L. Dowdy. On the Partitioning of Multiprocessor Systems. Technical Report, Vanderbilt University, July 1988.
....Symbol Parallelism Pmax p N CN CDF of N H High 0.9 1.0 90.10 0.33 M Moderate 0.1 1 (0.4P) 43.14 0.80 L Low 0.1 0.9 11.00 2.70 1.0 0. 1 100 1 k F(k) L M H ffl To evaluate the impact of sublinear ERFs, the following parametric ERD will be considered, which is derived from an execution signature in [6]. fl(k) 1 fi)k k fi ; k = 1; 2; 5) At fi = 0 we get the flat ERD fl(k) j 1. By increasing fi we obtain ERDs that are closer to linear as shown in Figure 2 until we obtain the linear ERD when fi = 1. We note that fi = 100 represents a considerably sublinear ERD, whereas fi = 500 is ....
L. Dowdy. On the Partitioning of Multiprocessor Systems. Technical Report, Vanderbilt University, Nashville, TN, July 1988.
....of workstations to act as a general purpose compute server, a scheduling approach that can effectively handle a mixed workload is required. To most efficiently allocate the resources in the cluster, parallel applications should dynamically share workstations with interactive, sequential processes [45, 128]. However, few studies have examined the performance of explicit coscheduling with interactive applications or with applications performing I O [111, 144] Interactive jobs typically perform a small amount of computation after waiting a longer interval for input from the user. Similarly, I O bound ....
.... sequential world; compromising between response time and throughput while maintaining fairness is still an active area of research [60, 69, 74, 83, 87, 131, 155, 158, 170] Since sharing the cluster dynamically between parallel and interactive applications is likely to give the best utilization [45, 128], it is important to solve this problem in the presence of parallel jobs, even though this further complicates the issues. From the perspective of the interactive job, the ideal approach is to promptly interrupt the communicating job whenever the interactive job has work to perform. From the ....
Larry Dowdy. On the Partitioning of Multiprocessor Systems. Technical Report 88-06, Department of Computer Science, Vanderbilt University, July 1988. 197
....Ideally, each phase of a computation is treated separately, but it is more often the case that a single signature is constructed for an entire computation. Dowdy uses the simple function: T (p) C 1;j C 2;j =p where C 1;j and C 2;j are constants obtained from empirical measurements [Dow88] Although widely used, this characterization suffers from two problems. First, it is unrealistic because an increased number of processors always leads to a reduction in execution time (assuming C 2;j 0) In practice, parallelism related overheads can often exceed the gains obtained from ....
L. W. Dowdy. On the partitioning of multiprocessor systems. Technical Report 88-06, Dept. of Computer Science, Vanderbuilt University, 1988.
....time of n parallel tasks on p (p n) processors, with one of them being faster than the remaining ones. 2 Multiprocessor Scenarios and Scheduling Issues A key dimension of scheduling policies concerns the frequency with which processor allocation and assignments are made: static vs. dynamic [11, 18, 24, 27, 41, 44]. Either one or both of the two phases of a scheduling policy can be static or dynamic. With static scheduling, the decision concerning processor allocation and task assignment is made at the onset of the job execution. Static policies have low run time overhead and the scheduling costs are paid ....
....environments. With variable partitioning, the size of the partition allocated to each job varies, according to the job s characteristics and scheduling policy. Variable partitioning is implemented by the allocation phase of scheduling disciplines, which can be either static or dynamic. Dowdy [11] proposes a static allocation scheme, where the processor partitioning is based on the execution signature of the jobs. The characteristic speedup function which relates the job s execution time to the number of processors allocated to it is called execution signature [11] Tripathi et al. 18] ....
[Article contains additional citation context not shown here]
Dowdy, L. W. On the partitioning of multiprocessor system. Technical Report 88-06, Department of Computer Science, Vanderbilt University, 1988.
....user, and degree of parallelism is more effective. Thus, the latter classification will be used. 3 The Historical Profiler The first step in the design of the Historical Profiler is defining the information that the Historical Profiler should provide to the scheduler. In the literature [MEB90, Dow88, PD89, GST91, Wu93, Sev94, PS96] scheduling algorithms that use the execution time of jobs have been frequently examined. Therefore, the Historical Profiler will provide a method of obtaining an estimate of the time a job will take to execute, with an indication of the uncertainty in the estimate. 3.1 Environment The ....
L. W. Dowdy. On the partitioning of multiprocessor systems. Technical Report Technical Report 88-06, Vanderbilt University, March 1988.
....for virtual memory. Our primary conclusion is that any performance benefits resulting from the easing of minimum processor constraints imposed by the memory requirements of jobs will be negated by the overhead due to paging. 1 Introduction In recent years, several adaptive partitioning strategies [6, 19, 5, 18, 17, 3, 14] have been proposed for scheduling parallel jobs on message passing multicomputers. A key characteristic of these policies is that they reduce the number of processors allocated to individual jobs as the load on the system increases. The motivation behind this policy is to take advantage of the ....
....These assumptions result in an average job demand (considering both classes) of 1125 seconds, with the coefficient of variation of job demand (CD ) of 2.05. The speedup function used in our simulations is given by S(p) 1 fi)p= fi p) This speedup function has been used by several studies [13, 3, 5] and is shown in Figure 1. For a given number of processors (p) and given job demand (on one processor) the speedup function is used to compute the processing requirement of the job on p processors. In our simulations, we assume that fi is uniformly distributed between 30 and 300, the range ....
L. Dowdy. On the partitioning of multiprocessor systems. Technical report, Vanderbilt Univ., Nashville,TN, July 1988.
....the parameters of real programs. Our goal here is to show that this model captures the behavior of real programs running on diverse parallel architectures. This technique is also useful for summarizing the speedup curve of a job and interpolating between speedup measurements. 1. 1 Related work In [4], Dowdy proposed a speedup model based on a program with a sequential component of length c 1 and a perfectly parallel component of length c 2 . The execution time, T (n) of such a program is T (n) c 1 c 2 =n, where n is the number of processors. Chiang et al. 3] derive from this a model of ....
Lawrence W. Dowdy. On the partitioning of multiprocessor systems. Technical Report 88-06, Vanderbilt University, March 1988.
....reallocation of processors among jobs outperforms a more static, equipartition policy. Tucker and Gupta [249] propose a different solution for reducing the frequency of context switches and for reducing cache corruption, which is explained next. Dynamic Partitioning: The dynamic partitioning [187, 73] (also known as Process control with processor partitioning) policy proposed by Tucker and Gupta [249] has the goal of minimizing context switches, so that less time is spent rebuilding a processor s cache. Their approach is based on the hypothesis that an application performs best when the ....
L. Dowdy. On the partitioning of multiprocessor systems. Technical Report 88-06, Department of Computer Science, Vanderbilt University, July 1988.
....based on results from an analytic model [40] 1.2.2 Distributed Memory Systems The first parallel systems used very simple scheduling policies that allocated the entire parallel machine to each waiting job in turn for some time quantum or until the job completed. Early modelling studies [14, 25] expose the potential performance gain of multiprogramming over uniprogramming. Early work on multiprogramming distributed memory systems extends policies developed for the shared memory environment to the distributed memory environment. Crovella et al. 12] report on an implementation of a ....
L. Dowdy. On the partitioning of multiprocessor systems. Technical Report, Vanderbilt University, Nashville, TN, July 1988.
....with a two program workload. In section 3, the model is adapted to an Intel iPSC 2 hypercube multiprocessor. Section 4 contains results from numerical experiments performed on the Intel iPSC 2 and compares throughputs obtained empirically to those predicted by the model. 3 2. The Model In [2], a high level model of a simple multiprocessor system whose processing elements (PEs) can be partitioned into two sets is described. The model assumes that program scheduling is done via a separate host processor, and that the parallel workload consists of two programs. The model description uses ....
.... both programs are executing on the multiprocessor, the processing rate is the sum of the individual processing rates times the potential factor SM (P 1 ; P 2 ) Execution signatures of the form M (P 1 ; H) p 1 D 11 p 1 D 12 and M (H; P 2 ) p 2 D 21 p 2 D 22 (1) have been suggested [2], where D i1 and D i2 are parameters dependent on characteristics of the specific architecture and of the specific program i. If D i1 = 0, program i experiences a linear speedup; if D i2 = 0, the execution rate of program i is constant, independent of the number of processors used. The model is ....
Lawrence W. Dowdy, On the partitioning of multiprocessor systems, Technical eport , Department of Computer Science, Vanderbilt University, Nashville, TN, 37235, March 1988.
No context found.
Larry Dowdy. On the Partitioning of Multiprocessor Systems. Technical Report 88-06, Department of Computer Science, Vanderbilt University, July 1988.
No context found.
L. Dowdy, "On the Partitioning of Multiprocessor Systems", Tech. Report, Vanderbilt University, June 1988.
No context found.
L. Dowdy, "On the Partitioning of Multiprocessor Systems", Tech. Report, Vanderbilt University, June 1988.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC