| J. Singh. J. Hennessy and A. Gupta, "Scaling parallel programs for multiprocessors: methodology and examples", IEEE Computer, June 1993, pp. 42-50. |
....was performed to derive application specific expressions for the computation and communication time for each phase of the algorithm. Based on these expressions speedup and the scaling behavior were modeled and compared to measurements on a transputer based system and on a SUPRENUM cluster. In [Sing 93] several application specific parameters are used to model the scaling behavior of an application. The influence of the parameters on various scaling models (time constraint, memory constraint) is investigated. 2.1.2 Signatures, Profiles and Shapes A signature is a (graphical) representation of ....
J. P. Singh, J. L. Hennessy, and A. Gupta. "Scaling Parallel Programs for Multiprocessors: Methodology and Examples". IEEE Computer, pp. 42--50, July 1993.
....size may unintentionally also lead to scheduling by duration, if there is some statistical correlation between these two job attributes. As it turns out, the question of whether such a correlation exists is not easy to settle. Three application scaling models have been proposed in the literature [30, 23]: Fixed work. This assumes that the work done by a job is xed, and parallelism is used to solve the same problems faster. Therefore the runtime is assumed to be inversely proportional to the degree of parallelism (negative correlation) This model is the basis for Amdahl s law. Fixed time. ....
J. P. Singh, J. L. Hennessy, and A. Gupta, \Scaling parallel programs for multiprocessors: methodology and examples". Computer 26(7), pp. 42-50, Jul 1993.
....size are scaled (holding the number of processors fixed) We assume that the more powerful systems of the future will be used to run larger problems, or obtain results of improved accuracy, rather than to simply solve the same problems in less time. Thus, a time constrained scaling approach [14] is adopted, in which application parameters are scaled concurrently with the system parameters in such a way that the execution times before and after scaling the application and the machine are equal. Our focus is on asymptotic results; specifically, as the machine and the application are ....
....the applications that are considered in this work, and the manner by which the parameters of each are scaled. Section 4 presents our scaling analysis and the results of that analysis. Finally, in Section 5, conclusions are presented. 2 Scaling There has been much previous work on scalability [3 8,11 14,17 21], although all of this work focuses on large scale shared memory multiprocessors and message passing machines, and all but [4] is concerned with scaling the system size (number of processors) In contrast, we consider small scale machines, and keep the number of processors fixed while scaling the ....
[Article contains additional citation context not shown here]
Singh, J.P., Hennessy, J.L., and Gupta, A., "Scaling Parallel Programs for Multiprocessors: Methodology and Examples", IEEE Computer, Vol.26, No.7 (July 1993), pp. 42-50.
....3.4 4 1116.7 4171.5 3.7 8 705.2 2344.3 3.3 16 569.3 1970.9 3.5 32 1305.3 3311.6 2.5 64 2350.8 4155.8 1.8 128 3280.1 4408.1 1.3 system 44.1 438.0 9.9 Table 2. Runtime statistics for jobs with different degrees of parallelism. increased available parallelism. Three models have been proposed [27, 25]: Fixed work. This assumes that the work done by a job is fixed, and parallelism is used to solve the same problems faster. Therefore the runtime is assumed to be inversely proportional to the degree of parallelism. This model is the basis for Amdahl s law [2] Fixed time [10, 11] Here it ....
J. P. Singh, J. L. Hennessy, and A. Gupta, "Scaling parallel programs for multiprocessors: methodology and examples". Computer 26(7), pp. 42--50, Jul 1993.
....2. PERF. ON A MODERATE SCALE, HW CC SYSTEM 27 enough processor and node to make application performance reasonable, and our goal is not to compare with the absolute performance of other processors. As the number of processors changes, we need a scaling model under which to scale the problem size [81, 71]. We use the three major models, which may each be applicable in di#erent circumstances: problem constrained (PC) or constant problem size scaling, time constrained (TC) scaling, and memory constrained (MC) scaling. Most of the chapter focuses on PC scaling, under which speedup is defined simply ....
....(Time(p) Time(1) From (2. 1) speedup can be measured as the increase in the useful work done during that fixed execution time Speedup(p) Work(p) Work(1) Memory constrained scaling(MC) Here neither work nor time remains fixed (in fact, both can increase dramatically as discussed in [71]) We must therefore resort to the full expression in (2.1) so Speedup(p) Increase in Work Done Increase in Time Taken, where each increase is a ratio. For the memory constrained scaling expression, if the increase in execution time were only due to the increase in work and not due to ....
[Article contains additional citation context not shown here]
J. P. Singh, A. Gupta, and J. Hennessy. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 1994.
.... become increasingly complex and irregular as we try to solve more realistic problems and it has also been shown to deliver very good performance when supported in hardware in tightly coupled multiprocessors, at least up to the 64 128 processor scale where experiments have been performed [58, 57, 48, 87, 88, 96, 89]. It is also the programming model of choice for small scale multiprocessors, so provides a graceful migration path. The last of the programming model possibilities (message passing CHAPTER 1. INTRODUCTION 17 everywhere) not only forces programmers to use explicit message passing but also does ....
J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 26(7):42--50, july 1993.
.... become increasingly complex and irregular as we try to solve more realistic problems and it has also been shown to deliver very good performance when supported in hardware in tightly coupled multiprocessors, at least up to the 64 128 processor scale where experiments have been performed [58, 57, 48, 87, 88, 96, 89]. It is also the programming model of choice for small scale multiprocessors, so provides a graceful migration path. The last of the programming model possibilities (message passing CHAPTER 1. INTRODUCTION 17 everywhere) not only forces programmers to use explicit message passing but also does ....
J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. Computer, 26(7):42--50, July 1993.
....the increase in system size [10] Finally, in time constrained scaling, the application parameters are scaled so that the running time does not change while the system size changes. All of these models have some deficiencies and none of them are universal for all applications as pointed out by [19]. There are a wide range of optimizations designed to improve the performance of problems on multiprocessor architectures. Any given optimization targets a certain class of problems and the effectiveness of the optimization varies with problem characteristics. We believe that it is also important ....
J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, pages 42--50, July 1993.
....origin or not) globally addressable memory provides a seamless extension of the memory model viewed by threads assigned to the same processing node. For applications with irregular and unpredictable data needs, the lack of global naming capability can hurt the programmability and efficiency [96, 104]. The ability to name memory objects globally will also facilitate the support of dynamic load balancing of threads. As a result, shared address space is adopted by most multithreaded execution models with dataflow origin. Consistency and Replication A memory consistency model represents a ....
J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 6(26):42--50, 1993.
....[9] The original reference data set provided with wave5 is sized inappropriately for the caches on today s machines: the data set processed by each call to PARMVR is less than 300KB. Larger problem sizes provided with the benchmark grow along the time dimension but not in the space dimension [16]. Since the original data set was too small to be representative of problems likely to be run on today s parallel machines, we enlarged the problem by increasing the amount of data accessed in each loop. In the enlarged problem, the amount of data accessed by each loop ranges from 256KB to 17MB. ....
J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. Computer, 26(7):42--50, 1993.
....is N . Extensive data with regard to timing and parallel performance (on a SP2) are available for linearly scaled problems, i.e. N = N 0 Q (with N 0 = 3200) and Q = 4; 9; 16. As soon as N varies, either in the analysis or in experiments, it is important to take application parameters into account [SHG93]. From the theory of hyperbolic difference schemes it is known that the Courant number is the vital similarity parameter. Therefore, we fix the Courant number. This means that both the spatial grid size h (N = O(1=h 2 ) and the time step vary with =h constant. For linearly scaled problems this ....
Singh J., Hennessy J., and Gupta A. (1993) Scaling parallel programs for multiprocessors: methodology and examples. IEEE Computer July: 42--50.
....Analysis, IEEE Int l Conf. on Application Specific Systems, Architectures, and Processors, 1997, pp. 304 315. 6] D. Royo, M. Valero Garca, and A. Gonzlez, A Jacobi based Algorithm for Computing Symmetric Eigenvalues and Eigenvectors in a Two dimensional Mesh, Research Report UPC DAC 1997 54. [7] J.P. Singh, J.L. Hennessy and A. Gupta, Scaling Parallel Programs for Multiprocessors: Methodology and Examples, IEEE Computer, pp. 42 50, July 1993. 8] J.H. Wilkinson, The Algebraic Eigenvalue Problem, Claredon Press, Oxford, 1965. 8 the observation that A and U require 2n 2 data ....
....into account in the Min operator to compute n L (p) 2.3. 3 Memory constrained scalability analysis In this case, when increasing the number of nodes the user increases the problem size as much as possible without exceeding the amount of memory of the system (memory constrained scaling model [7]) Therefore, the figure of merit F(p) is simply the size of the greatest problem which can be solved in a system of p nodes: Since the problem size must be greater than or equal to G 1 (p) the constraint that limits the scalability of the system is, in this case: 3 Example of application ....
[Article contains additional citation context not shown here]
J.P. Singh, J.L. Hennessy and A. Gupta, "Scaling Parallel Programs for Multiprocessors: Methodology and Examples," IEEE Computer, pp. 42-50, July 1993.
....While memory availability is indeed growing at an amazing rate, so are processor speeds and user requirements. Therefore requiring jobs to fit into available memory forces users to take memory availability into account when designing programs, and might limit the computations that can be performed [536] 27 . Providing virtual memory is increasingly recognized as a necessity [532] Systems that use preemption, on the other hand, cannot afford not to support memory management as well. When PEs are shared among a number of jobs, so is the memory. The more jobs there are, the harder it is to ....
J. P. Singh, J. L. Hennessy, and A. Gupta, "Scaling parallel programs for multiprocessors: methodology and examples". Computer 26(7), pp. 42--50, Jul 1993.
....run longer. However, it seems that all these models are over simplified to the point where it is hard to correlate them with measured results. In particular, users configure their applications according to their needs rather than according to the way resources happen to be packaged in the machine [18]. Thus users rarely use all the memory available, on any size partition. It is true, however, that they tend to use more on larger partitions. Finally, we note that modeling the memory usage distribution itself is not easy, because it does not seem to be similar to commonly used analytical ....
J. P. Singh, J. L. Hennessy, and A. Gupta, "Scaling parallel programs for multiprocessors: methodology and examples". Computer 26(7), pp. 42--50, Jul 1993.
....from development of parallel scientific applications in at least two ways: 1) the reason why parallelism is exploited, and (2) how it is exploited. ad. 1: Reasons for exploiting parallelism. An important reason for exploiting parallelism in the case of scientific applications is due to scaling [12]: we simply want to do more in the same time. This means that the volume of data that is to be processed is enlarged, or that the time step is decreased to reduce the computational error. However, the reason for exploiting parallelism in real time applications originates from the demand to meet ....
J.P. Singh, J.L. Hennessy, and A. Gupta. "Scaling Parallel Programs for Multiprocessors: Methodology and Examples." Computer, 26(7):42--50, July 1993.
....number of processors varies. Some plausible invariants are ffl total problem size (Amdahl 1967) ffl work per processor (Gustafson 1988) ffl total execution time (Worley 1990) ffl memory per processor (Sun 1993) ffl efficiency (Grama et al. 1993) ffl computational error (e.g. discretization error) (Singh et al. 1993) While a fixed problem size is generally too restrictive, keeping the amount of memory used per processor constant as the number of processors grows often allows the problem to grow at an impractically high rate, since the amount of Parallel Sparse Solver 29 1 2 4 8 16 32 number of processors ....
Singh, J.P., Hennessy, J.L., and Gupta, A. 1993. Scaling parallel programs for multiprocessors: methodology and examples. IEEE Computer, 26(7):42--50.
....systems. Algorithm researchers have concentrated on the design of highly efficient parallel volume rendering algorithms [SS91, MPS92, MPHK94, Hsu93, CC93, Neu93, NL92, WS93] Their work has been concerned primarily with optimizing performance with respect to constant problem size (CPS) scaling [SHG93] and has largely ignored systems issues. Lately, some work has been done in developing distributed rendering systems that can make use of available parallel resources [ABSS94, AMD94, RLGB94] Unfortunately, this later work has focused on developing machine dependent, self contained applications ....
J. P. Singh, J. Hennesy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 26(7):42--50, 1993.
....versions of the HIRLAM model with respect to the pure forecast calculations. As explained in section 2 each method has its own characteristics resulting in different spatial and temporal resolutions. This makes a comparison not trivial. A general discussion about this topic can be found in [9]. Our strategy consists of comparing the execution times of the different methods for performing calculations on the same physical area during the same simulated time span with the same accuracy. A first comparison between the gridpoint and spectral model can be based on the tables 1 and 2. In ....
J.P. Singh, J.L. Hennessy, and A. Gupta, Scaling Parallel Programs for Multiprocessors: Methodology and Examples, IEEE Computer, Vol. 26, No. 7, July 1993, 42--50.
.... today s memory hierarchies, as well as those of tomorrow s low cost processors, such as multimedia co processors, and (2) it provides a more appropriate ratio between data set and cache size, modeling programs with larger data sets or data sets with less data locality than those in our benchmarks [19]. We also examined a variety of register file sizes, ranging between 264 and 352, to gauge the sensitivity of the register file management techniques to register size. With more than 352 registers, other processor resources, such as the instruction queues, become performance bottlenecks. At the ....
J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 27(7):42--50, July 1993.
....operation. In this work, we assume that W depends exclusively on n and, therefore, increasing the problem size is considered equivalent to increasing the input data size. Other authors have pointed out that the input data size is not always the only parameter which determines the problem size [6]. Examples of other parameters which influence the problem size are the numerical accuracy or the interval between timesteps. Anyway, we believe that the proposals of this paper can be easily extended to a more general view of the problem size. We denote by R the size of the main memory per node ....
J.P. Singh, J.L. Hennessy and A. Gupta, "Scaling Parallel Programs for Multiprocessors: Methodology and Examples," IEEE Computer, pp. 42-50, July 1993.
....usually measured as speedup over the best uniprocessor execution. Origin has a fast enough processor and node, and our goal is not to compare with the absolute performance of other processors. As the number of processors changes, we need a scaling model under which to scale the problem size [17, 14]. We use the three major models, which may each be applica ble in di#erent circumstances: problem constrained (PC) or constant problem size scaling, time constrained (TC) scaling, and memory constrained (MC) scaling. Most of the paper focuses on PC scaling, under which speedup is defined simply ....
.... (Time(p) Time(1) From (1) speedup can be measured as the increase in the useful work done during that fixed execution time Speedup(p) Work(p) Work(1) Memory constrained scaling(MC) Here neither work nor time remains fixed (in fact, both can increase dramatically as discussed in [14]) We must therefore resort to the full expression in (1) so Speedup(p) Increase in Work Done Increase in Time Taken. For the memory constrained scaling expression, if the increase in execution time were only due to the increase in work and not due to overheads of parallelism, and if there ....
[Article contains additional citation context not shown here]
J. Singh, A. Gupta, and J. Hennessy. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 1994.
....usually measured as speedup over the best uniprocessor execution. Origin has a fast enough processor and node, and our goal is not to compare with the absolute performance of other processors. As the number of processors changes, we need a scaling model under which to scale the problem size [17, 14]. We use the three major models, which may each be applicable in different circumstances: problem constrained (PC) or constant problem size scaling, time constrained (TC) scaling, and memory constrained (MC) scaling. Most of the paper focuses on PC scaling, under which speedup is defined simply ....
.... (Time(p) Time(1) From (1) speedup can be measured as the increase in the useful work done during that fixed execution time Speedup(p) Work(p) Work(1) ffl Memory constrained scaling(MC) Here neither work nor time remains fixed (in fact, both can increase dramatically as discussed in [14]) We must therefore resort to the full expression in (1) so Speedup(p) Increase in Work Done Increase in Time Taken. For the memory constrained scaling expression, if the increase in execution time were only due to the increase in work and not due to overheads of parallelism, and if there ....
[Article contains additional citation context not shown here]
J. Singh, A. Gupta, and J. Hennessy. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 1994.
No context found.
J. Singh. J. Hennessy and A. Gupta, "Scaling parallel programs for multiprocessors: methodology and examples", IEEE Computer, June 1993, pp. 42-50.
No context found.
J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 27(7):42--50, July 1993.
No context found.
J.P. Singh, J.L. Hennessy, and A. Gupta. "Scaling Parallel Programs for Multiprocessors: Methodology and Examples. " Computer, 26(7):42--50, July 1993.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC