| G. Abandah and E. Davidson, "Modeling the Communication Performance of the IBM SP2," Proc. 10th IEEE Int'l Parallel Processing Symp., Honolulu, Hawaii, pp. 249-257, April 1519, 1996. |
....widely applicable for solving real problems. The complexity of designing efficient parallel applications and algorithms requires that models be used at various levels of abstraction. Several approaches to model the communication performance of a multicomputer have been proposed in the literature [1, 3, 10]. LogP is a simple parallel machine model that reflects the most important parameters required to estimate the real performance of parallel computers [10] LogGP [3] is an extension of LogP capturing the increased network bandwidth for long messages. The message passing interface (MPI) standard ....
G. Abandah and E. Davidson, "Modeling the Communication Performance of the IBM SP2," Proc. 10th IEEE Int'l Parallel Processing Symp., Honolulu, Hawaii, pp. 249-257, April 1519, 1996.
....manager. 29 Setting up the partition, acquiring the required resources, and loading the executable may take a significant amount of time. For example, measurements on an IBM SP2 showed that this process can take from a few seconds to hundreds of seconds as the number of PEs grows from 1 to 32 [1]. In the Mach system these overheads were reduced by using hierarchical structure within each partition [401] Tearing down a partition and terminating jobs cleanly is just as important as setting things up. In particular, it is crucial that no stray threads be left behind when the job terminates ....
....each cluster. The allocation is done by Xylem, the Cedar operating system, from a global queue of tasks, where each task specifies the state of a whole cluster (i.e. 8 Alliant PEs) A server process executes periodically on each cluster, 68 PE 1 PE 2 PE 3 PE 4 i = 1 . x: a[0] a[1]: f(x) i = 2 . x: a[1] a[2] f(x) i = 3 . x: a[2] a[3] f(x) i = 4 . x: a[3] a[4] f(x) x: a[4] a[5] f(x) x: a[5] a[6] f(x) x: a[6] ....
[Article contains additional citation context not shown here]
G. A. Abandah and E. S. Davidson, "Modeling the communication performance of the IBM SP2 ". In 10th Intl. Parallel Processing Symp., pp. 249--257, Apr 1996.
....manager. Setting up the partition, acquiring the required resources, and loading the executable may take a significant amount of time. For example, measurements on an IBM SP2 showed that this process can take from a few seconds to hundreds of seconds as the number of PEs grows from 1 to 32 [1]. In the Mach system these overheads were reduced by using hierarchical structure within each partition [244] Tearing down a partition and terminating jobs cleanly is just as important as setting things up. In particular, it is crucial that no stray threads be left behind when the job terminates ....
G. A. Abandah and E. S. Davidson, "Modeling the communication performance of the IBM SP2 ". In 10th Intl. Parallel Processing Symp., pp. 249--257, Apr 1996.
....separate TCP IP messages in a loop to all destinations at least not for small messages and small clusters. However, when used to transfer large executable files, large benefits are possible. Thus the high startup costs for parallel applications, that have been observed for several systems [1], can be avoided or at leaset reduced. We start with a short overview of the ParPar system, and the need for multicasts. We then describe the implementation of our multicast facility in detail, and present experimental evidence of its impact on application execution when used to transfer ....
....0 ft(f 0 )g 1 n(f) where t(f) is the time since file f was last used, and n(f) is the number of times file f was used. The first term therefore captures the relative age of the executable in comparison with other files, while the second captures its importance. Both terms are in the range [0, 1], so the score is in the range [0, 2] The executables with the highest score is evicted, until the desired disk space is cleared. Thus jobs that were run repeatedly are allowed to stay longer, out of anticipation that they will be used again. 6 Conclusions Our first conclusion is that building ....
G. A. Abandah and E. S. Davidson, "Modeling the communication performance of the IBM SP2 ". In 10th Intl. Parallel Processing Symp., pp. 249--257, Apr 1996.
....by: SSO(p) 1:2 0:22p 0 1 2 3 4 5 6 7 8 0 4 8 12 16 20 24 Number of processors Measured Data 1.2 0.22 p Figure 14: Static scheduling overhead. Compared with multicomputers, the SPP1000 has a relatively short SSO, e.g. SSO on the IBM SP2 is about one order of magnitude higher [14]. The SPP1000 advantage stems from having one operating system image, with central control that swiftly allocates and starts parallel tasks. Moreover, multiple processors can share the same executable code, thus, reducing the overhead of distributing the executable code. The Convex compilers also ....
....sharedmemory multiprocessors and that the resulting characterization is useful for developing and tuning shared memory applications and compilers. We have shown that the corresponding characterization of message passing multicomputer communication performance can also be systematically carried out [14]. Acknowledgments The University of Michigan Center for Parallel Computing, site of the SPP1000, is partially funded by NSF grant CDA 92 14296. ....
G. Abandah and E. Davidson, "Modeling the Communication Performance of the IBM SP2," in 10th Int'l Parallel Processing Symp. (IPPS'96), pp. 249-- 257, April 1996.
....where each processor prints its task id. Measuring the execution wall time is a good approximation for the SSO. Figure 4 shows the range and average of the SSO for 10 runs. Using curve fitting, the SSO in seconds can be roughly approximated by: SSO(p) 1:2 0:22p Compared with multicomputers [10], the SPP 1000 has relatively short SSO. The SPP 1000 advantage stems from having one operating system image with central control that swiftly allocates and starts parallel tasks. Moreover, multiple processors can share the same executable binaries. The Convex Fortran and C parallelizing ....
....in this paper can be applied to other shared memory multiprocessors and that the resulting characterization is useful for developing and tuning sharedmemory applications and compilers. We have shown that the corresponding characterization of message passing multicomputer communication performance [10, 15] can also be systematically carried out. ....
G. Abandah and E. Davidson, "Modeling the communication performance of the IBM SP2," in 10th International Parallel Processing symposium (IPPS'96), (Honolulu, Hawaii), April 1996.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC