| R. W. Hockney. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2. Parallel Computing, 20(3):389--398, 1994. |
....supported by a truly real time operating system, our mea surements must bear some runtime conditions. The aggregated bandwidth introduced by Xu and Hwang [31] offers a better metric to quantify the data transfer rate in a collective message passing operation. The asymptotic bandwidth by Hockney [13] is only effective in characterizing point to point communications. We did not apply the active messages (Culler, et al. 25] or the MPI FM (Chien, et al. 19] in our benchmark experiments, because they have not being widely ported on those target machines we have tested. We suggest extended ....
Roger W. Hockney; "The Communication Challenge for MPP: Intel Paragon and Meiko CS-2," Parallel Computing, 1994, Vol. 20, pp. 389-398.
....delay, d, most of the time is constant but occasionally may be unbounded due to hot spots in applications. 3.1.2.1 Communication Modeling An important concern is to model the communication time T required to send a message from one node to another. I use the communication modeling of Hockney [66]. Hockney s model characterizes the communication time for a point to point communication operation as: where is the start up time which is equal to the time needed to send a zero byte message, and includes the time required to prepare the message, such as adding a header, and a trailer. is the ....
....a new algorithm for total exchange operation, to be called combined total exchange algorithm,in Section 5.7. Finally, I summarize this chapter in Section 5.8. 5. 2 Communication Modeling for Broadcasting Multi broadcasting As discussed in Chapter 3, I use a modified Hockney s communication model [66]. I modify the Hockney s model into two models. In this section, I define the first model as used for hiding the reconfiguration delays in broadcasting and multi broadcasting algorithms. In Section 5.4, I define the second model for other collective communication algorithms. The second model ....
R. W. Hockney, "The Communication Challenge for MPP: Intel Paragon and Meiko CS-2", Parallel Computing, Volume 20, No. 3, March 1994, pp. 389-398.
....In the case of end to end latency bandwidth characterization, the latency L and bandwidth B parameters are supposed to correctly characterize the end to end delay D(S) of the messaging system for any message size S. This is because the following linear relation (derived from the one reported in [Hoc94] is assumed as an analytical model of the communication performance: D(S) L (S Gamma Sm ) B (2.1) where Sm is the minimal message size allowed by the system. Equation 2.1 assumes that the per byte delivery cost 1=B does not depend on the actual message size. However this does not always ....
R.W. Hockney. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2. Parallel Computing, 20(3):389--398, March 1994.
....In the case of end to end latency bandwidth characterization, the latency L and bandwidth B parameters are supposed to correctly characterize the end to end delay D(S) of the messaging system for any message size S. This is because the following linear relation (derived from the one reported in [43]) is assumed as an analytical model of the communication performance: D(S) L (S Gamma Sm ) B (2.1) where Sm is the minimal message size allowed by the system. Equation 2.1 assumes that the per byte delivery cost 1=B does not depend on the actual message size. However this does not always ....
R.W. Hockney. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2. Parallel Computing, 20(3):389--398, March 1994.
....computer is to measure the minimum time required to send a message between two processes located on different nodes. For this reason, many performance evaluation studies concentrate on point to point communication, in order to establish an initial comparison point among different platforms [Cul96, DD95, Hoc94, XH96]. Table 2 and Figure 2 show the results obtained running the point to point test described in the previous section. As can be observed in the table, the introduction of the new version of MPI implies a reduction in the software overhead. As a consequence, start up times are lower and the latency ....
....reduction of execution time after the upgrade. In fact, it happens that the message size required to obtain a reasonable performance has been increased after the change. Hockney defines L 1 2 as the length of the messages that allows a utilization of one half of the maximum channel bandwidth [Hoc94]. With the old subsystem, L 1 2 was 14 aprox. 3000 bytes (see Table 2) Now it has been increased to reach around 32 KB (see Figure 3) Therefore, the minimum message size required to efficiently run parallel applications has been increased in an order of magnitude. If the goal is to increase ....
R.W. Hockney. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2 . Parallel Computing 20 (1994), pp. 389-398.
....the physical medium. We define the throughput T (s) as the transfer rate perceived by the application when sending a message of size s: T (s) s=D(s) 1) A linear approximation of the performance characteristics of a messaging system is usually given in terms of latency and asymptotic bandwidth [4]. Asymptotic bandwidth is usually evaluated by measuring the throughput T (s) with very large messages. According to our average estimates, latency and asymptotic bandwidth of GAMMA are respectively 12.7 s and 12.22 MByte s. Thanks to its carefully optimized communication path, GAMMA achieves the ....
R.W. Hockney. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2. Parallel Computing, 20(3):389--398, March 1994.
.... occomm cs2 or edit makefile to include new target 5) run occomm according to target system e.g. occomm contig.dat 2 Introduction to OCCOMM Several groups have reported the message passing performance of parallel computers using a ping pong or message exchange test (Addison et al., 1993,Hockney, 1994, Dongarra and Dunigan, 1994, Cameron et al., 1995) but it is always assumed that the data to be communicated is contiguous in memory. In general this is not the case and in the context of boundary exchange for grid based applications it becomes pertinent to ask what is the best way of exchanging ....
Hockney, R.W., "The communication challenge for MPP: Intel Paragon and Meiko CS-2", 1994, Parallel Computing, 20, 389-398.
....from the past and present are shown in Figure 2 1. The round trip cost, further explained in Table 2.1, is roughly based on a two way null remote procedure call (RPC) or a ping pong operation. It is obtained by doubling the reported value when only the one way cost is provided in the literature [19, 20, 21, 15, 22, 23, 24, 25, 26, 27]. Since an actual implementation for [18] does not exist, the round trip cost is extrapolated from the specified overhead of assembling, sending and receiving a remote read message 1 . On the horizontal axis, Figure 2 1 also shows that the systems employ a variety of mechanisms for robustness. ....
....100 byte message (Communication Co Processor) CM 5 32 Mhz SPARC 143 S swap[9] CMMD library 3. 4 S swap[9] CMNF library J Machine 12.5 Mhz MDP 43 cyc[10] Streaming Injection 1024 max round trip null RPC CS 2 40 Mhz 20 S[39] Channel SPARC 24.6 S[23] DMA w active message Hardware Table Lookup 174 S[21] PARMACS macros ping pong 206 S[20] mpsc library mesg exhange T3D 150 Mhz 21064 600nS[26] Shared Memory 2048 max remote read 2:76 S [40] Fast Messages F I Specific 16 byte Fetch and Increment Hardware Support 120 S[26] Interrupt Driven User Level Message Message Handler T 88100MP dispatch ....
Roger W. Hockney, "The Communication Challenge for MPP: Intel Paragon and Meiko CS-2", Parallel Computing 1994, pp. 389--398.
....alternative approach to modeling communication, which implicitly accounts for such hardware and software features, is to develop general formulas for performance. By setting the values of constituent parameters, these formulas can be fit to data produced by empirical studies. For example, Hockney [15] proposed a communication cost model that identifies two metrics as the most important performance indicators for modeling point to point communication on MPPs. The startup time, t 0 , is the delay of a zero length message, and r 1 is the asymptotic bandwidth of the network. Thus, the average ....
.... latency formula, we define the parameter to be: Tn (a 0 ; a 1 ; a 2 ; an Gamma1 ) T s (a 0 ; a 1 ; a 2 ; an Gamma1 ) 2) The ratio is the reciprocal of the metric m 1=2 , which in Hockney s model is the message length required to achieve half the asymptotic bandwidth [15]. Plotting the performance of a communication algorithm according to its coordinates ( T s ) results in a plot for that algorithm. The use of the plot is illustrated in the following analysis. Shown in Figure 1(a) are three lines marked 1, 2 and 3. Let us suppose that they represent ....
R. W. Hockney, "The communication challenge for MPP: Intel Paragon and Meiko CS-2," Parallel Computing, vol. 20, pp. 389--398, 1994.
....machine. It is based on the Intel i860 XP rated at 50 MHz with a peak performance of 75 MFLOPS. The two dimensional mesh network provides a peak bandwidth of 200 MB s. The MPI version used was that of Intel based on MPICH 1.0.12. 4 Results As a basic model to evaluate the results Hockney s model[9] was chosen: t = t 0 M r 1 (1) where t is overall time in sec, t 0 is time measured for a zero size message, M is message size in bytes and r 1 is some asymptotic bandwidth measured in MB s. Measurements on all machines showed that this is in general a good model to describe the behaviour ....
R.W. Hockney, "The Communication Challenge for MPP: Intel Paragon and Meiko CS-2", Parallel Computing 20 (1994), 389--398.
....messages. The results obtained are outlined in Table 2. The results that we obtained can be compared to the ones obtained by PARMA 2 as well as on other parallel platforms. In order to make this comparison we assume the following linear dependency of Delay as a function of the message length [11] for messages ranging from 0 to 40 bytes: Delay = L No bytes=B s In case of messages ranging from 40 to 1516 bytes, we use instead the following linear formula: Delay = L 40=B s (No bytes Gamma 40) Bm in order to take into account the different bandwidth resulting from the variable length ....
R.W. Hockney. The communication challenge for MPP: Intel paragon and meiko CS-2. Parallel Computing, 20(3):389--398, March 1994.
.... T (s) as the transfer rate perceived by the application when sending a message of size s: T (s) s=D(s) 1) The performance of message passing communications on a parallel platform is often characterized in terms of two synthetic parameters, namely communication latency and maximum bandwidth [2]. Maximum bandwidth is usually introduced as a constant value according to the following linear approximation for D(s) D(s) D(0) s=Bandwidth (2) where s is the message size in bytes. However such a linear approximation not always holds accurately. Actually the bandwidth is a function of the ....
R.W. Hockney. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2. Parallel Computing, 20(3):389--398, March 1994.
....by the Ethernet standard) along the wire. We define the throughput T (s) as the end to end transfer rate for a message of size s: T (s) s=D(s) 1) A linear approximation of the performance characteristics of a messaging system is usually given in terms of latency and asymptotic bandwidth [9]. Asymptotic bandwidth is usually evaluated by measuring the throughput T (s) with very large messages. According to our average estimates, latency and asymptotic bandwidth of GAMMA are respectively 12.7 s and 12.22 MByte s. Figure 1 depicts the throughput curve of GAMMA. The comparison against ....
R.W. Hockney. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2. Parallel Computing, 20(3):389--398, March 1994.
....previously presented. They represent an effective way to accurately estimate the costs of the MPI primitives on the CS 2. Point to point communications The Hockney s model is one of the most meaningful model to describe the point topoint communication on parallel machines with distributed memory [3]. According to this model the communication latency can be expressed as: where t 0 is the startup time (in microseconds) which is the time needed to send a 0byte message, m is the message length (bytes) and r is the asymptotic bandwidth in Mbytes per second, which is the maximal bandwidth ....
Hockney, R.W., "The Communication Challenge for MPP: Intel Paragon and Meiko CS2 ", Parallel Computing, North-Holland, vol. 20, pp. 389-398, 1994.
....2.2 SUPRENUM results and comparison with iPSC 860 We report below results for the COMMS1 benchmark on the SUPRENUM and iPSC 860 computers. Results comparing COMMS1 on the iPSC 860 and Touchstone Delta have been reported in [18] and a comparison between the Intel Paragon and Meiko CS 2 in [19]. The basic performance of the Intel iPSC 860 has also been measured and evaluated by Berrendorf and Helin [3] Table 1 gives the values obtained for the communication parameters, in the version of the benchmark using the native SUPRENUM extensions to the Fortran90 language. These include a SEND ....
R. W. Hockney, The communication challenge for MPP: Intel Paragon and Meiko CS-2, Parallel Computing 20 389-398 (1994). 27
No context found.
R. W. Hockney. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2. Parallel Computing, 20(3):389--398, 1994.
No context found.
R.W. Hockney, "The Communication Challenge for MPP: Intel Paragon and Meiko CS-2", Parallel Computing 20 (1994), 389-398.
No context found.
Hockney, R.W., 1994, "The communication challenge for MPP: Intel Paragon and Meiko CS-2", Parallel Computing, 20, 389-398.
No context found.
R.W. Hockney. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2. Parallel Computing, 20(3):389--398, March 1994.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC