| D. K. Bradley. First and Second Generation Hypercube Performance. Technical Report UIUCDCS-R-88-1455, Dept. of Computer Science, University of Illinois at Urbana-Champaign, Sep. 1988. |
....specific aspects of data network performance: cooperative message passing, virtual channels, active messages, circular shifts, and spatial communication locality. The cooperative message passing benchmark and spatial communication locality benchmark were derived from work by Grunwald and Bradley [13, 8]. The original purpose of these two benchmarks was to measure the communication performance of distributed memory parallel systems, specifically the Intel iPSC family of hypercubes, for a variety of regular and irregular communication patterns. We ported these benchmarks to the CM 5 by replacing ....
....canonical layout after the reordering process. In the New Mexico Order, the data array is striped across the four computation nodes in units 51 . A[113 114] A[1 2] A[15 16] Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 Disk 6 Disk 7 A[7 8] A[21 22] A[5 6] A[19 20] A[3 4] A[17 18] A[115 116] A[119 120] A[127 128] A[117 118] A[9 10] A[23 24] A[121 122] A[11 12] A[25 26] A[123 124] A[13 14] A[27 28] A[125 126] Parity Parity Parity Parity Figure 8.3: Data Layout on the Disk Array of 14 data elements ....
Bradley, D. K. First and Second Generation Hypercube Performance. Master's thesis, University of Illinois at Urbana--Champaign, Department of Computer Science, Sept 1988. 72
....the potentially nonlocal assignment to the force array (statement ( in Figure 4) Compiling this naively, without message blocking, would result in M 1 P airsave (short) messages per processor. For systems whose communicationtime to send m units of data is well modeled by fim with AE fi [3], this would increase the communication cost by a factor =fi. Another issue besides raw message blocking is how much we can gain by combining non local reductions. For example, if processor p does not own Atom I, but has to make several contributions to F (I) it might be profitable to combine ....
D. K. Bradley. First and second generation hypercube performance. Technical Report UIUCDCS--R--88--1455, Dept. of Computer Science, University of Illinois at UrbanaChampaign, 1988.
....performance of short messages. 2.1.1 OS Level DMA Based Interfaces The first category of network interfaces which we consider consists of parallel processors that relegate message handling to the DMA interface under the operating system s control. Examples include the NCUBE [Pal88] the iPSC 2 [Bra88] and the SP 2 [FHP 94] At the hardware level, these machines send and receive messages by initiating a DMA transfer between main memory and the node s network channel. At the software level, the sending of a message is accomplished by writing the message into the memory and executing a ....
....requires the program on the receiving node to explicitly perform a receive operation. Since these machines involve the operating system to handle messages, the latency of sending messages can be quite high. A simple send with small messages takes 267 s on the iPSC 2 and 437 s on an NCUBE four [Bra88] Much of this time is due to high overhead operating system routines [Nug88] One group of users rewrote parts of the nCUBE 2 operating system and reduced the overhead by nearly an order of magnitude[vCGS92] However, even the remaining time was still quite large (11 15 s) due to the expense of ....
[Article contains additional citation context not shown here]
D. K. Bradley. First and second generation hypercube performance. Technical Report UIUCDCS-R-88-1455, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA, September 1988.
....than local shared data. NORMA or No Remote Memory Access multiprocessors that do not share data and cannot access remote data directly. Processors communicate over an internal or an external network. This corresponds to the conventional message passing (DADM) machines such as the iPSC Hypercube [34] and Meiko Computing Surface [130] Young s classification does not cover some of the more recent multiprocessor architectures. In particular, many NUMA architectures employ coherent caching and many message passing systems (NORMA) provide the ability to share address spaces via OS support. To ....
D. K. Bradley. First and Second Generation Hypercube Performance. Technical Report UIUCDCS-R-88-1455, Dept. of Computer Science, University of Illinois at Urbana-Champaign, Sep. 1988.
....each category separately. 1.1 OS Level DMA Based Interfaces The first category of network interfaces which we consider consists of parallel processors that relegate message handling to the DMA interface under the operating system s control. Examples include the NCUBE [Pal88] and the iPSC 2 [Bra88] At the hardware level, both machines send and receive messages by initiating a DMA transfer between main memory and the node s network channel. At the software level, the sending of a message is accomplished by writing the message into the memory and executing a send system call which ....
....the program on the receiving node to explicitly perform a receive operation. Since these machines involve the operating system to handle messages, the latency of sending messages can be quite high. A simple send with small messages takes 267 s on the iPSC 2 and 437 s on an NCUBE four [Bra88] Much of this time is due to high overhead operating system routines [Nug88] One group of users rewrote parts of the nCUBE 2 operating system and reduced the overhead by nearly an order of magnitude[vECGS92] However, even the remaining time was still quite large (11 15 s) due to the expense ....
D. K. Bradley. First and Second Generation Hypercube Performance. Technical Report UIUCDCS-R-88-1455, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA, September 1988.
....of the curves indicates that this factor will be about 15 for a 1024 processor system. 9 Related Work and Discussion We discussed the direct predecessor to the T3E in Section 2. This section discusses some other related work. Early message passing machines, such as the NCUBE [37] or iPSC 2 [5], required operating system calls to perform any communication between processors. More recent systems have provided user level messaging facilities. The Connection Machine CM 5, for example, provides a user level network interface via memory mapping [46] Message receipt, however, requires ....
Bradley, D. K., "First and Second Generation Hypercube Performance," Technical Report UIUCDCS-R-88-1455, University of Illinois at Urbana-Champaign, September 1988.
....potentially nonlocal assignment to the force array (statement ( in Figure 4) Compiling this naively, without message blocking, would result in roughly P airs ave (short) messages per processor. For systems whose communication time to send m units of data is well modeled by fim with AE fi [3], this would increase the communication cost by a factor =fi. This would make it unacceptable to send individual messages for each nonlocal access, instead of combining them at the end of the loop. Another issue besides raw message blocking is how much we can gain by combining non local ....
D. K. Bradley. First and second generation hypercube performance. Technical Report UIUCDCS--R--88--1455, Dept. of Computer Science, University of Illinois at UrbanaChampaign, 1988.
....to the captured performance data to determine communication latency and bandwidth. 3.2. 1 Data Network Benchmarks The benchmarks used to measure the performance of the fat tree data network, except those for active messages and virtual channels, were derived from work by Grunwald and Bradley [4, 2] on Intel systems. We ported the benchmarks to the CM 5 by replacing the Intel message passing calls with equivalent CMMD calls. Unlike on the Intel iPSC 860 or Paragon XP S, the CM 5 data network is hierarchical. Hence, measuring the effects of spatial and temporal communication locality ....
Bradley, D. K. First and Second Generation Hypercube Performance. Master's thesis, University of Illinois at Urbana--Champaign, Department of Computer Science, Sept 1988.
No context found.
D. K. Bradley. First and Second Generation Hypercube Performance. Technical Report UIUCDCS-R-88-1455, Dept. of Computer Science, University of Illinois at Urbana-Champaign, Sep. 1988.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC