13 citations found. Retrieving documents...
Stunkel, C. B., Shea, D. G., Grice, D. G., Hochschild, P.H., Tsao, M. "The SP1 HighPerformance Switch", Scalable High Performance Computing Conference, May 1994, pp. 150--157.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
An Evaluation of Architectural Platforms for Parallel.. - Jayasimha, al. (1996)   (Correct)

....in each node is a RS6K 370 with a 50 MHz clock, 32KB data and instruction caches) The original system has been software upgraded to make it function like a SP2. We will refer to this system as the IBM SP in the paper. The nodes of the SP are interconnected through a variant of the Omega network [17]. This network, similar in topology to ALLNODE, permits multiple contentionless paths between nodes. We parallelized the application using MPL (Message Passing Library) IBM s native message passing library and PVMe, a customized version of PVM (version 3.2) developed by IBM for the SP. The Cray ....

.... direct mapped cache of 8KB size (both the 560 and 590 have 4 way set associative date caches of sizes 64KB and 256 KB respectively; in addition they have 2 way set associative instruction caches of sizes 8KB and 32KB) Poor single processor performance on the T3D has also been reported elsewhere [17]. These results stress the importance of superior cache design to the overall performance. A reasonably fast CPU with a large, set associative cache and a high bandwidth 13 Number of Processors Cray Y MP IBM SP (RS6K 370) ALLNODE S Cray T3D ALLNODE F Figure 8: Execution time of ....

Stunkel, C. B., Shea, D. G., Grice, D. G., Hochschild, P.H., Tsao, M. "The SP1 HighPerformance Switch", Scalable High Performance Computing Conference, May 1994, pp. 150--157.


Parallelizing Navier-Stokes Computations on a Variety of .. - Jayasimha, Hayder.. (1995)   (Correct)

....node is a RS6K 370 the CPU has a 50 MHz clock, 32KB data and instruction caches) The original system has been software upgraded to make it function like a SP2. We will refer to this system as the IBM SP in the paper. The nodes of the SP are interconnected through a variant of the Omega network [14]. This network, similar in topology to ALLNODE, permits multiple contentionless paths between nodes. We parallelized the application using MPL (Message Passing Library) IBM s native message passing library and PVMe, a customized version of PVM (version 3.2) developed by IBM for the SP. The Cray ....

Stunkel, C. B., Shea, D. G., Grice, D. G., Hochschild, P.H., Tsao, M. "The SP1 High-Performance Switch", Scalable High Performance Computing Conference, May 1994, pp. 150--157.


Performance Evaluation of Switch-Based Wormhole Networks - Lionel Ni Fellow (1995)   (8 citations)  (Correct)

....number of routing paths. For ease of discussion, we assume the existence of the right most stage. a) b) Fig. 12. Topological equivalence by removing the right most stage. 3.3 Analogy to Fat Tree As shown in Fig. 13, a butterfly BMIN with turnaround routing can be viewed as a fat tree [30]. In a fat tree, processors are located at leaves, and internal vertices are switches. When a message is routed from one processor to another, it is sent up (in forward direction) the tree to the least common ancestor of the two processors, and then sent down (in backward direction) to the ....

C.B. Stunkel, D.G. Shea, D.G. Grice, P.H. Hochschild, and M. Tsao, "The SP1 High-Performance Switch," Proc. 1994 Scalable High Performance Computing Conf., pp. 150-157, May 1994.


Performance Evaluation of Switch-Based Wormhole Networks - Ni, Gui, Moore (1995)   (8 citations)  (Correct)

....Available ports on the right hand side of the network are used to configure larger networks, which are not shown in the figure. There are many commercial SPCs using BMINs with wormhole switching and turnaround routing including the TMC CM 5 [24] Meiko CS 2 (k = 4) 25] and IBM SP 1 2 (k = 4) [26, 27]. In the CM 5, the first two level stages use 4 Theta 2 switches, yielding a dual port communication architecture. Although BMINs have been used in many commercial machines, to the best of our 000 001 010 011 100 101 110 111 processor memory Nodes C 0 G 0 C 1 G 1 C 2 G 2 Turnaround Butterfly ....

C. B. Stunkel, D. G. Shea, D. G. Grice, P. H. Hochschild, and M. Tsao, "The SP1 highperformance switch," in Proc. of the 1994 Scalable High Performance Computing Conference (SHPCC 94), pp. 150 -- 157, May 1994.


Performance of Multistage Bus Networks for a.. - Bhuyan, Iyer.. (1997)   (Correct)

....static or dynamic. Dynamic networks can connect any input to any output by enabling some switches. They are applicable to both shared memory and message passing multiprocessors. Among such dynamic INs, the hierarchical buses or rings [1] 2] and Multistage Interconnection Networks (MINs) 3] [4] have been commercially employed. In a strictly hierarchical bus architecture [1] there are a number of buses connected in the form of a tree between the processors and the memories. The use of multiple buses makes the hierarchical bus based systems more scalable compared to the popular single ....

....of a corresponding Bidirectional MIN (BMIN) in this paper. The BMIN allows U turns and a packet can be routed based on the same techniques presented in this paper for the MBN. Recently, Xu and Ni [9] have discussed a U turn strategy for bidirectional MINs as applicable to the IBM SP architecture [4]. However, the MIN employed in SP architectures is cluster based and works differently than the proposed MBN or BMIN. In this paper, we analyze the performance of an MBN for distributed shared memory multiprocessors based on different self routing techniques. Unlike the previous analysis [8] the ....

C. B. Stunkel, D. G. Shea, D. G. Grice, P. H. Hochschild and M. Tsao, "The SP1 high-performance switch," Proc. 1994 Scalable High-Performance Computing Conference, pp. 150-157, May 1994.


Interconnection Networks And Data Prefetching For Large-Scale.. - Kim (1995)   (Correct)

....physical limitations but also costs make such a network infeasible for use in large scale multiprocessor systems. Many different network architectures have been employed in recent commercial multiprocessor systems. For example, the IBM SP 2 uses a bidirectional multistage shuffle exchange network [8, 9], the Cray T3D uses a 3 D bidirectional torus network [10] the TMC CM 5 uses a fat tree network [11] and the Intel Paragon uses a 2 D mesh network [12] As yet there is no consensus on the best network organization for large scale multiprocessor systems. This lack of agreement shows the need for ....

....effective and have been used widely. However, for a large scale multiprocessor system, there is no consensus on the best network organization. Consequently, many different network architectures have been employed. For example, the IBM SP 2 uses a bidirectional multistage shuffle exchange network [8, 9], the Cray T3D uses a 3 D bidirectional torus network [10] the TMC CM 5 uses a fat tree network [11] and the Intel Paragon uses a 2 D mesh network [12] An appropriate network selection for a given system requires an in depth study of various aspects of network design and trade offs, not only for ....

C. B. Stunkel, D. G. Shea, D. G. Grice, P. H. Hochschild, and M. Tsao, "The SP1 high-performance switch," in Proceedings of the Scalable High Performance Computing Conference, pp. 150--157, May 1994.


Modeling Computation and Communication Performance of.. - Boyd, Abandah, Lee..   (2 citations)  (Correct)

....access (DMA) operations as well as error detection correction and bit steering for all data sent to and received from memory. The DCU has a one cycle access time. The miss penalty to memory is determined experimentally to be between 16 and 21 processor clock cycles. 2.2. Interconnect Architecture [16][17] 18] The SP2 interconnect, termed the High Performance Switch (HPS) is designed to minimize the average latency of message transmissions while allowing the aggregate bandwidth to scale linearly with the number of nodes. The HPS is a bidirectional 3 multistage interconnect (MIN) ....

C. B. Stunkel, et al, "The SP1 High--Performance Switch," Proceedings of the Scalable High Performance Computing Conference, May, 1994, pp. 150-157.


Message Passing Performance on SP Systems - Georgitsis, Sobolewski (1996)   (Correct)

.... 2 Communication Model and Performance Metrics The IBM SP system is a distributed memory scalable parallel machine in which individual nodes are interconnected by means of a High Performance Switch (HiPS) which is a multistage packet switching Omega network with buffered worm hole routing [2, 3, 4]. This switch allows any node to communicate directly with any other node with (almost) constant hardware latency of less than one microsecond, even for very large systems with hundreds of nodes. If p is the maximum packet size, then the time required to send an m byte message, where m p, between ....

C. B. Stunkel; D. G. Shea; D. G. Grice; P. H. Hochschild; M. Tsao; P. R. Varker, "The SP1 High-Performance Switch," available at http://www.tc.cornell.edu/ibm/pps/doc.


Flow control considerations in network-based architectures .. - Konstantinidou, Ngai   (Correct)

....When the same problems are considered in network based architectures, the solutions are inherently much more complicated. Existing network based architectures have made a wide variety of choices with respect to their interconnect medium. This includes multistage networks (IBM SP and TMC CM5) [24, 18], two dimensional grads (Intel Paragon) 21] threedimensional tori (Cray T3D) 6] a hierarchy of fiber rings (Convex Exemplar) 11] and networks constructed of ATM switching fabric [20, 5] As varied as the characteristics of these interconnects maybe, they have certain similarities. ....

C.B. Stunkel, D.G. Shea, D.G. Grice, P. Hochschild, and M. Tsao. "The SP1 HighPerformance Switch." In Proc. Scalable High Performance Computing Conference, 1994


Bandwidth And Latency Guarantees In Low-Cost, High-Performance.. - Kim (1997)   (2 citations)  (Correct)

....of Fixed Arbitration Switches Unfortunately, existing switch designs cannot support both bursty data and real time communications at the same time. For instance, most existing multicomputer network switches such as the Thinking Machine s CM5 [23] Intel Paragon [24] Cray T3D [25] and IBM SP2 [26] adopt simple arbitration strategies such as round robin (RR) or first comefirst serve (FCFS) which distribute resources uniformly to local traffic. While permitting high speed switching, simple arbitration mechanisms have limited ability to provide high performance to contemporary multicomputer ....

....is being utilized than reserved. However, it is applied to switch ports, not network connections. To our knowledge, the ServerNet is the first integration of network resource control in multicomputer network routers. 4 Usually, multicomputer network switches support links shorter than 100m [23, 24, 25, 26, 28]; whereas, LAN and WAN span from a few kilometers to a few thousand kilometers. In the remainder of the dissertation, we explore service disciplines for high speed, lowcost multicomputer networks. We first evaluate the advantages and limitations of the ALUbiasing arbitration mechanism ....

[Article contains additional citation context not shown here]

C.B. Stunkel, D.G. Shea, D.G. Grice, P.H. Hochschild, and M. Tsao. "The SP1 high-performance switch," in Proceedings of the Scalable High Performance Computing Conference, Knoxville, TN,, May 1994, pp. 150--157. Available from http:// ibm.tc.cornell.edu/ibm/pps/doc/hps.ps.


Finding Bottlenecks In Large Scale Parallel Programs - Hollingsworth (1994)   (6 citations)  (Correct)

....of the CM 5 operating system provides a version of ptrace that runs on the nodes, and this permits us to build a broadcast based version of ptrace as a CM 5 application that runs interleaved with the measured application. On machines that do not provide hardware broadcast, such at the IBM SP 2[91], we can construct a software message spanning tree. The spanning tree technique achieves logarithmic time, instead of unit time cost. However, most new machines come with some form of broadcast facility, including the Intel Paragon (though it is currently not accessible to application software) ....

C. B. Stunkel, D. G. Shea, D. G. Grice, P. H. Hochschild and M. Tsao, "The SP1 High-Performance Switch", 1994 Scalable High-Performance Computing Conference, May 1994, pp. 150-157.


Models and Resource Metrics for Parallel and Distributed.. - Li, Mills, Reif (1989)   (12 citations)  (Correct)

No context found.

C. Stunkel, D. Dhea, D. Grice, P. Hochschild, and M. Tsao, "The SP1 highperformance switch," in Proc. of the Scalable High Performance Computing Conference, (Knoxville, TN), pp. 150--157, May 1994.


The High Performance Switch and Programming Interfaces on IBM.. - Cheng, Podgorny   (Correct)

No context found.

. C. B. Stunkel, D. G. Shea, et al., "The SP1 High Performance Switch," in Proc. 1994 Scalable High-Performance Computing Conference, pp. 150-157, May 1994

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC