| S. L. Scott and G. M. Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus. In Proc. Symp. High Performance Interconnects (Hot Interconnects 4), pages 147--156, August 1996. |
....a 3 D torus. Finally we investigate the fault tolerance properties of these networks and show that they degrade more gracefully in the presence of faults than alternative topologies. 1 Introduction Interconnection networks are widely used to connect processors and memories in multiprocessors [20], as switching fabrics for high end routers and switches [9] and for connecting I O devices [18] Large scientific computers, e.g. ASCI White [1] have thousands of processors and large internet routers, e.g. the Avici TSR, are scalable to thousands of ports. These applications, and many ....
S. Scott and G. Thorson. The Cray T3E network: adaptive routing in a high performance 3D torus. In Proceedings of Hot Interconnects Symposium IV, 1996.
....8 packets are routed along any minimal path between source and destination. The remaining two channels are escape channels where packets are routed deterministically when the adaptive choice is limited by network contention [9] A similar algorithm has been recently adopted by the Cray T3E [22]. A central point of our adaptive algorithm is the interface between the processor and the router. We assume that packets can enter the network using only a subset of the adaptive channels [19] This limitation, known as source throttling, makes the network throughput stable when the network ....
Steven L. Scott and Gregory M. Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus. In HOT Interconnects IV, Stanford University, August 1996.
....caused by the coherency protocols. Other important academic prototypes that use low dimensional meshes are Alewife and the Mmachine [DKC 94] The list also includes many of the most popular commercial machines. The Cray T3D and T3E adopt a bidirectional three dimensional toroidal network [ST96] and the topology of both the Intel Delta and Paragon is a toroidal mesh. A fair comparison of the communication performance of these machines is not an easy task because they have widely different technological characteristics. On the other hand, theoretical models of the interconnection ....
Steven L. Scott and Gregory M. Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus. In HOT Interconnects IV, Stanford University, August 1996. 8
....Though there is a huge variety of interconnection networks in the literature, only few of them proved successful in practical applications. Many new generation parallel computers, such as the Intel Touchstone Delta, Intel Paragon, MIT J Machine, Stanford Flash and the Cray T3D and T3E [5] [6] belong to the family of k ary n cubes [7] In particular, low dimensional cubes are attractive because they can be easily mapped in the three dimensional space [8] Fat trees have been adopted by many research prototypes and commercial machines. The data network of the Connection Machine CM 5 ....
Steven L. Scott and Gregory M. Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus. In HOT Interconnects IV, Stanford University, August 1996.
....together with interconnection networks to exhibit huge computing power. As the enabling technology of the multicomputer, interconnection networks affect its overall system performance much. Many top parallel machines adopt the multicomputer architecture, such as Intel Paragon [15] and Cray T3E [22]. Currently, multicomputer is going out of the scientific area and becoming more popular. For example, the well known high speed network Myrinet [3] is based directly on technologies from the first multicomputer Cosmic Cube. Recently, much interest has been shown in supporting real time and ....
Steven. L. Scott and Gregory M. Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus, Proceedings of Hot Interconnects IV, August, 1996.
....channels, packets are routed along any minimal path between source and destination. The remaining two channels are escape channels where packets are routed deterministically when the adaptive choice is limited by network contention. A similar algorithm has been recently adopted by the Cray T3E [ST96]. 3 Total exchange algorithms The algorithms that can be used to implement the total exchange on a given network can be roughly classified into two classes: direct algorithms, in which data are sent directly from source to destination and indirect algorithms, in which data are sent from source ....
Steven L. Scott and Gregory M. Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus. In HOT Interconnects IV, Stanford University, August 1996.
....the paths will not block forever. In [50] has been shown that this approach is also a necessary condition and in [51] has been extended to include store and forward and cut through switching. The Duato s methodology has been recently utilized in the design of the routing chip of the Cray T3E [141] [139] Chapter 4 Abstract Machine Models A programming model, to be effective, should reflect the cost of executing its basic mechanisms at a level visible to the programmer or to the compiler. Intelligent optimizations during the software development rely on the ability to decide than an ....
Steven L. Scott and Gregory M. Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus. In HOT Interconnects IV, Stanford University, August 1996.
....technology, the increase in the dimensionality of a network will decrease the number of wires and thus the bandwidth of a single physical channel. Routers designed for low dimensional k ary n cube networks have physical channels that typically are 8 bitdata to 16 bit data wide [2] 3] 8] 9] [10]. In this study, I am proposing a new network topology, called k ary m way network, which is based on the concept of m way channels. The idea of an m way channel is that a maximum number, m, of routers and processors can link directly to it, and hence share the same physical channel. k ary m way ....
Scott S. L. and Thorson G., The Cray T3E network: adaptive routing in a high performance 3D torus, Proceedings of Hot Interconnects Symposium IV, August 1996.
....channels. k ary n cubes are strictly orthogonal direct topologies with n dimensions and k routers (nodes) along each dimension. Low dimensional k ary n cube networks have been implemented in many parallel architectures, which include the Intel Teraflops [3] MIT J Machine [10] and Cray T3E [11]. A physical bi directional link connecting two routers in a k ary n cube network can be implemented either as one set of bi directional wires called half duplex organization, or as two sets of unidirectional wires called full duplex organization. With a full duplex organization, a router ....
S. L. Scott and G. Thorson, The Cray T3E network: adaptive routing in a high performance 3D torus, Proceedings of Hot Interconnects Symposium IV, August 1996.
....to all nodes in the system) we will consider multicast for the remainder of this paper. However, it must be noted that all the developed algorithms and theories in this paper apply to broadcast as well. Current generation parallel systems like IBM SP2 [41] Intel Paragon [16] Cray T3E [35], nCube 3 [12] J Machine [28] and Stanford FLASH use the cut through switching technique due to its inherent advantages like low latency communication and reduced communication hardware overhead [27] These systems provide very small buffer space at each hop, which results in links getting held ....
....its inherent advantages like low latency communication and reduced communication hardware overhead [27] These systems provide very small buffer space at each hop, which results in links getting held up by blocked worms. Also, these systems use regular network topologies (such as meshes [16] tori [35], hypercubes [3, 8] multistage interconnection networks [41] etc. with various deadlock free routing schemes. Such regular topologies have important mathematical properties that make message communication easier by making message routing simpler, lowering the average distance per communication, ....
S. L. Scott and G. M. Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus. In Proceedings of the Symposium on High Performance Interconnects (Hot Interconnects 4), pages 147--156, August 1996.
....different benchmark versions. 5.1 Architecture of Cray T3E The T3E 900 512 used for our measurements consists of up to 2048 DEC Alpha 21164 processors running at 450 MHz. They are connected with a 3 dimensional torus network. The net is decoupled from the processors at a speed of 75 MHz [ST96] with overlapped communication. Each link has a bandwidth of approximately 500 MB s resulting in a 3 GB s transfer rate for a single node. The network interface consists of 512 user and 128 system E registers, memory mapped into the I O space of each processor. E registers provide the only means ....
Steven L. Scott and Gregory M. Thorson. The Cray T3E network: Adaptive routing in a high performance 3D torus. HOT Interconnects IV, August 15--16 1996.
....channels, packets are routed along any minimal path between source and destination. The remaining two channels are escape channels where packets are routed deterministically when the adaptive choice is limited by network contention [9] A similar algorithm has been recently adopted by the Cray T3E [22]. A central point of our adaptive algorithm is the interface between the processor and the router. We assume that packets can enter the network using only a subset of the adaptive channels [19] This limitation, known as source throttling, makes the 6 F. Petrini network throughput stable when the ....
Scott, S. L.---Thorson, G. M.: The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus. In HOT Interconnects IV, Stanford University, August 1996.
....and its classification according to the above mentioned classes. 3 The Cray T3E 3.1 Architectural Overview The T3E consists of up to 2048 DEC Alpha EV5 21164 processors running at 300 MHz. They are connected with a 3D torus network. The net is decoupled from the processors at a speed of 75 MHz [15] with overlapped communication. Each link has a bandwidth of approximately 500 MB s resulting in a 3 GB s transfer rate for a single node. The network interface consists of 512 user and 128 system E registers, memory mapped into the address space of each processor. E registers provide the only ....
....4tn = 13:3ns Cp = 128 entries Cn = 56 entries We derived the first two from measurements. The given values are lower bounds because we measured only the time spent for issuing a prefetch instruction and reading an Eregister without any address calculation. 4tn and Cn were taken from literature [15]. 128 E registers provide maximum network bandwidth [14] and therefore, suffice for our tests. Cn is the network capacity for one network link. Each additional link adds 34 entries. With these parameters, TLatency = 1489:6ns between two adjacent nodes within the network which differs only 0:8 ....
Scott, S. L., and Thorson, G. M. The Cray T3E network: Adaptive routing in a high performance 3D torus. HOT Interconnects IV (August 15-16 1996).
....and covers its classification in the above mentioned classes. 3 The Cray T3E 3.1 Architectural Overview The T3E consists up to 2048 DEC Alpha EV5 21164 processors running at 300 MHz. They are connected with a 3D torus network. The net is decoupled from the processors at a speed of 75 MHz [8] with overlapped communication. Each link has a bandwidth of approximately 500 MB s resulting in a 3 GB s transfer rate for a single node. The network interface consists of 512 user and 128 system E registers, memory mapped in the address space of each processor. They are the only way to perform ....
....space with locally consistent memory. 3.2 Characteristic Parameters The model parameters of the T3E from table 1 are given below. 4t p = 160ns 4t c = 107ns 4tn = 13:3ns C p = 480 entries Cn = 56 entries We derived the first two from measurements. The last three were taken from literature [8]. With these parameters, we got T Latency = 1489:6ns which differs only 0:5 from measurement. For the classification of the T3E, there is 4tn 4t p for all applications. 4t c 4t p in contrast to the model adds only some waiting times but does not affect the classification. Consequently, the ....
Steven L. Scott and Gregory M. Thorson. The Cray T3E network: Adaptive routing in a high performance 3D torus. HOT Interconnects IV, August 15-16 1996.
....channels, packets are routed along any minimal path between source and destination. The remaining two channels are escape channels where packets are routed deterministically when the adaptive choice is limited by network contention. A similar algorithm has been recently adopted by the Cray T3E [7]. A central point of this adaptive algorithm is the interface between the processor and the router. We assume that packets can enter the network using only a subset of the adaptive channels [5] This limitation, known as source throttling, makes the network throughput stable when the network ....
S. L. Scott and G. M. Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus. In HOT Interconnects IV, Stanford University, August 1996.
....networks, each allocating a 16 PN system, using a 6 port router. Black squares indicate PNs; white squares denote unused ports. cated interface for PNs at one or more ports, is less exible. For example, while the SGI Origin2000 s SPIDER router is symmetric [7] the Cray T3E s router is not [14]. An additional advantage of bristled networks is that they have a lower average node distance, which may reduce the average latency. In the example above, the 1 way 16 node hypercube has an average distance of 2 hops, whereas the 2 way 8 node and the 4 way 4 node bristled networks have 1.5 and 1 ....
....VCs and adaptive routing have been extensively studied in the past [2, 6, 8, 13] Most of the research has been based on simulations using synthetic workloads, or at most traces of real applications. In addition, adaptive routing has seen a few actual implementations, like in the Cray T3E network [14]. Many of the evaluations using synthetic workloads concluded that both VCs and adaptivity were bene cial. It was frequently possible to push the network to the limit by increasing the message injection rate and by using very long messages. On the other hand, CC NUMA trac is mainly characterised ....
S. L. Scott and G. M. Thorson. \The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus". Proc. Symp. High-Performance Interconnects (Hot Interconnects 4), August 1996.
....Supercomputing Center (SDSC) Each Cray T3E processing element at SDSC has a clock rate of 300MHz, an 8Kbytes internal cache, 96Kbytes second level cache, and 128Mbytes memory. The peak bandwidth between nodes is reported as 500Mbytes s and the peak round trip communication latency is about 0. 5 2s [28]. We have observed that when the block size is 25, double precision GEMM achieves 388MFLOPS while double precision GEMV reaches 255MFLOPS. We have used a block size 25 in our experiments. We also obtained access to a Cray T3E at the NERSC division of the Lawrence Berkeley Lab. Each node in this ....
S. L. Scott and G. M. Thorson, The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus, in Proceedings of HOT Interconnects IV, Stanford University, Aug. 1996.
....that is currently flowing out of the switch. When this message has drained out completely, the simple Round Robin policy is resumed. We will refer to this scheme as Round Robin Keep Flow (RR KF) A slightly modified variation of this scheme is employed in the routers for the Cray T3E network [15]. First Come First Served In FCFS the oldest worm has priority. A possible implementation is the following: an age counter is associated with each VC. When a new worm enters a VC, the age counter is set to 0. Afterwards, the age is incremented at a programmable rate. The arbiter will prefer ....
S. Scott and G. Thorson. The Cray T3E network: Adaptive routing in a high performance 3D torus. In Proc. Symp. High Performance Interconnects (Hot Interconnects 4), pages 147--156, Aug. 1996.
....for the head of a packet to move from the input of one network switch to the input of the next, in the absence of contention. The default link bandwidth is four bytes per processor cycle. These parameters are consistent with the highest performance multiprocessor interconnects currently available [12, 30, 86]. The two system busses connect the memory and the second level (L2) cache to each other and to the two network interfaces. The busses are able to transfer one word (eight bytes) per processor cycle. Bus arbitration and contention are modeled on a cycle by cycle basis. 1 Preliminary simulations ....
Steven L. Scott and Gregory M. Thorson. The Cray T3E network: Adaptive routing in a high performance 3D torus. In Hot Interconnects IV, pages 147--156, August 1996. BIBLIOGRAPHY 158
....with interconnection networks to exhibit huge computing power. As the enabling technology of the multicomputer, interconnection networks can largely affect its overall system performance. Many top parallel machines adopt the multicomputer architecture, such as Intel Paragon [12] and Cray T3E [18]. Currently, multicomputer has found its use far beyond the scientific computing area. For example, the wellknown high speed network Myrinet [1] is based directly on technologies from the first multicomputer Cosmic Cube. Recently, much interest has been shown in supporting realtime and multimedia ....
Steven. L. Scott and Gregory M. Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus, Proceedings of Hot Interconnects IV, August, 1996.
....are many network interfaces that have been designed for scalable parallel computers. Some examples used in commercial systems are the network interfaces in the Thinking Machines CM 1 and CM 5 [2, 14] the Paragon and Teraflops multicomputers by Intel [17, 18] and the Cray Research T3D and T3E [19, 20, 21]. In all of these systems, the network interface is implemented using a separate chip, and the computing node is based on a standard processor architecture such as the SPARC, the DEC Alpha or the Intel Pentium. Some examples from academic research projects are the MIT Message Driven Processor ....
....is located at the memory bus. Connecting through the memory bus provides greater flexibility than connecting to the cache bus, and greater performance than connecting through an I O bus. Network interfaces in this category include the Thinking Machines CM 5 [14, 34] the Cray Research T3D and T3E [19, 20, 21], the Intel Paragon [17] and University of Washington Meerkat 1 [35] I O bus connected network interfaces In parallel systems that use a local area network as its communication backbone, the preferred location of the network interface is at the I O bus. I O bus cards are relatively simple and ....
[Article contains additional citation context not shown here]
Steve Scott and Greg Thorson. The Cray T3E network: adaptive routing in a high performance 3-d torus. Proc. of Hot Interconnects IV, Stanford University, Palo Alto CA, August 1996, pp. 147-156.
No context found.
S. L. Scott and G. M. Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus. In Proc. Symp. High Performance Interconnects (Hot Interconnects 4), pages 147--156, August 1996.
No context found.
S. L. Scott, G. M. Thorson, The Cray T3E network: adaptive routing in a high performance 3d torus, in: Proceedings of Hot Interconnects IV, 1996.
No context found.
S. L. Scott and G. M. Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus. In Proc. of Hot Interconnects, pages 147--156, August 1996.
No context found.
S. L. Scott and G. M. Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus. In Proceedings of HOT Interconnects IV, August 1996.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC