| A. A. Chien. A cost and speed model for k-ary n-cube wormhole routers. IEEE Transactions on Parallel and Distributed Systems, 9(2):150--162, 1998. |
....becomes congested. Components are allowed to fail arbitrarily, and may even be repaired online so long as at least one routing path always exists between each pair of nodes. Simpler control logic allows the network to be clocked at a higher speed than would otherwise be possible ( DeHon94] [Chien98]) The cost, of course, is a more complicated messaging protocol which requires additional logic and storage at each node, and reduces the performance of the system. Thus, with few nodes (hundreds or thousands) it is likely a good tradeoff to place extra design effort into the network and reap ....
....two networks are the same. In practice this would likely not be the case for two reasons. First, the control logic of the discarding network is much simpler than that of the non discarding network; as a result it will be possible to clock the discarding network nodes at a higher speed ( DeHon94] [Chien98]) Second, with a fault tolerant messaging protocol it is possible to boost the clock speed even further since one does not need to worry about introducing the occasional signaling error so long as it can be detected. 133 topology: no yes slowdown no yes slowdown no yes slowdown no yes ....
Andrew A. Chien, "A Cost and Speed Model for k-ary n-Cube Wormhole Routers", IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No. 2, 1998, pp. 150-162.
....the network becomes congested. Components are allowed to fail arbitrarily, and may even be repaired online so long as at least one routing path always exists between each pair of nodes. Simpler control logic allows the network to be clocked at a much higher speed than would otherwise be possible [3]. The cost, of course, is a more complicated messaging protocol which requires additional logic and storage at each node, and reduces the performance of the system. Thus, with few nodes (hundreds or thousands) it is likely a good tradeoff to place extra design effort into the network and reap ....
....flit size and cycle times of the two networks are the same. In practice this would likely not be the case for three reasons. First, the control logic of the lossy network is much simpler than that of the lossless network; as a result it will be possible to clock the lossy network at a higher speed [3]. Second, with a fault tolerant messaging protocol and enough checksum bits it is possible to boost the clock speed even further since one does not need to worry about introducing the occasional signaling error so long as it can be detected. Finally, in the lossless network a number of bits would ....
[Article contains additional citation context not shown here]
Andrew A. Chien, "A Cost and Speed Model for k-ary n- Cube Wormhole Routers", IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No. 2, 1998, pp. 150-162.
....able to support a large number of concurrent connections, needed in a multimedia environment, the buffers associated to each link must be organized in a large number of virtual channels. Virtual channels have been traditionally organized as a set of queues linked by a multiplexer. As indicated in [33], router delays can increase substantially when a large number of virtual channels are multiplexed onto physical links. This is due in part to the multiplexer and virtual channel controller delays. Moreover, fully demultiplexed crossbars [34] i.e. one virtual channel per crossbar port) become ....
A. A. Chien, "A cost and speed model for k-ary n-cube wormhole routers," in Hot Interconnects'93, August 1993.
....channels are the CrayT3E router [20] the SGI SPIDER [12] and the Intel Cavallino [2] These router designs also use adaptive routing techniques that alleviates the link contention problems. Nevertheless, the use of both virtual channels and adaptive routing algorithms increases hardware costs [5]. Some works on this topic [22] show that simpler router designs imply lower execution times for all the applications tested. Thus, our work focuses on the study of a low cost router design that avoids the necessity of virtual channels. The main contribution of this paper is the presentation of a ....
....although it does not carry any information. Ghost packets do not have any destination node. Rather, they remain in the network forever and are only visible from outside the dimensional ring. 3. Data packets are routed using a Dimensional Order Routing scheme (DOR) 8] DOR has low hardware cost [5] and its minimal number of turns reduces the possible collision points for each data packet. 4. The header phit of the data packet carries the routing information to relay packets from their source to their destination node. Each time a data packet changes dimension, a header phit is split out. ....
A. Chien, A cost and speed model for k-ary n-cube wormhole routers, Proceedings of Hot Interconnections'93, August 1993.
....whereas IBM SP 2 with MPI has a startup time of 35 sec [21] where the startup time includes the software overheads for allocating buffers, copying messages, and initializing the router and DMA. Chien analyzed the router delay for various routing algorithms using a 0. 8 micron gate array technology [22]. Based on that study and contemporary VLSI technology, the following default performance parameters are assumed: communication startup time (t s )of2 sec, link propagation delay(t p )of20nsec, and switch (router) delay (t r ) of 300 nsec. In addition, the network interface delay is assumed to be ....
A. A. Chien, "A Cost and Speed Model for k-ary n- Cube Wormhole Routers," IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No. 2, pp. 150162, Feb. 1998.
....also increase the complexity of the routing algorithm and require additional router control logic. The multiplexing and scheduling of the virtual channels on the physical channel is also more complicated. In addition, router latency and cycle tinhe increase with the number of virtual channels [1], so fewer virtual channels is better. Decreasing the number of virtual channels needed for a given adaptivehess is accomplished by using a less restrictive routing algorithm. Conversely, a less restrictive routing algorithm has better performance relative to other routing algorithms when the ....
A. A. Chien. A Cost and Speed Model for k-ary n-cube Wormhole Routers. Technical report, University of Illinois at Urbana-Champaign, 1993.
....are distributed uniformly or according a static communication pattern, as explained in more detail in the following section. The simulator collects performance data only after 2000 cycles, to allow the network to reach steady state and each simulation is halted after 20000 cycles. Chien in [1] has proposed a cost model to make fair comparisons between routing algorithms. It assumes a 0.8 micron CMOS gate array technology for the implementation of the routing chip. Using this model, as done in [4] for a 256 nodes hypercube, we can see that fat trees are wire limited when we use up to ....
A. A. Chien. A Cost and Speed Model for k-ary n-cube Wormhole Routers. In Hot Interconnects '93, Palo Alto, California, August 1993.
....network latency of local packets, by limiting the degree of multiplexing of the virtual channels. Virtual channels can be expensive because they complicate routing decision and channel control, increasing the router complexity and delay significantly, with a consequent cost performance tradeoff [3] To reduce the number of virtual channels, non monotonic allocation strategies have been proposed. Duato has summarized these strategies in a necessary and sufficient condition for deadlock free adaptive routing [8] Virtual channels are divided into two classes: some are specifically dedicated ....
....Also, in our algorithm we use a single injection channel (inj = 1) and two source throttling channels (thr = 2) Adaptive algorithms have more degrees of freedom but require larger crossbars and more complex arbitration. So these advantages are often offset by increased clock cycles. Chien in [3] has proposed a cost model to make fair comparisons between routing algorithms. It assumes a 0.8 micron CMOS gate array technology for the implementation of the routing chip. The three delays T routing , T crossbar and T link are computed as follows. Routing a message involves address decoding, ....
[Article contains additional citation context not shown here]
A. A. Chien. A Cost and Speed Model for k-ary n-cube Wormhole Routers. In Hot Interconnects '93, Palo Alto, California, August 1993.
....arrives at a router [7] Data follow immediately after the header. This strategy splits messages into small units of information, or flits [6] performing flow control at the flit level. As a consequence of this unique feature, flit buffers can be very small, leading to compact and fast routers [4]. These benefits led to the implementation of wormhole switching in most commercial routers [2, 13, 19] Moreover, this pipelined message transmission makes latency less sensitive to the distance in the network provided that messages are long enough, facilitating the search for optimal ....
....ical link with the latter implementing fully adaptive routing as well. Both features improve throughput significantly [9, 10] and may reduce the execution time for bandwidthlimited parallel applications. However, virtual channels and adaptive routing have been shown to increment router delay [4], thus increasing the execution time of latencysensitive parallel applications. For those applications, it has been suggested that routers should implement neither virtual channels nor adaptive routing [23] Therefore, including virtual channels and adaptive routing in a wormhole router is a ....
A.A. Chien, "A cost and speed model for k-ary n-cube wormhole routers," Proceedings of Hot Interconnects'93, Palo Alto, California, August 1993.
....channel and by multiplexing physical channel bandwidth. The use of virtual channels can increase throughput considerably by dynamically sharing the physical bandwidth among several messages [11] However, it has been shown that virtual channels are expensive, increasing node delay considerably [9]. So, the number of virtual channels per physical channel should be kept small. An alternative approach consists of using adaptive routing [14] However, deadlocks may appear if the routing algorithms are not carefully designed. A deadlock occurs in an interconnection network when no message is ....
....when the network reaches the saturation point. One solution to this problem could be to increase the number of virtual channels [12] so that messages have a lower probability of being involved in cyclic dependencies. However, a high number of virtual channels could lead to a lower clock frequency [9]. RESERVE RELEASE COMPARATOR INJECTION PERMITTED THRESHOLD CHANNELS BUSY OUTPUT COUNTER Fig. 1. Implementation of the message injection limitation mechanism Another solution is to control network traffic, in order to guarantee that it is always under the performance degradation point. ....
A. A. Chien, "A cost and speed model for k-ary n-cube wormhole routers," in Proceedings of Hot Interconnects'93, August 1993.
....to design partially adaptive non minimal routing algorithms for the class of k ary n cubes. Virtual channels can be expensive because they complicate routing decision and channel control, increasing the the router complexity and delay significantly, with a consequent cost performance trade off [5]. To reduce the number of virtual channels, non monotonic allocation strategies have been proposed [12] 2] Duato has summarized these strategies in a necessary and sufficient condition for deadlock free adaptive routing [13] Virtual channels are divided into two classes: some are specifically ....
Andrew A. Chien. A Cost and Speed Model for k-ary n-cube Wormhole Routers. In Hot Inteconnects '93, Palo Alto, California, August 1993.
....the link delay, and a raw bandwidth of 1:6 Gbits per second in the single direction. In all four networks we will consider flow control strategies with two and four virtual channels. Though the introduction of virtual channels was causing an increase in the clock cycle in early router designs [24], pipelined look ahead routers can tolerate a limited number of virtual channels without increasing the clock cycle[25] These characteristics (delays and number of virtual channels) represent the state of the art and can be found in existing routers as the SGI SPIDER [2] 4.2 Network interface ....
Andrew A. Chien. A Cost and Speed Model for k-ary n-cube Wormhole Routers. In Hot Inteconnects '93, Palo Alto, California, August 1993.
....router cycle times for each algorithm. As each routing algorithm presented in this paper differs in complexity, it is obvious that router delay will also differ for each algorithm. In this section, we compute delay times for the various router subcomponents by applying the delay model developed in [3], which assumes 0.8m CMOS gate array technology. These times are then used in Section 4.4 to scale the results accordingly. All node components are assumed to be synchronized by a clock signal. According to the router model presented in the previous section, two clock cycles are required to ....
....router model presented in the previous section, two clock cycles are required to forward a flit from one router to another router: one to traverse the router internals and one to traverse the physical channel between two routers. Results would be different for asynchronous routers. As described in [3], the router model is subdivided into dimensions. Also, the head flit of a message traverses a path through the router that is somewhat different from the path traversed by trailing flits. However, all flits of a message traverse the same external physical path. In general, the delay in ....
[Article contains additional citation context not shown here]
Andrew A. Chien. A cost and speed model for k-ary n-cube wormhole routers. In Proceedings of Hot Interconnects '93, August 1993.
....in the communication processor is 40 cycles, for both incoming and outgoing packets. The flit size is 16 bits and the link delay to transmit a flit across a physical link is 4 cycles. Adaptive algorithms have more degrees of freedom but require larger crossbars and more complex arbitration [2]. For this reason the routing delay is normalized with the router complexity. The routing delay is 4 cycles for the deterministic algorithm, 8 cycles for the Duato algorithm and 12 cycles for the Chaos routing. 4 Experimental results In our experiments we mapped a 65536 input butterfly 2 on ....
Andrew A. Chien. A Cost and Speed Model for k-ary n-cube Wormhole Routers. In Hot Inteconnects '93, Palo Alto, California, August 1993.
....are distributed uniformly or according a static communication pattern, as explained in more detail in the following section. The simulator collects performance data only after 2000 cycles, to allow the network to reach steady state and each simulation is halted after 20000 cycles. Chien in Ref. [4] has proposed a cost model to make fair comparisons between routing algorithms. This model has gained consideration in several performance studies. 14 It assumes a 0.8 micron CMOS gate array technology for the implementation of the routing chip. The three delays T routing T crossbar and T link ....
....performance studies. 14 It assumes a 0.8 micron CMOS gate array technology for the implementation of the routing chip. The three delays T routing T crossbar and T link are computed as follows. Routing a message involves address decoding, routing decision and header selection. According to Ref. [4] the routing decision has a delay that grows logarithmically with the number of alternatives, or degree of freedom, offered by the routing algorithm. Denoting by F the degree of freedom, the model estimates the routing delay in T routing = 4:7 1:2 log F ns: 7) The time required to transfer a ....
A. A. Chien, "A Cost and Speed Model for k-ary n-cube Wormhole Routers," In Hot Inteconnects '93, Palo Alto, California, August 1993.
....4 tree and a 16 ary 2 cube satisfy these conditions, so we will consider these two networks in the experimental evaluation. A fair comparison of interconnection networks should also take into account physical constraints as the pin count, wire delay, bisection width [18] and the router complexity [30]. In our experiments we normalize the communication performance by setting the flit and the data path size on the fat tree at two bytes and at four bytes on the cube. If we consider a 4 ary 4 tree and a 16 ary 2 cube, this normalization can be interpreted in the following ways. Technological ....
....processing nodes to the network switches. Other important parameters are the router complexity and the wire delays. Adaptive algorithms have more degrees of freedom but require larger crossbars and more complex arbitration. So these advantages are often offset by increased clock cycles. Chien in [30] has proposed a cost model to make fair comparisons between routing algorithms. It can be applied to evaluate the router 1 The network capacity can be determined by considering that 50 of the uniform random traffic crosses the bisection of the network. Thus if a cube has bisection bandwidth B, ....
[Article contains additional citation context not shown here]
A. A. Chien, "A Cost and Speed Model for k-ary n- cube Wormhole Routers," in Hot Inteconnects '93, (Palo Alto, California), August 1993.
....switch (router) delay (t r ) of 300 500 nsec. The startup time includes the software overheads for allocating buffers, copying messages, and initializing the router and DMA [15] The router delay includes several steps of complicated operations and varies for various routing algorithms as Chien [16] analyzed. We also assume that the network interface delay is almost the same as the switch delay for our evaluation. Since synchronization messages do not need any data flits, the communication latency of a message transfer can be approximated to t s d Delta t p (d 1) Delta t r , where d is ....
A. A. Chien, "A Cost and Speed Model for k-ary n-Cube Wormhole Routers," IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No. 2, pp. 150162, Feb. 1998.
....number of flits in a synchronization message. 6 With the store and forward or the virtual cut through strategy, the rest of the message is moved to the switch where the head is stopped. Thus, deadlock can be easily avoided at the cost of large buffer in each switch for holding the entire message [45]. 14 The basic technique for proving that a network is deadlock free is to articulate the dependences that can arise between channels as a result of message movement, and to demonstrate that there exists no cycle in the resulting channel dependence graph [33] This implies that no traffic ....
....switch (router) delay (t r ) of 300 500 nsec. The startup time includes the software overheads for allocating buffers, copying messages, and initializing the router and DMA [46] The router delay includes several steps of complicated operations and varies for various routing algorithms as Chien [45] analyzed. We also assume that the network interface delay is almost the same as the switch delay for our evaluation. Since synchronization messages do not need any data flits, the communication latency of a message transfer can be approximated to t s d Delta t p (d 1) Delta t r , where d is ....
A. A. Chien, "A Cost and Speed Model for k-ary n-Cube Wormhole Routers," IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No. 2, pp. 150-162, Feb. 1998.
....k ary n cubes, k ary n flies and k ary n trees and a node architecture with processing capabilities and a memory hierarchy . A fair analysis of interconnection networks should take into account physical constraints as the pin count, wire delay, bisection width [39] and the router complexity [40]. In our experiments we normalize the communication performance by setting the flit and the data path size on the fat tree at two bytes and at four bytes on the toroidal cube. If we consider two representative networks with 256 nodes, a 4 ary 4 tree and a 16 ary 2cube, this normalization can be ....
....this value is not sensitive to the characteristics of the h relation. It is worth noting that virtual channels can be expensive because they complicate routing decision and channel control, increasing the router complexity and delay significantly, with a consequent cost performance trade off [40]. So, the number of virtual channels per physical channel should be kept small. When we use four virtual channels and h rel = 2 the ratio between the execution time and the lower bound is only 1:38. When h rel = 4 and h rel = 8 the ratio is 1:21 and 1:13, respectively. From these results we can ....
A. A. Chien, "A Cost and Speed Model for k-ary n- cube Wormhole Routers," in Hot Inteconnects '93, (Palo Alto, California), August 1993.
....done [3, 4] For example, the Cray T3D with PVM is quoted as having a startup time of 3 sec, whereas an IBM SP 2 with MPI has a startup time of 35 sec [13] The startup time, t s , includes the software overheads for allocating buffers, copying messages, and initializing the router and DMA. Chien [12] analyzed the router delay for various routing algorithms using a 0.8 micron gate array technology. Based on that study and current VLSI technology, the router delay at a nonmember node t rn is assumed to be 5 15 nsec. The router delay at a member node t rm , which includes several steps of ....
A. A. Chien, "A Cost and Speed Model for k-ary n-Cube Wormhole Routers," IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No. 2, pp. 150-162, Feb. 1998.
....we will consider two networks with 256 nodes, a toroidal 16 ary 2 cube and 4 ary 4 tree. 6. 1 Performance Normalization A fair comparison of interconnection networks should take into account physical constraints as the pin count, wire delay, bisection width [3] and the router complexity [26]. In our experiments we normalize the communication performance by setting the flit and the data path size on the fat tree at two bytes and at four bytes on the toroidal cube. If we consider a 4 ary 4 tree and a 16 ary 2 cube, this normalization can be interpreted in the following ways. 1. ....
....nodes to the network switches. Other important parameters are the router complexity and the wire delays. Also, adaptive algorithms have more degrees of freedom but require larger crossbars and more complex arbitration. So these advantages are often offset by increased clock cycles. Chien in [26] has proposed a cost model to make fair comparisons between routing algorithms. It can be applied to evaluate the router delays of ffl the deterministic and ffl the two variants of minimal adaptive algorithms for the cubes and ffl the adaptive algorithms for the fat trees shown in Section 5.2. ....
[Article contains additional citation context not shown here]
Andrew A. Chien. A Cost and Speed Model for k-ary n-cube Wormhole Routers. In Hot Inteconnects '93, Palo Alto, California, August 1993.
....done [3, 4] For example, the Cray T3D with PVM is quoted as having a startup time of 3 sec, whereas an IBM SP 2 with MPI has a startup time of 35 sec [13] The startup time, t s , includes the software overheads for allocating buffers, copying messages, and initializing the router and DMA. Chien [12] analyzed the router delay for various routing algorithms using a 0.8 micron gate array technology. Based on that study and current VLSI technology, the router delay at a nonmember node t rn is assumed to be 5 15 nsec. The router delay at a member node t rm , which includes several steps of ....
A. A. Chien, "A Cost and Speed Model for k-ary n-Cube Wormhole Routers," IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No. 2, pp. 150-162, Feb. 1998.
....ary n cubes. Two non adaptive escape VCs per physical channel are necessary to break cycles which are formed by wraparound channels. In the case of a three VC configuration per network port, only one third of the VCs are assigned to adaptive routing. As VCs are expensive, minimum VCs are desirable[5]. Although recent transistor technology allows large buffer space when a router is implemented as an independent chip[4, 21, 23] the cost of VCs for tightly coupled component chips in parallel and scalable systems is still critical[14, 18] DISHA has been proposed to minimize the number of VCs in ....
....one of them based on the status of the physical channel and the VCs in the adjacent node. The complexity of the logic and the size of the multiplexer increases as the number of connected ports and VCs increases. We match the flit and phit (physical transfer unit) size to simplify the flow control[5]. Virtual Channel output Controller (VCC) The VCC arbitrates the acknowledgment signals from the OCA for the multiple VCs. In order to reduce the interconnection area and the size of the OCA, the output data links from the VCs in the same port are multiplexed in the VCC[7] Input Arbiter (IA) ....
A.A. Chien: "A Cost and Speed Model for k-ary n-cube Wormhole Routers", Proc. Hot Interconnects, (1993).
....one of them based on the status of the physical channel and the adjacent receiving buffers. The complexity of the logic and the size of the multiplexer increase when the number of connected ports and VCs increase. We equate the flit and phit(physical transfer unit) size to simplify the flow control[3]. Virtual Channel output Controller(VCC) In order to reduce the interconnection area and the size of the OCA, an output data path from the port is multiplexed as shown in Fig.1(c) 7] The VCC arbitrates the acknowledge signals form the OCA for the multiple VCs. 4. A Cost and Speed Comparison ....
....in this paper, such as the double x and double xy, belong to this group. One contribution of our research is a concrete cost and performance comparison for the typical deadlock avoidance routers. Chien discussed the same issue by modeling the costs and speeds for the inside of the router chips[3]. Duato and Lopez used Chien s model with their own assumption for the channel delay to evaluate routing algorithms[9] We demonstrated alternative method of comparisons by using the practical delays based on the HDL designs. Another difference is that we showed the effect of the ....
A.A. Chien: "A Cost and Speed Model for k-ary n-cube Wormhole Routers", Proc. Hot Interconnects, (August 1993).
....wormhole routers, adaptive routing, deadlock recovery, torus network, hardware cost 1. Introduction Adaptive routing has been considered for its ability to improve interconnection network performance. Chien proposed a cost and speed model for a fair comparison among various router designs[3]. He insisted that balance between routing exibility and design complexity was important. Several pieces of research showed the advantage of adaptive routing based on his model[5, 8] More recently, L opez discussed an optimal router design using Duato s routing algorithm and Chien s model[10] ....
A.A. Chien: \A Cost and Speed Model for k-ary n-cube Wormhole Routers", Proc. Hot Interconnects (1993).
....section 2.4 consider the throughput of wormhole switched k ary n cube networks. Throughput and latency are the most important performance measures. Finally, section 2.5 addresses routing which is a critical design factor for wormhole switched networks. 2. 1 Topology A k ary n cube network [23,31,32,104,139,143]has k nodes in each of n dimensions, giving a total of N = k n nodes. Figure 2.1 illustrates a 3 ary 3 cube with a total of 27 nodes. The topology is direct and regular [44, 162] and the number k is called the radix. As suggested by the expanded view each node has both a computing unit and a ....
....on VLSI technology and the application domain is large scale interconnection networks [144, 145] Compactness and high performance is largely due to the fact that the switching devices are relieved from buffer management. The price to pay is the implementation cost of flit level flow control [23]. The WH switching principle origins from research on high performance interconnection networks for multiprocessor systems [7,31,36] A list of commercial machines using WH switching is [74, 75, 100, 107, 113, 124, 146] Today, the same technique is finding its way also into LAN [12, 25, 43] and ....
CHIEN, A. A cost and speed model for k-ary-n-cube wormhole routers. In Proc. of the Hot Interconnects Workshop'93, 1st Symposium on High-Performance Interconnects (Aug. 1993).
....Delay Model for Router Micro architectures Li Shiuan Peh William J. Dally lspeh cs.stanford.edu billd csl.stanford.edu Computer Systems Laboratory Stanford University Stanford, CA94305 Abstract. Current router models [2, 3, 5, 6] assume that clock cycle time depends solely on router latency. However, in practice, routers are heavily pipelined, making cycle time largely independent of router latency. In this paper, we describe a router delay model that accurately accounts for pipelining based on technology independent ....
....implementation complexity and the impact on router delay, simply assuming unit router delay. This can lead to inaccurate and skewed comparisons. A router delay model which enables designers and researchers to factor in implementation specific delay estimates will thus be invaluable. Chien [2, 3] proposed a router model for wormhole and virtual channel routers 1 to address this need. In his model, he presented a canonical router architecture as depicted in Figure1, which can be applied to all routers, regardless of the flow control or routing technique governing the router. The ....
Andrew A. Chien, "A Cost and Speed Model for k-ary ncube Wormhole Routers", IEEE Transactions of Parallel and Distributed Systems, vol. 9, no. 2, February 1998.
....Delay Model for Router Micro architectures Li Shiuan Peh William J. Dally lspeh cs.stanford.edu billd csl.stanford.edu Computer Systems Laboratory Stanford University Stanford, CA94305 Abstract. Current router models [2, 3, 5, 6] assume that clock cycle time depends solely on router latency. However, in practice, routers are heavily pipelined, making cycle time largely independent of router latency. In this paper, we describe a router delay model that accurately accounts for pipelining based on technology independent ....
....implementation complexity and the impact on router delay, simply assuming unit router delay. This can lead to inaccurate and skewed comparisons. A router delay model which enables designers and researchers to factor in implementation specific delay estimates will thus be invaluable. Chien [2, 3] proposed a router model for wormhole and virtual channel routers 1 to address this need. In his model, he presented a canonical router architecture as depicted in Figure1, which can be applied to all routers, regardless of the flow control or routing technique governing the router. The ....
Andrew A. Chien, "A Cost and Speed Model for k-ary ncube Wormhole Routers", In Proceedings of Hot Interconnects, Palo Alto, August 1993.
....explore the effect of varying numbers of physical and virtual channels on the latency of a pipelined router. Simulation results comparing wormhole and virtual channel routers using pipelines proposed by the model are presented in Section 5 and Section 6 concludes the paper. 2. Related Work Chien [2, 3] first noted the need for router delay models which consider implementation complexity, and proposed a router model for wormhole and virtual channel routers. Chien s model uses the router architecture of Figure 1, which was employed in the Torus Routing Chip [6] for all routers regardless of the ....
....time. Using this accurate model we compare the performance of wormhole, virtual channel, and speculative virtualchannel flow control. Our results show that both virtualchannel routers give a substantial throughput gain over a straight wormhole router, contrary to previously reported results [3]. We compare simulations using our accurate pipelined model with simulations based on a single cycle router model and find considerable differences between the two models. The single cycle model greatly underestimates latency by ignoring pipeline delays. It also overestimates throughput by not ....
Andrew A. Chien, "A Cost and Speed Model for k-ary ncube Wormhole Routers", IEEE Transactions of Parallel and Distributed Systems, vol. 9, no. 2, February 1998.
....explore the effect of varying numbers of physical and virtual channels on the latency of a pipelined router. Simulation results comparing wormhole and virtual channel routers using pipelines proposed by the model are presented in Section 5 and Section 6 concludes the paper. 2. Related Work Chien [2, 3] first noted the need for router delay models which consider implementation complexity, and proposed a router model for wormhole and virtual channel routers. Chien s model uses the router architecture of Figure 1, which was employed in the Torus Routing Chip [6] for all routers regardless of the ....
Andrew A. Chien, "A Cost and Speed Model for k-ary ncube Wormhole Routers", In Proceedings of Hot Interconnects, Palo Alto, August 1993.
No context found.
A. A. Chien. A cost and speed model for k-ary n-cube wormhole routers. IEEE Transactions on Parallel and Distributed Systems, 9(2):150--162, 1998.
No context found.
A. A. Chien. A cost and speed model for k-ary n-cube wormhole routers. In Proceedings of Hot Interconnects, 1993.
No context found.
A. A. Chien. A cost and speed model for k-ary n-cube wormhole routers. In Proceedings of Hot Interconnects, 1993.
No context found.
A. A. Chien. A cost and speed model for k-ary n-cube wormhole routers. IEEE Transactions on Parallel and Distributed Systems, 9(2):150--162, Feb. 1998.
No context found.
A. A. Chien. A cost and speed model for k-ary n-cube wormhole routers. IEEE Transactions on Parallel and Distributed Systems, 9(2):150--162, Feb. 1998.
No context found.
A. A. Chien. A cost and speed model for k-ary n-cube wormhole routers. IEEE Transactions on Parallel and Distributed Systems, 9(2):150--162, 1998.
No context found.
A. A. Chien. A cost and speed model for k-ary n-cube wormhole routers. IEEE Transactions on Parallel and Distributed Systems, 9(2):150--162, 1998.
No context found.
A. A. Chien. A cost and speed model for k-ary n-cube wormhole routers. IEEE Transactions on Parallel and Distributed Systems, 9(2):150--162, 1998.
No context found.
Andrew A. Chien, "A Cost and Speed Model for k-ary n-cube Wormhole Routers", IEEE Transactions of Parallel and Distributed Systems, vol. 9, no. 2, pp. 150-162, February 1998.
No context found.
Andrew A. Chien, "A Cost and Speed Model for k-ary n-cube Wormhole Routers", In Proceedings of Hot Interconnects, Stanford, August 1993.
No context found.
A. A. Chien, "A Cost and Speed Model for k-ary n-cube Wormhole Routers." Proc Hot Interconnects '93, Aug. 1993.
No context found.
A. Chien. A cost and speed model for k-ary n-cube wormhole routers. In Proc. of Hot Interconnects'93, Aug. 1993.
No context found.
A.A. Chien, "A Cost and Speed Model for k-ary n-Cube Wormhole Routers," IEEE Trans. Paral. Distr. Syst. 9(2), 1998, 150-162.
No context found.
A. A. Chien. A cost and speed model for k-ary n-cube wormhole routers. IEEE Transactions on Parallel and Distributed Systems, 9(2):150--162, 1998.
No context found.
A. A. Chien. A cost and speed model for k-ary n-cube wormhole routers. IEEE Transactions on Parallel and Distributed Systems, 9(2):150--162, 1998.
No context found.
A. A. Chien. A Cost and Speed Model for k-ary n-cube Wormhole Routers. In Proceedings of the Hot Interconnects Workshop, August 1993.
No context found.
A.A. Chien, A cost and speed model for k-ary n-cube wormhole routers, IEEE Trans. Parallel and Distributed Sys. 9 (2) (1998).
No context found.
A. A. Chien. A cost and speed model for k-ary ncube wormhole routers. In Proc. Hot Interconnects '93, August 1993.
No context found.
A.A. Chien, "A cost and speed model for k-ary n-cube wormhole routers," Proceedings of Hot Interconnects '93, Palo Alto, California, August 1993.
No context found.
A. Chien, "A cost and Speed Model for k-ary n-cube wormhole router", In Proc. of Hot Interconnects, August 1993.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC