| K. Aoyama and A. A. Chien. The cost of adaptivity and virtual lanes in a wormhole router. submitted to Journal of VLSI Design, 1993. |
....done in the past, however, has been done using synthetic workloads, or at most trace driven simulation. Even the few studies that have used execution driven simulation and real applications have generally based their conclusions on relatively outdated router and system models. Aoyama and Chien [1] proposed a parametric delay model of a router chip. Such a model takes into account the complexity introduced by the number of ports, virtual channels, and adaptive routing. Although it was used by researchers to time their designs, it is not a suitable model for pipelined routers and links, ....
K. Aoyama and A. A. Chien. "The Cost of Adaptivity and Virtual Lanes in a Wormhole Router". Journal of VLSI Design, Vol. 2, No. 4, 1995.
....done in the past, however, has been done using synthetic workloads, or at most trace driven simulation. Even the few studies that have used execution driven simulation and real applications have generally based their conclusions on relatively outdated router and system models. Aoyama and Chien [1] proposed a parametric delay model of a router chip. Such a model takes into account the complexity introduced by the number of ports, virtual channels, and adaptive routing. Although it was used by researchers to time their designs, it is not a suitable model for pipelined routers and links, ....
K. Aoyama and A. A. Chien. \The Cost of Adaptivity and Virtual Lanes in a Wormhole Router". Journal of VLSI Design, Vol. 2, No. 4, 1995.
....of their flit buffer and virtual channel requirements . Intuitively, algorithms that use more virtual channels should give better results, but detailed analysis has shown that the overhead associated with the virtual channels is usually very high and that the actual performance may even degrade [8]. Most performance studies on wormhole routing have resorted to simulation and measurements. Development of analytical models for performance evaluation is difficult because of the multiple and simultaneous resource possessions as well as the chained blockings during pipelined routing. However, ....
K. Aoyama and A. A. Chien , "The Cost of Adaptivity and Virtual Lanes in Wormhole Router,", To appear in the Journal of VLSI Design.
....This allows scheduled routing to perform flow control within a single inter node transfer time. Larger buffers and asynchronous flow control signals can be used to avoid round trip handshakes for dynamic routers as well, but this is costly in terms of pin resources and or buffer management. In [2], a comparison of dynamic routers is given, breaking down all the components of the cost. A simple deterministic router s cycle time is quoted at 9.8 ns, with a simple planaradaptive router taking 11.4 ns. In [56] Shoemaker uses these component estimates to derive an estimate for the cycle time ....
Kazuhiro Aoyama and Andrew A. Chien. The cost of adaptivity and virtual lanes in a wormhole router. Journal of VLSI Design, 2(4):315--333, May 1993.
....However, the router designs used in this study have very similar network cycle times (within 0.14 nanoseconds, according to [2] Therefore, instead of measuring latencies in terms of an absolute time (e.g. nanoseconds) we measure in terms of network clock cycles. If the routing design in [1] is modified such that a double buffering scheme is used to pipeline a demandmultiplexing router, the arbitration between multiple virtual channels that can transfer data across a physical channel can be overlapped with other router functions. Using this technique, the critical path of a ....
K. Aoyama and A. Chien. The Cost of Adaptivity and Virtual Lanes in a Wormhole Router. Technical report, University of Illinois at UrbanaChampaign, May 1993.
....[4] and the SGI Spider [12] A network that uses VCT is the Chaos router [3] Innumerable RTL simulators have been created for these and other studies; however comparatively few studies have accounted for hardware implementation. Of particular note are those by Chien [5] and by Aoyama and Chien [1]. 6 Conclusions and Work in Progress In this work we have endeavored to explore exhaustively the design space of low cost multicomputer networks including issues in switching, lane selection, buffer size, topology, and hardware implementation. We made some basic assumptions restricting the ....
Aoyama, K., and Chien, A. A. The cost of adaptivity and virtual lanes in a wormhole router. Journal of VLSI Design (1994).
....shown in Figure 3(b) the offsets reach zero when the packet has arrived at its destination node. The router could improve best effort performance by implementing adaptive wormhole routing, with additional virtual channels to avoid deadlock, at the expense of increased implementation complexity [19, 20]. In particular, non minimal adaptive routing would enable besteffort packets to circumvent links with a heavy load of time constrained traffic. 3.4 Buffer Architecture The real time router includes a packet memory for storing time constrained traffic awaiting access to the outgoing links; in ....
.... can support a finer grain of packet priorities by increasing the number of virtual channels, at the expense of implementation complexity; extra virtual channels incur the cost of additional flit buffers and larger virtual channel identifiers, as well as more complex switching and arbitration logic [20]. Instead of dedicating virtual channels and flit buffers to each priority level, a router can increase priority resolution by adopting a packet switched design. The priority forwarding router chip [5] follows this approach by employing a 32 bit priority field in small, 8 packet priority queues at ....
K. Aoyama and A. Chien, "Cost of adaptivity and virtual lanes in a wormhole router," Journal of VLSI Design, vol. 2, no. 4, pp. 315--333, 1995.
....4(a) the offsets reach zero when the packet has arrived at its destination node. To improve the performance of best effort traffic, an enhanced version of the router could support adaptive wormhole routing and additional virtual channels, at the expense of increased implementation complexity [31, 32]. In particular, non minimal adaptive routing would enable best effort packets to circumvent links with a heavy load of time constrained traffic. Although routing is closely tied with deadlock avoidance for best effort packets, the real time router need not dictate a particular routing scheme for ....
.... fine grain packet priorities by increasing the number of virtual channels, at the expense of additional implementation complexity; these virtual channels incur the cost of additional flit buffers and larger virtual channel identifiers, as well as more complex switching and arbitration logic [32]. Instead of dedicating virtual channels and flit buffers to each priority level, a router can increase priority resolution by adopting a packet switched design. The priority forwarding router chip [6] follows this approach by employing a 32 bit priority field in small, 8packet priority queues at ....
K. Aoyama and A. Chien, "Cost of adaptivity and virtual lanes in a wormhole router," Journal of VLSI Design, vol. 2, no. 4, pp. 315--333, 1995.
....by virtual threads to keep any one particular communication thread from blocking a resource. In addition, a more complicated communication controller remembers recent network activity in its area and attempts to route subsequent traffic to parts of the network that are less utilized. Much research [Dall91, Aoya93] has gone into trying to evaluate the clk 1 clk2 clk3 clk1 clk2 clk3 msg A msg B Figure 1.3 Deterministic Router Suffers Congestion 16 trade off between adaptive router complexity vs network congestion improvement. Even the most complex adaptive routers are often unable to minimize network ....
....since a wide variety of technologies and implementations are possible and can confuse the comparison. The main advantage of the NuMesh CFSM is the lack of data dependent run time decisions and the ability to perform flow control operations with a single internode transfer. A paper out of Illinois [Aoya93] made a comparison to dynamic routers, all implemented in the same technology. In the paper, each contributor to the clock period was identified. Since the contributors to the delay for the Figure 6.2 JTAG Boundary Scan (Input Pin) D Q SI D Q G to next pin from previous pin from input pin G JTAG ....
K. Aoyama and A. Chien. "The Cost of Adaptivity and Virtual Lanes in a Wormhole Router", Journal of VLSI Design. Vol. 2, No. 4, pages 315-333, May 1993.
....buffers, and separate or shared buffers. While switches consist of many components, we focus here on the costs of the two parts, scheduling and queueing logic, which are crucial components in supporting integrated service networks. Because costs for other basic components have been studied before [64, 65], our study examines and quantifies the additional costs of switches for integrated service networks. With the cost analysis, we can also compare relative implementation costs of different scheduling algorithms. Although switches for different scheduling algorithms can be implemented best in ....
....(queueing and scheduling) because implementations of different algorithms will all include similar, basic switch logic such as a crossbar, routing table, input and output port logic, flow control and fault tolerance logic. Because costs for these basic components have been studied before [64, 65], our study reveals the additional costs of switches for integratedservice networks. Although switches for different scheduling algorithms can be implemented best in slightly different ways, their fundamental differences in implementation complexity come from scheduling and buffering support. As a ....
K. Aoyama and A. A. Chien. "The cost of adaptivity and virtual lanes in a wormhole router," Journal of VLSI Design, Special Issue on Interconnection Networks, vol. 2, no. 4, pp. 315--333, Apr. 1995.
....data links, and six control lines for parity and control each. To evaluate memory interface performance, we compare cache refill times over a range of line sizes. The performance numbers assumed for the calculation are shown in Table III and they are derived from our hardware design studies [4] including SPICE simulations of multi tap bus lines. We further assume 12 The only difference lies in that the destination addresses for reply packets become remote memory nodes instead of the local processor. Intel i860XP DI micro Address (29) Data (64) Control (46) Memory Memory Memory ....
....the routing delay. The higher network clock rate for the DI microprocessor memory interface is due to the electrical advantages of point to point interconnects over multi tap bus lines [31, 18] Router delay is based on a number of published implementation studies [17, 32] and our own designs [4, 8]. The actions required to complete a cache line reload in each system are illustrated in Figure 15. Table III: Memory and interconnect performance numbers assumed for the evaluation. Architecture component Characteristics Performance number assumed Memory Module Access Time 20 ns Network ....
[Article contains additional citation context not shown here]
Aoyama, K. The cost of adaptivity and virtual lanes in a wormhole router. In Journal of VLSI Design, 1994.
....decision. By considering multiple outgoing links, adaptive algorithms can balance network load and increase a packet s chance of cutting through intermediate nodes, at the expense of out of order packet arrivals at the destination node [27] and an increase in router implementation complexity [28]. The cut through probability also depends on the selection function [29] which determines which order the router considers the candidate outgoing links. This paper presents analytical models that compare the cut through performance of a collection of oblivious and adaptive routing algorithms, ....
K. Aoyama and A. Chien, "Cost of adaptivity and virtual lanes in a wormhole router," Journal of VLSI Design, vol. 2, no. 4, pp. 315--333, 1995.
....Such an approach prevents the directory controller of the destination node from processing these messages in a timely fashion. This results in an increase in effective message latency and nullifies the latency reduction gained by the virtual channel mechanism. ffl A performance degradation model [2] for implementing the virtual channel mechanism has been used in the evaluation. However, new techniques [17] are available today to design routers with a moderate number of virtual channels (up to 4 or 5) without increasing the routing decision delay for the header flit. Using these techniques, ....
....at the knee points in Fig. 11. 5.2.4 Impact of Routing Delay Next, we studied the impact of routing decision delay on the benefit of virtual channels. Some researchers believe that the network speed can not remain unchanged as more number of virtual channels are supported. In an earlier study [2], a performance model of 30 slowdown in network cycle time for adding each virtual channel has been proposed. However, a more careful analysis on the problem reveals that the slowdown is mainly caused by the routing delay of the header flit at a router. As the network technology advances and ....
K. Aoyama and A. A. Chien. The Cost of Adaptivity and Virtual Lanes in a Wormhole Router. Journal of VLSI Design, 2(4):315--333, 1995.
....or even superior performance to fully adaptive routers. We are currently pursuing construction of hardware prototypes to evaluate the cost of adaptive routers. Based on these designs, we are pursuing a careful characterization of the cost of a variety of router extensions with great interest [8, 2]. Fundamental to the evaluation of limited adaptivity routers lies a deeper question. How much adaptivity do routing networks need This question will only be answered as application programs and software systems for massively parallel machine mature. An unanswered question is how to best make ....
K. Aoyama and A. A. Chien. The cost of adaptivity and virtual lanes in a wormhole router. submitted to Journal of VLSI Design, 1993.
....simpler in both setup and flow control than any of the other adaptive routers, giving faster operation. However, this advantage is tenuous, as adding even a few virtual lanes (and the required virtual channel controllers) to dimension order routing can effectively eliminate the speed advantage [5, 4]. 5.2 Planar Adaptive Router AD FC AD FC AD FC AD FC Routing Arbitration (RA) VC Network Inputs Network Outputs X Y1 Y1Plane Input X Y1To Next Adaptive Plane AD FC AD FC AD FC AD FC Routing Arbitration (RA) VC Y1 Y1Y1 To Next Adaptive Plane From ....
....than that of the planaradaptive router even beyond ten dimensions, due to the absence of virtual channel controllers. Avoiding the requirement of virtual channel controllers simplifies routers, giving significant performance benefits. A more detailed analysis of these issues can be found in [5, 4]. AD FC VC Routing Arbitration (RA) VC VC VC Network Inputs Network Outputs VC Crossbar (CB) 9 x 9 Out from Network AD FC AD FC AD FC AD FC AD FC AD FC AD FC AD FC To Network Figure 8: The architecture of a two dimensional channels router. It requires both ....
K. Aoyama and A. A. Chien. The cost of adaptivity and virtual lanes in a wormhole router. submitted for publication, 1993.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC