Results 1 - 10
of
377
Token flow control
"... As companies move towards many-core chips, an efficient onchip communication fabric to connect these cores assumes critical importance. To address limitations to wire delay scalability and increasing bandwidth demands, state-of-the-art on-chip networks use a modular packet-switched design with route ..."
Abstract
-
Cited by 635 (35 self)
- Add to MetaCart
As companies move towards many-core chips, an efficient onchip communication fabric to connect these cores assumes critical importance. To address limitations to wire delay scalability and increasing bandwidth demands, state-of-the-art on-chip networks use a modular packet-switched design with routers at every hop which allow sharing of network channels over multiple packet flows. This, however, leads to packets going through a complex router pipeline at every hop, resulting in the overall communication energy/delay being dominated by the router overhead, as opposed to just wire energy/delay. In this work, we propose token flow control (TFC), a flow control mechanism in which nodes in the network send out tokens in their local neighborhood to communicate information about their available resources. These tokens are then used in both routing and flow control: to choose less congested paths in the network and to bypass the router pipeline along those paths. These bypass paths are formed dynamically, can be arbitrarily long and, are highly flexible with the ability to match to a packetâs exact route. Hence, this allows packets to potentially skip all routers along their path from source to destination, approaching the communication energy-delaythroughput of dedicated wires. Our detailed implementation analysis shows TFC to be highly scalable and realizable at an aggressive target clock cycle delay of 21FO4 for large networks while requiring low hardware complexity. Evaluations of TFC using both synthetic traffic and traces from the SPLASH-2 benchmark suite show reduction in packet latency by up to 77.1 % with upto 39.6 % reduction in average router energy consumption as compared to a state-of-theart baseline packet-switched design. For the same saturation throughput as the baseline network, TFC is able to reduce the amount of buffering by 65 % leading to a 48.8 % reduction in leakage energy and a 55.4 % lower total router energy.
Optical Burst Switching (OBS) -- A New Paradigm for an Optical Internet
- JOURNAL OF HIGH SPEED NETWORKS
, 1999
"... To support bursty traffic on the Internet (and especially WWW) efficiently, optical burst switching (OBS) is proposed as a way to streamline both protocol and hardware in building the future generation Optical Internet. By leveraging the attractive properties of optical communications and at the sam ..."
Abstract
-
Cited by 381 (14 self)
- Add to MetaCart
To support bursty traffic on the Internet (and especially WWW) efficiently, optical burst switching (OBS) is proposed as a way to streamline both protocol and hardware in building the future generation Optical Internet. By leveraging the attractive properties of optical communications and at the same time, taking into account its limitations, OBS combines the best of optical circuitswitching and packet/cell switching. In this paper, the general concept of OBS protocols and in particular, those based on Just-Enough-Time (JET), is described, along with the applicability of OBS protocols to IP over WDM. Specific issues such as the use of fiber delay-lines (FDL) for accommodating processing delay and/or resolving conflicts are also discussed. In addition, the performance of JET-based OBS protocols which use an offset time along with delayed reservation to achieve efficient utilization of both bandwidth and FDLs as well as to support priority-based routing is evaluated.
Performance analysis of k-ary n-cube interconnection networks
- IEEE Transactions on Computers
, 1990
"... Abstmct-VLSI communication networks are wire-limited. The cost of a network is not a function of the number of switches required, but rather a function of the wiring density required to construct the network. This paper analyzes commu-nication networks of varying dimension under the assumption of co ..."
Abstract
-
Cited by 357 (18 self)
- Add to MetaCart
(Show Context)
Abstmct-VLSI communication networks are wire-limited. The cost of a network is not a function of the number of switches required, but rather a function of the wiring density required to construct the network. This paper analyzes commu-nication networks of varying dimension under the assumption of constant wire bisection. Expressions for the latency, average case throughput, and hot-spot throughput of k-ary n-cube networks with constant bisection are derived that agree closely with experi-mental measurements. It is shown that low-dimensional networks (e.g., tori) have lower latency and higher hot-spot throughput than high-dimensional networks (e.g., binary n-cubes) with the same bisection width. Index Terms- Communication networks, concurrent comput-ing, interconnection networks, message-passing multiprocessors, parallel processing, VLSI. I.
A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks
- IEEE Transactions on Parallel and Distributed Systems
, 1993
"... Abstract- Second generation multicomputers use wormhole routing, allowing a very low channel setup time and drastically reducing the dependency between network latency and internode distance. Deadlock-free routing strategies have been developed, allowing the implementation of fast hardware routers t ..."
Abstract
-
Cited by 261 (28 self)
- Add to MetaCart
Abstract- Second generation multicomputers use wormhole routing, allowing a very low channel setup time and drastically reducing the dependency between network latency and internode distance. Deadlock-free routing strategies have been developed, allowing the implementation of fast hardware routers that reduce the communication bottleneck. Also, adaptive routing algorithms with deadlock-avoidance or deadlock-recovery techniques have been proposed for some topologies, being very effective and outperforming static strategies. This paper develops the theoretical background for the design of deadlock-free adaptive routing algorithms for wormhole net-works. Some basic definitions and two theorems are proposed, developing conditions to verify that an adaptive algorithm is deadlock-free, even when there are cycles in the channel de-pendency graph. Also, two design methodologies are proposed.
High Speed Switch Scheduling for Local Area Networks
- ACM Transactions on Computer Systems
, 1993
"... Current technology trends make it possible to build communication networks that can support high performance distributed computing. This paper describes issues in the design of a prototype switch for an arbitrary topology point-to-point network with link speeds of up to one gigabit per second. The s ..."
Abstract
-
Cited by 246 (3 self)
- Add to MetaCart
Current technology trends make it possible to build communication networks that can support high performance distributed computing. This paper describes issues in the design of a prototype switch for an arbitrary topology point-to-point network with link speeds of up to one gigabit per second. The switch deals in fixed-length ATM-style cells, which it can process at a rate of 37 million cells per second. It provides high bandwidth and low latency for datagram traffic. In addition, it supports real-time traffic by providing bandwidth reservations with guaranteed latency bounds. The key to the switch's operation is a technique called parallel iterative matching, which can quickly identify a set of conflict-free cells for transmission in a time slot. Bandwidth reservations are accommodated in the switch by building a fixed schedule for transporting cells from reserved flows across the switch; parallel iterative matching can fill unused slots with datagram traffic. Finally, we note that pa...
Planar-adaptive routing: Low-cost adaptive networks for multiprocessors.
- Proceedings of the 19th International Symposium on Computer Architecture,
, 1992
"... ..."
(Show Context)
Limits on Interconnection Network Performance
- IEEE Transactions on Parallel and Distributed Systems
, 1991
"... As the performance of interconnection networks becomes increasingly limited by physical constraints in high-speed multiprocessor systems, the parameters of high-performance network design must be reevaluated, starting with a close examination of assumptions and requirements. This paper models networ ..."
Abstract
-
Cited by 194 (4 self)
- Add to MetaCart
(Show Context)
As the performance of interconnection networks becomes increasingly limited by physical constraints in high-speed multiprocessor systems, the parameters of high-performance network design must be reevaluated, starting with a close examination of assumptions and requirements. This paper models network latency, taking both switch and wire delays into account. A simple closed form expression for contention in buffered, direct networks is derived and is found to agree closely with simulations. The model includes the effects of packet size and communication locality. Network analysis under various constraints (such as fixed bisection width, fixed channel width, and fixed node size) and under different workload parameters (such as packet size, degree of communication locality, and network request rate) reveals that performance is highly sensitive to these constraints and workloads. A twodimensional network has the lowest latency only when switch delays and network contention are ignored, but...
A Necessary and Sufficient Condition for Deadlock-Free Routing in Cut-Through and Store-and-Forward Networks
, 1995
"... This paper develops the theoretical background for the design of deadlockfree adaptive routing algorithms for virtual cut-through and store-and-forward switching. This theory is valid for networks using either central buffers or edge buffers. Some basic definitions and three theorems are proposed, d ..."
Abstract
-
Cited by 159 (19 self)
- Add to MetaCart
This paper develops the theoretical background for the design of deadlockfree adaptive routing algorithms for virtual cut-through and store-and-forward switching. This theory is valid for networks using either central buffers or edge buffers. Some basic definitions and three theorems are proposed, developing conditions to verify that an adaptive algorithm is deadlock-free, even when there are cyclic dependencies between routing resources. Moreover, we propose a necessary and sufficient condition for deadlock-free routing. Also, a design methodology is proposed. It supplies fully adaptive, minimal and non-minimal routing algorithms, guaranteeing that they are deadlock-free. The theory proposed in this paper extends the necessary and sufficient condition for wormhole switching previously proposed by us. The resulting routing algorithms are more flexible than the ones for wormhole switching. Also, the design methodology is much easier to apply because it automatically supplies deadlock-fr...
Performance and Stability of Communication Networks via Robust Exponential Bounds
- IEEE/ACM Transactions on Networking
, 1993
"... We propose a new way for evaluating the performance of packet switching communication networks under a fixed (session based) routing strategy. Our approach is based on properly bounding the probability distribution functions of the system input processes. The bounds we suggest, which are decaying ex ..."
Abstract
-
Cited by 150 (3 self)
- Add to MetaCart
(Show Context)
We propose a new way for evaluating the performance of packet switching communication networks under a fixed (session based) routing strategy. Our approach is based on properly bounding the probability distribution functions of the system input processes. The bounds we suggest, which are decaying exponentials, possess three convenient properties: When the inputs to an isolated network element are all bounded, they result in bounded outputs, and assure that the delays and queues in this element have exponentially decaying distributions; In some network settings bounded inputs result in bounded outputs; Natural traffic processes can be shown to satisfy such bounds. Consequently, our method enables the analysis of various previously intractable setups. We provide sufficient conditions for the stability of such networks, and derive upper bounds for the interesting parameters of network performance. 1 Introduction In this paper we consider data communication networks, and the problem of ev...
The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor
- In Proceedings of Workshop on Scalable Shared Memory Multiprocessors
, 1991
"... The Alewife multiprocessor project focuses on the architecture and design of a large-scale parallel machine. The machine uses a low-dimensional direct interconnection network to provide scalable communication bandwidth, while allowing the exploitation of locality. Despite its distributed-memory arch ..."
Abstract
-
Cited by 148 (25 self)
- Add to MetaCart
(Show Context)
The Alewife multiprocessor project focuses on the architecture and design of a large-scale parallel machine. The machine uses a low-dimensional direct interconnection network to provide scalable communication bandwidth, while allowing the exploitation of locality. Despite its distributed-memory architecture, Alewife allows efficient shared-memory programming through a multilayered approach to locality management. A new scalable cache-coherence scheme called LimitLESS directories allows the use of caches for reducing communication latency and network bandwidth requirements. Alewife also employs run-time and compile-time methods for partitioning and placement of data and processes to enhance communication locality. While the above methods attempt to minimize communication latency, communication with distant processors cannot be completely avoided. Alewife's processor, Sparcle, is designed to tolerate these latencies by rapidly switching between threads of computation. This paper describe...