Results 1 - 10
of
52
Towards Scalable, Energy-Efficient, Bus-Based On-Chip Networks
, 2010
"... It is expected that future on-chip networks for many-core processors will impose huge overheads in terms of energy, delay, complexity, verification effort, and area. There is a common belief that the bandwidth necessary for future applications can only be provided by employing packet-switched networ ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
It is expected that future on-chip networks for many-core processors will impose huge overheads in terms of energy, delay, complexity, verification effort, and area. There is a common belief that the bandwidth necessary for future applications can only be provided by employing packet-switched networks with complex routers and a scalable directory-based coherence protocol. We posit that such a scheme might likely be overkill in a well designed system in addition to being expensive in terms of power because of a large number of power-hungry routers. We show that bus-based networks with snooping protocols can significantly lower energy consumption and simplify network/protocol design and verification, with no loss in performance. We achievethesecharacteristicsbydividingthe chip into multiple segments, each having its own broadcast bus, with these buses further connected by a central bus. This helps eliminate expensive routers, but suffers from the energy overhead of long wires. We propose the use of multiple Bloom filters to effectively track data presenceinthecacheandrestrict busbroadcaststoasubsetof segments, significantly reducing energy consumption. We further show that the use of OS page coloring helps maximize locality and improves the effectiveness of the Bloom filters. We also employ low-swing wiring to furtherreduce the energy overheads of the links. Performance can also be improved at relatively low costs by utilizing more of the abundant metal budgets on-chip and employing multiple address-interleaved buses rather than multiple routers. Thus, with the combination of all the above innovations, we extend thescalabilityofbusesandbelievethatbusescanbe a viable and attractive option for future on-chip networks. We show energy reductions of up to 31X on average compared to many state-of-the-art packet-switched networks.
Design and Implementation of Backtracking Wave-Pipeline Switch to Support Guaranteed Throughput in Network-on-Chip
"... Abstract—It is a challenging task in a network-on-chip to design an on-chip switch/router to dynamically support (hard) guaranteed throughput under very tight on-chip constraints of power, timing, area, and time-to-market. This paper presents the design and implementation of a novel pipeline circuit ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
(Show Context)
Abstract—It is a challenging task in a network-on-chip to design an on-chip switch/router to dynamically support (hard) guaranteed throughput under very tight on-chip constraints of power, timing, area, and time-to-market. This paper presents the design and implementation of a novel pipeline circuit-switched switch to support guaranteed throughput. The proposed circuit-switched switch, based on a backtracking probing path setup, operates with a source-synchronous wave-pipeline approach. The switch can support a dead- and live-lock free dynamic path-setup scheme and can achieve high bandwidth and high area and energy efficiency. A silicon-proven prototype of a 16-bit-data 5-bidirectional-port switch in a four-metal-layer 0.18- m CMOS standard-cell technology can yield an aggregate data bandwidth of up to 73.84 Gb/s, while occupying only a modest area of 0.0315 mm2. The synthesizable implementation of the proposed switch also results in a cost-effective design, fast development time, and portability. Index Terms—Backtracking, circuit-switched, dynamic path-setup, guaranteed throughput, network-on-chip (NoC),
In-network Monitoring and Control Policy for DVFS of CMP Networks-on-Chip and Last Level Caches
"... Abstract—In chip design today and for a foreseeable future, on-chip communication is not only a performance bottleneck but also a substantial power consumer. This work focuses on employing dynamic voltage and frequency scaling (DVFS) policies for networks-on-chip (NoC) and shared, distributed last-l ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Abstract—In chip design today and for a foreseeable future, on-chip communication is not only a performance bottleneck but also a substantial power consumer. This work focuses on employing dynamic voltage and frequency scaling (DVFS) policies for networks-on-chip (NoC) and shared, distributed last-level caches (LLC). In particular, we consider a practical system architecture where the distributed LLC and the NoC share a voltage/frequency domain which is separate from the core domain. This architecture enables controlling the relative speed between the cores and memory hierarchy without introducing synchronization delays within the NoC. DVFS for this archi-tecture is more difficult than individual link/core-based DVFS since it involves spatially distributed monitoring and control. We propose an average memory access time (AMAT)-based monitoring technique and integrate it with DVFS based on PID control theory. Simulations on PARSEC benchmarks yield a 33 % dynamic energy savings with a negligible impact on system performance. Index Terms—Multicore, NoC, dynamic power, memory system I.
Comparing Energy and Latency of Asynchronous and Synchronous NoCs for Embedded SoCs
"... Abstract—Power consumption of on-chip interconnects is a primary concern for many embedded system-on-chip (SoC) applications. In this paper, we compare energy and performance characteristics of asynchronous (clockless) and synchronous networkon-chip implementations, optimized for a number of SoC des ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
(Show Context)
Abstract—Power consumption of on-chip interconnects is a primary concern for many embedded system-on-chip (SoC) applications. In this paper, we compare energy and performance characteristics of asynchronous (clockless) and synchronous networkon-chip implementations, optimized for a number of SoC designs. We adapted the COSI-2.0 framework with ORION 2.0 router and wire models for synchronous network generation. Our own tool, ANetGen, specifies the asynchronous network by determining the topology with simulated-annealing and router locations with force-directed placement. It uses energy and delay models from our 65nm bundled-data router design. SystemC simulations varied traffic burstiness using the self-similar b-model. Results show that the asynchronous network provided lower median and maximum message latency, especially under bursty traffic, and used far less router energy with a slight overhead for the interrouter wires. I.
Exploration of Temperature Constraints for Thermal Aware Mapping of 3D Networks on Chip
"... This paper proposes three ILP-based static thermalaware mapping algorithms for 3D Networks on Chip (NoC) to explore the thermal constraints and their effects on temperature and performance. Through complexity analysis, we show that the first algorithm, an optimal one, is not suitable for 3D NoC. The ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
This paper proposes three ILP-based static thermalaware mapping algorithms for 3D Networks on Chip (NoC) to explore the thermal constraints and their effects on temperature and performance. Through complexity analysis, we show that the first algorithm, an optimal one, is not suitable for 3D NoC. Therefore, we develop two approximation algorithms and analyze their algorithmic complexities to show their proficiency. As the simulation results show, the mapping algorithms that employ direct thermal calculation to minimize the temperature reduce the peak temperature by up to 24 % and 22%, for the benchmarks that have the highest communication rate and largest number of tasks, respectively. This comes at the price of a higher power-delay product. This exploration shows that considering power balancing early in the mapping algorithms does not affect the chip temperature. Moreover, it shows that considering the explicit performance constraint in the thermal mapping has no major effect on performance. 1 1.
Generic Low-Latency NoC Router Architecture for FPGA Computing Systems
- Proc. FPL, 2011
"... Abstract—A novel cost-effective and low-latency wormhole router for packet-switched NoC designs, tailored for FPGA, is presented. This has been designed to be scalable at system level to fully exploit the characteristics and constraints of FPGA based systems, rather than custom ASIC technology. A ke ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract—A novel cost-effective and low-latency wormhole router for packet-switched NoC designs, tailored for FPGA, is presented. This has been designed to be scalable at system level to fully exploit the characteristics and constraints of FPGA based systems, rather than custom ASIC technology. A key feature is that it achieves a low packet propagation latency of only two cycles per hop including both router pipeline delay and link traversal delay- a significant enhancement over existing FPGA designs- whilst being very competitive in terms of performance and hardware complexity. It can also be configured in various network topologies including 1-D, 2-D, and 3-D. Detailed design-space exploration has been carried for a range of scaling parameters, with the results of various design trade-offs being presented and discussed. By taking advantage of abundant build-in reconfigurable logic and routing resources, we have been able to create a new scalable on-chip FPGA based router that exhibits high dimensionality and connectivity. The architecture proposed can be easily migrated across many FPGA families to provide flexible, robust and cost-effective NoC solutions suitable for the implementation of high-performance FPGA computing systems. I.
A Flexible Analytic Model for the Design Space Exploration of Many-Core Network-on-Chips Based on Queueing Theory
- in Proceedings of the Fourth International Conference on Advances in System Simulation, ser. SIMUL
"... Abstract—A continuing technology scaling and the increasing requirements of modern embedded applications are most likely forcing a current multi-processor system-on-chip design to scale to a many-core system-on-chip with thousands of cores on a single chip. Network-on-chip emerged as flexible and hi ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
Abstract—A continuing technology scaling and the increasing requirements of modern embedded applications are most likely forcing a current multi-processor system-on-chip design to scale to a many-core system-on-chip with thousands of cores on a single chip. Network-on-chip emerged as flexible and high-performance solution for the interconnection problem. There will be an urgent need for fast, flexible and accurate simulation models to guide the design process of many-core system-on-chip. In this paper, we introduce a novel analytic approach for modeling on-chip networks to fulfill these requirements. The model is based on queueing theory and very flexible in terms of supported topology, routing scheme and traffic pattern. The approach overcomes the limitations of the mean value analysis introduced in the existing work. Instead, it provides information about a steady-state distribution of the network routers. This allows to dimension network resources, such as buffers, links, etc. We show the high accuracy of the model by comparison with a cycle-accurate simulation. The model is able to estimate the mean network latency with an accuracy of about 3%. Keywords-network-on-chip; noc; queueing theory; analytic model. I.
Bandwidth-Aware Application Mapping for NoC-Based MPSoCs
, 2011
"... Network-on-Chip (NoC) has been introduced to meet the communication challenges for on chip multi-processors and the bandwidth of NoC takes a significant role in area and power consumption of overall system. In order to minimize the bandwidth requirement of NoC, a mapping method is proposed to schedu ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Network-on-Chip (NoC) has been introduced to meet the communication challenges for on chip multi-processors and the bandwidth of NoC takes a significant role in area and power consumption of overall system. In order to minimize the bandwidth requirement of NoC, a mapping method is proposed to schedule the tasks of an application onto NoC architecture. More precisely, given the application task graph of a specific application, an ACO-based (Ant Colony Optimization) algorithm is used to map the tasks onto the NoC, such that the bandwidth requirement of NoC is minimized. The benefit of our method is evaluated by simulation and the simulation results show that our method can achieve about 48 % reduction in bandwidth requirement of NoC compared to the state-of-art method.
An Outlook on Design Technologies for Future Integrated Systems
, 2009
"... The economic and social demand for ubiquitous and multifaceted electronic systems—in combination with the unprecedented opportunities provided by the integration of various manufacturing technologies—is paving the way to a new class of heterogeneous integrated systems, with increased performance and ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The economic and social demand for ubiquitous and multifaceted electronic systems—in combination with the unprecedented opportunities provided by the integration of various manufacturing technologies—is paving the way to a new class of heterogeneous integrated systems, with increased performance and connectedness and providing us with gateways to the living world. This paper surveys design requirements and solutions for heterogeneous systems and addresses design technologies for realizing them.
Design of an Energy-Efficient Asynchronous NoC and Its Optimization Tools for Heterogeneous SoCs
"... Abstract—The energy usage of on-chip interconnects is a concern for many system-on-chips targeting portable batterypowered devices. We have designed and evaluated a network-onchip (NoC) for such an application, including tools to optimize for power and communication latency. Our asynchronous (clockl ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Abstract—The energy usage of on-chip interconnects is a concern for many system-on-chips targeting portable batterypowered devices. We have designed and evaluated a network-onchip (NoC) for such an application, including tools to optimize for power and communication latency. Our asynchronous (clockless) network operates with efficient two-phase bundled-data links and four-phase routers. The topology and router floorplan is determined by our tool, ANetGen, which optimizes the network for energy and latency using simulated annealing and forcedirected placement methods. We compare our solutions against a traditional synchronous NoC as specified by the COSI-2.0 framework and ORION 2.0 router and wire energy models. Traffic is simulated with SystemC functional models, and messages are generated with a “bursty ” self-similar b-model. Results indicate our asynchronous network was more energy-efficient, lower in area, and provided comparable or superior message latency. Index Terms—Application-specific, asynchronous design, embedded, GALS, low-power, network-on-chip, system-on-chip. I.