Results 1 - 10
of
18
A case for heterogeneous on-chip interconnects for cmps
- In ISCA
, 2011
"... Network-on-chip (NoC) has become a critical shared resource in the emerging Chip Multiprocessor (CMP) era. Most prior NoC designs have used the same type of router across the entire net-work. While this homogeneous network design eases the burden on a network designer, partitioning the resources equ ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
(Show Context)
Network-on-chip (NoC) has become a critical shared resource in the emerging Chip Multiprocessor (CMP) era. Most prior NoC designs have used the same type of router across the entire net-work. While this homogeneous network design eases the burden on a network designer, partitioning the resources equally among all routers across the network does not lead to optimal resource us-age, and hence, affects the performance-power envelope. In this work, we propose to apportion the resources in an NoC to leverage the non-uniformity in network resource demand. Our proposal in-cludes partitioning the network resources, specifically buffers and links, in an optimal manner. This approach results in redistributing resources such that routers that require more resources are allocated more buffers and wider links compared to routers demanding fewer resources. This results in a novel heterogeneous network, called HeteroNoC, which is composed of two types of routers – small power efficient routers, and big high performance routers. We eval-uate a number of heterogeneous network configurations, composed of big and small routers, and show that giving more resources to routers along the diagonals in a mesh network provides maximum benefits in terms of performance and power. We also show the po-tential benefits of the HeteroNoC design by co-evaluating it with memory-controllers and configuring it with an asymmetric CMP consisting of heterogeneous cores.
Delay Analysis of Wormhole Based Heterogeneous NoC
"... We introduce a novel evaluation methodology to analyze the delay of a wormhole routing based NoC with variable link capacities and a variable number of virtual channels per link. This methodology can be utilized to analyze different heterogeneous NoC architectures and traffic scenarios for which no ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
We introduce a novel evaluation methodology to analyze the delay of a wormhole routing based NoC with variable link capacities and a variable number of virtual channels per link. This methodology can be utilized to analyze different heterogeneous NoC architectures and traffic scenarios for which no analysis framework has been developed before. In particular, it can replace computationally-extensive simulations at the inner-loop of the link capacities and virtual channels allocation steps of the NoC topology optimization process. Our analysis introduces a set of implicit equations which can be efficiently solved iteratively. We demonstrate the accuracy of our approximation by comparing the analysis results to a simulation model for several use-cases and synthetic examples. In addition, we compare the analysis with simulation results for a chip-multi-processor (CMP) using SPLASH-2 and PARSEC traces for both homogeneous and heterogeneous NoC configurations.
Comparing Energy and Latency of Asynchronous and Synchronous NoCs for Embedded SoCs
"... Abstract—Power consumption of on-chip interconnects is a primary concern for many embedded system-on-chip (SoC) applications. In this paper, we compare energy and performance characteristics of asynchronous (clockless) and synchronous networkon-chip implementations, optimized for a number of SoC des ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
(Show Context)
Abstract—Power consumption of on-chip interconnects is a primary concern for many embedded system-on-chip (SoC) applications. In this paper, we compare energy and performance characteristics of asynchronous (clockless) and synchronous networkon-chip implementations, optimized for a number of SoC designs. We adapted the COSI-2.0 framework with ORION 2.0 router and wire models for synchronous network generation. Our own tool, ANetGen, specifies the asynchronous network by determining the topology with simulated-annealing and router locations with force-directed placement. It uses energy and delay models from our 65nm bundled-data router design. SystemC simulations varied traffic burstiness using the self-similar b-model. Results show that the asynchronous network provided lower median and maximum message latency, especially under bursty traffic, and used far less router energy with a slight overhead for the interrouter wires. I.
An Analytical Latency Model for Networks-on-Chip
"... Abstract—We propose an analytical model based on queueing theory for delay analysis in a wormhole-switched network-on-chip (NoC). The proposed model takes as input an application communication graph, a topology graph, a mapping vector, and a routing matrix, and estimates average packet latency and r ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Abstract—We propose an analytical model based on queueing theory for delay analysis in a wormhole-switched network-on-chip (NoC). The proposed model takes as input an application communication graph, a topology graph, a mapping vector, and a routing matrix, and estimates average packet latency and router blocking time. It works for arbitrary network topology with deterministic routing under arbitrary traffic patterns. This model can estimate per-flow average latency accurately and quickly, thus enabling fast design space exploration of various design parameters in NoC designs. Experimental results show that the proposed analytical model can predict the average packet latency more than four orders of magnitude faster than an accurate simulation, while the computation error is less than 10 % in non-saturated networks for different system-on-chip platforms. Index Terms—Modeling and prediction, network-on-chip (NoC), performance analysis and design aids, queueing theory. I.
Statistical Approach to Networks-on-Chip
"... Chip multiprocessors (CMPs) combine increasingly many general-purpose processor cores on a single chip. These cores run several tasks with unpredictable communication needs, resulting in uncertain and often-changing traffic patterns. This unpredictability leads network-on-chip (NoC) designers to pla ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Chip multiprocessors (CMPs) combine increasingly many general-purpose processor cores on a single chip. These cores run several tasks with unpredictable communication needs, resulting in uncertain and often-changing traffic patterns. This unpredictability leads network-on-chip (NoC) designers to plan for the worst-case traffic patterns, and significantly over-provision link capacities. In this paper, we provide NoC designers with an alternative statistical approach. We first present the traffic-load distribution plots (T-Plots), illustrating how much capacity over-provisioning is needed to service 90%, 99%, or 100 % of all traffic patterns. We prove that in the general case, plotting T-Plots is #P-complete, and therefore extremely complex. We then show how to determine the exact mean and variance of the traffic load on any edge, and use these to provide Gaussian-based models for the T-Plots, as well as guaranteed performance bounds. We also explain how to practically approximate T-Plots using random-walk-based methods. Finally, we use T-Plots to reduce the network power consumption by providing an efficient capacity allocation algorithm with predictable performance guarantees.
Design of an Energy-Efficient Asynchronous NoC and Its Optimization Tools for Heterogeneous SoCs
"... Abstract—The energy usage of on-chip interconnects is a concern for many system-on-chips targeting portable batterypowered devices. We have designed and evaluated a network-onchip (NoC) for such an application, including tools to optimize for power and communication latency. Our asynchronous (clockl ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Abstract—The energy usage of on-chip interconnects is a concern for many system-on-chips targeting portable batterypowered devices. We have designed and evaluated a network-onchip (NoC) for such an application, including tools to optimize for power and communication latency. Our asynchronous (clockless) network operates with efficient two-phase bundled-data links and four-phase routers. The topology and router floorplan is determined by our tool, ANetGen, which optimizes the network for energy and latency using simulated annealing and forcedirected placement methods. We compare our solutions against a traditional synchronous NoC as specified by the COSI-2.0 framework and ORION 2.0 router and wire energy models. Traffic is simulated with SystemC functional models, and messages are generated with a “bursty ” self-similar b-model. Results indicate our asynchronous network was more energy-efficient, lower in area, and provided comparable or superior message latency. Index Terms—Application-specific, asynchronous design, embedded, GALS, low-power, network-on-chip, system-on-chip. I.
Statistical Approach to NoC design
- ACM/IEEE NoCS
, 2008
"... Chip multiprocessors (CMPs) combine increasingly many general-purpose processor cores on a single chip. These cores run several tasks with unpredictable communication needs, resulting in uncertain and often-changing traffic patterns. This unpredictability leads network-onchip (NoC) designers to plan ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Chip multiprocessors (CMPs) combine increasingly many general-purpose processor cores on a single chip. These cores run several tasks with unpredictable communication needs, resulting in uncertain and often-changing traffic patterns. This unpredictability leads network-onchip (NoC) designers to plan for the worst-case traffic patterns, and significantly over-provision link capacities. In this paper, we provide NoC designers with an alternative statistical approach. We first present the traffic-load distribution plots (T-Plots), illustrating how much capacity overprovisioning is needed to service 90%, 99%, or 100 % of all traffic patterns. We prove that in the general case, plotting T-Plots is #P-complete, and therefore extremely complex. We then show how to determine the exact mean and variance of the traffic load on any edge, and use these to provide Gaussian-based models for the T-Plots, as well as guaranteed performance bounds. Finally, we use T-Plots to reduce the network power consumption by providing an efficient capacity allocation algorithm with predictable performance guarantees. 1
Mathematical Formalisms for Performance Evaluation of Networks-on-Chip
"... This article reviews four popular mathematical formalisms—queueing theory, network calculus, schedulability analysis,anddataflow analysis—and how they have been applied to the analysis of on-chip communication performance in Systems-on-Chip. The article discusses the basic concepts and results of ea ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This article reviews four popular mathematical formalisms—queueing theory, network calculus, schedulability analysis,anddataflow analysis—and how they have been applied to the analysis of on-chip communication performance in Systems-on-Chip. The article discusses the basic concepts and results of each formalism and provides examples of how they have been used in Networks-on-Chip (NoCs) performance analysis. Also, the respective strengths and weaknesses of each technique and its suitability for a specific purpose are investigated. An open research issue is a unified analytical model for a comprehensive performance evaluation of NoCs. To this end, this article reviews the attempts that have been made to bridge these formalisms.
Energy-efficient design of a . . .
, 2011
"... Portable electronic devices will be limited to available energy of existing battery chemistries for the foreseeable future. However, system-on-chips (SoCs) used in these devices are under a demand to offer more functionality and increased battery life. A difficult problem in SoC design is providing ..."
Abstract
- Add to MetaCart
Portable electronic devices will be limited to available energy of existing battery chemistries for the foreseeable future. However, system-on-chips (SoCs) used in these devices are under a demand to offer more functionality and increased battery life. A difficult problem in SoC design is providing energy-efficient communication between its components while maintaining the required performance. This dissertation intro-duces a novel energy-efficient network-on-chip (NoC) communication architecture. A NoC is used within complex SoCs due it its superior performance, energy usage, mod-ularity, and scalability over traditional bus and point-to-point methods of connecting SoC components. This is the first academic research that combines asynchronous NoC circuits, a focus on energy-efficient design, and a software framework to customize a NoC for a particular SoC. Its key contribution is demonstrating that a simple, asynchronous NoC concept is a good match for low-power devices, and is a fruitful area for additional investigation. The proposed NoC is energy-efficient in several ways: simple switch
Optimizing Heterogeneous NoC Design
"... We develop a novel design methodology that optimizes capacity of each link in a NoC and the numbers of virtual channels (VCs) at each router port for a given set of flows and latency constraints. In order to lower computation costs associated with a simulated annealing search in the design space, we ..."
Abstract
- Add to MetaCart
(Show Context)
We develop a novel design methodology that optimizes capacity of each link in a NoC and the numbers of virtual channels (VCs) at each router port for a given set of flows and latency constraints. In order to lower computation costs associated with a simulated annealing search in the design space, we utilize an approximate analysis of the NoC performance thus replacing the need for a NoC simulation. Therefore, computation time and resources are dramatically reduced. The area saving achieved by our heterogeneous NoC design is demonstrated by several use-cases. The heterogeneous NoC design process is applied to SoCs running multimedia benchmarks, and to Chip-Multi-Processor (CMP) running PARSEC benchmark programs. Categories and Subject Descriptors