Results 1 - 10
of
12
Low contention mapping of real-time tasks onto tilepro 64 core processors
- 2009 15th IEEE Real-Time and Embedded Technology and Applications Symposium
"... Predictability of task execution is paramount for real-time systems so that upper bounds of execution times can be determined via static timing analysis. Static timing analysis on network-on-chip (NoC) processors may result in unsafe underestimations when the underlying communication paths are not c ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Predictability of task execution is paramount for real-time systems so that upper bounds of execution times can be determined via static timing analysis. Static timing analysis on network-on-chip (NoC) processors may result in unsafe underestimations when the underlying communication paths are not considered. This stems from contention on the underlying network when data from multiple sources share parts of a routing path in the NoC. Contention analysis must be performed to provide safe and reliable bounds. In addition, the overhead incurred by contention due to interprocess communication (IPC) can be reduced by mapping tasks to cores in such a way that contention is minimized. This paper makes several contributions to increase predictability of real-time tasks on NoC architectures. First, we contribute a constraint solver that exhaustively maps realtime tasks onto cores to minimize contention and improve predictability. Second, we develop a novel TDMA-like approach to map communication traces into time frames to ensure separation of analysis for temporally disjoint communication. Third, we contribute a novel multi-heuristic approximation, HSolver, for rapid discovery of low contention solutions. HSolver reduces contention by up to 70 % when compared with naïve and constrained exhaustive solutions. We evaluate our experiments using a micro-benchmark of task system IPC on the TilePro64, a real, physical NoC processor with 64 cores. To the best of our knowledge, this is the first work to consider IPC for worst-case time frames to simplify analysis and to measure the impact on actual hardware for NoC-based real-time multicore systems. 1.
Application-to-Core Mapping Policies to Reduce Interference in On-Chip Networks
, 2011
"... As the industry moves toward many-core processors, Network-on-Chips (NoCs) will likely become the communication backbone of future microprocessor designs. The NoC is a critical shared resource and its effective utilization is essential for improving overall system performance and fairness. In this p ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
(Show Context)
As the industry moves toward many-core processors, Network-on-Chips (NoCs) will likely become the communication backbone of future microprocessor designs. The NoC is a critical shared resource and its effective utilization is essential for improving overall system performance and fairness. In this paper, we propose application-to-core mapping policies to reduce the contention in network-on-chip and memory controller resources and hence improve overall system performance. First, we introduce the notion of clusters: cores are grouped into clusters, and a memory controller is assigned to each cluster. The memory controller assigned for a cluster is primarily responsible for servicing the data requested by the applications assigned to that cluster. We propose and evaluate page allocation and page replacement policies that ensure that network traffic of a core is restricted to its cluster with high probability. Second, we develop algorithms that distribute applications between clusters. Our inter-cluster mapping algorithm separates interference-sensitive applications from aggressive ones by mapping them to different clusters to improve system performance, while maintaining a reasonable network load balance among different clusters. Contrary to the conventional wisdom of balancing network/memory load across clusters, we observe that it is also important to ensure that applications that are more sensitive to network latency experience little interference from applications that are networkintensive. Finally, we develop algorithms to map applications to cores within a cluster. The key idea of intra-cluster mapping is to map those applications that benefit more from being close to the memory controller, closer to the controller. We evaluate the proposed application-to-core mapping policies on a 60-core CMP with an 8x8 mesh NoC using a suite of 35 diverse applications. Averaged over 128 randomly generated multiprogrammed workloads, the final proposed policy improves system throughput by 16.7 % in terms of weighted speedup over a baseline manycore processor, while also reducing system unfairness by 22.4 % and interconnect power consumption by 52.3%.
Clustered Caching for Improving Performance and Energy requirements in NoC based Multiprocessors
"... allowing to run larger applications on chip multiprocessors. Parallelism is achieved by running different threads of applications on separate processors. This leads to coherence issues of shared data. As wire delays are dominating in current SoCs, added communication over the interconnect also adds ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
allowing to run larger applications on chip multiprocessors. Parallelism is achieved by running different threads of applications on separate processors. This leads to coherence issues of shared data. As wire delays are dominating in current SoCs, added communication over the interconnect also adds to latency and power requirements. In this paper we propose to form small size clusters of cores which will share the same high-level cache instead of one global, large-size banked cache. Experimental evaluation shows that clustering gives improvement in performance and power requirements. Research on application mapping on NoC has shown that assigning nearby cores to an application improves performance. We performed experiments by localising an application within a cluster and obtained improvements in performance as well as power.
CoNA: Dynamic Application Mapping for Congestion Reduction in Many-Core Systems
"... Abstract—Increasing the number of processors in a single chip toward network-based many-core systems requires a run-time task allocation algorithm. We propose an efficient mapping algorithm that assigns communicating tasks of incoming applications onto resources of a many-core system utilizing Netwo ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Increasing the number of processors in a single chip toward network-based many-core systems requires a run-time task allocation algorithm. We propose an efficient mapping algorithm that assigns communicating tasks of incoming applications onto resources of a many-core system utilizing Network-on-Chip paradigm. In our contiguous neighborhood allocation (CoNA) algorithm, we target at the reduction of both internal and external congestion due to detrimental impact of congestion on the network performance. We approach the goal by keeping the mapped region contiguous and placing the communicating tasks in a close neighborhood. A completely synthesizable simulation environment where none of the system objects are assumed to be ideal is provided. Experiments show at least 40 % gain in different mapping cost functions, as well as 16 % reduction in average network latency compared to existing algorithms. Keywords-Network-on-Chip; MPSoC; run-time; dynamic; task mapping; processor allocation; congestion; contiguous; latency; performance I.
ON-CHIP NETWORK-ENABLED MANY-CORE ARCHITECTURES FOR COMPUTATIONAL BIOLOGY APPLICATIONS
, 2013
"... ..."
Online System Reconfiguration Strategy for Real-Time Embedded Applications on NoC Manycore Platforms
"... ..."
(Show Context)
ABSTRACT ZIMMER, CHRISTOPHER J. Bringing Efficiency and Predictability to Massive Multi-core
"... Massive multi-core network-on-chip (NoC) processors represent the next stage in both embedded and general purpose computing. These novel architecture designs with abundant processing resources and increased scalability address the frequency limits of modern processors, power/leakage constraints, and ..."
Abstract
- Add to MetaCart
(Show Context)
Massive multi-core network-on-chip (NoC) processors represent the next stage in both embedded and general purpose computing. These novel architecture designs with abundant processing resources and increased scalability address the frequency limits of modern processors, power/leakage constraints, and the scalability limits of system bus interconnects. NoC architectures are particularly interesting in both the real-time embedded and high-performance computing domains. Abundant processing resources have the potential to simplify scheduling and represent a shift away from single core utilization concerns e.g., within the model of the “dark silicon ” abstraction that promotes a 1-to-1 task-to-core mapping with frequent core activations/deactivations. Additionally, due to silicon constraints, massive multi-core processors often contain simplified processor pipelines that provide an increase in predictability analysis beneficial for real-time systems. Also, simplified processor pipelines coupled with high-performance interconnects often result in low power utilization that is beneficial in high-performance systems. While suitable in many ways, these architectures are not without their own challenges. Reliance on shared memory and the strain that massive multi-core processors can put on memory controllers represent a significant challenge to predictability and performance. Resilience is
HEFT: A Hybrid System-Level Framework for Enabling Energy-Efficient Fault-Tolerance in NoC based MPSoCs
"... chip (NoC) fabrics are increasingly becoming susceptible to transient faults. Fault-tolerance mechanisms that are typically employed in NoCs usually entail significant energy overheads that are expected to become prohibitive as fault rates increase in future CMOS technologies. We propose a system-le ..."
Abstract
- Add to MetaCart
(Show Context)
chip (NoC) fabrics are increasingly becoming susceptible to transient faults. Fault-tolerance mechanisms that are typically employed in NoCs usually entail significant energy overheads that are expected to become prohibitive as fault rates increase in future CMOS technologies. We propose a system-level framework called HEFT to trade-off energy consumption and fault-tolerance in the NoC fabric. Our hybrid framework tackles the challenge of enabling energy-efficient resilience in NoCs in two phases: at design time and at runtime. At design time, we implement an algorithm to guide the robust mapping of cores on to a die while satisfying application bandwidth and latency constraints. At runtime we devise a prediction algorithm to monitor and detect changes in fault susceptibility of NoC components, to intelligently balance energy consumption and reliability. Experimental results show that HEFT improves energy/reliability ratio of synthesized solutions by 8-20%, while meeting application performance goals, when compared to multiple prior works on reliable system-level NoC design.
Mixed-Criticality Run-Time Task Mapping for NoC-Based Many-Core Systems
"... Abstract—Contiguous processor allocation improves both the network and the application performance, by decreasing the congestion probability among communication of different applications. Consequently, the average, standard deviation and worst-case latency of the network is decreased signifi-cantly. ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Contiguous processor allocation improves both the network and the application performance, by decreasing the congestion probability among communication of different applications. Consequently, the average, standard deviation and worst-case latency of the network is decreased signifi-cantly. This makes the contiguous allocation a good solution for time-critical applications with bounded deadlines. On the other hand, non-contiguous allocation will increase the system throughput significantly. Isolated nodes are utilized and more applications can finish their job in a time unit. However, this will lead to poor network metrics, unsuitable for real-time applications. In this work, we combine these two approaches in order to manage workloads with mixed-critical characteristics. Real-time applications are mapped contiguously, while non-critical applications are allowed to get dispersed over the available system nodes. Results show over 50 % improvement in worst-case latency and 100 times improvement in deadline misses. Keywords—Processor allocation; Application Mapping; Dy-namic Many-Core Systems; Contiguous Task Mapping;