Results 1 -
5 of
5
An adaptive system-on-chip for network applications
- In IPDPS 2006
, 2006
"... This paper presents the hardware architecture of Dy-naCORE, a dynamically reconfigurable system-on-chip for network applications. DynaCORE is an application specific coprocessor for offloading computationally in-tensive tasks from a network processor. The system-on-chip architecture is based on an a ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
(Show Context)
This paper presents the hardware architecture of Dy-naCORE, a dynamically reconfigurable system-on-chip for network applications. DynaCORE is an application specific coprocessor for offloading computationally in-tensive tasks from a network processor. The system-on-chip architecture is based on an adaptable network-on-chip which allows the dynamic replacement of hardware modules as well as the adaptation of the on-chip com-munication structure. The coprocessor leverages the active partial reconfiguration feature of modern FPGAs in order to adapt to shifting demand patterns. An em-bedded general-purpose processor core within the copro-cessor runs software which manages the configurations of the device. With reference to a prototypical imple-mentation targeting a Xilinx Virtex-II Pro FPGA, this paper focuses on on-chip communication issues. Top-ics include the integration of PowerPC processor cores into the configurable logic as well as the mode of oper-ation of the network-on-chip. 1.
Parallel Merge Sort on a Binary Tree On-Chip Network
"... Abstract — Recently, advances in the microelectronics technology enable us to integrate increasingly more processor cores in a single chip. As the chip density continues to grow, the communication latency compared to the computation time is becoming a dominant factor in modern single chip multiproce ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract — Recently, advances in the microelectronics technology enable us to integrate increasingly more processor cores in a single chip. As the chip density continues to grow, the communication latency compared to the computation time is becoming a dominant factor in modern single chip multiprocessors systems. Consequently, networks on a chip (NoC) have been introduced as an alternative to shared busses in order to provide higher scalability, higher abstraction levels, and increased modularity. However, many challenges remain in the design of the NoC, e.g., maintain a certain quality of service (QoS) with limited bandwidth, incorporate deadlock-free routing, and implement congestion-free flow control. In order to meet these challenges, network topologies play an important role in multiprocessor systems. In this paper, we present a case study encompassing the implementation of an NoC based on the binary tree in a field-programmable gate array (FPGA) and the parallel merge sort as an application. A hardware design for a packet switched source router with a worm-hole flow control and a round-robin arbitration is described. The implemented platform provides a homogeneous, deadlock-free, congestion-free, cost-efficient communication mechanism and application-specific synchronization. In addition, the distributed shared memory organization allows for easier application programming. The experiments indicate that a single 32-bit sorting-specific processing element consumes less than 1 % of the chip area. In addition, each binary tree router requires less than 3% of the area of the FPGA, while providing up to 2.9Gbps at 180 MHz.
Partially Reconfigurable Point-to-Point Interconnects in Virtex-II Pro FPGAs
"... Abstract. Conventional rigid router-based networks on chip incur certain overheads due to huge occupied logic resources and topology embedding, i.e., the mapping of a logical network topology to a physical one. In this paper, we present an implementation of partially reconfigurable point-to-point (ρ ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Conventional rigid router-based networks on chip incur certain overheads due to huge occupied logic resources and topology embedding, i.e., the mapping of a logical network topology to a physical one. In this paper, we present an implementation of partially reconfigurable point-to-point (ρ-P2P) interconnects in FPGA to overcome the mentioned overheads. In the presented implementation, arbitrary topologies are realized by changing the ρ-P2P interconnects. In our experiments, we considered parallel merge sort and Cannon’s matrix multiplication to generate network traffic to evaluate our implementation. Furthermore, we have implemented a 2D-mesh packet switched network to serve as a reference to compare our results with. Our experiment shows that the utilization of on-demand ρ-P2P interconnects performs 2 × better and occupies 70 % less area compared to the reference mesh network. Furthermore, the reconfiguration latency is significantly reduced using the Xilinx module-based partial reconfiguration technique. Finally, our experiments suggest that higher performance gains can be achieved as the problem size increases. 1