DMCA
doi:10.1155/2008/562326 Research Article DART: A Functional-Level Reconfigurable Architecture for High Energy Efficiency (2007)
Citations
105 | Rapid - reconfigurable pipelined datapath,”
- Ebeling, Cronquist, et al.
- 1996
(Show Context)
Citation Context ...econfiguration overhead. This latter is mainly obtained using reconfigurable operators instead of LUTbased configurable logic blocks. Precursors of this class of architectureswereKressArray[7], RaPid =-=[8]-=-, and RaW machines [9] which were specifically designed for streaming algorithms. These works have led to numerous academic and commercial architectures. The first industrial product was the Chameleon... |
95 |
Spreading codes for direct sequence CDMA and wide band CDMA cellular networks,
- Dinan, Jabbari
- 1998
(Show Context)
Citation Context ...l users by scrambling communications [27]. This is done by multiplying the information by private codes dedicated to users. Since these codes have good autocorrelation and intercorrelation properties =-=[28]-=-, there is virtually no interference between users, and consequently they may be multiplexed on the same carrier frequency. Within a WCDMA receiver, real and imaginary parts of data received on the an... |
73 | Operating systems for reconfigurable embedded platforms: online scheduling of real-time tasks
- Steiger, Walder, et al.
- 2004
(Show Context)
Citation Context ...(depending on application) can allow for some system-level optimization of the energy consumption. The allocation of tasks can permit the putting of some part of architecture into idle or sleep modes =-=[3]-=- or the use of other mechanisms like clock gating to save energy [4]. 2.2. Reducing the configuration distribution cost Control and configuration distribution has a significant impact on the energy co... |
71 |
et al. SUIF: An infrastructure for research on parallelizing and optimizing compilers
- Wilson, French, et al.
- 1994
(Show Context)
Citation Context ... description of the application into configuration instructions—thanks to compilation and architectural synthesis. 6.1. Front end The front end of this development flow is based on the SUIF framework =-=[23]-=- developed at Stanford. It aims to generate an internal representation of the program from which other modules can operate. Moreover, this module has to extract the loop kernels inside the C code and ... |
61 | An asynchronous NOC architecture providing low latency service and its multi-level design framework»,
- Beigne, F, et al.
- 2005
(Show Context)
Citation Context ...project. The fresh architecture is an NoC-based system-onchip for application prototyping designed by the CEA-LETI [30]. This architecture contains 23 IPs connected to a 20node network (called Faust) =-=[32]-=- for a total complexity of 8-million gates (including RAMs). The circuit has been realized using 0.13 μm CMOS technology from STMicroelectronics. IPs from different partners were implemented: (i) an A... |
60 |
et al. Baring It All to Software: Raw Machines
- Waingold
- 1997
(Show Context)
Citation Context ...d. This latter is mainly obtained using reconfigurable operators instead of LUTbased configurable logic blocks. Precursors of this class of architectureswereKressArray[7], RaPid [8], and RaW machines =-=[9]-=- which were specifically designed for streaming algorithms. These works have led to numerous academic and commercial architectures. The first industrial product was the Chameleon Systems CS2000 family... |
50 |
Interconnect Architecture Exploration for Low-Energy Reconfigurable Single-Chip DSP
- Zhang, Wan, et al.
- 1999
(Show Context)
Citation Context ... of reconfiguration targets. Especially, the interconnection network must support a good tradeoff between flexibility and configuration data volume. Hierarchical networks are perfect for this purpose =-=[5]-=-. If there are some redundancies in the datapath structure, it is possible to reduce the configuration data volume, by distributing simultaneously the same configuration data to several targets. This ... |
35 | A VLIW Processor With Reconfigurable Instruction Set for Embedded Applications
- Lodi, Toma, et al.
- 2003
(Show Context)
Citation Context ...a 16-deep instruction memory in each PE. This approach permits the reconfiguration of the processor in one cycle, but at the price of a very high cost in configuration memory. The XiRisc architecture =-=[13]-=- is a reconfigurable processor based on a VLIW RISC core with a five-stage pipeline, enhanced with an additional run-time configurable datapath, called pipelined configurable gate array (PiCoGA). PiCo... |
31 | Loop alignment for memory accesses optimization
- Fraboulet, Huard, et al.
- 1999
(Show Context)
Citation Context ... S = A + B Mem2 Rec 1cycles Configuration 2 Mem1 − S = C − D Figure 7: Software reconfiguration example. Mem4 developed to decrease the amount of data memory accesses and hence the energy consumption =-=[24, 25]-=-. 6.2. cDART compiler In order to generate the software reconfiguration instructions, we have integrated a compiler, cDART, into our development flow. This tool was generated—thanks to the CALIFE tool... |
21 | Re-configurable computing in wireless
- Salefski, Caglar
- 2001
(Show Context)
Citation Context ...which were specifically designed for streaming algorithms. These works have led to numerous academic and commercial architectures. The first industrial product was the Chameleon Systems CS2000 family =-=[10]-=-, designed for application in telecommunication facilities. This architecture comprises a GPP and a reconfigurable processing fabric. The fabric is built around identical processing tiles including re... |
20 |
et al. PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators
- Schreiber
(Show Context)
Citation Context ...lready been successfully used in the PICO (program in, chip out) project developed at HP labs to implement regular codes into a systolic structure, and to compile irregular ones for an VLIW processor =-=[20]-=-. Other projects such as Pleiades [21] orGARP[22] are also using this approach. The proposed development flow is depicted in Figure 8. It allows the user to describe its applications in C. These high-... |
16 | A 1-V heterogeneous reconfigurable DSP IC for wireless baseband digital signal processing
- Zhang, Prabhu, et al.
- 2000
(Show Context)
Citation Context ...ations for each RLC. The reconfiguration overhead can be optimized by exploiting partial run-time reconfiguration, which gives the opportunity for reprogramming only a portion of the PiCoGA. Pleiades =-=[14]-=- was the first reconfigurable platform taking into account the energy efficiency as a design constraint. It is a heterogeneous coarse-grained platform built around satellite processors which communica... |
16 | Loop Fusion for Memory Space Optimization
- FRABOULET, GODARY, et al.
- 2001
(Show Context)
Citation Context ... S = A + B Mem2 Rec 1cycles Configuration 2 Mem1 − S = C − D Figure 7: Software reconfiguration example. Mem4 developed to decrease the amount of data memory accesses and hence the energy consumption =-=[24, 25]-=-. 6.2. cDART compiler In order to generate the software reconfiguration instructions, we have integrated a compiler, cDART, into our development flow. This tool was generated—thanks to the CALIFE tool... |
15 | Using the kressarray for reconfigurable computing
- Hartenstein, Herz, et al.
- 1998
(Show Context)
Citation Context ...ucing the reconfiguration overhead. This latter is mainly obtained using reconfigurable operators instead of LUTbased configurable logic blocks. Precursors of this class of architectureswereKressArray=-=[7]-=-, RaPid [8], and RaW machines [9] which were specifically designed for streaming algorithms. These works have led to numerous academic and commercial architectures. The first industrial product was th... |
13 | Improving software performance with configurable logic,” Design Automation for Embedded Systems
- Villarreal, Suresh, et al.
- 2002
(Show Context)
Citation Context ... in parallel with the same configuration bits were implemented using wildcarding bits to augment the cell address/position to select several cells at the same time for reconfiguration. The 80/20 rule =-=[6]-=- asserts that 80% of the execution time are consumed by 20% of the program code, and only 20% are consumed by the remaining source code. The timeconsuming portions of the code are described as beingS... |
12 | Augmenting a microprocessor with reconfigurable hardware
- Hauser
- 2000
(Show Context)
Citation Context ...m in, chip out) project developed at HP labs to implement regular codes into a systolic structure, and to compile irregular ones for an VLIW processor [20]. Other projects such as Pleiades [21] orGARP=-=[22]-=- are also using this approach. The proposed development flow is depicted in Figure 8. It allows the user to describe its applications in C. These high-level descriptions are first translated into cont... |
11 |
PACT XPP—A selfreconfigurable data processing architecture.
- BAUMGARTE, MAY, et al.
- 2001
(Show Context)
Citation Context ... topology of interconnection network, this architecture is mainly designed to provide high speeds in the telecommunication domain regardless of other constraints. The extreme processor platform (XPP) =-=[11]-=- fromPACT is based on a mesh array of coarse-grained processing array elements (PAEs). PAEs are specialized for algorithms of a particular domain on a specific XPP processor core. The XPP processor is... |
11 |
FAUST : On-chip distributed architecture for a 4G baseband modem SoC
- Durand, Bernard, et al.
- 2005
(Show Context)
Citation Context ...r multiple users, it is necessary to implement one DSP per user. Consuming 600 mW, its energy efficiency is 0.3 MOPS/mW. The same design has been implemented and optimized into the TMS320C64 VLIW DSP =-=[30]-=-. This processor is a high-performance DSP from Texas Instruments that can run at a clock frequency up to 720 MHz and consumes 1.36 W. It includes 8 independent operators and can reach a peak performa... |
10 |
DART: a dynamically reconfigurable architecture dealing with nextgeneration telecommunication constraints, Int. Reconfigurable Architecture Workshop,
- David, Chillet, et al.
- 2002
(Show Context)
Citation Context ... amount of instruction memory reading and decoding leads to significant energy savings. The association of the principles presented in Section 3 leads to the first definition of the DART architecture =-=[16]-=-. Two visions of the system level of this architecture can be explored. The first one consists in a set of autonomous clusters which have access to a shared memory space, managed by a task controller.... |
9 | Low Energy Field-Programmable Gate Array.
- George
- 2000
(Show Context)
Citation Context ...cation domain to be very efficient. The algorithms in the domain are carefully profiled in order to find the kernels that will eventually be implemented as a satellite processor. Finally, the work in =-=[15]-=- proposes some architectural improvements to define a low-energy FPGA. However, for complex applications, this architecture is limited in terms of attainable performance and development time.4 EURASI... |
8 | Analysis and Design of Low Power Digital Multipliers”
- Meier
- 1999
(Show Context)
Citation Context ...the most common data format (16 bits) but which support SWP processing for 8-bit data. The first type of FU implements a multiplier/adder. Designing a low-power multiplier is difficult but well known =-=[17]-=-. One of the most efficient architectures is the BoothWallace multiplier for word lengths of at least 16 bits. The designed FU includes the saturation of signed results in the same cycle as the operat... |
8 | V.: A Flexible Code Generation Framework for the Design of Application Specific
- Charot, Messe
- 1999
(Show Context)
Citation Context ...grated a compiler, cDART, into our development flow. This tool was generated—thanks to the CALIFE tool suite which is a retargetable compiler framework based on the ARMOR language, developed at INRIA =-=[26]-=-. DART was first described in the ARMOR language.This implementation description arises from the inherent needs of the three main compiling activities which are the code selection, the allocation, and... |
8 |
Wideband CDMA for Third Generation
- Ojanpera, Prasad
- 1998
(Show Context)
Citation Context ...ommunication systems. Its principle is to adapt signals to the communication support by spreading its spectrum and sharing the communication support between several users by scrambling communications =-=[27]-=-. This is done by multiplying the information by private codes dedicated to users. Since these codes have good autocorrelation and intercorrelation properties [28], there is virtually no interference ... |
6 |
Sub-word parallelism in digital signal processing
- Fridman
- 2000
(Show Context)
Citation Context ...s. Alternatively, functional units can be optimized for only a subset of these data sizes. Optimizing functional units for 8- and 16-bit data sizes allows to design subword processing (SWP) operators =-=[1]-=-. Thanks to these operators, the computational power of the architecture can be increased during processing with data-level parallelism, without reducing overall performances at other times. Operation... |
4 |
A multithreaded architecture approach to parallel DSPs for highperformance image processing applications
- Wittenburg, Pirsch, et al.
- 1999
(Show Context)
Citation Context ...lication running on the architecture. Consequently, to support TLP while guaranteeing a good computational density, the architecture must be able to alter the organization of its processing resources =-=[2]-=-. Finally, application parallelism can be considered as an extension of thread parallelism. The goal is to identify the applications that may run concurrently on the architecture. Contrary to threads,... |
2 |
FPGA clock management for low power applications
- Brynjolfson, Zilic
- 2000
(Show Context)
Citation Context ...ion of the energy consumption. The allocation of tasks can permit the putting of some part of architecture into idle or sleep modes [3] or the use of other mechanisms like clock gating to save energy =-=[4]-=-. 2.2. Reducing the configuration distribution cost Control and configuration distribution has a significant impact on the energy consumption. Therefore, the configuration data volume as well as the c... |
2 |
A VLIW processor with reconfigurable instruction set for embedded applications
- Guerrieri
- 2003
(Show Context)
Citation Context ...a 16-deep instruction memory in each PE. This approach permits the reconfiguration of the processor in one cycle, but at the price of a very high cost in configuration memory. The XiRisc architecture =-=[13]-=- is a reconfigurable processor based on a VLIW RISC core with a five-stage pipeline, enhanced with an additional run-time configurable datapath, called pipelined configurable gate array (PiCoGA). PiCo... |
2 |
A multithreaded architecture approach to parallel DSPs for highperformance image processing applications
- Wittenburg, Pirsch, et al.
- 1999
(Show Context)
Citation Context ...lication running on the architecture. Consequently, to support TLP while guaranteeing a good computational density, the architecture must be able to alter the organization of its processing resources =-=[2]-=-. Finally, application parallelism can be considered as an extension of thread parallelism. The goal is to identify the applications that may run concurrently on the architecture. Contrary to threads,... |
1 |
Stream applications on the dynamically reconfigurable processor
- Suzuki, Hasegawa, et al.
- 2004
(Show Context)
Citation Context ...icast communications. PAEs have input and output registers, and the data streams need to be highly pipelined to use the XPP resources efficiently. The NEC dynamically reconfigurable processor (DRP-1) =-=[12]-=- is an array of tiles constituted by an 8 × 8matrixofprocessing elements (PEs). Each PE has an 8-bit ALU, an 8-bit data management unit, and some registers. These units are connected by programmable w... |
1 | A compilation framework for a dynamically reconfigurable architecture
- David, Chillet, et al.
(Show Context)
Citation Context ...n and the optimization of C code, a retargetable compiler to handle compilation of the software configurations, and high-level synthesis techniques to generate the hardware reconfiguration of the RDP =-=[19]-=-. As in most of development methodologies for reconfigurable hardware, the key issue is to identify the different kinds of processing. Based on the two reconfiguration modes of the DART architecture, ... |
1 |
Wireless base-station signal processing with a platform FPGA
- Nicklin
- 2002
(Show Context)
Citation Context ...ICO (program in, chip out) project developed at HP labs to implement regular codes into a systolic structure, and to compile irregular ones for an VLIW processor [20]. Other projects such as Pleiades =-=[21]-=- orGARP[22] are also using this approach. The proposed development flow is depicted in Figure 8. It allows the user to describe its applications in C. These high-level descriptions are first translate... |
1 |
TMS320C64x technical overview,” Texas instruments
- instruments
- 2000
(Show Context)
Citation Context ...ints of the application, the FIR filter cannot be implemented on a DSP processor. The TMS320C62 VLIW DSP running at 100 MHz can support a 6-finger rake receiver for a bandwidth of 16 KBps per channel =-=[29]-=-. This solution supports UMTS requirements, but for multiple users, it is necessary to implement one DSP per user. Consuming 600 mW, its energy efficiency is 0.3 MOPS/mW. The same design has been impl... |
1 |
XPP white paper : a technical perspective,” Release 2.1
- PACT
- 2002
(Show Context)
Citation Context ....13 μm technology at 1.5 V, it can run at 200 MHz. The use of 48 PAEs processing 8-bit data in SWP mode consumes 600 mW and enables the implementation of 400 rake fingers at the chip rate of 3.84 MHz =-=[31]-=-. While achieving twice the pick performance of DART, its energy efficiency is 50% less and achieves 20 MOPS/mW. 8. VLSI INTEGRATION AND FIGURES TheVLSIintegrationofDARThasbeenmadeinacollaborative pro... |