Results 1  10
of
62
Reconfigurable Computing: A Survey of Systems and Software
, 2000
"... Due to its potential to greatly accelerate a wide variety of applications, reconfigurable computing has become a subject of a great deal of research. Its key feature is the ability to perform computations in hardware to increase performance, while retaining much of the flexibility of a software solu ..."
Abstract

Cited by 259 (5 self)
 Add to MetaCart
Due to its potential to greatly accelerate a wide variety of applications, reconfigurable computing has become a subject of a great deal of research. Its key feature is the ability to perform computations in hardware to increase performance, while retaining much of the flexibility of a software solution. In this survey we explore the hardware aspects of reconfigurable computing machines, from single chip architectures to multichip systems, including internal structures and external coupling. We also focus on the software that targets these machines, such as compilation tools that map highlevel algorithms directly to the reconfigurable substrate. Finally, we consider the issues involved in runtime reconfigurable systems, which reuse the configurable hardware during program execution.
DAGaware AIG rewriting: A fresh look at combinational logic synthesis
 In DAC ’06: Proceedings of the 43rd annual conference on Design automation
, 2006
"... This paper presents a technique for preprocessing combinational logic before technology mapping. The technique is based on the representation of combinational logic using AndInverter Graphs (AIGs), the networks of twoinput ANDs and inverters. The optimization works by alternating DAGaware AIG rew ..."
Abstract

Cited by 104 (33 self)
 Add to MetaCart
(Show Context)
This paper presents a technique for preprocessing combinational logic before technology mapping. The technique is based on the representation of combinational logic using AndInverter Graphs (AIGs), the networks of twoinput ANDs and inverters. The optimization works by alternating DAGaware AIG rewriting, which reduces area by sharing common logic without increasing delay, and algebraic AIG balancing, which minimizes delay without increasing area. The new technologyindependent flow is implemented in a publicdomain tool ABC. Experiments on large industrial benchmarks show that the proposed methodology scales to very large designs and is several orders of magnitude faster than SIS and MVSIS while offering comparable or better quality when measured by the quality of the network after mapping. 1
Applicationspecific instruction generation for configurable processor architectures
 in Proc. ACM International Symposium on FieldProgrammable Gate Arrays
, 2004
"... Designing an applicationspecific embedded system in nanometer technologies has become more difficult than ever due to the rapid increase in design complexity and manufacturing cost. Efficiency and flexibility must be carefully balanced to meet different application requirements. The recently emerge ..."
Abstract

Cited by 67 (7 self)
 Add to MetaCart
(Show Context)
Designing an applicationspecific embedded system in nanometer technologies has become more difficult than ever due to the rapid increase in design complexity and manufacturing cost. Efficiency and flexibility must be carefully balanced to meet different application requirements. The recently emerged configurable and extensible processor architectures offer a favorable tradeoff between efficiency and flexibility, and a promising way to minimize certain important metrics (e.g., execution time, code size, etc.) of the embedded processors. This paper addresses the problem of generating the applicationspecific instructions to improve the execution speed for configurable processors. A set of algorithms, including pattern generation, pattern selection, and application mapping, are proposed to efficiently utilize the instruction set extensibility of the target configurable processor. Applications of our approach to several reallife benchmarks on the Altera Nios processor show encouraging performance speedup (2.75X on average and up to 3.73X in some cases).
DAOmap: A depthoptimal area optimization mapping algorithm for FPGA designs
 Proc. ICCAD ’04
, 2004
"... In thispaper we study the technology mappingproblem for FPGA architectures to minimize chip area, or the total number of lookup tables (LlJTs) ofthe mapped design, under the chip performance constraint. This is a wellstudied topic and a very diflcult task (NPhard) The contributions of this paper a ..."
Abstract

Cited by 61 (12 self)
 Add to MetaCart
(Show Context)
In thispaper we study the technology mappingproblem for FPGA architectures to minimize chip area, or the total number of lookup tables (LlJTs) ofthe mapped design, under the chip performance constraint. This is a wellstudied topic and a very diflcult task (NPhard) The contributions of this paper are as Jollaws: (i) we consider the potential node duplications during the CUI enumeration/generotion procedure so the mapping costs encoded in the cuts drive the areaoptimization objective more effectively: (iij affer the timing constraint is determined, we will relax the noncritical paths by searching the solution space considering both local andglobal optimality information to minimize mapping area; (iiij an iterative cut selection procedure is carried out that further explores and perturbs the solution space to improve solution quality. We guarantee optimal mapping depth under the unit delay model. Experimental results show that our mapping algorithm, named DAOmap, produces significant quality and runtime improvements. Compared to the stoteo/theart depthoptimal, area minimization mapping algorithm CutMap [21], DAOmap is 16.02 % better on area and runs 24.2X faster on average when both algorithms are mapping to FPGAs using LWs oJfive inputs. LUTs of other inputs are also used for comparisons.
Improvements to Technology Mapping for LUTbased FPGAs
 IEEE TCAD
, 2007
"... The paper presents several improvements to stateoftheart in FPGA technology mapping exemplified by a recent advanced technology mapper DAOmap [Chen and Cong, ICCAD `04]. Improved cut enumeration computes all Kfeasible cuts without pruning for up to 7 inputs for the largest MCNC benchmarks. A new t ..."
Abstract

Cited by 35 (12 self)
 Add to MetaCart
(Show Context)
The paper presents several improvements to stateoftheart in FPGA technology mapping exemplified by a recent advanced technology mapper DAOmap [Chen and Cong, ICCAD `04]. Improved cut enumeration computes all Kfeasible cuts without pruning for up to 7 inputs for the largest MCNC benchmarks. A new technique for onthefly cut dropping reduces by orders of magnitude memory needed to represent cuts for large designs. Improved area recovery leads to mappings with area on average 7% smaller than DAOmap, while preserving delay optimality when starting from the same optimized netlists. Applying mapping with structural choices derived by a synthesis flow on average reduces delay by 7 % and area by 14%, compared to DAOmap.
Reducing Structural Bias in Technology Mapping
 PROC. IWLS ‘05
, 2005
"... Technology mapping based on DAGcovering suffers from the problem of structural bias: the structure of the mapped netlist depends strongly on the subject graph. In this paper we present a new mapper aimed at mitigating structural bias. It is based on a simplified cutbased boolean matching algorithm ..."
Abstract

Cited by 29 (11 self)
 Add to MetaCart
Technology mapping based on DAGcovering suffers from the problem of structural bias: the structure of the mapped netlist depends strongly on the subject graph. In this paper we present a new mapper aimed at mitigating structural bias. It is based on a simplified cutbased boolean matching algorithm, and using the speed afforded by this simplification we explore two ideas to reduce structural bias. The first, called lossless synthesis, leverages recent advances in structurebased combinational equivalence checking to combine the different networks seen during technology independent synthesis into a single network with choices in a scalable manner. We show how cutbased mapping extends naturally to handle such networks with choices. The second idea is to combine several library gates into a single gate (called a supergate) in order to make the matching process less local. We show how supergates help address the structural bias problem, and how they fit naturally into the cutbased boolean matching scheme. An implementation based on these ideas significantly outperforms stateoftheart mappers in terms of delay, area and runtime on academic and industrial benchmarks.
Heuristics for area minimization in LUTbased FPGA technology mapping
 Proc. IWLS ’04
, 2004
"... In this paper, an iterative technology mapping tool called IMap is presented. It supports depthoriented (area is a secondary objective), areaoriented (depth is a secondary objective), and duplicationfree mapping modes. The edge delay model, as opposed to the more common unit delay model, is used ..."
Abstract

Cited by 25 (5 self)
 Add to MetaCart
(Show Context)
In this paper, an iterative technology mapping tool called IMap is presented. It supports depthoriented (area is a secondary objective), areaoriented (depth is a secondary objective), and duplicationfree mapping modes. The edge delay model, as opposed to the more common unit delay model, is used throughout. Two new heuristics are used to obtain area reductions over previously published methods. The first heuristic predicts the effects of various mapping decisions on the area of the final solution and the second heuristic bounds the depth of the mapping solution at each node. In depthoriented mode, when targeting 5LUTs, IMap obtains depth optimal solutions that are 13.3 % and 12.5 % smaller than those produced by CutMAP and FlowMAPr0, respectively. Targetting the same LUT size in areaoriented mode, IMap obtains solutions that are 13.7 % smaller than those produced by duplicationfree mapping. 1.
Optimality Study of Logic Synthesis for LUTBased FPGAs
"... Abstract—Fieldprogrammable gatearray (FPGA) logic synthesis and technology mapping have been studied extensively over the past 15 years. However, progress within the last few years has slowed considerably (with some notable exceptions). It seems natural to then question whether the current logics ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Fieldprogrammable gatearray (FPGA) logic synthesis and technology mapping have been studied extensively over the past 15 years. However, progress within the last few years has slowed considerably (with some notable exceptions). It seems natural to then question whether the current logicsynthesis and technologymapping algorithms for FPGA designs are producing nearoptimal solutions. Although there are many empirical studies that compare different FPGA synthesis/mapping algorithms, little is known about how far these algorithms are from the optimal (recall that both logicoptimization and technologymapping problems are NPhard, if we consider area optimization in addition to delay/depth optimization). In this paper, we present a novel method for constructing arbitrarily large circuits that have known optimal solutions after technology mapping. Using these circuits and their derivatives (called Logic synthesis Examples with Known Optimal (LEKO) and Logic synthesis Examples with Known Upper bounds (LEKU), respectively), we show that although leading FPGA technologymapping algorithms can produce close to optimal solutions, the results from the entire logicsynthesis flow (logic optimization + mapping) are far from optimal. The LEKU circuits were constructed to show where the logic synthesis flow can be improved, while the LEKO circuits specifically deal with the performance of the technology mapping. The best industrial and academic FPGA synthesis flows are around 70 times larger in terms of area on average and, in some cases, as much as 500 times larger on LEKU examples. These results clearly indicate that there is much room for further research and improvement in FPGA synthesis. Index Terms—Circuit optimization, circuit synthesis, design automation, fieldprogrammable gate arrays (FPGAs), optimization methods. I.
Boolean Matching for LUTBased Logic Blocks With Applications to Architecture Evaluation and Technology Mapping
 IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems
, 2001
"... In this paper, we present new Boolean matching methods for lookup table (LUT)based programmable logic blocks (PLBs) and their applications to PLB architecture evaluations and field programmable gate array (FPGA) technology mapping. Our Boolean matching methods, which are based on functional decompo ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
(Show Context)
In this paper, we present new Boolean matching methods for lookup table (LUT)based programmable logic blocks (PLBs) and their applications to PLB architecture evaluations and field programmable gate array (FPGA) technology mapping. Our Boolean matching methods, which are based on functional decomposition operations, can characterize functions for complex PLBs consisting of multiple LUTs (possibly of different sizes) such as Xilinx XC4K CLBs. With these techniques, we conducted quantitative evaluation of four PLB architectures on their functional capabilities. Architecture evaluation results show that the XC4K CLB can implement 98% of sixinput and 88% of seveninput functions extracted from MCNC benchmarks, while a simplified PLB architecture is more cost effective in terms of function implementation per LUT bit. Finally, we proposed new technology mapping algorithms that integrate Boolean matching and functional decomposition operations for depth minimization. Technology mapping results show that our PLB mapping approach achieves 12% smaller depth or 15% smaller area in XC5200 FPGAs and 18% smaller depth in XC4K FPGAs, compared to conventional LUT mapping approaches.
PlacementDriven Technology Mapping for LUTBased FPGAs
 In Proceedings of the ACM Int. Syposium on FPGAs
, 2003
"... In this paper, we study the problem of placementdriven technology mapping for tablelookup based FPGA architectures to optimize circuit performance. Early work on technology mapping for FPGAs such as Chortled[14] and Flowmap[3] aim to optimize the depth of the mapped solution without consideration ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we study the problem of placementdriven technology mapping for tablelookup based FPGA architectures to optimize circuit performance. Early work on technology mapping for FPGAs such as Chortled[14] and Flowmap[3] aim to optimize the depth of the mapped solution without consideration of interconnect delay. Later works such as Flowmapd[7], BiasClus[4] and EdgeMap consider interconnect delays during mapping, but do not take into consideration the effects of their mapping solution on the final placement. Our work focuses on the interaction between the mapping and placement stages. First, the interconnect delay information is estimated from the placement, and used during the labeling process. A placementbased mapping solution which considers both global cell congestion and local cell congestion is then developed. Finally, a legalization step and detailed placement is performed to realize the design. We have implemented our algorithm in a LUT based FPGA technology mapping package named PDM (PlacementDriven Mapping) and tested the implementation on a set of MCNC benchmarks. We use the tool VPR[1][2] for placement and routing of the mapped netlist. Experimental results show the longest path delay on a set of large MCNC benchmarks decreased by 12.3 % on the average.