Results 1 - 10
of
461
The effect of LUT and cluster size on deepSubmicron FPGA performance and density
- in Proc. IEEE Field Programmable Gate Arrays (FPGA
, 2000
"... Abstract—In this paper, we revisit the field-programmable gatearray (FPGA) architectural issue of the effect of logic block functionality on FPGA performance and density. In particular, in the context of lookup table, cluster-based island-style FPGAs (Betz et al. 1997) we look at the effect of looku ..."
Abstract
-
Cited by 108 (4 self)
- Add to MetaCart
(Show Context)
Abstract—In this paper, we revisit the field-programmable gatearray (FPGA) architectural issue of the effect of logic block functionality on FPGA performance and density. In particular, in the context of lookup table, cluster-based island-style FPGAs (Betz et al. 1997) we look at the effect of lookup table (LUT) size and cluster size (number of LUTs per cluster) on the speed and logic density of an FPGA. We use a fully timing-driven experimental flow (Betz et al. 1997), (Marquardt, 1999) in which a set of benchmark circuits are synthesized into different cluster-based (Betz and Rose, 1997, 1998) and (Marquardt, 1999) logic block architectures, which contain groups of LUTs and flip-flops. Across all architectures with LUT sizes in the range of 2 to 7 inputs, and cluster size from 1 to 10 LUTs, we have experimentally determined the relationship between the number of inputs required for a cluster as a function of the LUT size ( ) and cluster size (). Second, contrary to previous results, we have shown that clustering small LUTs (sizes 2 and 3) produces better area results than what was presented in the past. However, our results also show that the performance of FPGAs with these small LUT sizes is significantly worse (by almost a factor of 2) than larger LUTs. Hence, as measured by area-delay product, or by performance, these would be a bad choice. Also, we have discovered that LUT sizes of 5 and 6 produce much better area results than were previously believed. Finally, our results show that a LUT size of 4 to 6 and cluster size of between 3–10 provides the best area-delay product for an FPGA. Index Terms—Architecture, clusters, computer-aided design (CAD), field-programmable gate-array (FPGA), look-up table (LUT), very large scale integration (VLSI). I.
Timing-Driven Placement for FPGAs
, 2000
"... In this paper we introduce a new Simulated Annealingbased timing-driven placement algorithm for FPGAs. This paper has three main contributions. First, our algorithm employs a novel method of determining source-sink connection delays during placement. Second, we introduce a new cost function that tra ..."
Abstract
-
Cited by 95 (2 self)
- Add to MetaCart
(Show Context)
In this paper we introduce a new Simulated Annealingbased timing-driven placement algorithm for FPGAs. This paper has three main contributions. First, our algorithm employs a novel method of determining source-sink connection delays during placement. Second, we introduce a new cost function that trades off between wire-use and critical path delay, resulting in significant reductions in critical path delay without significant increases in wire-use. Finally, we combine connection-based and path-based timing-analysis to obtain an algorithm that has the low time-complexity of connection-based timing-driven placement, while obtaining the quality of path-based timing-driven placement. A comparison of our new algorithm to a well known nontiming -driven placement algorithm demonstrates that our algorithm is able to increase the post-place-and-route speed (using a full path-based timing-driven router and a realistic routing architecture) of 20 MCNC benchmark circuits by an average of 42%, whil...
Architecture Evaluation for Power-Efficient FPGAs
- in Proc. ACM Intl. Symp. Field-Programmable Gate Arrays
, 2003
"... This paper presents a flexible FPGA architecture evaluation framework, namedJgaEVA-LP, for power efficiency analysis of LUT-based FPGA architectures. Our work has several contributions: (i) We develop a mixed-level FPGA power model that combines switch-level models for interconnects and macromodels ..."
Abstract
-
Cited by 76 (24 self)
- Add to MetaCart
This paper presents a flexible FPGA architecture evaluation framework, namedJgaEVA-LP, for power efficiency analysis of LUT-based FPGA architectures. Our work has several contributions: (i) We develop a mixed-level FPGA power model that combines switch-level models for interconnects and macromodels for LUTs; (ii) We develop a tool that automatically generates a back-annotated gate-level netlist with post-layout extracted capacitances and delays; (iii) We develop a cycleaccurate power simulator based on our power model. It carries out gate-level simulation under real delay model and is able to capture glitch power; (iv) Using the frameworkJgaEVA-LP, we study the power efficiency of FPGAs, in 0.10um technology, under various settings of architecture parameters such as LUT sizes, cluster sizes and wire segmentation schemes and reach several important conclusions. We also present the detailed power consumption distribution among different FPGA components and shed light on the potential opportunities of power optimization for future FPGA designs (e.g., _< 0.10urn technology).
Using cluster-based logic blocks and timing-driven packing to improve FPGA speed and density
- In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays
, 1999
"... In this papel; we investigate the speed and area-eficiency of FPGAs employing “logic clusters ” containing multiple LUTs and registers as their logic block. We introduce a new, timing-driven tool (T-VPack) to “pack ” LUTs and registers into these logic clusters, and we show that this algorithm is su ..."
Abstract
-
Cited by 75 (8 self)
- Add to MetaCart
In this papel; we investigate the speed and area-eficiency of FPGAs employing “logic clusters ” containing multiple LUTs and registers as their logic block. We introduce a new, timing-driven tool (T-VPack) to “pack ” LUTs and registers into these logic clusters, and we show that this algorithm is superior to an existing packing algorithm. Then, using a realistic routing architecture and sophisticated delay and area models, we empirically evaluate FPGAs composed of clusters ranging in size from one to twenty LUTs, and show that clusters of size seven through ten provide the best area-delay trade-o @ Compared to circuits implemented in an FPGA composed of size one clusters, circuits implemented in an FPGA with size seven clusters have 30 % less delay (a 43 % increase in speed) and require 8 % less area, and circuits implemented in an FPGA with size ten clusters have 34 % less delay (a 52 % increase in speed), and require no additional area. 1.
Nanowire-based programmable architectures
- ACM Journal on Emerging Technologies in Computing Systems
, 2005
"... Chemists can now construct wires which are just a few atoms in diameter; these wires can be selectively field-effect gated, and wire crossings can act as diodes with programmable resistance. These new capabilities present both opportunities and challenges for constructing nanoscale computing systems ..."
Abstract
-
Cited by 59 (6 self)
- Add to MetaCart
Chemists can now construct wires which are just a few atoms in diameter; these wires can be selectively field-effect gated, and wire crossings can act as diodes with programmable resistance. These new capabilities present both opportunities and challenges for constructing nanoscale computing systems. The tiny feature sizes offer a path to economically scale down to atomic dimensions. However, the associated bottom-up synthesis techniques only produce highly regular structures and come with high defect rates and minimal control during assembly. To exploit these technologies, we develop nanowire-based architectures which can bridge between lithographic and atomic-scale feature sizes and tolerate defective and stochastic assembly of regular arrays to deliver high density universal computing devices. Using 10nm pitch nanowires, these nanowire-based programmable architectures offer one to two orders of magnitude greater mapped-logic density than defect-free lithographic FPGAs at 22nm.
Efficient Circuit Clustering for Area and Power Reduction in FPGAs
- In Proceedings of ACM/SIGDA international symposium on Field-programmable gate arrays
, 2002
"... We present a routability-driven bottom-up clustering technique for area and power reduction in clustered FPGAs. This technique uses a cell connectivity metric to identify seeds for efficient clustering. Effective seed selection, coupled with an interconnect-resource aware clustering and placement, c ..."
Abstract
-
Cited by 57 (0 self)
- Add to MetaCart
(Show Context)
We present a routability-driven bottom-up clustering technique for area and power reduction in clustered FPGAs. This technique uses a cell connectivity metric to identify seeds for efficient clustering. Effective seed selection, coupled with an interconnect-resource aware clustering and placement, can have a favorable impact on circuit routability. It leads to better device utilization, savings in area, and reduction in power consumption. Routing area reduction of 35 % is achieved over previously published results. Power dissipation simulations using a buffered pass-transistor-based FPGA interconnect model are presented. They show that our clustering technique can reduce the overall device power dissipation by an average of 13%. 1.
FPGA Routing Architecture: Segmentation and Buffering to Optimize Speed and Density
, 1999
"... In this work we investigate the routing architecture of FPGAs, focusing primarily on determining the best distribution of routing segment lengths and the best mix of pass transistor and tri-state buffer routing switches. While most commercial FPGAs contain many length 1 wires (wires that span only o ..."
Abstract
-
Cited by 55 (5 self)
- Add to MetaCart
In this work we investigate the routing architecture of FPGAs, focusing primarily on determining the best distribution of routing segment lengths and the best mix of pass transistor and tri-state buffer routing switches. While most commercial FPGAs contain many length 1 wires (wires that span only one logic block) we find that wires this short lead to FPGAs that are inferior in terms of both delay and routing area. Our results show instead that it is best for FPGA routing segments to have lengths of 4 to 8 logic blocks. We also show that 50% to 80% of the routing switches in an FPGA should be pass transistors, with the remainder being tri-state buffers. Architectures that employ the best segmentation distributions and the best mixes of pass transistor and tri-state buffer switches found in this paper are not only 11% to 18% faster than a routing architecture very similar to that of the Xilinx XC4000X but also considerably simpler. These results are obtained using an architecture invest...
An Architecture and Compiler for Scalable On-Chip Communication
- IEEE Transactions on Very Large Scale Integration (VLSI) Systems
, 2004
"... Abstract—A dramatic increase in single chip capacity has led to a revolution in on-chip integration. Design reuse and ease of implementation have became important aspects of the design process. This paper describes a new scalable single-chip communication architecture for heterogeneous resources, ad ..."
Abstract
-
Cited by 42 (0 self)
- Add to MetaCart
(Show Context)
Abstract—A dramatic increase in single chip capacity has led to a revolution in on-chip integration. Design reuse and ease of implementation have became important aspects of the design process. This paper describes a new scalable single-chip communication architecture for heterogeneous resources, adaptive system-on-a-chip (aSOC) and supporting software for application mapping. This architecture exhibits hardware simplicity and optimized support for compile-time scheduled communication. To illustrate the benefits of the architecture, four high-bandwidth signal processing applications including an MPEG-2 video encoder and a Doppler radar processor have been mapped to a prototype aSOC device using our design mapping technology. Through experimentation it is shown that aSOC communication outperforms a hierarchical bus-based system-on-chip (SoC) approach by up to a factor of five. A VLSI implementation of the communication architecture indicates clock rates of 400 MHz in 0.18- m technology for sustained on-chip communication. In comparison to previously-published results for an MPEG-2 decoder, our on-chip interconnect shows a runtime improvement of over a factor of four. Index Terms—Communications architecture, on-chip interconnect, system-on-chip (SoC).
Reconfigurable computing: architectures and design methods.”
- Proceedings on IEEE Computers and Digital Techniques,
, 2005
"... ..."
(Show Context)
Power Modeling and Characteristics of Field Programmable Gate Arrays
, 2005
"... This paper studies power modeling for Field Programmable ..."
Abstract
-
Cited by 40 (9 self)
- Add to MetaCart
This paper studies power modeling for Field Programmable