Results 1  10
of
376
Optimizing Power Using Transformations
 IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems
, 1995
"... : The increasing demand for portable computing has elevated power consumption to be one of the most critical design parameters. A highlevel synthesis system, HYPERLP, is presented for minimizing power consumption in application specific datapath intensive CMOS circuits using a variety of architect ..."
Abstract

Cited by 208 (14 self)
 Add to MetaCart
(Show Context)
: The increasing demand for portable computing has elevated power consumption to be one of the most critical design parameters. A highlevel synthesis system, HYPERLP, is presented for minimizing power consumption in application specific datapath intensive CMOS circuits using a variety of architectural and computational transformations. The synthesis environment consists of highlevel estimation of power consumption, a library of transformation primitives, and heuristic/probabilistic optimization search mechanisms for fast and efficient scanning of the design space. Examples with varying degree of computational complexity and structures are optimized and synthesized using the HYPERLP system. The results indicate that more than an order of magnitude reduction in power can be achieved over currentday design methodologies while maintaining the system throughput; in some cases this can be accomplished while preserving or reducing the implementation area. 1.0 Introduction VLSI research a...
Rotation scheduling: A loop pipelining algorithm
 Dept. of Computer Science, Princeton University
, 1997
"... Abstract — We consider the resourceconstrained scheduling of loops with interiteration dependencies. A loop is modeled as a data flow graph (DFG), where edges are labeled with the number of iterations between dependencies. We design a novel and flexible technique, called rotation scheduling, for sc ..."
Abstract

Cited by 114 (53 self)
 Add to MetaCart
Abstract — We consider the resourceconstrained scheduling of loops with interiteration dependencies. A loop is modeled as a data flow graph (DFG), where edges are labeled with the number of iterations between dependencies. We design a novel and flexible technique, called rotation scheduling, for scheduling cyclic DFG’s using loop pipelining. The rotation technique repeatedly transforms a schedule to a more compact schedule. We provide a theoretical basis for the operations based on retiming. We propose two heuristics to perform rotation scheduling and give experimental results showing that they have very good performance. Index Terms — Highlevel synthesis, loop pipelining, parallel compiler, retiming, scheduling.
Approximating Minimum Feedback Sets and Multicuts in Directed Graphs
 ALGORITHMICA
, 1998
"... This paper deals with approximating feedback sets in directed graphs. We consider two related problems: the weighted feedback vertex set (fvs) problem, and the weighted feedback edge set problem (fes). In the fvs (resp. fes) problem, one is given a directed graph with weights (each of which is at le ..."
Abstract

Cited by 106 (3 self)
 Add to MetaCart
This paper deals with approximating feedback sets in directed graphs. We consider two related problems: the weighted feedback vertex set (fvs) problem, and the weighted feedback edge set problem (fes). In the fvs (resp. fes) problem, one is given a directed graph with weights (each of which is at least 1) on the vertices (resp. edges), and is asked to find a subset of vertices (resp. edges) with minimum total weight that intersects every directed cycle in the graph. These problems are among the classical NPHard problems and have many applications. We also consider a generalization of these problems: subsetfvs and subsetfes, in which the feedback set has to intersect only a subset of the directed cycles in the graph. This subset consists of all the cycles that go through a distinguished input subset of vertices and edges, denoted by X . This generalization is also NPHard even when X = 2. We present approximation algorithms for the subsetfvs and subsetfes problems. The first algorithm we present achieves an approximation factor of O(log2 X). The second algorithm achieves an approximation factor of O(min(log tau log log tau; log n log log n)), where tau is the value of the optimum fractional solution of the problem at hand, and n is the number of vertices in the graph. We also define a multicut problem in a special type of directed networks which we call circular networks, and show that the subsetfes and subsetfvs problems are equivalent to this multicut problem. Another contribution of our paper is a combinatorial algorithm that computes a (1 + epsilon) approximation to the fractional optimal feedback vertex set. Computing the approximate solution is much simpler and more efficient than general linear programming methods. All of our algorithms use this approximate solution.
HighLevel Power Modeling, Estimation, and Optimization
 IEEE Trans. On Computer Aided Design
, 1998
"... Abstract—Silicon area, performance, and testability have been, so far, the major design constraints to be met during the development of digital verylargescaleintegration (VLSI) systems. In recent years, however, things have changed; increasingly, power has been given weight comparable to the othe ..."
Abstract

Cited by 106 (12 self)
 Add to MetaCart
(Show Context)
Abstract—Silicon area, performance, and testability have been, so far, the major design constraints to be met during the development of digital verylargescaleintegration (VLSI) systems. In recent years, however, things have changed; increasingly, power has been given weight comparable to the other design parameters. This is primarily due to the remarkable success of personal computing devices and wireless communication systems, which demand highspeed computations with low power consumption. In addition, there exists a strong pressure for manufacturers of highend products to keep power under control, due to the increased costs of packaging and cooling this type of devices. Last, the need of ensuring high circuit reliability has turned out to be more stringent. The availability of tools for the automatic design of lowpower VLSI systems has thus become necessary. More specifically, following a natural trend, the interests of the researchers have lately shifted to the investigation of power modeling, estimation, synthesis, and optimization techniques that account for power dissipation during the early stages of the design flow. This paper surveys representative contributions to this area that have appeared in the recent literature. Index Terms — Behavioral and logic synthesis, low power design, power management. I.
Special Purpose Parallel Computing
 Lectures on Parallel Computation
, 1993
"... A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing ..."
Abstract

Cited by 82 (6 self)
 Add to MetaCart
A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing [365] demonstrated that, in principle, a single general purpose sequential machine could be designed which would be capable of efficiently performing any computation which could be performed by a special purpose sequential machine. The importance of this universality result for subsequent practical developments in computing cannot be overstated. It showed that, for a given computational problem, the additional efficiency advantages which could be gained by designing a special purpose sequential machine for that problem would not be great. Around 1944, von Neumann produced a proposal [66, 389] for a general purpose storedprogram sequential computer which captured the fundamental principles of...
Time and Area Efficient Pattern Matching on FPGAs
 In The Twelfth Annual ACM International Symposium on FieldProgrammable Gate Arrays (FPGA ’04
, 2004
"... Pattern matching for network security and intrusion detection demands exceptionally high performance. Much work has been done in this field, and yet there is still significant room for improvement in efficiency, flexibility, and throughput. We develop a novel lineararray string matching architec ..."
Abstract

Cited by 75 (11 self)
 Add to MetaCart
(Show Context)
Pattern matching for network security and intrusion detection demands exceptionally high performance. Much work has been done in this field, and yet there is still significant room for improvement in efficiency, flexibility, and throughput. We develop a novel lineararray string matching architecture using a buffered, twocomparator variation on the KnuthMorrisPratt(KMP) algorithm. For small (16 or fewer characters) patterns, it compares favorably with the stateoftheart while providing better scalability and reconfiguration, and more efficient hardware utilization. KMP is a wellknown, efficient string matching technique using a single comparator and a precomputed transition table. We add a second comparator and an input buffer, allowing the system to accept at least one character in each cycle and terminate after a number of clock cycles at maximum equal to the length of the input string plus the size of the buffer. The system also provides a clean, modular route to reconfiguring the patterns onthefly and scaling the system to support more units, using several rows of linear array elements. In this paper, we prove the bound on the buffer size and running time, and provide performance comparisons against other approaches.
Scheduling dataflow graphs via retiming and unfolding
 IEEE Trans. on Parallel and Distributed Systems
, 1997
"... Abstract—Loop scheduling is an important problem in parallel processing. The retiming technique reorganizes an iteration; the unfolding technique schedules several iterations together. We combine these two techniques to obtain a static schedule with a reduced average computation time per iteration. ..."
Abstract

Cited by 62 (26 self)
 Add to MetaCart
Abstract—Loop scheduling is an important problem in parallel processing. The retiming technique reorganizes an iteration; the unfolding technique schedules several iterations together. We combine these two techniques to obtain a static schedule with a reduced average computation time per iteration. We first prove that the order of retiming and unfolding is immaterial for scheduling a dataflow graph (DFG). From this nice property, we present a polynomialtime algorithm on the original DFG, before unfolding, to find the minimumrate static schedule for a given unfolding factor. For the case of a unittime DFG, efficient checking and retiming algorithms are presented.
Optimizing TwoPhase, LevelClocked Circuitry (Extended Abstract)
"... We investigate two strategies for reducing the clock period of a twophase, levelclocked circuit: clock tuning, which adjusts the waveforms that clock the circuit, and retiming, which relocates circuit latches. These methods can be used to convert a circuit with edgetriggered latches into a faster ..."
Abstract

Cited by 58 (16 self)
 Add to MetaCart
(Show Context)
We investigate two strategies for reducing the clock period of a twophase, levelclocked circuit: clock tuning, which adjusts the waveforms that clock the circuit, and retiming, which relocates circuit latches. These methods can be used to convert a circuit with edgetriggered latches into a faster levelclocked one. We model a twophase circuit as a graph whose vertex set V is a collection of combinational logic blocks, and whose edge set E is a set of interconnections. Each interconnection passes through 0 or more latches, where each latch is clocked by one of two periodic, nonoverlapping waveforms, or phases. We give efficient polynomialtime algorithms for problems involving the timing verification and optimization of twophase circuitry. Included are algorithms for ffl verifyi...
Efficient Implementation of Retiming
 In Proc. Intl. Conf. on ComputerAided Design
, 1994
"... Retiming is a technique for optimizing sequential circuits. It repositions the registers in a circuit leaving the combinational cells untouched. The objective of retiming is to find a circuit with the minimum number of registers for a specified clock period. More than ten years have elapsed since Le ..."
Abstract

Cited by 52 (0 self)
 Add to MetaCart
(Show Context)
Retiming is a technique for optimizing sequential circuits. It repositions the registers in a circuit leaving the combinational cells untouched. The objective of retiming is to find a circuit with the minimum number of registers for a specified clock period. More than ten years have elapsed since Leiserson and Saxe first presented a theoretical formulation to solve this problem for singleclock edgetriggered sequential circuits. Their proposed algorithms have polynomial complexity; however naive implementations of these algorithms exhibit O(n 3 ) time complexity and O(n 2 ) space complexity when applied to digital circuits with n combinational cells. This renders retiming ineffective for circuits with more than 500 combinational cells. This paper addresses the implementation issues required to exploit the sparsity of circuit graphs to allow minperiod retiming and constrained minarea retiming to be applied to circuits with as many as 10,000 combinational cells. We believe this is...
Low Power Synthesis of Dual Threshold Voltage CMOS VLSI Circuits
, 1999
"... The use of dual threshold voltages can significantly reduce the static power dissipated in CMOS VLSI circuits. With the supply voltage at 1V and threshold voltage as low as 0.2V the subthreshold leakage power of transistors starts dominating the dynamic power. Also, many times a large number of devi ..."
Abstract

Cited by 49 (0 self)
 Add to MetaCart
The use of dual threshold voltages can significantly reduce the static power dissipated in CMOS VLSI circuits. With the supply voltage at 1V and threshold voltage as low as 0.2V the subthreshold leakage power of transistors starts dominating the dynamic power. Also, many times a large number of devices spend a long time in a standby mode where the leakage power is the only source of power consumption. We present a nearoptimal approach to synthesize low static power CMOS VLSI circuits with two threshold voltages that reduces power consumption compared with a previous approach by upto 29.45%. Also, presented is a technique which finds static power optimal configurations for CMOS VLSI circuits when arbitrary number of threshold voltages are allowed.