Results 1  10
of
48
HighLevel Power Modeling, Estimation, and Optimization
 IEEE Trans. On Computer Aided Design
, 1998
"... Abstract—Silicon area, performance, and testability have been, so far, the major design constraints to be met during the development of digital verylargescaleintegration (VLSI) systems. In recent years, however, things have changed; increasingly, power has been given weight comparable to the othe ..."
Abstract

Cited by 106 (12 self)
 Add to MetaCart
(Show Context)
Abstract—Silicon area, performance, and testability have been, so far, the major design constraints to be met during the development of digital verylargescaleintegration (VLSI) systems. In recent years, however, things have changed; increasingly, power has been given weight comparable to the other design parameters. This is primarily due to the remarkable success of personal computing devices and wireless communication systems, which demand highspeed computations with low power consumption. In addition, there exists a strong pressure for manufacturers of highend products to keep power under control, due to the increased costs of packaging and cooling this type of devices. Last, the need of ensuring high circuit reliability has turned out to be more stringent. The availability of tools for the automatic design of lowpower VLSI systems has thus become necessary. More specifically, following a natural trend, the interests of the researchers have lately shifted to the investigation of power modeling, estimation, synthesis, and optimization techniques that account for power dissipation during the early stages of the design flow. This paper surveys representative contributions to this area that have appeared in the recent literature. Index Terms — Behavioral and logic synthesis, low power design, power management. I.
SystemLevel Exploration for ParetoOptimal Configurations in Parameterized SystemsonaChip
, 2002
"... In this work, we provide a technique for efficiently exploring the power/performance design space of a parameterized systemonachip (SOC) architecture to find all Paretooptimal configurations. These Paretooptimal configurations will represent the range of power and performance tradeoffs that are ..."
Abstract

Cited by 49 (5 self)
 Add to MetaCart
In this work, we provide a technique for efficiently exploring the power/performance design space of a parameterized systemonachip (SOC) architecture to find all Paretooptimal configurations. These Paretooptimal configurations will represent the range of power and performance tradeoffs that are obtainable by adjusting parameter values for a fixed application that is mapped on the SOC architecture. Our approach extensively prunes the potentially large configuration space by taking advantage of parameter dependencies. We have successfully applied our technique to explore Paretooptimal configurations of our SOC architecture for a number of applications.
HighLevel Area and Power Estimation for VLSI Circuits
, 1997
"... This paper addresses the problem of computing the area complexity of a multioutput combinational logic circuit, given only its functional description, i.e., Boolean equations, where area complexity is measured in terms of the number of gates required for an optimal multilevel implementation of the ..."
Abstract

Cited by 35 (4 self)
 Add to MetaCart
This paper addresses the problem of computing the area complexity of a multioutput combinational logic circuit, given only its functional description, i.e., Boolean equations, where area complexity is measured in terms of the number of gates required for an optimal multilevel implementation of the combinational logic. The proposed area model is based on transforming the given multioutput Boolean function description into an equivalent singleoutput function. The model is empirical, and results demonstrating its feasibility and utility are presented. Also, a methodology for converting the gate count estimates, obtained from the area model, into capacitance estimates is presented. Highlevel power estimates based on the total capacitance estimates and average activity estimates are also presented.
Toward Achieving Energy Efficiency in Presence of Deep Submicron Noise
 IEEE TRANSACTIONS ON VLSI SYSTEMS
, 2000
"... Presented in this paper are 1) informationtheoretic lower bounds on energy consumption of noisy digital gates and 2) the concept of noise tolerance via coding for achieving energy efficiency in the presence of noise. In particular, lower bounds on a) circuit speed and supply voltage ; b) transition ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
Presented in this paper are 1) informationtheoretic lower bounds on energy consumption of noisy digital gates and 2) the concept of noise tolerance via coding for achieving energy efficiency in the presence of noise. In particular, lower bounds on a) circuit speed and supply voltage ; b) transition activity in presence of noise; c) dynamic energy dissipation; and d) total (dynamic and static) energy dissipation are derived. A surprising result is that in a scenario where dynamic component of power dissipation dominates, the supply voltage for minimum energy operation ( ) is greater than the minimum supply voltage ( min ) for reliable operation. We then propose noise tolerance via coding to approach the lower bounds on energy dissipation. We show that the lower bounds on energy for an offchip I/O signaling example are a factor of 24 below present day systems. A very simple Hamming code can reduce the energy consumption by a factor of 3 , while ReedMuller (RM) codes give a 4 reduction in energy dissipation.
InformationTheoretic Bounds on Average Signal Transition Activity
, 1999
"... this paper, we derive lower and upper bounds on the average signal transition activity via an informationtheoretic approach in which symbols generated by a process (possibly correlated) with entropy vae Tl are coded with an average of R bits per symbol. The bounds are asymptotically achievable if t ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
this paper, we derive lower and upper bounds on the average signal transition activity via an informationtheoretic approach in which symbols generated by a process (possibly correlated) with entropy vae Tl are coded with an average of R bits per symbol. The bounds are asymptotically achievable if the process is stationary and ergodic. We also present a coding algorithm based on the LempelZiv data compression algorithm to achieve the bounds. Bounds are also obtained on the expected num ber of l's (or O's). These results are applied to, 1.) determine the activity reducing efficiency of different coding algorithms such as Entropy coding, Transition signaling, and BusInvert coding, and 2.) determine the lowerbound on the powerdelay product given T/ and R. Two examples are provided where transition activity within 4% and 9% of the lower bound is achieved when blocks of 8 symbols and 13 symbols, respectively, are coded at a time
Signal Coding for Low Power: Fundamental Limits and Practical Realizations,"
 IEEE Trans. on Circuits and Systems II: Analog and Digital Signal Processing,
, 1999
"... ..."
InstructionBased SystemLevel Power Evaluation of SystemonaChip Peripheral Cores
 in International Symposium on System Synthesis, Los Alamitos, CA, USA, 2000, p. 163, IEEE Computer Society
"... Abstract Various systemlevel corebased power evaluation approaches for core types like microprocessors, caches, main memories, and buses, have been proposed in the past. Approaches for other types of components have been based either on the gatelevel, registertransfer level, or behaviorallevel. ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
Abstract Various systemlevel corebased power evaluation approaches for core types like microprocessors, caches, main memories, and buses, have been proposed in the past. Approaches for other types of components have been based either on the gatelevel, registertransfer level, or behaviorallevel. We propose a new technique, suitable for a variety of cores like peripheral cores, that is the first to combine gatelevel power data with a systemlevel simulation model written in C++ or Java. For that purpose, we investigated peripheral cores and decomposed their functionality into socalled instructions. Our technique addresses a corebased system design paradigm. We show that our technique is sufficiently accurate for making powerrelated systemlevel design decisions, and that its computation time is orders of magnitude smaller than lowerlevel simulation approaches. Keywords Systemonachip, lowpower design, intellectual property, caches, cores, estimation, silicon platforms, system parameters. Introduction As mobile computing devices have gained a major market share in the computing sector and their functionality (i.e., complexity) has increased rapidly, minimizing power consumption has become one of the most important design goals. Furthermore, the short life cycles of consumer products in conjunction with increasing product complexity has led to a corebased design paradigm. As a consequence, there is a strong demand for corebased power evaluation and optimization tools. A core is a predesigned processinglevel component, such as a microprocessor, memory, DMA controller, UART, bus interface, or CODEC, residing on a systemonachip (SOC) with typically tens of other cores. A core, also known as Intellectual Property or IP, may come in different forms: a soft core comes as a synthesizable model in hardwaredescription language (HDL), a firm core as a structural model in an HDL, and a hard core as a technologyspecific layout. Core providers typically parameterize their cores, whether soft, hard, or firm in form, in order to increase the core's applicability in different applications and to ease its reusability. Example parameters include bitwidths and buffersizes. Soft and firm cores may be parameterized via HDL generics, while hard cores via layout generators. For example, a UART's buffer size may be varied from 1 to 16 bytes, trading off size and power for performance. An important systemonachip design task is thus the configuring of all cores' parameters, such that the configuration is tuned for the application (i.e., the software running on the SOC's microprocessor core) and for the power, size and performance constraints of the SOC. Since the whole SOC has to be optimized in terms of power and the according parameters are interdependent and large in number, fast and accurate evaluation and optimization tools are needed. A core's power consumption may vary greatly depending on the application driving the SOC, and on the configuration of the core itself. Thus, the power tables provided in core databooks, representing average power data, may yield inaccurate power numbers for a particular application and configuration, even when extended to account for a subset of common configurations. Therefore, numerous researchers have proposed techniques for rapid, systemlevel power evaluation of certain types of cores, including microprocessor, cache, memory and bus cores. In our efforts to develop a systemlevel power evaluation environment for parameterized SOC's, we found however that no techniques existed to evaluate general cores as fast and accurately as a combined gatelevel/systemlevel (i.e., executable specification) could provide. Our work uses this approach and applies it to peripheral cores, i.e., those singlepurpose processing cores that typically surround a microprocessor cores, such as DMA controllers, UARTs, bus interfaces and CODECs, to explore this promising technology. The remainder of the paper is organized as follows. In Section 2, we describe related work. In Section 3, we present our power evaluation technique. In Section 4, we give our experimental results. In section 5, we conclude. henkel@ccrl.nj.nec.com RTL (registertransfer level) power evaluation operates at an even higherlevel of abstraction, modeling power consumption of more abstract circuit components, such as adders and multipliers. Simulation is performed at the RTlevel and power is obtained by using these power models, also known as macromodels. The approach taken here can be divided into two categories, macromodeling using tablelookup techniques and analytical models. Power modeling and evaluation in 0 was among the first to show the feasibility of RTLbased approaches that showed a very good accuracy and a much higher speed than gatelevel approaches. Using tablelookups, each component is modeled via an Nvariable characterization (input density, output density, switchingprobability, etc.) of its power consumption [6] Previous work Previous behaviorallevel approaches seek to estimate power of a behavioral HDL description before a synthesized design is obtained. An abstract notion of physical capacitance and switching activity is used. Switching is estimated using entropy from circuit input to circuit output by quadratic or exponential degradation [11][12]. While such behavioral approaches can provide fast evaluation of power for custom designs, they will not be nearly as accurate for cores as approaches that take advantage of the fact that cores are predesigned. Work has been done to evaluate power consumption of microprocessor cores. One approach, instructionlevel power modeling, is proposed by Powerevaluation for peripheral cores 3.1 Overview We examined a variety of peripheral cores, and found that they all could be viewed as executing a sequence of what we call "instructions", using this term in a relaxed manner. Typically, } an instruction represents an atomic action available to the programmer of a microprocessor. But we use "instruction" more generally as an action that collectively with other actions describes the range of possible behaviors of a core. Furthermore, an "instruction" can be better used for power evaluation since compared to a classical instruction our notation of instruction can denote a smaller or a larger piece of functionality depending on the power characteristics. Thus, we have extended the instructionlevel power modeling approach, previously used for microprocessor cores, to peripheral cores. In developing the approach, we noted that cores typically already come with systemlevel functional models, written in a language like C, C++, or Java, and that in fact the VSIA requires such models in its standard Our approach can be broken into a number of steps as follows. The core provider must select a set of appropriate instructions, perform gatelevel power analysis to construct power lookup tables for each instruction, and create a systemlevel core model that utilizes the lookuptables for power evaluation. The core user connects the systemlevel core models, executes the whole system (which is possible since the systemlevel model represents an executable specification), and thus obtains power data after a system execution/simulation. We now describe each step in more detail, using a UART core as an example. Peripheral instructions The core provider must first break the core's functionality into a set of instructions. Given an RTL model of a core called C, we first determine the systemlevel instructions i1, i2, i3, … in, of C. These instructions must have the property that they collectively cover the entire functionality of C and that no two instructions cover the same function of C, i.e., i1 As with the instructions of an instructionset processor, each instruction ij operates on some input data and produces some output data. In our UART example, we selected the following instructions: Reset, Enable_tx, Enable_rx, Send, and Receive. For each instruction, the core provider must determine how dependent the instruction's power consumption is on the instruction's input data. We thus define an instruction's powerdependency characteristic as one of: dependent directly on its input data, dependent on a statistical characterization of its input data (e.g., the density of 1's in a vector of bits), or independent of its input data. Such determination can be based on databooks, a core designer's knowledge, experimental results or statistical analysis. For our UART example, we ran experiments that provided different data to each instruction, and we determined that the powerdependency characteristic for all instructions was "independent." For example, the Send instruction consumes approximately a constant amount of power regardless of the data being sent; likewise for the Receive instruction. Note that this is just a simple example. In general, there is a tradeoff in choosing the right instructions for power evaluation: if the granularity is small, we will have many instructions leading to a longer simulation time. On the other hand, a small granularity tends to produce more accurate results since more subtleties are taken care of. When we have coarsegrained instructions, then the total number of instructions is smaller, leading to faster simulation. But coarser instructions might not be able to take into consideration subtle effects, and as such may lead to less accurate results. Very unlike microprocessors, certain instructions executed on a peripheral core can drastically change the power consumption of succeeding instructions. In particular, certain instructions change the mode of the peripheral core. This concept of mode is very different from that of measuring interinstruction power dependencies (e.g., a load following a store may consume more power than a load following an add). To account for a mode, the core provider must determine the set of modes, m1, m2, m3, … mk in C, referred to as powermodes, that cause C to consume significantly more or less power per each execution of its instructions, i1, i2, i3, … in. In our UART example, we found four power modes: Idle, Tx_enabled, Rx_enabled, Tx_rx_enabled. Given these modes, we define a powermode transition function, that gives the next powermode given the current powermode and the most recently executed instruction of C. For the UART example, the powermode transition function is given in Gatelevel power evaluation The second task consists of using gatelevel simulation to obtain perinstruction power data for the lookuptables. Given an RTL model of a core called C, its instructions i1, i2, i3, … in, and its modes m1, m2, m3, … mk, we follow the procedure outlined in Systemlevel modeling The next step is to develop a systemlevel model of each core that enables rapid power evaluation when executed, i.e., an executable specification. Given an RTL model of a core called C, its instructions i1, i2, i3, … in, and its modes m1, m2, m3, … mk, we implement a functional model of C in terms of its instructions. If using methodcalling objects Systemlevel power evaluation The above three steps are performed by the core developer. They may take days to complete, forming part of the months required to develop the core. They must be done for each target technology. But note that those steps have only to be performed one time. Once done, the resulting data can be used in any corebased design that is using the particular core. The core user does not perform the above steps (unless retargeting to a new technology, in which case the core provider may supply the necessary testbenches). Rather, the user connects the core models and simulates them. Simulation of a complete SOC, using systemlevel models, takes on the order of seconds or minutes. Thus, hundreds or thousands of configurations can be evaluated. The toplevel simulation model will be designed to output the value of the totalpower variable, for each core of the system, at the end of each simulation. The sum of these totalpower values represents the systemlevel estimate of the system's power consumption for a given configuration of its parameters.
Analytical Estimation of Transition Activity from WordLevel Signal Statistics
 Design Automation Conference
, 1997
"... Presented here is an analytical methodology to determine the average signal activity, T , from the highlevel signal statistics, a statistical signal generation model, and the signal encoding. Simulation results for 16 bit signals generated via AR#1# and MA#1# models indicate an estimation error in T ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
Presented here is an analytical methodology to determine the average signal activity, T , from the highlevel signal statistics, a statistical signal generation model, and the signal encoding. Simulation results for 16 bit signals generated via AR#1# and MA#1# models indicate an estimation error in T of less than 2#. The application of the proposed method to the estimation of T in DSP hardware is also explained. I. INTRODUCTION Power dissipation has become a critical design concern in recentyears driven by the emergence of mobile applications. Reliability concerns and packaging costs have made power optimization relevanteven for tethered applications. As system designers strivetointegrate multiplesystems onchip, power dissipation has become an equally important parameter that needs to be optimized along with area and speed. Therefore, extensive researchinto various aspects of lowpower system design is presently being conducted. These include power reduction techniques #3#5#; lowp...
Achievable bounds on signal transition activity
 in Proc. ACM/IEEE Int. Conf. ComputerAided Design
, 1997
"... Abstract Transitions on high capacitance busses in VLSI systems result in considerable system power dissipation. Therefore, various coding schemes have been proposed in the literature to encode the input signal in order to reduce the number of transitions. In this paper we derive achievable lower a ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
Abstract Transitions on high capacitance busses in VLSI systems result in considerable system power dissipation. Therefore, various coding schemes have been proposed in the literature to encode the input signal in order to reduce the number of transitions. In this paper we derive achievable lower and upper bounds on the expected signal transition activity. These bounds are derived via an informationtheoretic approach in which symbols generated by a source p ossibly correlated with entropy rate H are c o ded with an average of R bits symbol. These results are applied to, 1. determine the activity reducing e ciency of di erent coding algorithms such as Entropy coding, Transition coding, and BusInvert coding, 2. bound the error in entropybased p ower estimation schemes, and 3. determine the lowerbound on the powerdelay product. Two examples are p r ovided where t r ansition activity within 4 and 8 of the lower bound is achieved when blocks of 8 and 13 symbols respectively are c o ded at a time.
Tracedriven Systemlevel Power Evaluation of Systemonachip Peripheral Cores
 in Proc. Asia and South Pacific Design Automation Conference (ASPDAC
, 2001
"... Our earlier work for fast evaluation of power consumption of general cores in a systemonachip described techniques that involved isolating highlevel instructions of a core, measuring gatelevel power consumption per instruction, and then annotating a systemlevel simulation model with the obtain ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
Our earlier work for fast evaluation of power consumption of general cores in a systemonachip described techniques that involved isolating highlevel instructions of a core, measuring gatelevel power consumption per instruction, and then annotating a systemlevel simulation model with the obtained data. In this work, we describe a method for speeding up the evaluation further, through the use of instruction traces and trace simulators for every core, not just microprocessor cores. Our method shows noticeable speedups at an acceptable loss of accuracy. We show that reducing trace sizes can speed up the method even further. The speedups allow for more extensive systemlevel power exploration and hence better optimization.