Results 1  10
of
123
Approximate Signal Processing
, 1997
"... It is increasingly important to structure signal processing algorithms and systems to allow for trading off between the accuracy of results and the utilization of resources in their implementation. In any particular context, there are typically a variety of heuristic approaches to managing these tra ..."
Abstract

Cited by 516 (2 self)
 Add to MetaCart
It is increasingly important to structure signal processing algorithms and systems to allow for trading off between the accuracy of results and the utilization of resources in their implementation. In any particular context, there are typically a variety of heuristic approaches to managing these tradeoffs. One of the objectives of this paper is to suggest that there is the potential for developing a more formal approach, including utilizing current research in Computer Science on Approximate Processing and one of its central concepts, Incremental Refinement. Toward this end, we first summarize a number of ideas and approaches to approximate processing as currently being formulated in the computer science community. We then present four examples of signal processing algorithms/systems that are structured with these goals in mind. These examples may be viewed as partial inroads toward the ultimate objective of developing, within the context of signal processing design and implementation,...
A LowPower DCT Core Using Adaptive Bitwidth and Arithmetic Activity . . .
 IEEE JOURNAL OF SOLIDSTATE CIRC.
, 2000
"... This work describes the implementation of a discrete cosine transform (DCT) core compression system targetted to lowpower video (MPEG2 MP@ML) and stillimage (JPEG) applications. It exhibits two innovative techniques for arithmetic operation reduction in the DCT computation context along with stand ..."
Abstract

Cited by 38 (4 self)
 Add to MetaCart
This work describes the implementation of a discrete cosine transform (DCT) core compression system targetted to lowpower video (MPEG2 MP@ML) and stillimage (JPEG) applications. It exhibits two innovative techniques for arithmetic operation reduction in the DCT computation context along with standard voltage scaling techniques such as pipelining and parallelism. The first method dynamically minimizes the bitwidth of arithmetic operations in the presence of data spatial correlation. The second method trades off power dissipation and image compression quality (arithmetic precision.) The chip dissipates 4.38 mW at 14 MHz and 1.56 V.
A micropower programmable DSP using approximate signal processing based on distributed arithmetic
 IEEE J. SolidState Circuits
"... ..."
Energy Scalable System Design
, 2002
"... We introduce the notion of energyscalable systemdesign. The principal idea is to maximize computational quality for a given energy constraint at all levels of the system hierarchy. The desirable energyquality (EQ) characteristics of systems are discussed. EQ behavior of algorithms is considere ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
We introduce the notion of energyscalable systemdesign. The principal idea is to maximize computational quality for a given energy constraint at all levels of the system hierarchy. The desirable energyquality (EQ) characteristics of systems are discussed. EQ behavior of algorithms is considered and transforms that significantly improve scalability are analyzed using three distinct categories of commonly used signalprocessing algorithms on the StrongARM SA1100 processor as examples (viz., filtering, frequency domain transforms and classification). Scalability hooks in hardware are analyzed using similar examples on the Pentium III processor and a scalable programming methodology is proposed. Design techniques for true energy scalable hardware are also demonstrated using filtering as an example.
Energy Efficient Filtering Using Adaptive Precision and Variable Voltage
, 1999
"... A Finite Impulse Response (FIR) filter architecture based on a Distributed Arithmetic (DA) approach with two supply voltages and variable bit precision operation is presented. The filter is able to adapt itself to the minimum bit precision required by the incoming data and also operate at a lower vo ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
A Finite Impulse Response (FIR) filter architecture based on a Distributed Arithmetic (DA) approach with two supply voltages and variable bit precision operation is presented. The filter is able to adapt itself to the minimum bit precision required by the incoming data and also operate at a lower voltage so that it still meets a fixed throughput constraint. As opposed to the worst case fixed precision design, our precisionondemand implementation has an energy requirement that varies linearly with the average bit precision required by the input signal. We also demonstrate that 50% to 60% energy savings can easily be obtained in the case of speech data.
Scalable FPGAbased Architecture for DCT Computation Using Dynamic Partial Reconfiguration
"... In this paper, we propose FPGAbased scalable architecture for DCT computation using dynamic partial reconfiguration. Our architecture can achieve quality scalability using dynamic partial reconfiguration. This is important for some critical applications that need continuous hardware servicing. Our ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
In this paper, we propose FPGAbased scalable architecture for DCT computation using dynamic partial reconfiguration. Our architecture can achieve quality scalability using dynamic partial reconfiguration. This is important for some critical applications that need continuous hardware servicing. Our scalable architecture has three features. First, the architecture can perform DCT computations for eight different zones, i.e., from 1×1 DCT to 8×8 DCT. Second, the architecture can change the configuration of processing elements to trade off the precisions of DCT coefficients with computational complexity. Third, unused PEs for DCT can be used for motion estimation computations. Using dynamic partial reconfiguration with 2.3 MB bitstreams, 80 distinct hardware architectures can be implemented. We show the experimental results and comparisons between different configurations using both partial reconfiguration and nonpartial reconfiguration process. The detailed tradeoffs among visual quality, power consumption, processing clock cycles, and reconfiguration overhead are analyzed in the paper.
T.Stouraitis,“ A systolic array architecture for the discrete sine transform
 IEEE Trans. Signal Process
"... Abstract—An efficient approach to design very large scale integration (VLSI) architectures and a scheme for the implementation of the discrete sine transform (DST), based on an appropriate decomposition method that uses circular correlations, is presented. The proposed design uses an efficient restr ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Abstract—An efficient approach to design very large scale integration (VLSI) architectures and a scheme for the implementation of the discrete sine transform (DST), based on an appropriate decomposition method that uses circular correlations, is presented. The proposed design uses an efficient restructuring of the computation of the DST into two circular correlations, having similar structures and only one half of the length of the original transform; these can be concurrently computed and mapped onto the same systolic array. Significant improvement in the computational speed can be obtained at a reduced input–output (I/O) cost and low hardware complexity, retaining all the other benefits of the VLSI implementations of the discrete transforms, which use circular correlation or cyclic convolution structures. These features are demonstrated by comparing the proposed design with some of the recently reported schemes. Index Terms—Discrete sine transform, systolic arrays, VLSI algorithms. I.
On LUT cascade realizations of FIR filters
 DSD2005, 8th Euromicro Conference on Digital System Design: Architectures, Methods and Tools
, 2005
"... This paper first defines the ninput qoutput WS function, as a mathematical model of the combinational part of the distributed arithmetic of a finite impulse response (FIR) filter. Then, it shows a method to realize the WS function by an LUT cascade with kinput qoutput cells. Furthermore, it 1) s ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
(Show Context)
This paper first defines the ninput qoutput WS function, as a mathematical model of the combinational part of the distributed arithmetic of a finite impulse response (FIR) filter. Then, it shows a method to realize the WS function by an LUT cascade with kinput qoutput cells. Furthermore, it 1) shows that LUT cascade realizations require much smaller memory than the single ROM realizations; 2) presents new design method for a WS function by arithmetic decomposition, and 3) shows design results of FIR filters using FPGAs with embedded memories.
T.Stouraitis, “Systolic algorithms and a memorybased design approach for a unified architecture for the computation of DCT/DST/IDCT/IDST
 IEEE Trans. Circuits Syst.I, Reg. Papers
, 2005
"... Abstract—In this paper, an efficient design approach for a unified very largescale integration (VLSI) implementation of the discrete cosine transform /discrete sine transform /inverse discrete cosine transform/inverse discrete sine transform based on an appropriate formulation of the four transform ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Abstract—In this paper, an efficient design approach for a unified very largescale integration (VLSI) implementation of the discrete cosine transform /discrete sine transform /inverse discrete cosine transform/inverse discrete sine transform based on an appropriate formulation of the four transforms into cyclic convolution structures is presented. This formulation allows an efficient memorybased systolic array implementation of the unified architecture using dualport ROMs and appropriate hardware sharing methods. The performance of the unified design is compared to that of some of the existing ones. It is found that the proposed design provides a superior performance in terms of the hardware complexity, speed, I/O costs, in addition to such features as regularity, modularity, pipelining capability, and local connectivity, which make the unified structure well suited for VLSI implementation. Index Terms—Forward and inverse cosine and sine transforms, memorybased implementation techniques, systolic arrays, very largescale integration (VLSI) algorithms. I.
Scalable and modular memorybased systolic architectures for discrete Hartley transform
 IEEE Trans. Circuits Syst. I, Reg. Papers
, 2006
"... Abstract In this paper, we present a design framework for scalable memorybased implementation of the discrete Hartley transform (DHT) using simple and efficient systolic and systoliclike structures for short and prime transform lengths, as well as, for lengths 4 and 8. We have used the proposed s ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Abstract In this paper, we present a design framework for scalable memorybased implementation of the discrete Hartley transform (DHT) using simple and efficient systolic and systoliclike structures for short and prime transform lengths, as well as, for lengths 4 and 8. We have used the proposed shortlength structures to construct highly modular architectures for higher transformlengths by a new primefactor implementation approach. The structures proposed for the primefactor DHT, interestingly, do not involve any transposition hardware/time. Besides, it is shown here that an Npoint DHT can be computed efficiently from two (N/2)point DHTs of its even and oddindexed input subsequences in a recursive manner using a ROMbased multiplication stage. Apart from flexibility of implementation, the proposed structures offer significantly lower areatime complexity compared with the existing structures. The proposed schemes of computation of the DHT can conveniently be scaled not only for higher transformlengths but also according to the hardware constraint or the throughput requirement of the application.