Results 1  10
of
89
A pipeline FFT processor
 In IEEE Workshop on Signal Processing Systems
, 1999
"... Abstract: In this paper, we discuss the design and implementation of a highspeed, low power 1024point pipeline FFT processor. Key features are flexible internal data length and a novel processing element. The FFT processor, which is implemented in a standard 0.35 μm CMOS process, is efficient in t ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
Abstract: In this paper, we discuss the design and implementation of a highspeed, low power 1024point pipeline FFT processor. Key features are flexible internal data length and a novel processing element. The FFT processor, which is implemented in a standard 0.35 μm CMOS process, is efficient in term of power consumption and chip area. 1.
Highspeed recursive filter structures composed of identical allpass subfilters for interpolation, decimation, and QMF banks with perfect magnitude reconstruction
 IEEE Trans. Circuits Systems II
, 1999
"... Abstract—Highspeed recursive filter structures for interpolation and decimation with factors of two, and quadrature mirror filter (QMF) banks with perfect magnitude reconstruction, are proposed. The structures are composed of identical allpass subfilters that are interconnected via extra multiplie ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
(Show Context)
Abstract—Highspeed recursive filter structures for interpolation and decimation with factors of two, and quadrature mirror filter (QMF) banks with perfect magnitude reconstruction, are proposed. The structures are composed of identical allpass subfilters that are interconnected via extra multipliers. For the case of interpolation and decimation filters, the overall transfer function corresponds in the simplest case to several halfband infiniteimpulse response (IIR) filters in cascade. To achieve a smaller passband ripple than for a cascade design, a design procedure that has been used earlier for singlerate filters is used. In this approach, the design is split into designs of a prototype finiteimpulse response (FIR) filter and a halfband IIR filter. For the case of QMF banks, the design is again separated into designs of a prototype FIR filter and a halfband IIR filter. One major advantage of the proposed filter structures over the corresponding conventional (halfband filter) structures is that the required coefficient word length for the allpass filters is substantially reduced, implying that the maximal sample frequency can be substantially increased for a given VLSI technology. Further, for interpolation and decimation, the arithmetic complexity may be reduced in comparison with both the conventional structures and straightforward cascade structures. Simple recurrence formulas for computation of the interconnecting multipliers, given the overall transfer function, are derived. Several examples are included which compare the proposed structures with the corresponding conventional and straightforward cascade structures.
Design of Transport Triggered Architecture Processor for Discrete Cosine Transform
 in 15th Annual IEEE International ASIC/SOC Conference
, 2002
"... Transport Triggered Architecture (TTA) offers a costeffective tradeoff between the size and performance of ASICs and the programmability of generalpurpose processors. In this paper TTA processors for the RC4 and AES encryption algorithms of the new IEEE 802.11i WLAN security standard are designed. ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
Transport Triggered Architecture (TTA) offers a costeffective tradeoff between the size and performance of ASICs and the programmability of generalpurpose processors. In this paper TTA processors for the RC4 and AES encryption algorithms of the new IEEE 802.11i WLAN security standard are designed. Special operations efficiently supporting the ciphers are developed. The TTA design flow is utilized for finding configurations with the best performancesize ratios. The size of the configuration supporting both the algorithms is 69.4 kgates and the throughput 100 Mb/s for RC4 and 68.5 Mb/s for AES at 100 MHz in the 0.13 µm CMOS technology. Compared to commercial processors of the same wireless application domain, higher throughputs are achieved at significantly smaller area and lower clock speed, which also results in decreased energy consumption. 1.
Implementation of lowcomplexity FIR filters using a minimum spanning tree
 in Proc. IEEE Mediterranean Electrotechnical Conf
, 2004
"... In this paper we discuss implementation of lowcomplexity FIR filters using difference methods. By realizing the differences between coefficients and from them the actual coefficients, the complexity of the filter implementations can be reduced. Here two different methods are proposed for selecting ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
(Show Context)
In this paper we discuss implementation of lowcomplexity FIR filters using difference methods. By realizing the differences between coefficients and from them the actual coefficients, the complexity of the filter implementations can be reduced. Here two different methods are proposed for selecting the differences. Both methods can be implemented with low execution times, making it possible to include them in the search for quantized filter coefficients. 1.
A digital down converter for a wideband radar receiver
 In Proc. National Conf. Radio Science
, 2002
"... In a conventional radar receiver errors are introduced due to mismatch between the ADCs and gain and phase mismatch in the I/Q demodulation. By performing the I/Q demodulation in the digital domain some of these errors are alleviated. In this paper several different filter structures suitable for di ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
(Show Context)
In a conventional radar receiver errors are introduced due to mismatch between the ADCs and gain and phase mismatch in the I/Q demodulation. By performing the I/Q demodulation in the digital domain some of these errors are alleviated. In this paper several different filter structures suitable for digital I/Q demodulation are evaluated. Three different structures for such a digital down converter has been considered and their implementation properties has been examined. A solution combining an FIR filter with a wave digital filter is found to be the most efficient solution. 1
A Complex Multiplier Using “OverturnedStairs” Adder Tree,” to appear at
 IEEE Inter. Conference on Electronics, Circuits and Systems (ICECS
, 1999
"... In this paper we describe a new complex multiplier based on Distributed Arithmetic (DA) using an overturnedstairs adder tree (OStree). The OStree yields the same speed as the optimal Wallace tree, but has the advantage of a regular layout. A 17×13 complex multiplier is implemented with Mietec ™ 0 ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
(Show Context)
In this paper we describe a new complex multiplier based on Distributed Arithmetic (DA) using an overturnedstairs adder tree (OStree). The OStree yields the same speed as the optimal Wallace tree, but has the advantage of a regular layout. A 17×13 complex multiplier is implemented with Mietec ™ 0.35 µm standard CMOS technology. It can execute 30 Mmult./s and dissipate about 15 mW at 25 Mmult./s while operating at 1.5 V. 1.
GAARP: A PowerAware GALS Architecture for RealTime AlgorithmSpecific Tasks
 IEEE Trans. on Computers
, 2005
"... Abstract—Reducing the energy consumption of a realtime system has emerged as an important design concern. In this paper, we propose GAARP, an adaptive scalable architecture targeted toward algorithmspecific tasks for justintime performance using the right amount of power. The architecture consis ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Reducing the energy consumption of a realtime system has emerged as an important design concern. In this paper, we propose GAARP, an adaptive scalable architecture targeted toward algorithmspecific tasks for justintime performance using the right amount of power. The architecture consists of Globally Asynchronous and Locally Synchronous (GALS) building blocks, where the processing hardware is realized by a set of smaller slices of similar structure, each running synchronously with independent clocks. We demonstrate that, for different realtime commercial applications with algorithmspecific jobs like online transaction processing, digital filtering, Fourier transform, etc., the proposed architecture allows dynamic loadbalancing and adaptive intertask voltage scaling based on the load in each of the processing units. Compared to a synchronous implementation of the same functionality, we show that the proposed hardware can achieve higher efficiency in terms of power and performance by exploiting the flexibility to balance the load and change the supply voltage. The architecture also lends itself to process tolerance since it can detect processshifts for the individual processing units and determine the appropriate operating voltage/frequency for each unit. Simulation results for two representative applications show that, for a modest system configuration and random job distribution, we obtain up to 67 percent improvement in MOPS/W (millions of operations per second per watt) over a fully synchronous implementation. Index Terms—Asynchronous/synchronous operations, algorithms implemented in hardware, fault tolerance, energyaware systems. 1
Lowcomplexity constant coefficient matrix multiplication using a minimum spanning tree approach
, 2004
"... In this paper a novel approach for realizing constant coefficient matrix multiplication using few additions and subtractions is proposed. This method is applicable in, e.g., FIR filter banks, transforms, and polyphase form FIR filters for sample rate changes. Examples show that the proposed method y ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
In this paper a novel approach for realizing constant coefficient matrix multiplication using few additions and subtractions is proposed. This method is applicable in, e.g., FIR filter banks, transforms, and polyphase form FIR filters for sample rate changes. Examples show that the proposed method yields good results compared to realizing the matrix multiplication by utilizing multiple coefficient multiplication techniques for the rows or columns separately.
Digital Hilbert transformers composed of identical allpass subfilters
 Proc. IEEE Int. Symposium on Circuits and Systems
, 1998
"... In this paper we introduce digital Hilbert transformers composed of identical allpass subÞlters that are interconnected via extra multipliers. In the simplest case the overall transfer function corresponds to M Hilbert transformers in cascade where each transformer is derived by frequency shift of a ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
In this paper we introduce digital Hilbert transformers composed of identical allpass subÞlters that are interconnected via extra multipliers. In the simplest case the overall transfer function corresponds to M Hilbert transformers in cascade where each transformer is derived by frequency shift of a halfband IIR Þlter. The overall transformer can also be designed to have a smaller passband ripple than for a pure cascade design. The values for the extra multipliers are computed using simple recurrence formulas. One advantage of the proposed structures is that the sensitivity to coefÞcient errors of the recursive parts, i.e. of the allpass subÞlters, are lower than for the corresponding conventional structures. One consequence of this is that the maximal sample frequency is higher for the new structures. Since the coefÞcients of the allpass subÞlters in the new structures are shorter, the overall arithmetic complexity can under certain conditions also be reduced. Examples are included that demonstrate this. 1.
Multiple Constant Multiplication for DigitSerial Implementation of Low Power FIR Filters
"... Abstract: Multiple constant multiplication (MCM) is an efficient way of implementing several constant multiplications with the same input data. The coefficients are expressed using shifts, adders, and subtracters. By utilizing redundancy between the coefficients the number of adders and subtracters ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract: Multiple constant multiplication (MCM) is an efficient way of implementing several constant multiplications with the same input data. The coefficients are expressed using shifts, adders, and subtracters. By utilizing redundancy between the coefficients the number of adders and subtracters is reduced resulting in a low complexity implementation. However, for digitserial arithmetic a shift requires a flipflop, and, hence, the number of shifts should be taken into consideration as well. In this work we investigate the area, speed, power tradeoffs for implementation of FIR filters using MCM and digitserial arithmetic. We also introduce an algorithm for reducing both the number of adders and subtracters as well as the number of shifts.