Results

**1 - 7**of**7**### unknown title

"... Abstract — A reduced-complexity algorithm is presented for computation of the discrete Fourier transform, where N-point transform is computed from eight number of nearly (N/8)-point circular-convolution-like operations. A systolic architecture is also derived for VLSI implementation of the proposed ..."

Abstract
- Add to MetaCart

Abstract — A reduced-complexity algorithm is presented for computation of the discrete Fourier transform, where N-point transform is computed from eight number of nearly (N/8)-point circular-convolution-like operations. A systolic architecture is also derived for VLSI implementation of the proposed algorithm. The proposed architecture is fully-pipelined and contains regular and simple locally-connected processing elements. It is devoid of complex control structure and scalable for higher transform lengths. It is observed that the proposed systolic structure involves either less or nearly the same hardware-complexity compared with the corresponding existing systolic structures. Besides, it offers eight times more throughput and significantly low latency compared with the others. Index Terms — discrete Fourier transform, systolic array,

### Research Article Novel VLSI Algorithm and Architecture with Good Quantization Properties for a High-Throughput Area Efficient Systolic Array Implementation of DCT

"... Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Using a specific input-restructuring sequence, a new VLSI algorithm and architecture have been derived for a high throughput memory-based systolic array VL ..."

Abstract
- Add to MetaCart

(Show Context)
Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Using a specific input-restructuring sequence, a new VLSI algorithm and architecture have been derived for a high throughput memory-based systolic array VLSI implementation of a discrete cosine transform. The proposed restructuring technique transforms the DCT algorithm into a cycle-convolution and a pseudo-cycle convolution structure as basic computational forms. The proposed solution has been specially designed to have good fixed-point error performances that have been exploited to further reduce the hardware complexity and power consumption. It leads to a ROM based VLSI kernel with good quantization properties. A parallel VLSI algorithm and architecture with a good fixed point implementation appropriate for a memory-based implementation have been obtained. The proposed algorithm can bemapped onto two linear systolic arrays with similar length and form. They can be further efficiently merged into a single array using an appropriate hardware sharing technique. A highly efficient VLSI chip can be thus obtained with appealing features as good architectural topology, processing speed, hardware complexity and I/O costs. Moreover, the proposed solution substantially reduces the hardware overhead involved by the pre-processing stage that for short length DCT consumes an important percentage of the chip area. 1.

### Realization of Prime-Length Discrete Sine Transform Using Cyclic Convolution

"... This paper presents a new algorithm for the implementation of an N-point prime-length discrete sine transform (DST) through cyclic convolution. The proposed algorithm is based on the idea of reformulating prime N-length DST into two �� � �1�/2�- point cyclic convolutions. Thus, the hardware complexi ..."

Abstract
- Add to MetaCart

This paper presents a new algorithm for the implementation of an N-point prime-length discrete sine transform (DST) through cyclic convolution. The proposed algorithm is based on the idea of reformulating prime N-length DST into two �� � �1�/2�- point cyclic convolutions. Thus, the hardware complexity can be reduced. This cyclic convolution –based algorithm is used to obtain a simple systolic array for pipelined implementation of the DST. This algorithm preserves all the benefits of very large-scale integration algorithms based on cyclic convolution or circular convolution, such as regular and simple structure. The convolutions play a significant role in digital signal processing due to their nature of easy implementation.

### SUBMITTED TO IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I: REGULAR PAPERS 1 Systolic and Super-Systolic Multipliers for Finite Field GF (2m) Based on Irreducible Trinomials

"... Abstract — Novel systolic and super-systolic architectures are presented for polynomial basis multiplication over GF (2m) based on irreducible trinomials. By suitable cut-set retiming, we have derived here an efficient bit-level-pipelined bit-parallel systolic design for binary field multiplication ..."

Abstract
- Add to MetaCart

Abstract — Novel systolic and super-systolic architectures are presented for polynomial basis multiplication over GF (2m) based on irreducible trinomials. By suitable cut-set retiming, we have derived here an efficient bit-level-pipelined bit-parallel systolic design for binary field multiplication which requires less number of gates and registers, and involves nearly half the time-complexity of the corresponding existing design. We have also suggested a digit-level-pipelined design, which involves lower latency, and less number of registers compared with the bit-level pipelined structure. Moreover, we have proposed a super-systolic design consisting of a set of systolic arrays in a systolic-pipeline, and a pipelined systolic-block design consisting of a pipelined blocks of concurrent systolic arrays. The super-systolic designs have the same average computation time and the same critical path as the proposed bit-level-pipelined design, but can be used to reduce the latency by a factor O( m) at the cost of marginally higher number of XOR gates and bit-registers. The hardware-complexities of proposed super-systolic designs are nearly three times that of the existing bit-parallel structures, but offer very high-throughput compared with the others for large values of m. For the field order m = 233 and m = 409, the proposed structures offer, respectively, ten and eleven times more throughput than the others. Index Terms — Finite field, Galois field, finite field multiplica-tion, elliptic curve cryptography, error control coding, irreducible

### IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1 Parallel and Pipeline Architectures for High- Throughput Computation of Multilevel 3-D DWT

"... Abstract—In this paper, we present a throughput-scalable parallel and pipeline architecture for high-throughput compu-tation of multilevel three-dimensional discrete wavelet transform (3-D DWT). The computation of 3-D DWT for each level of decomposition is split into three distinct stages, and all t ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract—In this paper, we present a throughput-scalable parallel and pipeline architecture for high-throughput compu-tation of multilevel three-dimensional discrete wavelet transform (3-D DWT). The computation of 3-D DWT for each level of decomposition is split into three distinct stages, and all the three stages are implemented in parallel by a processing unit consisting of an array of processing modules. The processing unit for the first level decomposition of a video stream of frame-size (M×N) consists of Q/2 processing modules, where Q is the number of input samples available to the structure in each clock cycle. The processing unit for a higher level of decomposition requires 1/8 times the number of processing modules required by the processing unit for its preceding level. For J level 3-D DWT of a video stream, each of the proposed structures involves J processing units in a cascaded pipeline. The proposed structures

### An Arbitrary-length and Multiplierless DCT Algorithm and Systolic Implementation

"... Abstract—Discrete Cosine transform (DCT) is an important tool in digital signal processing. In this paper, a novel algorithm to perform DCT multiplierlessly is proposed. First, by modular mapping and truncating Taylor series expansion, the DCT is expressed in the form of the product of the constants ..."

Abstract
- Add to MetaCart

Abstract—Discrete Cosine transform (DCT) is an important tool in digital signal processing. In this paper, a novel algorithm to perform DCT multiplierlessly is proposed. First, by modular mapping and truncating Taylor series expansion, the DCT is expressed in the form of the product of the constants and discrete moments. Second, by performing appropriate bit operations and shift operations in binary system, the product can be transformed to some additions of integers. The proposed algorithm only involves integer additions and shifts because the discrete moments can be computed only by integer additions. An efficient and regular systolic array is designed to implement the proposed algorithm, and the complexity analysis is also given. Different to other fast Cosine transforms, our algorithm can deal with arbitrary length signals and get high precision. The approach is also applicable to multi-dimensional DCT and DCT inverses. Index Terms—discrete Cosine transform, moments, multiplierless, systolic arrays

### Optimization of memory based multiplication for LUT

"... The multiplier uses LUT’s as memory for their computations. The antisymmetric product coding (APC) and odd-multiple-storage (OMS) techniques were proposed for look-up-table (LUT) design. The APC and OMS techniques used for efficient memory-based multiplication. Therefore the combined approach provid ..."

Abstract
- Add to MetaCart

The multiplier uses LUT’s as memory for their computations. The antisymmetric product coding (APC) and odd-multiple-storage (OMS) techniques were proposed for look-up-table (LUT) design. The APC and OMS techniques used for efficient memory-based multiplication. Therefore the combined approach provides a reduction in LUT size to one-fourth of the conventional LUT. APC approach is combined with the OMS technique, the two’s complement operations could be very much simplified since the input address and LUT output could always be transformed into odd integers. The proposed LUT design for small input sizes can be used for efficient implementation of high precision multiplication by input operand decomposition. The LUT based multiplier involves 5-bit word size. The area and delay can be improved. In multiplier if we reduce the number of LUT then delay can be reduced. Memory-based computing is well suited for many digital signal processing (DSP) algorithms, which involve multiplication with a fixed set of coefficients.