Results 1 
6 of
6
Exploiting Inherent Parallelisms for Accelerating Linear Hough Transform
 IEEE Transactions on Image Processing
"... Abstract—Accelerating Hough transform in hardware has been of interest due its popularity in realtime capable image processing applications. In most existing linear Hough transform architectures, an edge map is serially read for processing, resulting in a total computation time of at least cycle ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract—Accelerating Hough transform in hardware has been of interest due its popularity in realtime capable image processing applications. In most existing linear Hough transform architectures, an edge map is serially read for processing, resulting in a total computation time of at least cycles. In this paper, we propose a novel parallel Hough transform computation method called the Additive Hough transform (AHT), wherein the image is divided using a grid to reduce the total computation time by a factor of . We have also proposed an efficient implementation of the AHT consisting of a lookup table (LUT) and twooperand adder arrays for every angle. Techniques to condense the LUT size have also been proposed to further reduce area utilization by as much as 50%. Our investigations based on employing an 8 8 grid shows a 1000 speedup compared to existing architectures for a range of image sizes. Areatime tradeoff analysis has been presented to demonstrate that the areatime product of the proposed AHTbased implementation is at least 43 % lower than other implementations reported in the literature. We have also included and characterized a hierarchical addition step in order to generate a global accumulation space equivalent to that of the conventional HT. It is shown that the proposed implementation with the hierarchical addition step remains superior to other methods in terms of both performance and areatime product metrics. Finally, we show that the proposed solution is equally efficient when applied on rectangular images. I.
High throughput memorybased architecture for DHT using a new convolutional formulation
 IEEE Trans. Circuits Syst. II, Exp. Briefs
, 2007
"... Abstract—A new formulation is presented for the computation of an Npoint DHT from two pairs of [(N/2 − 1)/2]point cyclic convolutions, and further used to obtain modular structures consisting of simple and regular memorybased systolic arrays for concurrent pipelined realization of the DHT. The pr ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract—A new formulation is presented for the computation of an Npoint DHT from two pairs of [(N/2 − 1)/2]point cyclic convolutions, and further used to obtain modular structures consisting of simple and regular memorybased systolic arrays for concurrent pipelined realization of the DHT. The proposed structures for directmemorybased implementation is found to involve nearly the same hardware complexity as those of the existing structures, but offers two to four times more throughput and two to four times less latency compared with others. The distributedarithmetic (DA)based implementation is also found to offer very less memorycomplexity and considerably low areadelay complexity compared with the existing DAbased structures.
SUBMITTED TO IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS 1 Systolic and SuperSystolic Multipliers for Finite Field GF (2m) Based on Irreducible Trinomials
"... Abstract — Novel systolic and supersystolic architectures are presented for polynomial basis multiplication over GF (2m) based on irreducible trinomials. By suitable cutset retiming, we have derived here an efficient bitlevelpipelined bitparallel systolic design for binary field multiplication ..."
Abstract
 Add to MetaCart
Abstract — Novel systolic and supersystolic architectures are presented for polynomial basis multiplication over GF (2m) based on irreducible trinomials. By suitable cutset retiming, we have derived here an efficient bitlevelpipelined bitparallel systolic design for binary field multiplication which requires less number of gates and registers, and involves nearly half the timecomplexity of the corresponding existing design. We have also suggested a digitlevelpipelined design, which involves lower latency, and less number of registers compared with the bitlevel pipelined structure. Moreover, we have proposed a supersystolic design consisting of a set of systolic arrays in a systolicpipeline, and a pipelined systolicblock design consisting of a pipelined blocks of concurrent systolic arrays. The supersystolic designs have the same average computation time and the same critical path as the proposed bitlevelpipelined design, but can be used to reduce the latency by a factor O( m) at the cost of marginally higher number of XOR gates and bitregisters. The hardwarecomplexities of proposed supersystolic designs are nearly three times that of the existing bitparallel structures, but offer very highthroughput compared with the others for large values of m. For the field order m = 233 and m = 409, the proposed structures offer, respectively, ten and eleven times more throughput than the others. Index Terms — Finite field, Galois field, finite field multiplication, elliptic curve cryptography, error control coding, irreducible
An Analog Architecture for SplitRadix DHT
"... The fast Hartley transform and algorithm for DHT was introduced by Bracewell. The split radix decimationinfrequency algorithm that requires less number of operation counts as compared to the radix2 and radix4 algorithms was developed by Sorenson et al. In this paper, an analog architecture for a ..."
Abstract
 Add to MetaCart
(Show Context)
The fast Hartley transform and algorithm for DHT was introduced by Bracewell. The split radix decimationinfrequency algorithm that requires less number of operation counts as compared to the radix2 and radix4 algorithms was developed by Sorenson et al. In this paper, an analog architecture for a split radix decimationintime algorithm is proposed. It utilizes three different structures in the signal flow diagram. It exhibits a recursive pattern and is modular. The validity of the analog architecture is tested by simulating it with the help of the Orcad PSpice.
unknown title
"... Abstract—This paper presents a fast splitradix(2×2)/(8×8) algorithm for computing the twodimensional (2D) discrete Hartley transform (DHT) of length N×N with N = q*2m, where q is an odd integer. The proposed algorithm decomposes an N×N DHT into one N/2×N/2 DHT and fortyeight N/8×N/8 DHTs. It ac ..."
Abstract
 Add to MetaCart
Abstract—This paper presents a fast splitradix(2×2)/(8×8) algorithm for computing the twodimensional (2D) discrete Hartley transform (DHT) of length N×N with N = q*2m, where q is an odd integer. The proposed algorithm decomposes an N×N DHT into one N/2×N/2 DHT and fortyeight N/8×N/8 DHTs. It achieves an efficient reduction on the number of arithmetic operations, data transfers and twiddle factors compared to the splitradix(2×2)/(4×4) algorithm. Moreover, the characteristic of expression in simple matrices leads to an easy implementation of the algorithm. If implementing the above two algorithms with fully parallel structure in hardware, it seems that the proposed algorithm can decrease the area complexity compared to the splitradix(2×2)/(4×4) algorithm, but requires a little more time complexity. An application of the proposed algorithm to 2D medical image compression is also provided. Index Terms—Twodimensional (2D) discrete Hartley transform (DHT), splitradix, fast algorithm I.
Modular delay Commutator for DHT algorithm
"... Abstract — In this paper, a new VLSI DHT algorithm that is well suited for a VLSI implementation on a highly parallel and modular architecture is proposed. It can be used for designing a completely novel VLSI architecture for DHT. Proposed discrete Hartley transform (DHT) that can be efficiently imp ..."
Abstract
 Add to MetaCart
Abstract — In this paper, a new VLSI DHT algorithm that is well suited for a VLSI implementation on a highly parallel and modular architecture is proposed. It can be used for designing a completely novel VLSI architecture for DHT. Proposed discrete Hartley transform (DHT) that can be efficiently implemented on a highly modular and parallel VLSI architecture having a regular structure is presented. The DHT algorithm can be efficiently split on several parallel parts that can be executed concurrently. Moreover, the proposed algorithm is well suited for the subexpression sharing technique that can be used to significantly reduce the hardware complexity of the highly parallel VLSI implementation. Using the advantages of the proposed algorithm and the fact that we can efficiently share the multipliers with the same constant, the number of the multipliers has been significantly reduced such that the number of multipliers is very small comparing with that of the existing algorithms.