Results 1  10
of
50
Faster Integer Multiplication
 STOC'07
, 2007
"... For more than 35 years, the fastest known method for integer multiplication has been the SchönhageStrassen algorithm running in time O(n log n log log n). Under certain restrictive conditions there is a corresponding Ω(n log n) lower bound. The prevailing conjecture has always been that the complex ..."
Abstract

Cited by 89 (0 self)
 Add to MetaCart
For more than 35 years, the fastest known method for integer multiplication has been the SchönhageStrassen algorithm running in time O(n log n log log n). Under certain restrictive conditions there is a corresponding Ω(n log n) lower bound. The prevailing conjecture has always been that the complexity of an optimal algorithm is Θ(n log n). We present a major step towards closing the gap from above by presenting an algorithm running in time n log n 2 O(log ∗ n). The main result is for boolean circuits as well as for multitape Turing machines, but it has consequences to other models of computation as well.
Algebraic signal processing theory: CooleyTukey type algorithms for real DFTs
 IEEE TRANSACTIONS ON SIGNAL PROCESSING
, 2009
"... In this paper we systematically derive a large class of fast generalradix algorithms for various types of real discrete Fourier transforms (real DFTs) including the discrete Hartley transform (DHT) based on the algebraic signal processing theory. This means that instead of manipulating the trans ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
In this paper we systematically derive a large class of fast generalradix algorithms for various types of real discrete Fourier transforms (real DFTs) including the discrete Hartley transform (DHT) based on the algebraic signal processing theory. This means that instead of manipulating the transform definition, we derive algorithms by manipulating the polynomial algebras underlying the transforms using one general method. The same method yields the wellknown CooleyTukey fast Fourier transform (FFT) as well as general radix discrete cosine and sine transform algorithms. The algebraic approach makes the derivation concise, unifies and classifies many existing algorithms, yields new variants, enables structural optimization, and naturally produces a humanreadable structural algorithm representation based on the Kronecker product formalism. We show, for the first time, that the generalradix CooleyTukey and the lesser known Bruun algorithms are instances of the same generic algorithm. Further, we show that this generic algorithm can be instantiated for all four types of the real DFT and the DHT.
Orthogonal FrequencyDivision Multiplexing for Optical Communications
, 2011
"... Abstract—We discuss the use of orthogonal frequencydivision multiplexing (OFDM) for combating groupvelocity dispersion (GVD) effects in amplified directdetection (DD) systems using singlemode fiber. We review known OFDM techniques, including asymmetrically clipped optical OFDM (ACOOFDM), DCcl ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
Abstract—We discuss the use of orthogonal frequencydivision multiplexing (OFDM) for combating groupvelocity dispersion (GVD) effects in amplified directdetection (DD) systems using singlemode fiber. We review known OFDM techniques, including asymmetrically clipped optical OFDM (ACOOFDM), DCclipped OFDM (DCOFDM) and singlesideband OFDM (SSBOFDM), and derive a linearized channel model for each technique. We present an iterative procedure to achieve optimum power allocation for each OFDM technique, since there is no closedform solution for amplified DD systems. For each technique, we minimize the optical power required to transmit at a given bit rate and normalized GVD by iteratively adjusting the bias and optimizing the power allocation among the subcarriers. We verify that SSBOFDM has the best optical power efficiency among the different OFDM techniques. We compare these OFDM techniques to onoff keying (OOK) with maximumlikelihood sequence detection (MLSD) and show that SSBOFDM can achieve the same optical power efficiency as OOK with MLSD, but at the cost of requiring twice the electrical bandwidth and also a complex quadrature modulator. We compare the computational complexity of the different techniques and show that SSBOFDM requires fewer operations per bit than OOK with MLSD. Index Terms—Communications system performance, directdetection, groupvelocity dispersion, intensity modulation, maximumlikelihood sequence detection, maximumlikelihood sequence estimation, multicarrier optical systems, orthogonal frequencydivision multiplexing. I.
Peaktoaverage power ratio reduction of OFDM systems using cross entropy method
 in 17th Int. Conf. on Wireless Commun
, 2005
"... copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. The author reserves all other publication and other rights in association with the copyright in the thesis, and except as herein before provided, neither the thesis nor any substantial ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. The author reserves all other publication and other rights in association with the copyright in the thesis, and except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatever without the author’s prior written permission.
How to write fast numerical code: A Small Introduction
 IN SUMMER SCHOOL ON GENERATIVE AND TRANSFORMATIONAL TECHNIQUES IN SOFTWARE ENGINEERING, VOLUME 5235 OF LNCS
, 2008
"... ..."
EFFICIENT DEALIASED CONVOLUTIONS WITHOUT PADDING ∗
"... Abstract. Algorithms are developed for calculating dealiased linear convolution sums without the expense of conventional zeropadding or phaseshift techniques. For onedimensional inplace convolutions, the memory requirements are identical with the zeropadding technique, with the important distin ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Algorithms are developed for calculating dealiased linear convolution sums without the expense of conventional zeropadding or phaseshift techniques. For onedimensional inplace convolutions, the memory requirements are identical with the zeropadding technique, with the important distinction that the additional work memory need not be contiguous with the input data. This decoupling of data and work arrays dramatically reduces the memory and computation time required to evaluate higherdimensional inplace convolutions. The technique also allows one to dealias the higherorder convolutions that arise from Fourier transforming cubic and higher powers. Implicitly dealiased convolutions can be built on top of stateoftheart fast Fourier transform libraries: vectorized multidimensional implementations for the complex and centered Hermitian (pseudospectral) cases have been implemented in the opensource software FFTW++.
TypeII/III DCT/DST algorithms with reduced number of arithmetic operations
, 2009
"... We present algorithms for the discrete cosine transform (DCT) and discrete sine transform (DST), of types II and III, that achieve a lower count of real multiplications and additions than previously published algorithms, without sacrificing numerical accuracy. Asymptotically, the operation count is ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
We present algorithms for the discrete cosine transform (DCT) and discrete sine transform (DST), of types II and III, that achieve a lower count of real multiplications and additions than previously published algorithms, without sacrificing numerical accuracy. Asymptotically, the operation count is reduced from 2N log2 N + O(N) to 17 9 N log2 N + O(N) for a poweroftwo transform size N. Furthermore, we show that an additional N multiplications may be saved by a certain rescaling of the inputs or outputs, generalizing a wellknown technique for N = 8 by Arai et al. These results are derived by considering the DCT to be a special case of a DFT of length 4N, with certain symmetries, and then pruning redundant operations from a recent improved fast Fourier transform algorithm (based on a recursive rescaling of the conjugatepair split radix algorithm). The improved algorithms for the DCTIII, DSTII, and DSTIII follow immediately from the improved count for the DCTII.
The tangent fft
 In Boztas and Lu
, 2007
"... Abstract. The splitradix FFT computes a sizen complex DFT, when n is a large power of 2, using just 4n lgn−6n+8 arithmetic operations on real numbers. This operation count was first announced in 1968, stood unchallenged for more than thirty years, and was widely believed to be best possible. Recen ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract. The splitradix FFT computes a sizen complex DFT, when n is a large power of 2, using just 4n lgn−6n+8 arithmetic operations on real numbers. This operation count was first announced in 1968, stood unchallenged for more than thirty years, and was widely believed to be best possible. Recently James Van Buskirk posted software demonstrating that the splitradix FFT is not optimal. Van Buskirk’s software computes a sizen complex DFT using only (34/9 + o(1))n lg n arithmetic operations on real numbers. There are now three papers attempting to explain the improvement from 4 to 34/9: Johnson and Frigo, IEEE Transactions on Signal Processing, 2007; Lundy and Van Buskirk, Computing, 2007; and this paper. This paper presents the “tangent FFT, ” a straightforward inplace cachefriendly DFT algorithm having exactly the same operation counts as Van Buskirk’s algorithm. This paper expresses the tangent FFT as a sequence of standard polynomial operations, and pinpoints how the tangent FFT saves time compared to the splitradix FFT. This description is helpful not only for understanding and analyzing Van Buskirk’s improvement but also for minimizing the memoryaccess costs of the FFT.
On the fixedpoint accuracy analysis of FFT algorithms
 IEEE Trans. Signal Process
, 2008
"... Abstract—In this paper, we investigate the effect of fixedpoint arithmetics with limited precision for different fast Fourier transform (FFT) algorithms. A matrix representation of error propagation model is proposed to analyze the rounding effect. An analytic expression of overall quantization l ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper, we investigate the effect of fixedpoint arithmetics with limited precision for different fast Fourier transform (FFT) algorithms. A matrix representation of error propagation model is proposed to analyze the rounding effect. An analytic expression of overall quantization loss due to the arithmetic quantization errors is derived to compare the performance with decimationintime (DIT) and decimationinfrequency (DIF) configurations. From the simulation results, the radix2 DIT FFT algorithm has better accuracy in term of signaltoquantizationnoise ratio (SQNR). Based on the results, a simple criterion of wordlength optimization is proposed to yield comparable accuracy with fewer bit budget.
The fast Fourier transform and fast wavelet transform for patterns on the torus
"... We introduce a fast Fourier transform on regular ddimensional lattices. We investigate properties of congruence class representants, i.e. their ordering, to classify directions and derive a CooleyTukeyAlgorithm. Despite the fast Fourier techniques itself, there is also the advantage of this tran ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
We introduce a fast Fourier transform on regular ddimensional lattices. We investigate properties of congruence class representants, i.e. their ordering, to classify directions and derive a CooleyTukeyAlgorithm. Despite the fast Fourier techniques itself, there is also the advantage of this transform to be parallelized efficiently, yielding faster versions than the onedimensional Fourier transform. These properties of the lattice can further be used to perform a fast multivariate wavelet decomposition, where the wavelets are given as trigonometric polynomials. Furthermore the preferred directions of the decomposition itself can be characterised.