16 citations found. Retrieving documents...
A. Norton, A. J. Silberger, Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures, IEEE Trans. Comput. 36 (1987), pp. 581--591.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Parallel Algorithm Scalability Issues in Petaflops.. - Grama, Gupta, Han, Kumar (2000)   (1 citation)  (Correct)

....algorithm. As the analysis of Sections 3.3.1 and 3.3.2 will show, each of these formulations minimizes the cost due to one of these constants. 3.3. 1 The Parallel Binary exchange Algorithm In the most commonly used mapping that minimizes communication for the binary exchange algorithm [15, 23, 27, 28, 29], if (b 0 b 1 b r 1 ) is the binary representation of i, then for all i, R[i] and S[i] are mapped to processor number (b 0 b d 1 ) With this mapping, processors need to communicate with each other in the rst d iterations of the main loop (starting at line 3) of the algorithm. For ....

A. Norton and A. J. Silberger. Parallelization and performance analysis of the CooleyTukey FFT algorithm for shared memory architectures. IEEE Transactions on Computers, C-36(5):581-591, 1987.


Application-Specific Architecture For Fast Transforms Based On.. - Argüello (1994)   (Correct)

.... There are numerous interconnection networks for the PE column which permit an ecient use of this spatial parallelism in each stage: shu e exchange [75,72,60,89,36] shift and replace [68] hypercube [35,44,92] indirect binary n cube [62] cube connected cycles [63] mesh [44,52] among others [6,13,28,39,46,73,78,84,42 43,58]. We can combine both approaches by constructing a rectangular array made up of PE column pipelines [72] Consequently, the appropriate architecture for the algorithms based on the successive doubling method will be a rectangular array made up of log r N columns of N=r processors, connected so as ....

A. Norton and A.J. Silberger, \Parallelization and performance analysis of the Cooley-Tukey FFT algorithm for shared memory architectures", IEEE Trans. on Computers, Vol. C-36, No. 5, pp. 581-591, 1987.


The Scalability of FFT on Parallel Computers - Gupta, Kumar (1993)   (25 citations)  (Correct)

....applications. Some of the applications of the FFT algorithm include Time Series and Wave Analysis, solving Linear Partial Differential Equations, Convolution, Digital Signal Processing and Image Filtering, etc. Hence, there has been a great interest in implementing FFT on parallel computers [4, 6, 11, 14, 21, 32, 41, 5]. In this paper we analyze the scalability of the parallel FFT algorithm on mesh and hypercube connected multicomputers. We also present experimental performance results on a 1024 processor nCUBE1 T M multicomputer to support our analytical results. The scalability of a parallel algorithm on ....

....on a larger number of processors. The scalability analysis of FFT on hypercube provides several important insights. On the hypercube architecture, a commonly used parallel formulation of the FFT algorithm (which we shall refer to as the binary exchange algorithm in the rest of the paper) [3, 4, 6, 11, 21, 32, 41, 36, 31] can obtain linearly increasing speedup with respect to the number of processors with only a moderate increase in problem size. This is not surprising in the light of the fact that the FFT computation maps naturally to the hypercube architecture [35] However, there is a limit on the achievable ....

[Article contains additional citation context not shown here]

A. Norton and A. J. Silberger. Parallelization and performance analysis of the Cooley-Tukey FFT algorithm for shared memory architectures. IEEE Transactions on Computers, C-36(5):581--591, 1987.


Analysis and Design of Scalable Parallel Algorithms for Scientific .. - Gupta (1995)   (2 citations)  (Correct)

....applications. Some of the applications of the FFT algorithm include Time Series and Wave Analysis, solving Linear Partial Differential Equations, Convolution, Digital Signal Processing and Image Filtering, etc. Hence, there has been a great interest in implementing FFT on parallel computers [11, 17, 29, 56, 72, 106, 133, 13, 74, 19, 25, 3]. 3.1.1 The FFT Algorithm Figure 3.1 outlines the serial Cooley Tukey algorithm for an n point single dimensional unordered radix 2 FFT adapted from [4, 116] X is the input vector of length n (n = 2 r for some integer r) and Y is its Fourier Transform. # k denotes the complex number e j ....

.... b 0 0 0) S[ b 0 b l 1 1b l 1 b r 1 ) 9. end; 10. end; 11. end. Figure 3.1: The Cooley Tukey algorithm for single dimensional unordered FFT. The Binary Exchange Algorithm In the most commonly used mapping that minimizes communication for the binary exchange algorithm [81, 5, 11, 17, 29, 72, 106, 133, 116, 94], if (b 0 b 1 b r 1 ) is the binary representation of i, then for all i, R[i] and S[i] are mapped to processor number (b 0 b d 1 ) With this mapping, processors need to communicate with each other in the first d iterations of the main loop (starting at line 3) of the algorithm. ....

A. Norton and A. J. Silberger. Parallelization and performance analysis of the Cooley-Tukey FFT algorithm for shared memory architectures. IEEE Transactions on Computers, C36 (5):581--591, 1987. 160


Optimum Complexity FFT Algorithms for RISC Processors - Karner, Auer, Ueberhuber (1998)   (Correct)

....Frameworks for the Fast Fourier Transform . In the twenty five years between the publications of Pease [16] and Van Loan [20] only a few authors used this powerful technique: Temperton [18] and Johnson et al. 9] for FFT implementations on classic vector computers and Norton and Silberger [14] on parallel computers with MIMD architecture. Recently, Gupta [6] 7] and Pitsianis [17] used the Kronecker product formalism to synthesize FFT programs. As a consequence, the Kronecker product approach to FFT algorithm design antiquates more conventional techniques like signal flow graphs. ....

A. Norton, A. J. Silberger, Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures, IEEE Trans. Comput. 36 (1987), pp. 581--591.


Top Speed FFTs for FMA Architectures - Karner, Auer, Ueberhuber (1998)   (Correct)

....in his 1992 presentation of FFT algorithms. In the twenty five years between the publications of Pease [16] and Van Loan [21] only a few authors used this powerful technique: Temperton [19] and Johnson et al. 8] for FFT implementations on classic vector computers and Norton and Silberger [15] on parallel computers with MIMD architecture. Recently, Gupta [6] and Pitsianis [17] used Kronecker product formulations to synthesize FFT programs. The Kronecker product approach makes it easy to modify FFT algorithms by exploiting the underlying algebraic structure of its matrix representation. ....

A. Norton, A. J. Silberger, Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures, IEEE Trans. Comput. 36 (1987), pp. 581--591.


Programming Techniques for Eagersharing Distributed Memory Systems - Li   (Correct)

....Analysis Analysis and prediction of the performance of multiprocessor systems are complex tasks[19, 46] since many factors are involved. There are two different levels at which analysis can be carried out. One is the architectural, or system level[39, 11] and the other is the application level[19, 12, 4, 50, 41, 51]. Analysis at the system level studies system architecture in detail, then evaluates and predicts overall system performance in terms of queuing, clock cycles, and tasks. Analysis at the application level chooses some application problem and predicts its performance on a given system. In this ....

....because it has very exacting requirements for data sharing during execution. Parallel FFT algorithms have been studied in Pease s pioneering paper [43] and a more recent book [27] The performance of the FFT algorithm on shared memory systems and message passing systems has been studied extensively[41, 19, 9, 17, 52]. The analysis in this subsection will concentrate CHAPTER 3. PERFORMANCE ANALYSIS 37 on implementations for eager DSM systems. In this discussion, for simplicity, a one dimensional FFT algorithm is considered. The one dimensional Fourier transform of a sequence of M( 2 L ) complex numbers (x ....

Alan Norton and Allan J. Silberger. Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures. IEEE Transactions on Computers, C-36(5):581--591, May 1987.


Parallel FFT Algorithms with Reduced Communication Overhead - Karner, Ueberhuber (1998)   (Correct)

....Frameworks for the Fast Fourier Transform . In the twenty five years between the publications of Pease [15] and Van Loan [19] only a few authors used this powerful technique: Temperton [18] and Johnson et al. 11] for FFT implementations on classic vector computers and Norton and Silberger [14] on parallel computers with MIMD architecture. Recently, Gupta [9] and Pitsianis [16] used the Kronecker product formalism to synthesize FFT programs. As a consequence, the Kronecker product approach to FFT algorithm design antiquates more conventional techniques like signal flow graphs. Signal ....

A. Norton, A. J. Silberger, Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures, IEEE Trans. Comput. 36 (1987), pp. 581--591.


Parallel Computation of 2-D Continuous Wavelet Transforms - Misra   (Correct)

....log is used for log 2 . in stage i is between two nodes that are at a distance of N=2 i (1 i log N) from each other. The sequential complexity of this computation is therefore O(N log N ) However, the regular structure of the computation makes it very amenable to parallel implementation [1, 8, 21, 22, 25]. As described above, the 2D FFT can be computed by first computing 1D FFTs along the rows of the array, followed by 1D FFTs along the columns of the resulting array. This leads to a sequential time complexity of O(N 2 log N ) 4 Implementation on 2D Meshes The 2D mesh (Figure 3) presents us ....

A. Norton and A. J. Silberger. Parallelization and performance analysis of the CooleyTukey FFT algorithm for shared memory architectures. IEEE Transactions on Computing, C-36(5):581--591, 1987.


Parallel FFT Algorithms with Reduced Communication Overhead - Karner, Ueberhuber (1998)   (Correct)

No context found.

A. Norton, A. J. Silberger, Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures, IEEE Trans. Comput. 36 (1987), pp. 581--591.


Top Speed FFTs for FMA Architectures - Karner, Auer, Ueberhuber (1998)   (Correct)

No context found.

A. Norton, A. J. Silberger, Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures, IEEE Trans. Comput. 36 (1987), pp. 581--591.


Low Communication FFTs - Franchetti, Lorenz, Ueberhuber (2002)   (Correct)

No context found.

A. Norton, A. J. Silberger, Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures, IEEE Trans. Comput. 36 (1987), pp. 581-591.


Optimum Complexity FFT Algorithms for RISC Processors - Karner, Auer, Ueberhuber (1998)   (Correct)

No context found.

A. Norton, A. J. Silberger, Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures, IEEE Trans. Comput. 36 (1987), pp. 581--591.


Parallel 1D-FFT Computation on Constant-valence Multicomputers - Mazzeo, Villano (1995)   (Correct)

No context found.

A. Norton and A. J. Silberger, `Parallelization and performance analysis of the Cooley--Tukey FFT algorithm for shared-memory architectures', IEEE Trans. Comp., C-36, 581--591 (1987).


A New Fast Discrete Fourier Transform - Zhou   (Correct)

No context found.

A. Norton and A. J. Siberger,"Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-memoryArchitecture",IEEE Trans. Comp., Vol.C-36, No.5, pp.581-591, May 1987


Computation of 2-D Wavelet Transforms on the Connection Machine-2 - Misra, Nichols   (Correct)

No context found.

A. Norton and A. J. Silberger. Parallelization and performance analysis of the CooleyTukey FFT algorithm for shared memory architectures. IEEE Transactions on Computing, C-36(5):581--591, 1987.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC