| A. Norton, A. J. Silberger, Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures, IEEE Trans. Comput. 36 (1987), pp. 581--591. |
....algorithm. As the analysis of Sections 3.3.1 and 3.3.2 will show, each of these formulations minimizes the cost due to one of these constants. 3.3. 1 The Parallel Binary exchange Algorithm In the most commonly used mapping that minimizes communication for the binary exchange algorithm [15, 23, 27, 28, 29], if (b 0 b 1 b r 1 ) is the binary representation of i, then for all i, R[i] and S[i] are mapped to processor number (b 0 b d 1 ) With this mapping, processors need to communicate with each other in the rst d iterations of the main loop (starting at line 3) of the algorithm. For ....
A. Norton and A. J. Silberger. Parallelization and performance analysis of the CooleyTukey FFT algorithm for shared memory architectures. IEEE Transactions on Computers, C-36(5):581-591, 1987.
.... There are numerous interconnection networks for the PE column which permit an ecient use of this spatial parallelism in each stage: shu e exchange [75,72,60,89,36] shift and replace [68] hypercube [35,44,92] indirect binary n cube [62] cube connected cycles [63] mesh [44,52] among others [6,13,28,39,46,73,78,84,42 43,58]. We can combine both approaches by constructing a rectangular array made up of PE column pipelines [72] Consequently, the appropriate architecture for the algorithms based on the successive doubling method will be a rectangular array made up of log r N columns of N=r processors, connected so as ....
A. Norton and A.J. Silberger, \Parallelization and performance analysis of the Cooley-Tukey FFT algorithm for shared memory architectures", IEEE Trans. on Computers, Vol. C-36, No. 5, pp. 581-591, 1987.
....applications. Some of the applications of the FFT algorithm include Time Series and Wave Analysis, solving Linear Partial Differential Equations, Convolution, Digital Signal Processing and Image Filtering, etc. Hence, there has been a great interest in implementing FFT on parallel computers [4, 6, 11, 14, 21, 32, 41, 5]. In this paper we analyze the scalability of the parallel FFT algorithm on mesh and hypercube connected multicomputers. We also present experimental performance results on a 1024 processor nCUBE1 T M multicomputer to support our analytical results. The scalability of a parallel algorithm on ....
....on a larger number of processors. The scalability analysis of FFT on hypercube provides several important insights. On the hypercube architecture, a commonly used parallel formulation of the FFT algorithm (which we shall refer to as the binary exchange algorithm in the rest of the paper) [3, 4, 6, 11, 21, 32, 41, 36, 31] can obtain linearly increasing speedup with respect to the number of processors with only a moderate increase in problem size. This is not surprising in the light of the fact that the FFT computation maps naturally to the hypercube architecture [35] However, there is a limit on the achievable ....
[Article contains additional citation context not shown here]
A. Norton and A. J. Silberger. Parallelization and performance analysis of the Cooley-Tukey FFT algorithm for shared memory architectures. IEEE Transactions on Computers, C-36(5):581--591, 1987.
....applications. Some of the applications of the FFT algorithm include Time Series and Wave Analysis, solving Linear Partial Differential Equations, Convolution, Digital Signal Processing and Image Filtering, etc. Hence, there has been a great interest in implementing FFT on parallel computers [11, 17, 29, 56, 72, 106, 133, 13, 74, 19, 25, 3]. 3.1.1 The FFT Algorithm Figure 3.1 outlines the serial Cooley Tukey algorithm for an n point single dimensional unordered radix 2 FFT adapted from [4, 116] X is the input vector of length n (n = 2 r for some integer r) and Y is its Fourier Transform. # k denotes the complex number e j ....
.... b 0 0 0) S[ b 0 b l 1 1b l 1 b r 1 ) 9. end; 10. end; 11. end. Figure 3.1: The Cooley Tukey algorithm for single dimensional unordered FFT. The Binary Exchange Algorithm In the most commonly used mapping that minimizes communication for the binary exchange algorithm [81, 5, 11, 17, 29, 72, 106, 133, 116, 94], if (b 0 b 1 b r 1 ) is the binary representation of i, then for all i, R[i] and S[i] are mapped to processor number (b 0 b d 1 ) With this mapping, processors need to communicate with each other in the first d iterations of the main loop (starting at line 3) of the algorithm. ....
A. Norton and A. J. Silberger. Parallelization and performance analysis of the Cooley-Tukey FFT algorithm for shared memory architectures. IEEE Transactions on Computers, C36 (5):581--591, 1987. 160
....Frameworks for the Fast Fourier Transform . In the twenty five years between the publications of Pease [16] and Van Loan [20] only a few authors used this powerful technique: Temperton [18] and Johnson et al. 9] for FFT implementations on classic vector computers and Norton and Silberger [14] on parallel computers with MIMD architecture. Recently, Gupta [6] 7] and Pitsianis [17] used the Kronecker product formalism to synthesize FFT programs. As a consequence, the Kronecker product approach to FFT algorithm design antiquates more conventional techniques like signal flow graphs. ....
A. Norton, A. J. Silberger, Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures, IEEE Trans. Comput. 36 (1987), pp. 581--591.
....in his 1992 presentation of FFT algorithms. In the twenty five years between the publications of Pease [16] and Van Loan [21] only a few authors used this powerful technique: Temperton [19] and Johnson et al. 8] for FFT implementations on classic vector computers and Norton and Silberger [15] on parallel computers with MIMD architecture. Recently, Gupta [6] and Pitsianis [17] used Kronecker product formulations to synthesize FFT programs. The Kronecker product approach makes it easy to modify FFT algorithms by exploiting the underlying algebraic structure of its matrix representation. ....
A. Norton, A. J. Silberger, Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures, IEEE Trans. Comput. 36 (1987), pp. 581--591.
....Analysis Analysis and prediction of the performance of multiprocessor systems are complex tasks[19, 46] since many factors are involved. There are two different levels at which analysis can be carried out. One is the architectural, or system level[39, 11] and the other is the application level[19, 12, 4, 50, 41, 51]. Analysis at the system level studies system architecture in detail, then evaluates and predicts overall system performance in terms of queuing, clock cycles, and tasks. Analysis at the application level chooses some application problem and predicts its performance on a given system. In this ....
....because it has very exacting requirements for data sharing during execution. Parallel FFT algorithms have been studied in Pease s pioneering paper [43] and a more recent book [27] The performance of the FFT algorithm on shared memory systems and message passing systems has been studied extensively[41, 19, 9, 17, 52]. The analysis in this subsection will concentrate CHAPTER 3. PERFORMANCE ANALYSIS 37 on implementations for eager DSM systems. In this discussion, for simplicity, a one dimensional FFT algorithm is considered. The one dimensional Fourier transform of a sequence of M( 2 L ) complex numbers (x ....
Alan Norton and Allan J. Silberger. Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures. IEEE Transactions on Computers, C-36(5):581--591, May 1987.
....Frameworks for the Fast Fourier Transform . In the twenty five years between the publications of Pease [15] and Van Loan [19] only a few authors used this powerful technique: Temperton [18] and Johnson et al. 11] for FFT implementations on classic vector computers and Norton and Silberger [14] on parallel computers with MIMD architecture. Recently, Gupta [9] and Pitsianis [16] used the Kronecker product formalism to synthesize FFT programs. As a consequence, the Kronecker product approach to FFT algorithm design antiquates more conventional techniques like signal flow graphs. Signal ....
A. Norton, A. J. Silberger, Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures, IEEE Trans. Comput. 36 (1987), pp. 581--591.
....log is used for log 2 . in stage i is between two nodes that are at a distance of N=2 i (1 i log N) from each other. The sequential complexity of this computation is therefore O(N log N ) However, the regular structure of the computation makes it very amenable to parallel implementation [1, 8, 21, 22, 25]. As described above, the 2D FFT can be computed by first computing 1D FFTs along the rows of the array, followed by 1D FFTs along the columns of the resulting array. This leads to a sequential time complexity of O(N 2 log N ) 4 Implementation on 2D Meshes The 2D mesh (Figure 3) presents us ....
A. Norton and A. J. Silberger. Parallelization and performance analysis of the CooleyTukey FFT algorithm for shared memory architectures. IEEE Transactions on Computing, C-36(5):581--591, 1987.
No context found.
A. Norton, A. J. Silberger, Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures, IEEE Trans. Comput. 36 (1987), pp. 581--591.
No context found.
A. Norton, A. J. Silberger, Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures, IEEE Trans. Comput. 36 (1987), pp. 581--591.
No context found.
A. Norton, A. J. Silberger, Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures, IEEE Trans. Comput. 36 (1987), pp. 581-591.
No context found.
A. Norton, A. J. Silberger, Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures, IEEE Trans. Comput. 36 (1987), pp. 581--591.
No context found.
A. Norton and A. J. Silberger, `Parallelization and performance analysis of the Cooley--Tukey FFT algorithm for shared-memory architectures', IEEE Trans. Comp., C-36, 581--591 (1987).
No context found.
A. Norton and A. J. Siberger,"Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-memoryArchitecture",IEEE Trans. Comp., Vol.C-36, No.5, pp.581-591, May 1987
No context found.
A. Norton and A. J. Silberger. Parallelization and performance analysis of the CooleyTukey FFT algorithm for shared memory architectures. IEEE Transactions on Computing, C-36(5):581--591, 1987.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC