Abstract:
In this paper, we propose a high-performance parallel three-dimensional fast Fourier transform (FFT) algorithm on clusters of vector symmetric multiprocessor (SMP) nodes. The three-dimensional FFT algorithm can be altered into a multirow FFT algorithm to expand the innermost loop length. We use the multirow FFT algorithm to implement the parallel three-dimensional FFT algorithm. Performance results of three-dimensional power-of-two FFTs on clusters of (pseudo) vector SMP nodes, Hitachi SR8000, are reported. We succeeded in obtaining performance of about 40 GFLOPS on a 16-node Hitachi SR8000. 1
Citations
|
223
|
Computational Frameworks for the Fast Fourier Transform, ser
– Loan
- 1992
|
|
108
|
FFT’s in external or hierarchical memory
– Bailey
- 1990
|
|
63
|
Multiprocessor FFTs
– Swarztrauber
- 1987
|
|
21
|
FFT algorithms for vector computers
– Swarztrauber
- 1984
|
|
13
|
Tukey, "An Algorithm for the Machine Calculation of the Complex Fourier Series
– Cooley, W
- 1965
|
|
10
|
An efficient parallel algorithm for the 3-D FFT NAS parallel benchmark
– Agarwal, Gustavson, et al.
- 1994
|
|
9
|
Fast Radix 2,3,4, and 5 Kernels for Fast Fourier Transformations on Computers with overlapping multiplyadd instructions
– Goedecker
- 1997
|
|
7
|
An implementation of multiple and multi-variate Fourier transforms on vector processors
– Hegland
- 1995
|
|
5
|
Two and Three Dimensional FFTs on Highly Parallel Computers
– Brass, Pawley
- 1986
|
|
4
|
Real and Complex Fast Fourier Transforms on the Fujitsu VPP 500
– Hegland
- 1996
|
|
4
|
Implementation of parallel FFT algorithms on distributed memory machines with a minimum overhead of communication
– Calvin
- 1996
|
|
4
|
A generalized prime factor FFT algorithm for any n = 2 p q r
– Temperton
- 1992
|
|
3
|
Pseudo Vector Processor based on Register-Windowed Superscalar Pipeline
– Nakazawa, Nakamura, et al.
- 1992
|
|
1
|
High-Performance FFT Algorithms for the Convex C4/XA Supercomputer
– Wadleigh, Gostin, et al.
- 1995
|