| E. J.-L. Lu and D. I. Okunbor. Massively parallel fast multipole algorithm in three dimensions. In the proceedings of Fifth IEEE International Symposium on High Performance Distributed Computing, August 1996. (to appear). |
.... MPIFMA. Since only primitive communication functions such as point to point communication and broadcast are implemented in the parallel FMA library, this makes it easy to port MPIFMA to other communication libraries. For efficiency, we implemented the optimum communication scheme proposed in [13]. The advantages of the optimum communication scheme include (1) minimum number of messages are communicated among processors, and (2) the computation and communication are overlapped as much as possible. Our preliminary results are remarkable. The force calculation of one million particles took ....
....to processor P j and vice versa as long as P i 6= P j . Therefore, for each box i, one can build up a processor list which box i s MEs will be sent. Since only the required MEs are transmitted, redundancies are therefore eliminated bringing about minimal transmission. The details can be found at [13]. Experiments were conducted on an Intel iPSC860 system with 1, 2, 4, 8, and 16 nodes. The benchmark system is a 3D particle system with particle positions O randomly generated between Gamma0:5 and 0:5. The size of the particle Total Near P Run ME2LE Field Comm. Eff. Time Force Cost 1 93.291 ....
E. J.-L. Lu and D. I. Okunbor. Massively parallel fast multipole algorithm in three dimensions. In the proceedings of Fifth IEEE International Symposium on High Performance Distributed Computing, August 1996. (to appear).
....to parallelization of fast multipole algorithm. Greengard and Gropp [11] presented the parallel version of FMA in two dimensions (2D) Board, et al. 14] have done a lot of work in the parallelization of FMA in 3D. Lu and Okunbor developed an efficient massively parallel FMA in three dimensions [15]. All parallel implementations perform well if the particles are distributed uniformly. However, the performance of these parallel algorithms degrades significantly when the particles are not distributed uniformly due to the load imbalancing. Computer Science Department, University of Missouri ....
....0.90. Processors Non Adaptive Factor = 0.90 2 1.961 1.945 4 1.959 3.042 8 1.957 5.411 16 3.065 9.596 Table 1 The speedups of domain decomposition scheme and weighted subtrees scheme with factor = 0.90. We implement the weighted subtrees technique on top of our parallel fast multipole algorithm [15]. The benchmark system is a 3D particle system with particle positions O randomly generated between Gamma0:5 and 0:5 with the restriction that all coordinates for a particle are either positive (0 O 0:5) or negative ( Gamma0:5 O 0) All experiments were conducted on an Intel iPSC 860 system ....
[Article contains additional citation context not shown here]
E. J.-L. Lu and D. I. Okunbor, A massively parallel fast multipole algorithm in three dimensions, in the proceedings of Fifth IEEE International Symposium on High Performance Distributed Computing, August 1996, pp. 40--48.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC