16 citations found. Retrieving documents...
D. W. Walker, "Characterizing the parallel performance of a large scale, particle-in-cell plasma simulation code", Concurrency: Practice and Experience, 2, 257, (1990).

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Particle-Mesh Techniques - MacNeice (1996)   (Correct)

....array of size N proc = 128 Theta 128, we could map cell (i; j; k) into processor (i; j) The major design question which faces us is how to distribute the particle data. There are some obvious choices which focus either on computational load balance or on efficient interprocessor communication [19]. 10.2.1 Uniform Load Balance with communication hotspots The first option is to parcel the particles out evenly amongst the processors, paying no attention to their physical locations. This achieves the best computational load balance during the particle push and during the purely ....

D. W. Walker, "Characterizing the parallel performance of a large scale, particle-in-cell plasma simulation code", Concurrency: Practice and Experience, 2, 257, (1990).


Parallel Remapping Algorithms for Adaptive Problems - Chao-Wei Ou And (1995)   (11 citations)  (Correct)

....used in this paper. For the rest of the paper we will assume that sorting n elements on p processors requires O( n log n p ) amount of time. This is true when n is O(p 1 ffl ) ffl 0. 4 Remapping for Perturbations In applications such as molecular dynamics [2] particle in a cell methods [18], particle dynamics [14] etc. the interaction between several particles is simulated. These particles are dispersed in a two or threedimensional space, and the simulation is performed for a large number of time steps. At each time step, the numerical approximation techniques used for simulation ....

D. Walker. Characterizing the Parallel Performance of a Large-Scale, Particle-in-Cell Plasma Simulation Code. Concurrency: Practice and Experience, 1990.


Particle-in-Cell Simulation Codes in High Performance Fortran - Akarsu, Dincer, Haupt, Fox (1996)   (1 citation)  (Correct)

....on the mesh (gather phase) Finally, the particles are repositioned under the influence of this force in the particle push phase. Parallel PIC methods have been previously studied by a number of researchers on a wide variety of platforms using different parallel programming methodologies [11, 12, 13, 14, 15, 16]. We implemented one and two dimensional PIC simulation codes in HPF. We compiled our codes using PGI s HPF compiler and ran them on an IBM SP 2 with up to 32 nodes. In this paper we talk about our data decomposition and parallelization strategy, and emphasize HPF features used in individual ....

....of the mesh bounded by two grid points. The performance of the PIC simulation codes depends critically on the decomposition (distribution) strategy of the grid points and particles among the nodes of a distributed memory machine or workstation cluster. We used a variant of Eulerian decomposition [14] which assigns each processor a fixed spatial partition of the grid and the particles within the cells are surrounded by those grid points. This prevents the high communication cost in the gather and scatter phase, yet load imbalance among processors may develop after repetitive iterations as a ....

D. W. Walker, Characterizing the Parallel Performance of a Large-Scale, Particle-in-Cell Plasma Simulation Code, Concurrency: Practice and Experience, (2)4:257--288, Dec.1990.


Distributed Scheduling of Unstructured Collective.. - Wang, Lin, Ranka (1993)   (4 citations)  (Correct)

....significant improvement over naive methods. Index Terms: Active Messages, communication latency, distributed scheduling, interrupt handler, node contention, personalized communication, unstructured communication. 1 Introduction Parallelization of many irregular and loosely synchronous problems [1, 3, 7, 9, 14, 16, 17] result in all to many personalized communication. An example of all to many personalized communication is given in Table 1. A 1 in the (i; j) entry represents the fact that processor P i needs to communicate to processor P j . Each message is of different size and each processor may send a ....

D.W. Walker. Characterizing the parallel performance of a large-scale, particle-in-cell plasma simulation code. Concurrency: Practice and Experience, 1990.


SPRINT: Scalable Partitioning, Refinement, and INcremental.. - Ou, Ranka   (Correct)

....vertices correspond to two or three dimensional coordinates, and the interaction between computations is limited to vertices that are physically proximate. Examples of such applications include finite element calculations [21] molecular dynamics [3] particle dynamics [25] particle ina cell [17, 31], region growing [6] and statistical physics [5] A list of other such applications is given in [4] For these applications, partitioning can be achieved by exploiting the above property. Essentially proximate points are clustered together and form a partition such that the number of points ....

D. Walker. Characterizing the Parallel Performance of a Large-Scale, Particle-in-Cell Plasma Simulation Code. Concurrency: Practice and Experience, 1990.


Runtime Support for Dynamic Space-Based Applications on.. - Ashok (1994)   (1 citation)  (Correct)

....of Dynamic Space Based Applications 1.2. 1 Electro Magnetic Particle In Cell The Electro Magnetic Particle In Cell (EMPIC) application simulates movement of charged particles that interact by exerting electric and magnetic field forces on each other [Birdsall Langdon 85, Hockney Eastwood 88, Walker 90] The force experienced by a particle depends on the current position and velocity of all the particles, and this changes continuously with time. The goal of the simulation is to understand the behavior of the particles. The standard solution uses a particle mesh method, which discretizes space ....

....Several researchers studied issues in parallelizing specific scientific applications in different fields of science and engineering. Here we mention the work done on dynamic space based applications: particle in cell (plasma physics) Campbell et al. 90, Ferraro et al. 93, Liewer Decyk 89, Walker 90] rarefied fluid flow (aeronautics) Fallavollita et al. 92, McDonald 89, Singh et al. 90] and molecular dynamics (materials science and chemistry) Brug e Fornili 90, Fincham 87, Pinches et al. 91, Raine et al. 89, Rapaport 91, Smith 91] 3.2 Application Induced Load Balancing 3.2.1 Automatic ....

[Article contains additional citation context not shown here]

D. W. Walker. Characterizing the Parallel Performance of a large-scale Particle-In-Cell Plasma Simulation Code. Concurrency: Practice and Experience, Volume 2, pages 257--288, 1990.


A Comparison of Optimization Heuristics for the Data.. - Chrisochoides, Mansour (1996)   (1 citation)  (Correct)

....method. Examples are nearest neighbor mapping, block partitioning, recursive coordinate bisection, recursive graph bisection, recursive spectral bisection, CM Clustering, and scattered decomposition [1] 4] 11] 12] 13] 14] 15] 20] 21] 37] 41] 49] 50] 42] 52] [57]. Other algorithms are based on deterministic optimization, where local search techniques are used to minimize cost functions that approximate the execution time T solver ; examples are Kernighan Lin algorithm [35] and geometry graph partitioning [4] Yet, another class of mapping algorithms are ....

D. Walker. Characterizing the parallel performance of a large-scale, particle-in-cell plasma simulation code. Concurrency Practice and Experience, Dec., 257-288. 1990.


Parallel Remapping Algorithms for Adaptive Problems - Chao-Wei Ou (1995)   (11 citations)  (Correct)

....vertices correspond to two or three dimensional coordinates, and the interaction between computations is limited to vertices that are physically proximate. Examples of such applications include finite element calculations [16] molecular dynamics [4] particle dynamics [23] particle in a cell [13, 27], region growing [7] and statistical physics [6] A list of other such applications is given in [5] For these applications, partitioning can be achieved by exploiting the above property. Essentially, proximate points are clustered together and form a partition such that the number of points ....

D.W. Walker. Characterizing the Parallel Performance of a Large-Scale, Particle-in-Cell Plasma Simulation Code. Concurrency: Practice and Experience, 1990.


Software Support for Irregular and Loosely.. - Choudhary, Fox.. (1992)   (4 citations)  (Correct)

....arrays. In this section, we will only consider the case where each individual phase is a static single phase computation as defined above. Examples of these computations include unstructured multigrid (e.g. 26] parallelized sparse triangular solver (e.g. 4, 1] particle in cell codes (e.g. [38, 24]) and vortex blob calculations [3] The key problem in implementation is again partitioning computation and data, but now the task is complicated because the interfaces between phases must be considered in the partitioning. The synchronization and communication requirements are similarly ....

D. W. Walker. Characterizing the Parallel Performance of a Large-Scale, Particle-In-Cell Plasma Simulation Code. Concurrency: Practice and Experience, 1990.


Data Distributions For Sparse Matrix Vector Multiplication - Romero, Zapata (1995)   (15 citations)  (Correct)

....bisected, alternating vertical and horizontal partitions until we have as many submatrices as PEs. Other possibilities for performing these divisions consist in altering the order of the partitions so that horizontal and vertical partitions are not alternated, introducing other arrangements [26]. This distribution method, apart from achieving a good load balance, permits a simple assignment of the submatrices to a PE network with hypercube or binary tree topology. However, as can be noted in figure 1, there are serious problems in communications, as adjacent elements in the matrix may be ....

....using a storage by row of submatrix M. The communications are concentrated in the Collection and Redistribution stages, generating a total of 2 p messages of size m p z r and m p z c . It can be easily seen that the MRD distribution scheme encompasses, as particular cases, the BRD [3] [26] and One Way Strip Partitioning (OSP) 2] 6] 9] 24] methods. When the number of PEs of the mesh is a power of two, the MRD method coincides with the BRD method. The OSP method coincides with the MRD method when it is just applied to the rows (Row OSP method) or columns (Column OSP method) of ....

D.W. Walker, "Characterizing the Parallel Performance of a Large-Scale, Particle-in-Cell Plasma Simulation Code", Concurrency, Practice and Experience, vol. 2, no. 4, pp. 257-288, 1990.


Sparse Block and Cyclic Data Distributions for Matrix.. - Asenjo Romero (1995)   (3 citations)  (Correct)

....bisected, alternating vertical and horizontal partitions until we have as many submatrices as PEs. Other possibilities for performing these divisions consist in altering the order of the partitions so that horizontal and vertical partitions are not alternated, introducing other arrangements [22]. This distribution method, apart from achieving a good load balance, permits a simple assignment of the submatrices to a PE network with hypercube or binary tree topology. However, there are serious problems in communications, as adjacent elements in the matrix may be projected onto PEs that ....

....using a storage by row of submatrix M. The communications are concentrated in the Collection and Redistribution stages, generating a total of 2 p messages of size m p z r and m p z c . It can be easily seen that the MRD distribution scheme encompasses, as particular cases, the BRD [3] [22] and One Way Strip Partitioning (OSP) 2] 5] 8] 20] methods. When the number of PEs of the mesh is a power of two, the MRD method coincides with the BRD method. The OSP method coincides with the MRD method when it is just applied to the rows (Row OSP method) or columns (Column OSP method) ....

D.W. Walker, "Characterizing the Parallel Performance of a Large-Scale, Particle-in-Cell Plasma Simulation Code", Concurrency, Pract. and Exper., vol.2, no.4, pp.257-288, 1990.


Irregular Personalized Communication on Distributed Memory.. - Ranka, Wang, Kumar (1993)   (5 citations)  (Correct)

....of a series of dissimilar loosely synchronous computational phases where each individual phase is a single concurrent computational phase. Examples of these computations include unstructured multigrid (e.g. 16] parallelized sparse triangular solver (e.g. 1, 4] particle in cell codes (e.g. [14, 24]) and vortex blob calculations [3] The key problem in efficiently executing these programs is partitioning the data and computation such that the load on each node is balanced and the communication is minimized. Figure 2 describes a decomposition of such a problem. The x and y arrays in Figure 1 ....

D.W. Walker. Characterizing the parallel performance of a large-scale, particle-in-cell plasma simulation code. Concurrency: Practice and Experience, 1990.


Static and Runtime Algorithms for All-to-Many Personalized.. - Sanjay Ranka (1992)   (2 citations)  (Correct)

....have been developed in [1, 16] Load balancing and reduction of communication are two important issues for achieving a good mapping. The directives of Fortran D [6] can be used to provide such a mapping for a large class of regular and synchronous problems. For some other classes of problems [3, 19, 20] that are irregular in nature, achieving a good mapping is considerably more difficult [7] Further, the nature of this irregularity may not be known at the time of compilation and can be ascertained only at runtime. The handling of irregular problems requires the use of runtime information to ....

D.W. Walker. Characterizing the Parallel Performance of a Large-Scale, Particle-in-Cell Plasma Simulation Code. Concurrency: Practice and Experience, pages pp. 257--288, 1990.


Irregular Personalized Communication on Distributed Memory.. - Ranka, Wang (1995)   (5 citations)  (Correct)

....consists of a series of dissimilar, loosely synchronous computational phases where each individual phase is a single concurrent computational phase. Examples of these computations include unstructured multigrid [17] parallelized sparse triangular solver [1, 4] and particle in cell codes [15, 26]. The key problem in efficiently executing these programs is partitioning the data and computation such that the load on each node is balanced and the communication is minimized. Figure 2 describes a decomposition of such a problem. The x and y arrays in Figure 1 represent the nodes in Figure 2, ....

D.W. Walker. Characterizing the Parallel Performance of a Large-Scale, Particle-in-Cell Plasma Simulation Code. Concurrency: Practice and Experience, pages pp. 257--288, 1990.


BOS is Boss: A Case for Bulk-Synchronous Object Systems - Goudreau, Lang, Narlikar, Rao   (Correct)

....balancing, and synchronization reduction are done by the application programmer. 6. 1 Electro Magnetic Particle In Cell Simulation The electro magnetic particle in cell (EMPIC) application simulates the movement of charged particles that exert electric and magnetic forces on each other [18, 32, 46]. We use a standard particlegrid method which discretizes the space on a grid, and uses a leapfrog integration scheme to solve Maxwell s differential equations across discrete timesteps. Each timestep consists of four phases. In the scatter phase, the current density of each grid point is computed ....

D. W. Walker, "Characterizing the parallel performance of a large-scale, particlein -cell plasma simulation code," Concurrency, Practice and Experience, vol. 2, no. 4, pp. 257--288, Dec. 1990.


Paradigms And Strategies For Scientific Computing On.. - Foster, Walker (1994)   Self-citation (Walker)   (Correct)

....at the corners of their home cells to evaluate the fields at their positions. ffl The push phase in which the equation of motion of each particle is advanced one time step. The update for each particle is independent of all others. There are two basic approaches to parallelizing PIC applications (Walker 1990, Walker 1991) In the first, both the computational grid and the particles are spatially decomposed into processes, and only data lying along process boundaries needs to be moved between processes. Good load balance is maintained if particles are distributed sufficiently homogeneously, but for ....

Walker, D. W. 1990. "Characterizing the Parallel Performance of a Large Scale, Particle-In-Cell Plasma Simulation Code." Concurrency: Practice and Experience 2, no. 4 (Dec.): 257-288.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC