| L. Oliker and R. Biswas, PLUM: Parallel load balancing for adaptive unstructured meshes, J. Parallel Distrib. Comput. 52 (1998), 150--177, doi:10.1006/jpdc.1998.1469. |
....do not like black box solutions. They prefer the clean and very flexible RISC approach from computer architectures (i.e. a simple instruction set ) which in this case translates to a simple functionality for implementing building blocks for parallel mesh generation. Contrary, existing systems [3, 4, 5] follow the CISC approach (i.e. complex and e#cient instructions) which in the case of problem specific environments for adaptive applications like mesh generation and refinement translates to complex libraries of routines for di#cult problems like dynamic load balancing. Our programming ....
L. Oliker and R. Biswas. Plum: Parallel load balancing for adaptive unstructured meshes. Journal of Par. and Dist. Comp., 52(2):150--177, 1998.
....techniques have been extensively investigated for unstructured meshes, such schemes for structured grids are relatively unexplored. Our goal is to adaptively manage dynamic applications on structured grids based on the runtime state using a characterization of the available options. PLUM [4] is a dynamic load balancing strategy for adaptive unstructured grid computations that uses a cost metric model for efficient mesh adaptation. The PLUM model uses computation, communication, and dataremapping weights to implement accurate metrics that estimate and compare the computational gain ....
L. Oliker and R. Biswas. PLUM: Parallel Load Balancing for Adaptive Unstructured Meshes. Journal of Parallel and Distributed Computing, 52(2), pp. 150-177, 1998.
....(1) and (2) constitutes an NP hard problem. Partitioners typically optimize a subset of the components at the expense of others. Our goal in defining the five component metric is to facilitate the determination of trade offs for each partitioning technique. Quantification of metrics is studied in [12]. 4 AN EXPERIMENTAL EVALUATION OF PARTITIONING TECHNIQUES FOR SAMR APPLICATIONS The goal of the presented research is to construct a dynamic meta partitioner, that can adapt to (and optimize for) fully dynamic PACs. In this paper we move towards this goal by characterizing each partitioner (and ....
Leonid Oliker and Rupak Biswas. Plum: Parallel load balancing for adaptive unstructured meshes. Journal of Parallel and Distributed Computing, 52(2):150--177, 1998.
....comp . The inter pr cessor communications time is dependent on the edge cut of the new par titioning. The datar edistr ibution time is dependent on the total amount of data that isr equir ed to be moved in o rer torj lize the new par titioning. Recently developed adaptiverH a r itioning schemes [4, 5, 11, 12, 13, 14, 15, 16, 17, 20, 21, 22, 23, 24, 25] tend to be ver y fast, especially compar ed to the time rF uir ed to per or m even a single iter tion of a typical scientific simulation. They also tend to balance the new par titioning to within a few perI( t of optimal. Hence, we can ignor e both t repart and t comp 1 .However , depending on ....
....adaptive gr aph par titioning is a multi objective optimization pr oblem. Two appr oaches have pr imar ily been taken when designing adaptivegr aph par titioner s. The fir st appr oach is to focus on minimizing the edge cut and to minimize the datar edistr ibution only as a secondar y objective [11, 12, 16, 17, 21, 22, 23, 24, 25]. A good example of such schemesar e scr atch r emapr epar titioner s 1 This is because, in the absence of load imbalance, t comp will be primarily determined by the domain specific computation and cannot be reduced further. 2 [11, 17, 21] These use some type of state of the ar t g r ph pa rF ....
[Article contains additional citation context not shown here]
L. Oliker and R. Biswas. PLUM: Parallel load balancing for adaptive unstructured meshes. Journal of Parallel and Distributed Computing, 52(2):150--177, 1998.
....processors, it will have a halo copy on each of these. The main advantage of maintaining such a rich set of distributed data structures is that the adaptive code may be used with a variety of di erent parallel solvers. The disadvantage however is that, compared to more focused algorithms (such as [13] for example) the amount of data that needs to be stored and partitioned between the processors is signi cantly greater. Both TETRAD ( 18] and PTETRAD ( 14 16] use a similar strategy to that outlined in [11] to perform mesh adaptation. Edges are rst marked for 2 re nement dere nement (or ....
....with the other main features of [9] In Subsection 4.2 some further approaches, based upon recursive bisection, are then described. In addition to these algorithms a number of other dynamic load balancing heuristics have been suggested in recent years (see [2,3,6,7,22] or many of the references in [13] for some typical examples) although none of these are included in this investigation. One approach that has been successfully applied to the parallel load balancing of adaptive unstructured meshes in 3 d is described by Oliker and Biswas in [13] Their paper provides more than just another ....
[Article contains additional citation context not shown here]
L. Oliker and R. Biswas, \PLUM: Parallel Load Balancing for Adaptive Unstructured Meshes", J. Parallel and Distributed Computing, 52, 150-177, 1998.
....usually the case with solution adaptive renement) Therefore, the partition has to be altered in order to re establish a balanced distribution. A number of solutions to this load balancing problem are based on re partitioning, where (sometimes even sequential) mesh partitioning algorithms are used [30]. We propose a two phase distributed load balancing algorithm which takes the existing mesh partition into account. The rst phase determines the amount of load that has to be moved between dierent subdomains in order to balance the distribution globally. The adjacencies between subdomains denes ....
....algorithms. If the mesh is rened adaptively, the existing partition will usually become unbalanced. Several existing implementations of parallel adaptive grid applications solve this problem by repartitioning the mesh. This is done by the use of any of the aforementioned partitioning methods [30]. The drawback of such an approach is twofold: rst, the mesh has to be routed to a single processor if the partitioning tool is sequential. Such an approach is obviously not scalable to large numbers of processors and to large meshes. Second, even if a parallel partitioning tool is available (such ....
[Article contains additional citation context not shown here]
L. Oliker, R. Biswas, Plum: Parallel load balancing for adaptive unstructured meshes, J. Par. Dist. Comput. 52 (2) (1998) 150177.
....t comp . The inter processor communications time is dependent on the edge cut of the new partitioning. The data redistribution time is dependent on the total amount of data that is required to be moved in order to realize the new partitioning. Recently developed adaptive repartitioning schemes [4, 5, 11,12,13,14,15,16,17,20,21,22,23,24,25] tend to be very fast, especially compared to the time required to perform even a single iteration of a typical scientific simulation. They also tend to balance the new partitioning to within a few percent of optimal. Hence, we can ignore both t comp 1 and t repart . However, depending on the ....
....this way,adaptive graph partitioning is a multi objective optimization problem. Two approaches have primarily been taken when designing adaptive partitioners. The first approachisto attempt to focus on minimizing the edge cut and to minimize the data redistribution only as a secondary objective[11,12,16,17,21,22,23,24,25]. A good example of suchschemes are scratch remap repartitioners [11,17,21] These use some type of state of the art graph partitioner to compute a new partitioning from scratch and then attempt to intelligently remap the subdomain labels to those of the original partitioning in order to minimize ....
[Article contains additional citation context not shown here]
L. Oliker and R. Biswas. PLUM: Parallel load balancing for adaptive unstructured meshes. Journal of Parallel and Distributed Computing, 52(2):150--177, 1998.
....partitioning but also adds an additional objective. That is, the amountofdata that needs to be redistributed among the processors in order to balance the computation should be minimized. In order to accurately measure this cost, we need to consider not only the weight ofavertex, but also its size [62]. Vertex weightisthe computational cost of the work represented bythevertex, while size reflects its redistribution cost. Thus, the repartitioner should attempt to balance the partitioning with respect to vertex weight while minimizing vertex migration with respect to vertex size. Depending on the ....
....its redistribution cost. Thus, the repartitioner should attempt to balance the partitioning with respect to vertex weight while minimizing vertex migration with respect to vertex size. Depending on the representation and storage policy of the data, size and weightmay not necessarily be equal [62]. Oliker and Biswas studied various metrics for measuring data redistribution costs in [62] They presented the metrics TotalV and MaxV. TotalV is defined as the sum of the sizes of vertices that change subdomains as the result of repartitioning. TotalV reflects the overall volume of ....
[Article contains additional citation context not shown here]
L. Oliker and R. Biswas. PLUM: Parallel load balancing for adaptive unstructured meshes. Journal of Parallel and Distributed Computing, 52(2):150--177, 1998.
....the case with solution adaptive re nement) Therefore, the partition has to be altered in order to re establish a balanced distribution. A number of solutions to this load balancing problem are based on re partitioning, where (sometimes even sequential) mesh partitioning algorithms are used [30]. We propose a two phase distributed load balancing algorithm which takes the existing mesh partition into account. The rst phase determines the amount of load that has to be moved between di erent subdomains in order to balance the distribution globally. The adjacencies between subdomains de nes ....
....algorithms. If the mesh is re ned adaptively, the existing partition will usually become unbalanced. Several existing implementations of parallel adaptive grid applications solve this problem by repartitioning the mesh. This is done by the use of any of the aforementioned partitioning methods [30]. The drawback of such an approach is twofold: Firstly, the mesh has to be routed to a single processor if the partitioning tool is sequential. Such an approach is obviously not scalable to large numbers of processors and to large meshes. Secondly, even if a parallel partitioning tool is available ....
[Article contains additional citation context not shown here]
L. Oliker and R. Biswas. Plum: Parallel load balancing for adaptive unstructured meshes. J. Par. Dist. Comput., 52(2):150-177, 1998.
....scheme that is able to balance the processor workload, while minimizing both the communications of the application and the amount of data migration necessary to balance the load, is a key component for the successful conduct of these applications. Recently, various graph repartitioning techniques [1, 13, 15, 18] have been developed that can quickly compute high quality repartitionings while minimizing the amount of data that needs to be migrated among processors for large classes of problems. These can be classified as either diffusion based schemes [15, 18] or scratchremap schemes [1, 13] ....
....[1, 13, 15, 18] have been developed that can quickly compute high quality repartitionings while minimizing the amount of data that needs to be migrated among processors for large classes of problems. These can be classified as either diffusion based schemes [15, 18] or scratchremap schemes [1, 13]. Diffusion based Repartitioners Diffusion based repartitioners attempt to minimize the difference between the original (imbalanced) partitioning and the final repartitioning by making incremental changes in the partitioning to restore balance. Domains that are overweight in the original ....
[Article contains additional citation context not shown here]
L. Oliker and R. Biswas. Plum: Parallel load balancing for adaptive unstructured meshes. Journal of Parallel and Distributed Computing, 52(2):150--177, 1998.
No context found.
L. Oliker and R. Biswas, PLUM: Parallel load balancing for adaptive unstructured meshes, J. Parallel Distrib. Comput. 52 (1998), 150--177, doi:10.1006/jpdc.1998.1469.
....itself must not pose a major overhead. Although several dynamic load balancers have been proposed for multiprocessor platforms [1 6] most of them are inadequate for adaptive unstructured grid applications. This motivates our work. Recently, we have developed a novel method, called PLUM [7], that dynami cally balances processor workloads with a global view when performing adaptive numerical calculations in a parallel message passing environment. The computational mesh is globally repartitioned from scratch after each adaptation, but a smart remapping technique is used to reassign ....
....PLUM Load Balancer PLUM is an automatic and portable load balancing environment, specifically created to handle adaptive unstructured grid applications. It differs from most other similar load balancers in that it dynamically balances processor work loads with a global view. Prior work [7,26,31] has successfully demonstrated the viability and verified the effectiveness of PLUM for various test cases involving adaptive unstructured grids. In this paper, we examine its architectureindependent feature by comparing results on three parallel machines. Figure 3 provides an overview of PLUM. ....
[Article contains additional citation context not shown here]
Oliker, L., Biswas, R., PLUM: Parallel load balancing for adaptive unstructured meshes. J. Parallel Distrib. Comput. 52 (1998) 150-177.
....supercomputers. The unstructured, dynamic nature of many systems worth simulating, however, makes their efficient parallel implementation a daunting task. This is primarily due to the load imbalance created by the dynamically changing nonuniform grids and the irregular data access patterns [15, 16, 22]. These cause significant communication at runtime, leaving many processors idle and adversely affecting the total execution time. Furthermore, modern computer architectures, based on deep memory hierarchies, show acceptable performance only if users care about the proper distribution and ....
....load balance and maintain good cache locality for adaptive applications. Unfortunately, a significant overhead is generally associated 1 The tuple fx; y; zg denotes fSMP nodes, MPI tasks, OpenMP threadsg. 16 L. OLIKER, X. LI, P. HUSBANDS, AND R. BISWAS with these rebalancing phases [15, 16, 22]. The CC NUMA and MPI OpenMP strategies would thus be comparable to an MPI implementation, requiring similar amounts of programming effort and rebalancing overheads. The major difference would be the use of a shared address space (global on an Origin2000, local within a node on a SP3) instead of ....
L. Oliker and R. Biswas, PLUM: Parallel load balancing for adaptive unstructured meshes, J. Parallel and Distributed Computing, 52 (1998), pp. 150--177.
....mesh and remapping the data. After refinement, the matrices for the submeshes are regenerated and passed on to the solver. The entire cycle is then repeated until the computation is done. Extensive details about this mesh adaptation procedure and the dynamic load balancing strategy are given in [7, 9]. 2.2 N Body Problem The N Body problem is a classical one, and arises in many areas of science and engineering such as astrophysics, molecular dynamics, and graphics. Having specified the initial positions and velocities of the interacting bodies, the problem is to find their positions after ....
....the communication volume needed for data remapping after the repartitioning since refinement is performed by the destination processors, and (iii) it increases data locality since the flow solver works on the newly partitioned refined mesh. Detailed explanations of these side effects are given in [7, 9]. In the MPI implementation, each process owns a submesh and maintains the necessary local data structures to represent it. Thus, each mesh object (vertex, edge, element) has a local index. These local data structures and indices provide good data locality for the MPI program. However, in order ....
[Article contains additional citation context not shown here]
L. Oliker and R. Biswas, "PLUM: Parallel load balancing for adaptive unstructured meshes," Journal of Parallel and Distributed Computing, 52 (1998) 150--177.
....since the computational mesh will be frequently adapted for unsteady flows, the runtime load also has to be balanced at each step. In other words, the dynamic load balancing procedure itself must not pose a major overhead. This motivates our work. We have developed a novel method, called PLUM [7], that dynamically balances processor workloads with a global view when performing adaptive numerical calculations in a parallel message passing environment. Examining the performance of PLUM for an actual workload, which simulates an acoustic wind tunnel experiment of a helicopter rotor blade, on ....
....2. Architecture Independent Load Balancer PLUM is an automatic and portable load balancing environment, specifically created to handle adaptive unstructured grid applications. It differs from most other load balancers in that it dynamically balances processor workloads with a global view [1, 7]. In this paper, we examine its architecture independent feature by comparing results for a test case running on an SP2, Origin2000, and T3E. PLUM consists of a partitioner and a remapper that load balance and redistribute the computational mesh when necessary. After an initial partitioning and ....
[Article contains additional citation context not shown here]
L. Oliker and R. Biswas. PLUM: Parallel load balancing for adaptive unstructured meshes. Journal of Parallel and Distributed Computing, 52:150--177, 1998.
....global data structures. A gather operation by a host processor is performed to concatenate the local data structures. The host can then interface the global mesh directly to the appropriate post processing module without having to perform any serial computation. 3 Dynamic Load Balancing PLUM [20] is a novel method to dynamically balance the processor workloads for unstructured adaptive grid computations with a global view. It has five salient features: ffl Repeated use of the initial mesh dual graph keeps the connectivity and partitioning complexity constant during the course of an ....
....In general, the number of new partitions is an integer multiple F of the number of processors, and each processor is assigned F unique partitions. Allowing multiple partitions per processor reduces the volume of data movement at the expense of partitioning and processor reassignment times [20]; however, setting F to unity suffices for most practical applications. We first generate a similarity matrix M that indicates how the remapping weights w remap of the new partitions are distributed over the processors. Entry M ij is the sum of the w remap values of all the dual graph vertices in ....
[Article contains additional citation context not shown here]
L. Oliker and R. Biswas, PLUM: Parallel load balancing for adaptive unstructured meshes, J. Parallel Distrib. Comput. 52 (1998) 150--177. 29
....imbalance created by the dynamically changing nonuniform grid. Nonetheless, it is generally thought that unstructured adaptive grid techniques will constitute a significant fraction of future high performance supercomputing. With this goal in mind, we have developed a novel method, called PLUM [7], that dynamically balances processor workloads with a global view when performing adaptive numerical calculations in a parallel message passing environment. The mesh is first partitioned and mapped among the available processors. Once an acceptable numerical solution is obtained, the mesh ....
....is discarded. The computational mesh is then refined and the numerical calculation is restarted. 2 Dynamic Load Balancing 2. 1 Repartitioning the Initial Mesh Dual Graph Repeatedly using the dual of the initial computational mesh for dynamic load balancing is one of the key features of PLUM [7]. Each dual graph vertex has a computational weight, w comp , and a remapping weight, w remap . These weights model the processing workload and the cost of moving the corresponding element from one processor to another. Every dual graph edge also has a weight, wcomm , that models the runtime ....
[Article contains additional citation context not shown here]
Oliker, L., Biswas, R.: PLUM: Parallel load balancing for adaptive unstructured meshes. NASA Ames Research Center Technical Report NAS-97-020 (1997)
....original serial code consists of approximately 1,300 lines of C and requires 6.4 seconds to execute this simulation on a 250 MHz MIPS R10000 processor. 4. Distributed Memory Implementation The distributed memory version of the mesh adaptation code was implemented in MPI within the PLUM framework [5]. PLUM is an automatic and portable load balancing environment, specifically created to handle adaptive unstructured grid applications. It differs from most 4 Figure 1: A close up view of the initial triangular mesh around the airfoil. other load balancers in that it dynamically balances ....
....8 8.31 1.4 10.23 19.9 1.02 30.21 151.75 16 5.04 1.3 5.57 11.9 1.02 13.57 121.06 32 2.28 1.7 2.82 6.8 1.05 7.77 118.55 64 1.41 2.3 1.69 5.4 1.08 4.17 132.34 Table 3: Performance of the MPI code on the Origin2000. remapping as a function of the maximum (not total) communication among processors [5]. The slight difference in the amount of data volume between Tables 2 and 3 for the same value of P is because the T3E has 8 byte integers whereas the Origin2000 has 4 byte integers. This message passing implementation of the adaptation algorithm required a significant amount of programming ....
L. Oliker and R. Biswas, "PLUM: Parallel load balancing for adaptive unstructured meshes," Journal of Parallel and Distributed Computing , 52 (1998) 150--177. 12
....1,935,619 Table 1: Progression of grid sizes through five levels of adaptation. 3 Figure 2: A close up view of the mesh after the second refinement. 4. Distributed Memory Implementation The distributed memory version of the mesh adaptation code was implemented in MPI within the PLUM framework [5]. PLUM is an automatic and portable load balancing environment, specifically created to handle adaptive unstructured grid applications. It differs from most other load balancers in that it dynamically balances processor workloads with a global view. PLUM consists of a partitioner and a remapper ....
....to the load balancer at this time. A repartitioning algorithm, like the ParMETIS [4] parallel multilevel partitioner, is used to divide the new mesh into subgrids. All necessary data is then redistributed, the computational mesh is actually refined, and the numerical calculation restarted. PLUM [5] is a novel method to dynamically balance the processor workloads for unstructured adaptive grid computations with a global view. It has five salient features: Repeated use of the initial mesh dual graph keeps the connectivity and partitioning complexity constant during the course of an ....
[Article contains additional citation context not shown here]
L. Oliker and R. Biswas, "PLUM: Parallel load balancing for adaptive unstructured meshes," Journal of Parallel and Distributed Computing, 52 (1998) 150--177. 10
....first multithreaded approaches to tackle unstructured mesh adaption, our findings and observations become extremely valuable. More extensive experiments will be done in the future, and the results compared critically with an explicit messagepassing implementation [12] and a global load balancer [11]. One must remember however that dynamic mesh adaption comprises a small though significant part of a complete application. Further investigations are needed to determine whether the functionality of such an approach is viable for real applications. ....
Oliker, L., Biswas, R.: PLUM: Parallel load balancing for adaptive unstructured meshes. NAS Technical Report NAS-97-020 (1997)
....do so are described in [1] Here, the imbalanced graph is partitioned from scratch using one of the multilevel graph partitioning algorithms described in [4, 5] The resulting partition is then intelligently mapped to the processors in order to reduce the amount of vertex migration required. In [6], a simple greedy remapping algorithm is described and shown to obtain near optimal results on application graphs. Another way to meet the above criteria is through diffusive repartitioning. Multilevel diffusion schemes have been developed that incrementally construct a new partition of the graph ....
....The effectiveness of repartitioning algorithms quite often is determined by how successful they are in load balancing the computations while minimizing the edge cut as well as the cost associated in redistributing the load in order to realize the new partitioning. Two metrics that are widely used [6, 7] for measuring this redistribution cost are TotalV which measures the total volume of data moved among all processors, and MaxV which measures maximum flow of data to or from any single processor. Specifically, TotalV is defined as the sum of the sizes of the vertices which change domains as the ....
[Article contains additional citation context not shown here]
L. Oliker and R. Biswas. Plum: Parallel load balancing for adaptive unstructured meshes. Technical Report NAS-97-020, NASA Ames Research Center, Moffett Field, CA, 1997.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC