| H. W. Hammond, `Mapping unstructured grid computations to massively parallel computers', PhD thesis, Rensselaer Polytechnic Institute, 1992. |
....program and is never modi ed at run time. Static mapping is NP complete in the general case [10] Therefore, many studies have been carried out in order to nd sub optimal solutions in reasonable time, including the development of speci c algorithms for common topologies such as the hypercube [8, 18]. When the target machine is assumed to have a communication network in the shape of a complete graph, the static mapping problem turns into the partitioning problem, which has also been intensely studied [4, 19, 27, 29, 44] However, when mapping onto parallel machines the communication network ....
.... [4, 19, 27, 29, 44] However, when mapping onto parallel machines the communication network of which is not a bus, not accounting for the topology of the target machine usually leads to worse running times, because simple cut minimization can induce more expensive long distance communication [18, 51]. 1.2 Sparse matrix ordering Many scienti c and engineering problems can be modeled by sparse linear systems, which are solved either by iterative or direct methods. To achieve eciency with direct methods, one must minimize the ll in induced by factorization. This ll in is a direct ....
[Article contains additional citation context not shown here]
S. W. Hammond. Mapping unstructured grid computations to massively parallel computers. PhD thesis, Rensselaer Polytechnic Institute, Troy, New-York, February 1992.
....then the block distribution will result, with each processor having the same (or nearly the same) number of rows, independently of the number of nonzero elements in those rows. The program for distributing arrays was run on several benchmarks including meshes originally used by Hammond [57] and test cases from the Harwell Boeing Sparse Matrix Collection [58] The characteristics of the tests are given in Table 2. The first test case is an unstructured triangular mesh around a three component airfoil, while the second test is a portion of a larger mesh representing an unstructured ....
S. W. Hammond, Mapping Unstructured Grid Computations to Massively Parallel Computers. PhD thesis, Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, 1991.
....where j (e)j is the length of the path in H used to route edge e and w(e) is a weight function indicative of the amount of communication on edge e. A strong positive correlation between values of this function and e ective execution times have been experimentally veri ed by some researchers [36] [38]. Since every hop that a message takes counts towards energy consumed, this function also encapsulates the amount of power spent on communication. Embeddability, as we de ne it, is a measure of how well a network is able to embed several guest graphs and is given by CH = 1 CG1;H 2 CG2;H ....
S.W. Hammond, Mapping unstructured grid computations to massively parallel computers, Ph.D. thesis, Ransellear Polytechnique Institute, Troy, NY, 1992.
....This class of tools visualizes mappings of scientific data structures such as meshes and matrices. Data distribution is performed automatically by partitioning algorithms and mainly aims at selected target architectures. Most popular representatives of this class are DecTool [18] GraphTool [19] and DDT [20] MATRIX [21] also visualizes run time states. Computational structure. Tools that operate on this level are associated with a concrete programming language. Existing work mainly concentrates on runtime behaviour of data structures. Well known representatives are VIPS [22] for ....
S. Hammond, Mapping Unstructured Grid Computations to Massively Pa-rallel Computers, Ph.D. thesis, Department of Computer Science, Renesselaer Polytechnic Institute, New York, 1992.
....explicit Runge Kutta method. Nearly perfect scaled parallel speed up [11] on an nCUBE 2 hypercube degrades substantially when adaptive refinement is incorporated into the local finite element method due to processor load imbalance. Parallel finite element methods often use static load balancing [12, 13] as a precursor to obtaining a finite element solution. Adaptive methods, however, require dynamic load balancing to adjust changing processor loads as the computation proceeds. In our work, we have developed a dynamic, fine grained, element based data migration algorithm called tiling that ....
Hammond, S. Mapping Unstructured Grid Computations to Massively Parallel Computers. Ph.D. Dissertation, Rensselaer Polytechnic Institute, Dept. Comp. Sci., Troy, 1992.
.... a) A 16 way partitioning of a discretized semi annulus domain using the algorithm proposed by Farhat in [Far88] b) A 4 way partitioning of a discretized L shape domain obtained by the algorithm introduced in [AlNT90] Finally, recursive bisection algorithms [Fox86b] Sim90] Wil90] Man92] S92] and [Chr92] a divide and conquer class of algorithms, have been used as a data clustering algorithms. These algorithms perform the bisection of a mesh by using either the coordinates of the mesh (or grid) points [Fox86b] or a rooted level structure or the spectral properties of the Laplacian ....
Hammond W. S. Mapping Unstructured Grid Computations to Massively Parallel Computers. PhD thesis, Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, 1992.
.... with Exdasy the user loads a mesh and creates a number of partitions using the available packages, among them the popular partitioners Chaco [8] and Metis [9] Optionally, partitions can be mapped onto a selected target computer by means of an available mapper, for example, the CPE heuristic [6]. All mapped and unmapped partitions can be explored and evaluated through graphical displays, which are chosen from a menu. Evaluation is based on the selected target computer, which can be changed through a pop up menu. Each distribution in the workspace can either be saved or further optimized ....
Hammond S.: Mapping Unstructured Grid Computations to Massively Parallel Computers, Ph.D. thesis, Department of Computer Science, Renesselaer Polytechnic Institute, 1992.
....This greedy strategy proceeds until no further improvement is possible. Unfortunately, FM is inherently sequential and in a formal sense has been proven difficult to parallelize [10] Our approach differs from that in standard serial implementations. We build on an idea described by Hammond [4] and apply FM in a pairwise fashion; that is, two processors perform FM on the subpieces of the partition they own. When only two processors are involved, FM can be quite efficient. In our approach many different pairs of processors work simultaneously following each Inertial bisection. The ....
S. Hammond, Mapping unstructured grid computations to massively parallel computers, PhD thesis, Rensselaer Polytechnic Institute, Dept. of Computer Science, Troy, NY, 1992.
....phenomenon. On massively parallel computers, finite difference and finite element methods frequently result in distributed processor load imbalances. To overcome load imbalance, many massively parallel methods use static load balancing as a preprocessor to the finite element calculation [4][6][7] 11] Adaptive finite difference and p finite element methods, which automatically refine or coarsen meshes and vary the order of accuracy of the numerical solution, offer greater robustness and computational efficiency than traditional methods by reducing the amount of computation required ....
Hammond, S. Mapping Unstructured Grid Computations to Massively Parallel Computers. Ph.D. thesis, Rensselaer Polytechnic Institute, Dept. of Computer Science, Troy, NY, 1992.
....the shape of the subdomains to control the volume of interprocessor communication for both two and three dimensional problems. The extension of the tiling system to unstructured meshes would require an interface that specifies the geometry and mesh to be used in the problem. Following Hammond [8], a static load balancing strategy [9] could be applied to the initial mesh before it is distributed to the processors, reducing the initial load imbalance and the effort required by the tiling algorithm. Finally, porting the adaptive methods and dynamic load balancing strategies to other ....
S. Hammond. Mapping Unstructured Grid Computations to Massively Parallel Computers. Ph. D. Dissertation, Rensselaer Polytechnic Institute, 1992.
....a multilevel version of RSB that can attain about an order of magnitude improvement in run time on typical examples. After its introduction in [24] RSB has found very rapid acceptance as an effective method for partitioning unstructured problems in variety of applications situations. Hammond [9] considered the implementation of an unstructured grid Euler solver on the Connection Machine 2, and found that RSB followed by cyclic pairwise exchange to be the best mapping scheme for the CM 2. Johan [11] uses RSB to partition large three dimensional finite element problems for the CM 2 and ....
S. Hammond. Mapping Unstructured Grid Computations to Massively Parallel Computers. PhD thesis, RPI, June 1992. RIACS Report 92.14. 16
....measures the likelihood of neighboring objects migrating to the object s processor, thus reducing the number of collisions. However, thrashing may occur over several iterations; additional stopping criteria are needed to end the iterations when the cost of the partition has not decreased. Hammond [22] performs pairwise exchanges of objects between pairs of processors to improve an existing decomposition. The processor graph is edge colored to allow parallel computation between pairs of adjacent processors throughout the graph. For each pair of processors, the best object to be moved in each ....
S. Hammond, Mapping unstructured grid computations to massively parallel computers, PhD thesis, Rensselaer Polytechnic Institute, Dept. of Computer Science, Troy, NY, 1992.
....deficient neighbors. Receiver Initiated Diffusion (RID) is the converse of the SID strategy in that underloaded processors request loads from overloaded neighbors. For most cases RID has been shown as being a superior approach to SID. Cyclic Pairwise Exchange is an algorithm presented by Hammond [32] in which processor pairs are defined by the hardware interconnections. Pairwise exchanges of tasks are then performed to iteratively improve an imbalanced load. This method has been shown to improve the mapping time of SA by up to a factor of six. Unfortunately this approach works best for SIMD ....
S. Hammond, Mapping unstructured grid computations to massively parallel computers. PhD thesis, Rensselaer Polytechnic Institute, Troy, New York, 1992.
....of grids so that the computation and communication work load of processors is balanced, or the appropriate placement of subgrids so that network contention is minimized. Although many heuristics for finding good suboptimal mapping solutions have been proposed in the literature (see in [6] 1] [7], 11] 8] and [2] there is no general purpose software that can be used independently of the specific characteristics and data structures of the 4 NIKOS CHRISOCHOIDES, GEOFFREY FOX, AND JOE THOMPSON application. The MTLB library aims to provide a common software framework for the ....
Hammond, W. S., Mapping Unstructured Grid Computations to Massively Parallel Computers, Ph.D Thesis, Rensselaer Polytechnic Institute, Troy, NY.
.... from the Harwell Boeing collection [18] bcspwr09, bcsstk13) and two De Bruijn networks (DEBR12, DEBR18) The numerical grids and the Harwell Boeing graphs are widely used in the literature to show the performance of different partitioning methods on real world problems, see for example [46, 28], while the De Bruijn networks are 4 regular Cayley graphs defined by shuffle and shuffle exchange permutations. All graphs have been obtained from the authors of [43] The dimensions of these graphs are listed in Table 1. Graph Number of nodes Number of edges airfoil1 4253 12289 big 15606 45878 ....
S. W. Hammond, "Mapping Unstructured Grid Computations to Massively Parallel Computers," Tech. Rep. 92.14, RIACS, Nasa Ames, 1992.
....the class of problems for which it is appropriate. Third, our method does not ignore machine architecture, but rather explicitly accounts for hypercube topology in the communication cost. Recent empirical evidence confirms that this should lead to significantly better partitions in practice [10]. Our method can also be applied to meshes since d dimensional meshes can be recursively decomposed as d dimensional hypercubes. Fourth, unlike most other approaches, our method solves the assignment problem simultaneously with the decomposition. An alternate method for using multiple eigenvectors ....
....requires, and the hop weight of a collection of messages to be the sum of their individual hop weights. We will use hopweight as our measure of the communication cost of a mapping. Recent experimental work has indicated that this is the most accurate communication metric for scientific computing [10]. With the intent of making this discussion more formal, we let M : V P be an assignment scheme that maps vertices to processors. We denote by V(q) the set of vertices assigned to a processor q, so V(q) fv 2 V : M(v) qg, and we use ae i to indicate the processor to which vertex v i is ....
[Article contains additional citation context not shown here]
S. Hammond, Mapping unstructured grid computations to massively parallel computers, PhD thesis, Rensselaer Polytechnic Institute, Dept. of Computer Science, Troy, NY, 1992.
....Iterative Local Migration Techniques exchange load between neighboring processors to improve the load balance and or decrease the communication volume. The definition of processor neighborhood can either be the hardware link or the connectivity of the split domains. The Cyclic pairwise exchange [7] pairs processors connected by a hardware link and exchanges the nodes of the mesh to improve the communication. Leiss Reddy on the other hand uses the hardware link as the neighborhood to transfer work from heavily loaded to less loaded processors. The Tiling algorithm [4] 20] extends the ....
....local migration scheme. It is based on the Leiss Reddy [9] algorithm and employs selection criteria similar to Wheat et al. 20] in transferring elements. Unlike, these approaches, however, the processors are paired during load transfers similar to the pairwise exchange heuristic used by Hammond [7]. Our pairing procedure does not pair processors connected by hardware link in the static processor graph, but rather in the dynamically changing graph representing the partitioned mesh (see Section 5) 3 Mesh Data Structures Parallel h and p refinement and dynamic redistribution algorithms for ....
S. W. Hammond. Mapping Unstructured Grid Computations to Massively Parallel Computers. PhD thesis, Computer Science Dept., Rensselaer Polytechnic Institue, Troy, 1991.
....There are several heuristics solving this problem. One method is to cluster the graph into N clusters with minimal cutsize and then performing a 1 1 mapping of the clusters onto the N processors of the given network [KERN70, KEYS94] Other methods try to find an optimal many to N mapping directly [BOIL90, HAM92]. 4 Parallel Radiosity Again most of the parallelizations of the Radiosity approach are performed on MIMD distributed memory architectures [BOUAT90, CHALM90] In contrast to raytracing all objects of the radiosity approach have to be discretized by fine patches in a highly granular way that ....
S.W. Hammond, "Mapping Unstructured Grid Computations to Massively Parallel Computers", Technical Report No. 92.14, RIACS, NASA Ames Research Center, June 1992.
....to the execution of the program and is never modified at run time. Static mapping is NP complete in the general case [6] Therefore, many studies have been carried out in order to find sub optimal solutions in reasonable time. Specific algorithms have been proposed for mesh [18] and hypercube [4, 10] topologies. When the target machine is assumed to have a communication network in the shape of a complete graph, the static mapping problem turns into the partitioning problem, which has also been intensively studied [1, 13, 15, 23] Scotch is a project carried out at the Laboratoire Bordelais ....
.... The communication cost function fC that we have chosen is the sum, for all edges, of their dilation multiplied by their weight: f C ( S;T ; ae S;T ) def = X eS2E(S) w(eS ) jae S;T (e S )j : This function, which has already been considered by several authors for hypercube target topologies [4, 10, 12], has several interesting properties: it is easy to compute, allows incremental updates performed by iterative algorithms, and its minimization favors the mapping of intensively intercommunicating processes onto nearby processors; regardless of the type of routing implemented on the target machine ....
[Article contains additional citation context not shown here]
S. W. Hammond. Mapping unstructured grid computations to massively parallel computers. PhD thesis, Rensselaer Polytechnic Institute, Troy, New-York, February 1992.
....during which there is very significant competition for the network. Hence when network congestion is important, weighting messages by the number of wires they consume should lead to better problem mappings. Empirical evidence supporting this and further discussion of the issue can be found in [6]. The computational kernel of spectral methods is the calculation of a small number of eigenvectors. We have implemented a variety of eigensolvers with different speed robustness tradeoffs. Roughly in order of increasing speed, these are Lanczos with full orthogonalization, Lanczos with full ....
S. Hammond, Mapping unstructured grid computations to massively parallel computers, PhD thesis, Rensselaer Polytechnic Institute, Dept. of Computer Science, Troy, NY, 1992.
....and dynamic when processes are moved between processors at run time. Static mapping is NP complete in the general case [6] Therefore, many studies have been carried out in order to find sub optimal solutions in reasonable time. Specific algorithms have been proposed for mesh [19] and hypercube [4, 9] topologies. When the target machine is assumed to have a communication network in the shape of a complete graph, the static mapping problem turns into the partitioning problem, which has also been intensely studied [1, 10, 13, 15, 22, 24] However, when mapping onto parallel machines the ....
.... [1, 10, 13, 15, 22, 24] However, when mapping onto parallel machines the communication network of which is not a bus, not accounting for the topology of the target machine usually leads to worse running times, because simple cut minimization can induce more expensive long distance communication [9]. This document describes the capabilities and operations of Scotch, a software package devoted to graph mapping, based on the Dual Recursive Bipartitioning algorithm. Scotch allows the user to map efficiently any kind of weighted process graph onto any kind of weighted architecture graph, whether ....
[Article contains additional citation context not shown here]
S. W. Hammond. Mapping unstructured grid computations to massively parallel computers. PhD thesis, Rensselaer Polytechnic Institute, Troy, New-York, feb 1992.
No context found.
H. W. Hammond, `Mapping unstructured grid computations to massively parallel computers', PhD thesis, Rensselaer Polytechnic Institute, 1992.
No context found.
S. W. Hammond. Mapping Unstructured Grid Computations to Massively Parallel Computers. PhD thesis, Computer Science Dept., Rensselaer Polytechnic Institue, Troy,, 1991.
No context found.
S. Hammond, Mapping unstructured grid computations to massively parallel computers, PhD thesis, Rensellaer Polytechnic Institute, Troy, NY, 1992. RIACS Report 92-14.
No context found.
S. Hammond. Mapping Unstructured Grid Computations to Massively Parallel Computers. PhD thesis, RPI, 1992.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC