43 citations found. Retrieving documents...
# W.J. Camp, S.J. Plimpton, B.A. Hendrickson, and R.W. Leland, "Massively Parallel Methods for Engineering and Science Problems, " Comm. ACM, vol. 37, no. 4, pp. 31--41, Apr. 1994.

 Home/Search   Document Details and Download   Summary   ACM   TOC   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Mapping And Fpga Global Routing Using Mean Field Annealing - Haritaoglu (1994)   (Correct)

....in the LocusRoute is to distribute the connections among channels so that channel densities are balanced. In this thesis, we propose a new approach the solution of global routing problem in FPGA s by using Mean Field Annealing technique. Second problem that is solved by MFA is the Mapping problem [4, 8, 29]. The mapping problem arises as parallel programs are developed for distributed memory architectures. Various classes of problems can be decomposed into a set of interacting sequential subproblems (tasks) which can be executed in parallel. In these classes of problems, the interaction patterns ....

B.A Hendrickson W. Camp, S. J. Plimpton and R. W. Leland. Massively parallel methods for engineering and science problems. Communication of ACM, 37(4):31--41, April 1994.


Artificial Intelligence Laboratory - Tr No January (1986)   (2 citations)  (Correct)

....orthogonal grids of figure 3 1 are wellstructured grids. By contrast, there are also unstructured non uniform grids (not shown here) where the grid points are layed out with almost complete freedom in order to matchthe boundaries and the areas where higher resolution is needed (Camp et al. [6]) Unstructured grids are very popular and very promising. A lot of researchiscurrently being done to find good ways of parallelizing unstructured grids. The above catalogue of numerical grids should put in perspectivethe uniform grids whichareused here. Uniform grids are not the most efficient ....

....apart. This is importantwhen a process migrates from a busy host to a free host, as explained in section 6. 5 (also see the appendix) The communication of data between processes is organized bymeans of a wellknown programming technique whichiscalled padding or ghost cells (Fox [19] Camp [6]) Specifically, each subregion is padded with one or more layers of extra nodes on the outside. One layer of nodes is used if the local interaction extends to a distance of one neighbor, and more layers are used if the local interaction extends further. Once the data is copied from one subregion ....

[Article contains additional citation context not shown here]

W.J. Camp, S.J. Plimpton, B.A. Hendrickson, and R.W. Leland. Massively parallel methods for engineering and science problems. Communications of the ACM,37(4): April 1994.


Implementing an Interactive Visualization System on a SIMD.. - Erbacher   (1 citation)  (Correct)

....under investigation forbid the generation of real time displays. Consequently, the cost of displaying the resulting images is not of significance. Extensive work has been done to ensure that appropriate load balancing is done on MIMD architectures to maximize the amount of parallelism achieved [CAM94]. This research, as with the other research areas discussed, has not taken any requirements for the resulting data (e.g. its organization) into account. In [SCH94] Schechter et al. describe the use of massively parallel systems for the visualization of ultrasonic pulses in solids. While they are ....

....affected by the results of these data mapping paradigms, compiler technology and dynamic load balancing. Automatic parallelizing compilers are of particular interest and are looked upon as an important future goal to aid in the acceptance of parallel systems by the general scientific communities [CAM94]. Both compiler technology and dynamic load balancing algorithms [MIG91] HIL93] for parallel systems need to take the impact of these results into account. Generally, these technologies rely on the assumption that an algorithm will perform more efficiently if the interprocessor communication is ....

W. J. Camp, S. J. Plimpton, B. A. Hendrickson, and R. W. Leland, "Massively parallel methods for engineering and science problems," Communications of the ACM, April 1994, Vol. 37, No. 4, pp. 31-41.


The LRPD Test: Speculative Run-Time Parallelization of.. - Rauchwerger, Padua (1995)   (57 citations)  (Correct)

.... simulations such as SPICE for circuit simulation, DYNA 3D and PRONTO 3D for structural mechanics modeling, GAUSSIAN and DMOL for quantum mechanical simulation of molecules, CHARMM and DISCOVER for molecular dynamics simulation of organic systems, and FIDAP for modeling complex fluid flows [11]. Thus, in order to realize the full potential of parallel computing it has become clear that static (compile time) analysis must be complemented by new methods capable of automatically extracting parallelism at run time [9, 11, 13] Run time techniques can succeed where static compilation ....

.... of organic systems, and FIDAP for modeling complex fluid flows [11] Thus, in order to realize the full potential of parallel computing it has become clear that static (compile time) analysis must be complemented by new methods capable of automatically extracting parallelism at run time [9, 11, 13]. Run time techniques can succeed where static compilation fails because they have access to the input data. For example, input dependent or dynamic data distribution, memory accesses guarded by run time dependent conditions, and subscript expressions can all be analyzed unambiguously at ....

W. J. Camp, S. J. Plimpton, B. A. Hendrickson, and R. W. Leland. Massively parallel methods for engineering and science problems. Comm. ACM, 37(4):31--41, April 1994.


Run-Time Methods for Parallelizing Partially Parallel Loops - Rauchwerger, Amato, Padua (1995)   (18 citations)  (Correct)

.... Examples include SPICE for circuit simulation, DYNA 3D and PRONTO 3D for structural mechanics modeling, GAUSSIAN and DMOL for quantum mechanical simulation of molecules, CHARMM and DISCOVER for molecular dynamics simulation of organic systems, and FIDAP for modeling complex fluid flows [8]. Due to space limitations, this paper is an extended abstract of [24] Center for Supercomputing Research Development, 1308 W. Main St. Urbana, IL 61801, email: rwerger,padua csrd.uiuc.edu. Research supported in part by Intel and NASA Graduate Fellowships, and Army contract ....

....Research supported in part by anAT T Bell Laboratories Graduate Fellowship, NSF Grant CCR9315696, and the International Computer Science Institute, Berkeley, CA. Thus, since the available parallelism in theses types of applications cannot be determined statically by present parallelizing compilers [6, 8], compile time analysis must be complemented by new methods capable of automatically extracting parallelism at run time. Run time techniques are needed because the access pattern of some programs cannot be statically determined, either because of limitations of current analysis algorithms or ....

[Article contains additional citation context not shown here]

W. J. Camp, S. J. Plimpton, B. A. Hendrickson, and R. W. Leland. Massively parallel methods for engineering and science problems. Comm. ACM, 37(4):31--41, April 1994.


Scalable Parallel Algorithms for Solving Sparse Systems of Linear.. - Gupta   (Correct)

....used to order the matrices in Table 2. This choice of the ordering scheme was prompted by two factors. First, there is increasing evidence that spectral orderings offer a good balance between generality of application and the quality of ordering both in terms of load balance and fill reduction [6]. Second, the SND algorithm itself can be parallelized efficiently, whereas most other ordering schemes do not appear to be as well suited for parallelization. Although, at the present time we compute the ordering on a serial computer, SND is our ordering algorithm of choice in a prospective ....

W. J. Camp, S. J. Plimpton, B. A. Hendrickson, and R. W. Leland. Massively parallel methods for engineering and science problems. Communications of the ACM, 37(4):31--41, April 1994.


Partitioning Similarity Graphs: A Framework for Declustering.. - Liu, Shekhar (1996)   (11 citations)  (Correct)

....have yielded algorithms that can partition graphs with a hundred thousand nodes in a couple of minutes [21] on workstations. Similar run times (e.g. within 50 seconds on a sparc 2 workstation for partitioning a graph with 170,000 nodes and 230 K edges into 64 partitions) are also reported in [2]. An incremental declustering algorithm is presented in Section 3.1 and is experimentally evaluated for a large data set in Section 6.3.4. In future work, we would like to evaluate the suitability of the latest graph partitioning algorithm for declustering. 1 Interestingly, min cut graph ....

W. J. Camp, S. J. Plimpton, and et. al. "Massively Parallel Methods for Engineering and Science Problems". Communication of ACM, 37(4):31--41, 1994.


Hypergraph Model for Mapping Repeated Sparse Matrix-Vector.. - Catalyurek, Aykanat   (Correct)

....are usually routed over the shortest paths of links between the communicating pairs of processors. Hence, multihop messages are usually weighted with the distances between the respective pairs of processors in the network, while considering their contribution to the overall communication cost [2, 3, 10]. Here, distance refers to the number of communication links and switching elements along the communication route in static and dynamic interconnection networks, respectively. The mapping methods proposed in the literature employ graph model of computation [2, 3, 10] In this work, we show the ....

....the overall communication cost [2, 3, 10] Here, distance refers to the number of communication links and switching elements along the communication route in static and dynamic interconnection networks, respectively. The mapping methods proposed in the literature employ graph model of computation [2, 3, 10]. In this work, we show the deficiencies of the graph model for mapping sparse matrix vector product computations, and propose a hypergraph model which avoids these deficiencies. Kernighan Lin [8] KL) based heuristics are fast heuristics widely used for solving the mapping problem [2, 10] ....

Camp, W. J., Plimpton, S. J., Hendrickson, B. A., and Leland, R. W., "Massively parallel methods for Engineering and Science problem, " Communication of ACM, vol. 37, no. 4, pp. 31--41, April 1994.


The LRPD Test: Speculative Run-Time Parallelization of.. - Rauchwerger, Padua (1995)   (57 citations)  (Correct)

.... simulations such as SPICE for circuit simulation, DYNA 3D and PRONTO 3D for structural mechanics modeling, GAUSSIAN and DMOL for quantum mechanical simulation of molecules, CHARMM and DISCOVER for molecular dynamics simulation of organic systems, and FIDAP for modeling complex fluid flows [11]. Thus, in order to realize the full potential of parallel computing it has become clear that static (compile time) analysis must be complemented by new methods capable of automatically extracting parallelism at run time [9, 11, 14] Run time techniques can succeed where static compilation ....

.... of organic systems, and FIDAP for modeling complex fluid flows [11] Thus, in order to realize the full potential of parallel computing it has become clear that static (compile time) analysis must be complemented by new methods capable of automatically extracting parallelism at run time [9, 11, 14]. Run time techniques can succeed where static compilation fails because they have access to the input data. For example, input dependent or dynamic data distribution, memory accesses guarded by run time dependent conditions, and subscript expressions can all be analyzed unambiguously at ....

W. J. Camp, S. J. Plimpton, B. A. Hendrickson, and R. W. Leland. Massively parallel methods for engineering and science problems. Comm. ACM, 37(4):31--41, April 1994.


The LRPD Test: Speculative Run-Time Parallelization of.. - Rauchwerger, Padua (1995)   (57 citations)  (Correct)

..... D. Padua is with the Department of Computer Science, University of Illinois at Urbana Champaign, Urbana, IL 61820. E mail: padua cs.uiuc.edu . tion of molecules, CHARMM and DISCOVER for molecular dynamics simulation of organic systems, and FIDAP for modeling complex fluid flows [12]. Thus, in order to realize the full potential of parallel computing it has become clear that static (compile time) analysis must be complemented by new methods capable of automatically extracting parallelism at run time [10] 12] 15] Run time techniques can succeed where static ....

.... of organic systems, and FIDAP for modeling complex fluid flows [12] Thus, in order to realize the full potential of parallel computing it has become clear that static (compile time) analysis must be complemented by new methods capable of automatically extracting parallelism at run time [10] [12], 15] Run time techniques can succeed where static compilation fails because they have complete information about the access pattern. For example, input dependent or dynamic data distribution, memory accesses guarded by run time dependent conditions, and subscript expressions can all be ....

W. J. Camp, S. J. Plimpton, B. A. Hendrickson, and R. W. Leland. Massively parallel methods for engineering and science problems. Comm. ACM, 37(4):31--41, April 1994.


Run-Time Parallelization: It's Time Has Come - Rauchwerger (1998)   (3 citations)  (Correct)

....the fraction of statically non analyzable codes. It is widely assumed that more than 50 of codes [20] are of the irregular type. Thus, in order to realize the full potential of parallel computing it has become clear that static (compile time) analysis must be augmented with new methods [8, 12, 15]. We need techniques that let us access the information necessary to decide if a loop is parallel and perform parallelizing transformations. The only time this data is available is during program execution, at run time. Run time techniques can succeed where static compilation fails because they ....

W. J. Camp, S. J. Plimpton, B. A. Hendrickson, and R. W. Leland. Massively parallel methods for engineering and science problems. Comm. ACM, 37(4):31--41, April 1994.


Run-Time Methods For Parallelizing Do Loops - Rauchwerger, Padua   (Correct)

....(c) Count the total number of write accesses to A that are marked in this iteration, and store the result in tw i (A) where i is the iteration number. S1: DO i = 1, n S2: A[R1[i] S3: A[W[i] S4: A[R2[i] S5: ENDDO R1[1:8] 2 2 2 10 8 8 8 10] W[1:8] [ 1 3 5 4 7 3 6 12] R2[1:8] 1 3 2 10 7 3 8 12] Position in shadow arrays Written Counted 1 2 3 4 5 6 7 8 9 10 11 12 tw(A) tm(A) Aw [1 : 12] 1 0 1 1 1 1 1 0 0 0 0 1 8 7 A r [1 : 12] 0 1 0 0 0 0 0 1 0 1 0 0 A np [1 : 12] 0 1 0 0 0 0 0 1 0 1 0 0 Aw [1 : 12] A r [1 : 12] 0 0 0 0 0 0 0 0 0 0 0 0 Aw [1 : 12] ....

....total number of write accesses to A that are marked in this iteration, and store the result in tw i (A) where i is the iteration number. S1: DO i = 1, n S2: A[R1[i] S3: A[W[i] S4: A[R2[i] S5: ENDDO R1[1:8] 2 2 2 10 8 8 8 10] W[1:8] 1 3 5 4 7 3 6 12] R2[1:8] [ 1 3 2 10 7 3 8 12] Position in shadow arrays Written Counted 1 2 3 4 5 6 7 8 9 10 11 12 tw(A) tm(A) Aw [1 : 12] 1 0 1 1 1 1 1 0 0 0 0 1 8 7 A r [1 : 12] 0 1 0 0 0 0 0 1 0 1 0 0 A np [1 : 12] 0 1 0 0 0 0 0 1 0 1 0 0 Aw [1 : 12] A r [1 : 12] 0 0 0 0 0 0 0 0 0 0 0 0 Aw [1 : 12] A np [1 : 12] 0 0 0 0 0 0 0 0 0 ....

W. J. Camp, S. J. Plimpton, B. A. Hendrickson, and R. W. Leland, Massively parallel methods for engineering and science problems, Comm. ACM, 37(4):31--41, 1994.


Maxwell's demon, rectifiers, and the second law: Computer.. - Skordos (1992)   (Correct)

....orthogonal grids of figure 3 1 are wellstructured grids. By contrast, there are also unstructured non uniform grids (not shown here) where the grid points are layed out with almost complete freedom in order to match the boundaries and the areas where higher resolution is needed (Camp et al. [6]) Unstructured grids are very popular and very promising. A lot of research is currently being done to find good ways of parallelizing unstructured grids. The above catalogue of numerical grids should put in perspective the uniform grids which are used here. Uniform grids are not the most ....

....apart. This is important when a process migrates from a busy host to a free host, as explained in section 6. 5 (also see the appendix) The communication of data between processes is organized by means of a wellknown programming technique which is called padding or ghost cells (Fox [19] Camp [6]) Specifically, each subregion is padded with one or more layers of extra nodes on the outside. One layer of nodes is used if the local interaction extends to a distance of one neighbor, and more layers are used if the local interaction extends further. Once the data is copied from one subregion ....

[Article contains additional citation context not shown here]

W.J. Camp, S.J. Plimpton, B.A. Hendrickson, and R.W. Leland. Massively parallel methods for engineering and science problems. Communications of the ACM, 37(4):31--41, April 1994.


Partitioning Similarity Graphs: A Framework for Declustering.. - Liu, Shekhar (1996)   (11 citations)  (Correct)

....6 have yielded algorithms that can partition graphs with a hundred thousand nodes in a couple of minutes [21] on workstations. Similar run times (e.g. within 50 seconds on a sparc 2 workstation for partitioning a graph with 170,000 nodes and 230 K edges into 64 partitions) are also reported in [2]. An incremental declustering algorithm is presented in Section 3.1 and is experimentally evaluated for a large data set in Section 6.3.4. In future work, we would like to evaluate the suitability of the latest graph partitioning algorithm for declustering. 3 Heuristic Techniques for Max Cut ....

W. J. Camp, S. J. Plimpton, and et. al. "Massively Parallel Methods for Engineering and Science Problems". Communication of ACM, 37(4):31--41, 1994.


Hypergraph-Partitioning Based Decomposition for Parallel.. - Catalyurek, Aykanat   (11 citations)  (Correct)

....each row and column of each submatrix has at least one nonzero. 1 The computational graph model is widely used in the representation of computational structures of various scientific applications, including repeated SpMxV computations, to decompose the computational domains for parallelization [5, 6, 20, 21, 27, 28, 31, 36]. In this model, the problem of sparse matrix decomposition for minimizing the communication volume while maintaining the load balance is formulated as the well known K way graph partitioning problem. In this work, we show the deficiencies of the graph model for decomposing sparse matrices for ....

W. Camp, S. J. Plimpton, B. Hendrickson, and R. W. Leland, "Massively parallel methods for engineering and science problems," Communication of ACM, vol. 37, pp. 31--41, April 1994.


Analysis and Design of Scalable Parallel Algorithms for Scientific .. - Gupta (1995)   (2 citations)  (Correct)

....to order the matrices in Table 4.2. This choice of the ordering scheme was prompted by two factors. First, there is increasing evidence that spectral orderings offer a good balance between generality of application and the quality of ordering both in terms of load balance and fill reduction [20]. Second, the SND algorithm itself can be parallelized efficiently, whereas most other ordering schemes do not appear to be as well suited for parallelization. Although, at the present time we compute the ordering on a serial computer, SND is our ordering algorithm of choice in a prospective ....

W. J. Camp, S. J. Plimpton, B. A. Hendrickson, and R. W. Leland. Massively parallel methods for engineering and science problems. Communications of the ACM, 37(4):31--41, April 1994.


A Scalable Parallel Algorithm for Sparse Matrix Factorization - Gupta, Kumar (1994)   (7 citations)  (Correct)

....used to order the matrices in Table 2. This choice of the ordering scheme was prompted by two factors. First, there is increasing evidence that spectral orderings offer a good balance between generality of application and the quality of ordering both in terms of load balance and fill reduction [6]. Second, the SND algorithm itself can be parallelized efficiently, whereas most other ordering schemes do not appear to be as well suited for parallelization. Although, at the present time we compute the ordering on a serial computer, SND is our ordering algorithm of choice in a prospective ....

W. J. Camp, S. J. Plimpton, B. A. Hendrickson, and R. W. Leland. Massively parallel methods for engineering and science problems. Communications of the ACM, 37(4):31--41, April 1994.


Mapping Tasks To Processors With The Aid Of Kohonen Networks - Heiß, Dormanns   (Correct)

....is only affected by other geometrically nearby discretization nodes. Exploiting these features leads to very effective geometrically oriented mapping strategies that are based on recursive partitioning the solution domain but do not take into account the interconnection topology of the machine. [4, 6, 8, 18, 20] Our proposal based on self organizing maps tries to combine the benefits of all three types of approach: It can be well parallelized, can run on the parallel computer itself at load time, it can be even executed at runtime to deal with dynamic load changes, and it has similar statistical ....

W.S. Camp, S.J. Plimpton, B.A. Hendrickson and R.W. Leland, Massively Parallel Methods for Engineering and Science Problems. CACM 34,4 (April 1994) pp. 31-41


Automatic Identification of Parallel Units and.. - Evans, Goscinski (1997)   (1 citation)  (Correct)

....some order of execution imposed upon them. Sequences have been identified with their passive and active components but some parts of the program may still be independent. Independent components cannot be executed in any fashion. There is normally some time or state dependency [Aho et al. 1988; Camp et al. 1994; Lilja, 1994; Watson, 1989] on the variables used by these independent components. An example of this is the use of system states or timing by a system clock that a procedure requires to read at during the procedure s execution. This procedure can be run in parallel but its execution should be ....

Camp, W.J., Plimpton, S.J., Hendrickson, B.A, and Leland, R.W. (1994) Massively Parallel Methods for Engineering and Scientific Problems, Communications of the ACM, Vol. 37, No.


Experimental Evaluation of Efficient Sparse Matrix Distributions - Manuel Ujaldon   (Correct)

....nor MRD BRS precludes a particular alignment choice the performance of a particular choice depends on the characteristics of the input matrix. 8 Related Work Much research has been done in sparse matrix distribution. Most of these techniques are derived from unstructured graph partitioning. [4] outlines three basic approaches of graph partitioning spectral bisection, graph bisection and coordinate bisection. All three methods result in good locality and load balance. However, they do not consider the preprocessing costs required to locate distributed data and the memory overheads ....

W.J. Camp, S.J. Plimpton, B.A. Hendrickson and R.W. Leland. Massively Parallel Methods for Engineering and Science Problems, Comms. of The ACM. Vol.37, No.4, pp 3141. April 1994.


Topology Preserving Dynamic Load Balancing for Parallel.. - Hegarty, Kechadi (1997)   (1 citation)  (Correct)

.... realizations, since the domain can be decomposed either spatially or along the molecular chain, and then a simple extension of the serial algorithm applied (at least for molecular mechanics Monte Carlo methods need a little more work to ensure the criterion of detailed balance is adhered to) [5, 6, 19]. However, since the problems are completely dynamic in nature, any such decomposition will lead to an inefficient parallelization after some time. Dynamic load balancing must then be applied to increase a simulations efficiency. To resolve the problem several load balancing techniques have been ....

W.J. Camp, S.J. Plimpton, B.A. Hendrickson, and R.W. Leland. Massively Parallel Methods for Engineering and Science Problems. Communication of the ACM, 37(4):31--41, April 1994.


Vienna-Fortran/HPF Extensions for Sparse and.. - Ujaldon, Zapata.. (1997)   (3 citations)  (Correct)

....decomposing the sparse global domain into as many sparse local domains as required. 3. 1 Multiple Recursive Decomposition (MRD) Common approaches for partitioning unstructured meshes while keeping neighborhood properties are based upon coordinate bisection, graph bisection and spectral bisection [8, 19]. Spectral bisection minimizes communication, but requires huge tables to store the boundaries of each local region and an expensive algorithm to compute it. Graph bisection is algorithmically less expensive, but also requires large data structures. Coordinate bisection significantly tends to ....

W.J. Camp, S.J. Plimpton, B.A. Hendrickson and R.W. Leland. Massively Parallel Methods for Engineering and Science Problems, Comms. ACM. Vol.37, No.4, pp 31-41. April 1994.


The LRPD Test: Speculative Run-Time Parallelization of.. - Rauchwerger, Padua (1999)   (57 citations)  (Correct)

No context found.

# W.J. Camp, S.J. Plimpton, B.A. Hendrickson, and R.W. Leland, "Massively Parallel Methods for Engineering and Science Problems, " Comm. ACM, vol. 37, no. 4, pp. 31--41, Apr. 1994.


A Hypergraph-Partitioning Approach for Coarse-Grain - Decomposition Umit Ataly   (Correct)

No context found.

W. Camp, S. J. Plimpton, B. Hendrickson, and R. W. Leland. Massively parallel methods for engineering and science problems. Communication of ACM, 37(4):31--41, April 1994.


Efficient Communication Algorithms for Pipeline Multicomputers - Kok Kin   (Correct)

No context found.

Camp, W. J., Plimpton, S. J., Hendrickson, B. A., and Leland, R. W., "Massively Parallel Methods for Engineering and Science Problems", Communications of the ACM, Vol. 37, No. 4, pp. 31-41, April 1994.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC