44 citations found. Retrieving documents...
Raja Das, Ravi Ponnusamy, Joel Saltz, and Dimitri Mavriplis. Distributed memory compiler methods for irregular problems -- data copy reuse and runtime partitioning. ICASE Report 91--73, Institute for Computer Applications in Science and Engineering, NASA Langley Research Center, Hampton, Virginia, September 1991.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Self-adapting Numerical Software for Next Generation.. - Dongarra, Eijkhout (2002)   (2 citations)  (Correct)

....of this will also overwhelm the cost of a single matrix vector multiplication, so only when many are to be performed is optimization worthwhile. Completely Run time optimization This is the scenario in which just in time (JIT) compilers work [15, 35, 1] as well as the inspector executor model [20] and other dynamic compilation systems [24, 6, 23] In these cases, one has essentially all information about a problem instance, but the least time available to optimize. A standard example of inspector executor is to examine the sparsity pattern of a sparse matrix on a parallel machine at run, ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis. Distributed memory compiler methods for irregular problems -- data copy reuse and runtime partitioning. ICASE Report 91-73, NASA Langley Research Center, Sept 1991.


Adaptive Time-Dependent CFD on Distributed Unstructured Meshes - Walshaw, Berzins (1995)   (Correct)

....it evolves it is desirable to use a dynamic load balancing technique. This approach is in contrast to work on regular grids which lend themselves to parallelisation via compiler directives Fortran 90 HPF and also differs from recent work on a distributed memory compiler for irregular problems, [3], which uses a static mesh decomposition executed at runtime. 2. Software Outline and Development Strategy An outline of the structure of the code is given in Figure 1. The four main Modules are mesh generation, evaluation of the residual of the ODE system, time stepping and mesh refinement ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis. Distributed Memory Compiler Methods for Irregular Problems -- Data Copy Reuse and Runtime Partitioning. ICASE Rep. No. 91-73, Institute for Computer Applications in Science and Engineering, 1991.


Compiler Support for Machine-Independent Parallelization of.. - von Hanxleden (1994)   (7 citations)  (Correct)

.... add of f(j loc(1:j cnt) after the executor (line 81) An optimization that is not implemented yet but at least conceptionally fairly straightforward is to use incremental schedules for pruning messages in case at least some of the data covered by a reference are already locally available [DPSM91, HKK 92] The information about what data are already available is stored for each node n 2 N in Read.GIVEN(n) 5.3.6 Reduction initialization An issue specific to reduction communications (such as Add and Mult) is the need for initializing buffer space for non local data (assigning 0 for ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis. Distributed memory compiler methods for irregular problems --- Data copy reuse and runtime partitioning. ICASE Report 91-73, Institute for Computer Application in Science and Engineering, Hampton, VA, September 1991.


Compiling for Hierarchical Shared-Memory Multiprocessors - Martens, Jayasimha (1994)   (Correct)

....level of the hierarchy sharing can occur; note that there is no sharing unless there is a loop carried dependence, so we restrict our graph for these purposes to vertices and edges involved in loop carried flow dependences. This is similar to Das et al. s bipartite runtime dependence graph (BRDG)[8], except that due to the simpler class of loops considered here our dependence graph is built and (mostly) analyzed at compile time. P C C Figure 25: Producer consumer sharing. Leaves represent PEs and interior nodes hierarchical memory units. A producer (P) and two consumers (C) share data ....

Raja Das, Ravi Ponnusamy, Joel Saltz, and Dimitri Mavriplis. Distributed Memory Compiler Methods for Irregular Problems---Data Copy Reuse and Runtime Partitioning. In J. Saltz and P. Mehrotra, editors, Languages, Compilers, and Run-Time Environments for Distributed Memory Machines, pages 185--219. Elsevier Science Publishers, 1992.


Non-uniform Irregular Communication Exchange on.. - Jhy-Chun Wang Tseng-Hui   (Correct)

....of this problem requires accessing off processor elements of y and y old. On distributed memory machines, it is inefficient to fetch individual off processor data because of high communication startup latency. Several off processor fetches can be combined together by using runtime inspectors [6]. NICE primitives can then be used to efficiently schedule the communication and to fetch off processor data. Note that the scheduling tables can be reused as long as the same set of off processor accesses is used, i.e. the array node is not changed. 6 Experimental Results We have implemented ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis. Distributed memory compiler methods for irregular problems---data copy reuse and runtime partitioning. In J. Saltz and P. Mehrotra, editors, Compilers and Runtime Software for Scalable Multiprocessors. Elsevier, Amsterdam, The Netherlands, 1991. To appear.


Distributed Array Management For HPF Compilers - Mahéo, Pazat (1993)   (1 citation)  (Correct)

....of processors. The memory overhead induced by this layout remains small. Proposals have also been made to allow for alignment while keeping memory use at a minimum [6] To complement the management of local partitions, several ad hoc techniques (temporary scalars [2] buffers [11] hash tables [7], extension of local partition [12] are used for handling received data. The additional memory space required varies according to the chosen technique but must remain acceptable. 2 Yves Mah eo, Jean Louis Pazat 2.2 Global to Local Index Conversion In order to take into account a large class of ....

....from the optimization techniques used in compilers. It avoids using multiple representations of the same array in different parts of a program. The page driven array manageemnt also seems to be appropriate for irregular computations and could be used together with the inspector executor technique [7]. We plan to carefully compare our management scheme with shared virtual memory systems and try to find out if HPF compilers can efficiently combine a shared virtual memory with the owner write rule or if a specific runtime support is more efficient. 12 Yves Mah eo, Jean Louis Pazat ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis. Distributed Memory Compiler methods for irregular problems - data copy reuse and runtime partitioning. In Third Workshop on Compilers for Parallel Computers, pages 185--219, Austrian Center for Parallel Computation, July 1992.


Parallel Implementations of Irregular Problems using.. - Panwar, Kim, Agha (1996)   (1 citation)  (Correct)

....of data decomposition and distribution policies for regular problems to improve execution efficiency on distributed memory multicomputers. For irregular problems, such a priori determination of the necessary data distribution is not feasible. To address this problem in some limited cases, PARTI [5] and Kali [17] transform a user defined for loop to an inspector executor pair. In these languages the compiler assumes whole responsibility to uncover concurrency characteristics. In many regular problems using dense matrices, compiler tools may exploit most of the useful parallelism. However, an ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis. Distributed Memory Compiler Methods for Irregular Problems - Data Copy Reuse and Runtime Partitioning. In J. Saltz and P. Mehrotra, editors, Languages, Compilers and Run-Time Environments for Distributed Memory Machines. Elsevier Science Publishers, 1992.


A Robust Parallel Programming Model for Dynamic Non-Uniform.. - Scott Kohn (1994)   (32 citations)  (Correct)

....substantial cost. Applications incur considerable runtime overhead in the computation of communication schedules, which describe the interprocessor communication necessary to satisfy data dependencies. While the cost of generating a communication schedule may be amortized by reusing the schedule [12], this is often not possible for dynamic applications with rapidly changing data dependencies. LPARX supports a different style of data decomposition: block structured, irregular decompositions. Such data distributions provide more freedom in balancing workload than either block or cyclic ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis, "Distributed memory compiler methods for irregular problems --- data copy reuse and runtime partitioning, " Tech. Rep. 91--73, ICASE, Hampton, VA, September 1991.


Parallel Programming Abstractions for Dynamic Non-Uniform.. - Scott Kohn   (Correct)

....and fine grained, applications incur substantial run time overhead in the computation of communication schedules, which describe the interprocessor communication necessary to satisfy data dependencies. While the cost of generating a communication schedule may be amortized by reusing the schedule [13], schedule reuse is often not possible for dynamic applications with rapidly changing data dependencies. LPARX supports a more general style of data decomposition: block structured, irregular partitioning. Because such decompositions are irregular, they provide more freedom in balancing workload ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis, Distributed memory compiler methods for irregular problems --- data copy reuse and runtime partitioning, Tech. Rep. 91--73, ICASE, Hampton, VA, September 1991.


Handling Irregular Problems with Fortran D - A Preliminary Report - von Hanxleden (1993)   (32 citations)  (Correct)

....and an fscatter add of f(j loc(1:j cnt) after the executor. An optimization that is not implemented yet but at least conceptionally fairly straightforward is to use incremental schedules for pruning messages in case at least some of the data covered by a reference are already locally available [DPSM91, HKK 92] The information about what data are already available is stored for each node n 2 N in Read.GIVEN(n) 4.4 Reduction initialization An issue specific to reduction communications (like Add and Mult) is the need for initializing buffer space for non local data (assigning 0 for Add, 1 ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis. Distributed memory compiler methods for irregular problems --- data copy reuse and runtime partitioning. ICASE Report 91-73, Institute for Computer Application in Science and Engineering, Hampton, VA, September 1991.


An Implementation of the LPAR Parallel Programming Model for.. - Scott Kohn (1993)   (10 citations)  (Correct)

....significantly impact performance and could be eliminated by an optimizing compiler. After all data requests have been satisfied, LPAR discards its communication schedules. LPAR could amortize the cost of creating communication schedules by saving and reusing them later in the computation. PARTI [9] takes this approach. However, LPAR applications are dynamic and their communication schedules change frequently, providing few opportunities for reuse. Fortunately, collections typically contain relatively few maps since they represent coarse grained decompositions. Therefore, LPAR communication ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis, Distributed memory compiler methods for irregular problems --- data copy reuse and runtime partitioning, Tech. Rep. 91--73, ICASE, Hampton, VA, September 1991.


Contract Compile time and run time analysis for.. - Ford, Nisbet.. (1995)   (Correct)

.... Compiler technology that exploits such research by transforming sequentially consistent programs into programs which dynamically select the appropriate consistency model is very much in its infancy [10] VSM optimisations which are similar to those described in this report have been presented in [13, 8, 5]. Previous work has not presented detailed application studies, or has provided experimental results for implementations of VSM optimisations that are believed to be unprotected 1 . The work presented in this report describes how protected implementations of local invalidate and local exclusive ....

....Kluwer Press, May 1995. 4] A. Dickinson et al. Implementation and Initial Results from a Parallel Version of the Meteorological Office Atmosphere Prediction Model. In Coming of Age: Proceedings of the Sixth ECMWF Workshop on the use of Parallel Processors in Meteorology. World Scientific, 1994. [5] B. Falsafi et al. Application Specific Protocols for User Level Shared Memory. In Supercomputing 94. IEEE Press. 6] A.J.G Hey. The GENESIS Distributed Memory Benchmarks. Parallel Computing 17(10 11) 1991. 7] Kuck and Associates Inc. Champaign Illinois. KAP User s Guide, 1988. 8] A.R. Lebeck ....

[Article contains additional citation context not shown here]

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis. Distributed Memory Compiler Methods for Irregular Problems - Data Copy Reuse and Runtime Partitioning. Technical Report 91-73, ICASE - NASA Langley RC, September 1991.


Irregular Loop Patterns Compilation On Distributed Shared.. - Hahad, Priol, Erhel (1994)   (1 citation)  (Correct)

.... n and op is a commutative and associative operation. In this study, we suppose that L is duplicated among the processors. In very large problems, L can be actually shared rather than duplicated since read only data (as L) does not disturb the system performance significantly [5] On DMPCs, PARTI [9, 4, 7] is one of the most advanced projects in resolving such problems. It has been grafted to several HPF compilers such as FORTRAN D [10, 11] and Vienna Fortran[3] The next section of that paper will go over the context of our study. Section 3 introduces the very heart of our proposal which is the ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis. Distributed Memory Compiler Methods for Irregular Problems - Data Copy Reuse and Runtime Partitioning. Technical Report 91-73, ICASE - NASA Langley RC, September 1991.


A Branch-and-Bound Algorithm for Array Distributions - Dierstein, Hayer, Rauber   (Correct)

....have been proposed that provide the programmer with a shared address space. These efforts include the development of new or extended languages like Fortran90, Fortran D [11] or Vienna Fortran [2] and parallelization systems like aspar [12] superb [7] 8] mimdizer [16] Kali [14] parti [3], Crystal [15] and paradigm [9] Earlier systems allow the programmer to write his program using global data references but require the distribution of the program data to be specified. The specified data distribution is used to guide the transformation of the input program into a SPMD (single ....

....the decision tree very accurate. Although the method shows good results, it is limited to the computation of data distribution for arrays. Future research may address the problem of distributing other data structures like lists, trees, graphs and so on. There have been approaches in this direction [3], 14] but as yet the problem has withstood automation. We could also optimize the distribution for arrays in some ways: the existing implementation only allows regular block distributions for arrays. Allowing more general distributions may result in better performance of the parallelized ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis. Distributed Memory Compiler Methods for Irregular Problems -- Data Copy Reuse and Runtime Partitioning. In Languages, Compilers and Run-- Time Environments for Distributed Memory Machines, pages 185--219. North--Holland, 1992.


Parallel Implementations of Irregular Problems using.. - Panwar, Kim, Agha (1996)   (1 citation)  (Correct)

....of data decomposition and distributionpolicies for regular problems to improve execution efficiency on distributed memory multicomputers. For irregular problems, such a priori determination of the necessary data distribution is not feasible. To address this problem in some limited cases, PARTI [4] and Kali [10] transform a user defined for loop to an inspector executor pair. In these languages the compiler assumes the entire responsibility to uncover concurrency characteristics. In many regular problems using dense matrices, compiler tools may exploit most of the useful parallelism. ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis. Distributed Memory Compiler Methods for Irregular Problems - Data Copy Reuse and Runtime Partitioning. In J. Saltz and P. Mehrotra, editors, Languages, Compilers and Run-Time Environments for Distributed Memory Machines. Elsevier Science Publishers, 1992.


A Methodology for Programming Scalable Architectures - Panwar, Agha (1994)   (2 citations)  (Correct)

....The ALIGN statement is used to map arrays onto decompositions. The decompositions are mapped to the physical machine by using the DISTRIBUTE statement. Both Vienna Fortran and Fortran D allow certain intrinsic distribution functions such as BLOCK, CYCLIC, and BLOCK CYCLIC. Kali and PARTI [18] support sparse and unstructured computations using distributed arrays accessed using indirection. They transform a sequential loop into two constructs namely, inspector and the executor. During program execution, the inspector loop examines the data references made by a processor and calculates ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis. Distributed memory compiler methods for irregular problems - data copy reuse and runtime partitioning. In J. Saltz and P. Mehrotra, editors, Languages, Compilers and Run-Time Environments for Distributed Memory Machines. Elsevier Science Publishers, 1992.


Scheduling of Unstructured Communication on the Intel iPSC/860 - Jhy-Chun Wang   (Correct)

....The contents do not necessarily reflect the position or the policy of the United States government and no official endorsement should be inferred. y Jhy Chun Wang s current address is Department of Computer Science, University of Illinois at Urbana Champaign, email: jcwang cs.uiuc.edu . PARTI [8, 11] derive the necessary communication information based on the data required for performing the local computations and data partitioning. This tends to result in unstructured communication patterns. Each processor needs to send messages to some number of processors, with no obvious patterns. ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis. Distributed Memory Compiler Methods for Irregular Problems---Data Copy Reuse and Runtime Partitioning. In J. Saltz and P. Mehrotra, editors, Compilers and Runtime Software for Scalable Multiprocessors. Elsevier, Amsterdam, The Netherlands, 1991.


PYRROS: Static Task Scheduling and Code Generation for.. - Yang, Gerasoulis (1992)   (49 citations)  (Correct)

....shared memory model of computation. The KALI system by Koelbel and Mehrota [14] addresses parallel code generation and is currently targeted at DOALL parallelism. Kennedy s group [13] is also working on code generation for FORTRAN D on distributed memory machines. The PARTI system by Saltz s group [4] focuses on irregular dependence graphs determined at run time and optimizes the performance by precomputing the data accessing pattern. The current FORTRAN D system, KALI and PARTI do not address program scheduling for optimizing the mapping of program and data and the ordering of code execution. ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis, Distributed Memory Compiler Methods for Irregular Problems: Data Copy Reuse and Runtime Partitioning, ICASE Report No. 91-73, 1991.


Static and Runtime Scheduling of Unstructured Communication - Ranka, Wang (1993)   (3 citations)  (Correct)

....been developed in [1, 16] For a large class of scientific problems, which are irregular in nature, achieving a good mapping is considerably more difficult [7] The nature of this irregularity may not be known at the time of compilation, and can be derived only at run time. Packages like PARTI [9, 12, 15] derive the necessary communication information based on the data required for performing the local computations and data partitioning. This tends to result in unstructured communication patterns. Each processor needs to send messages to some number of processors, with no obvious patterns. ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis. Distributed memory compiler methods for irregular problems - data copy reuse and runtime partitioning. In J. Saltz and P. Mehrotra, editors, Compilers and Runtime Software for Scalable Multiprocessors. Elsevier, Amsterdam, The Netherlands, 1991. to appear.


pC*: Efficient and Portable Runtime Support for Data-Parallel.. - Bigot (1996)   (Correct)

....operation at each position in the array. There is support within Chaos and similar libraries to detect where a communication schedule can be re used between iterations and in some cases between different code sections, and where values from off node may be re used without being requested again (Das, Ponnusamy, Saltz, Mavriplis, 1992; Agrawal, Sussman, Saltz, 1993; Ponnusamy, Saltz, Choudhary, 1993) This re use may be aided by compile time analysis, or determined at runtime by destroying a schedule when an index expression on which it is modified. The problem domain addressed by the Chaos library is quite different from ....

....to access local elements directly, by providing a schedule which points to local data in their local positions and remote data in a separate buffer. Buffers for remote data in communication schedule systems are often allocated as ghost cells at the end of the local portion of a parallel value (Das et al. 1992). In the worst case of accessing only remote data, the external buffers must be the same size as the local data, meaning that no space savings is achieved; in fact, more space is required to store the schedule itself. However, the copying cost will be avoided for intra node references. This is in ....

Das, R., Ponnusamy, R., Saltz, J., & Mavriplis, D. (1992). Distributed memory compiler methods for irregular problems---data copy reuse and runtime partitioning. In Saltz, J., & Mehrotra, P. (Eds.), Languages, Compilers, and Run-Time Environments for Distributed Memory Machines, pp. 185--219. Elsevier Science Publishers. Cited on pp. 122, 125.


Reducing Variations in Parallel Efficiency for.. - Wörner, Geuder..   (Correct)

....due to the NP completeness, only weak criteria can be found such as minimizing the number of edges cut and the number of adjacent partitions, and balancing the computational and communicational load. Much ongoing research is devoted to improving the partitioning of unstructured grids [6][7]. For the further discussion, we assume that a good partitioning for each configuration can be found. That is, each algorithm for the partitioning problem able to calculate a configuration dependent partitioning is already adaptive in the sense defined above. By that, internal and boundary ....

R. Das, R. Ponnusamy, J. Saltz, and D.J. Mavrilipis, Distributed memory Compiler Methods for Irregular Problems - data copy reuse and runtime partitioning, in Compilers and Runtime Software for Scalable Multiprocessors, J. Saltz and P. Mehrotra Editors, Amsterdam, The Netherlands, 1992 Elsevier


Compiler Analysis for Irregular Problems in Fortran D - von Hanxleden, Kennedy.. (1992)   Self-citation (Das Saltz)   (Correct)

....implies that all processors participate in them; this requirement for global coordination can be relaxed when extending the primitives to handle name spaces which are shared only between subsets of processors. Our runtime support makes it possible to track and reuse off processor data copies [3]. We generate combining communication schedules which combine offprocessor data for several indirect references, possibly contained in different loops, and we generate incremental schedules to obtain only those off processor data which are not already requested by a given set of pre existing ....

....comprise a non linear system of five differential equations. The calculation consists of a sequence of loops over edges, boundary faces and nodes of an unstructured mesh. The code was originally developed by Dimitri Mavriplis. The program was ported to the Touchstone Delta using Parti primitives [2, 3] and the code was run to simulate a variety of aircraft configurations under a range of test conditions. While the port was carried out by hand, the strategy used to place the PARTI primitives was the same as the strategies that would result from our dataflow framework. We executed a number of ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis. Distributed memory compiler methods for irregular problems --- data copy reuse and runtime partitioning. ICASE Report 91-73, Institute for Computer Application in Science and Engineering, Hampton, VA, September 1991.


Communication Optimizations for Irregular Scientific.. - Das, Uysal, Saltz, Hwang (1993)   (103 citations)  Self-citation (Das Saltz)   (Correct)

....in the executor code i.e. the gather and the scatter add calls. The gather on each processor fetches all the necessary y references that reside off processor. The scatter add calls accumulates the off processor x values. A detailed description of the functionality of these primitives are given in [6]. 5 do i = 1, NATOM do index = 1, INB(i) j = Partners(i, index) Calculate dF (x, y and z components) Subtract dF from F j . Add dF to F i end do end do Figure 4: Non bonded Force Calculation Loop from CHARMM 2 Real Problem Kernels We have ported a number of scientific codes to ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis. Distributed memory compiler methods for irregular problems - data copy reuse and runtime partitioning. In Compilers and Runtime Software for Scalable Multiprocessors, J. Saltz and P. Mehrotra Editors, Amsterdam, The Netherlands, 1992. Elsevier.


Parallelizing Molecular Dynamics Codes using Parti Software.. - Das Saltz (1993)   (6 citations)  Self-citation (Das Saltz)   (Correct)

....the sequential code except that it has a number of Parti primitive calls embeded in it. Initially, all parallel irregular loops were transformed into inspector executor pairs. Then we performed optimizations associated with reusing copies of off processor data and with vectorizing communications [5]. The entire energy calculation portion of CHARMM has been parallelized. This involves both the internal (bond, angle etc. energy calculation and the external (nonbonded) energy calculations. We present some of the results that we have obtained on the Intel iPSC 860. In these tables depicting the ....

....In the larger carboxy myoglobin simulation, pairlist generation proved to be extremely expensive. The reasons for the high cost do not appear to be fundamental, these costs in part are prompting us to redesign the address translation mechanisms in our software. We utilize incremental scheduling [5] to reduce the volume of data to be communicated by reusing live data that has been fetched for previous loops. Since we use incremental scheduling we can fetch all the required data i.e. the off processor atom coordinates that will be referenced. On the other hand, in the current implementation, ....

R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis, Distributed memory compiler methods for irregular problems - data copy reuse and runtime partitioning, in Compilers and Runtime Software for Scalable Multiprocessors, J. Saltz and P. Mehrotra Editors, Amsterdam, The Netherlands, 1992, Elsevier.


Automatic Parallelization of the Conjugate Gradient.. - Kotlyar, Pingali, Stodghill   (Correct)

No context found.

Raja Das, Ravi Ponnusamy, Joel Saltz, and Dimitri Mavriplis. Distributed memory compiler methods for irregular problems -- data copy reuse and runtime partitioning. ICASE Report 91--73, Institute for Computer Applications in Science and Engineering, NASA Langley Research Center, Hampton, Virginia, September 1991.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC