| S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiling Fortran D for MIMD Distributed-memory Machines. Commun. ACM, 35(8):66-- 80, 1992. |
....of the machine. A good mapping minimizes program completion time by balancing the opposing needs of parallelism and communication: spreading the data and work over many processors increases available parallelism, but also increases communication time. Most compilation systems (e.g. Fortran D [10] and High Performance Fortran [9] divide the data mapping problem into two phases: alignment, in which the relative positions of arrays are determined within a Cartesian grid called a template,anddistribution, in which the template is partitioned and mapped to a processor grid. We have dealt with ....
Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the ACM, 35(8):66--80, August 1992.
....loops, we look for opportunities to overlap I O with one or more loop bodies. Finally, while Mowry s approach has been successful for array accesses within inner loops, our high level approach works with programs containing multiple loop nests. Distributed Memory. Hiranandani, Kennedy and Tseng [11] described a compilation method for data parallel scientific programs written in Fortran D. Programmer supplied directives for data alignment, decomposition, and distribution provide a guide for the compiler concerning the structure of communication and computation. The alignment statement ....
Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the A CM, 35(8):66-80, August 1992.
....communication can perform optimizations across communication patterns at the software level, the protocol level, and the hardware level. It is more aggressive than traditional communication optimization techniques, which are either performed in the library [3, 8, 30, 26] or in the compiler [1, 2, 7, 12, 15, 28]. Compiled communication o#ers many advantages over the traditional communication method. First, by managing network resources at compile time, some runtime communication overheads such as group management can be eliminated. Second, compiled communication can use long lived connections for ....
....messaging layer [3, 8, 30, 26] CC MPI is di#erent from these systems in that it allows users to select the most e#ective method based on message sizes and network conditions. Many parallel compiler projects also try to improve communication performance by generating e#cient communication code [1, 2, 7, 12, 15, 28]. The communication optimizations performed by these compilers focus on reducing the number and the volume of communications and are architecture independent. While CC MPI is not directly related to compiler optimization techniques, compiler techniques developed, however, can enable e#ective ....
S. Hiranandani, K. Kennedy, and C. Tseng. Compiling Fortran D for MIMD Distributed--Memory Machines. Communications of the ACM, 35(8):66--80, August 1992.
....area of scientific computing, where shared arrays have to be distributed among the private memories of different machines. In particular methods that extent [ISO IEC 91]by array constructs and means to specify a decomposition of an arrayhave been studied extensively [Callahan 88] Andr e90] Hiranandani 92] Zima 90] 8 3 While these approaches promise immediate advances of the state of practice, it is also acknowledged that many algorithms described in a few pages of mathematics oftenly result in large programs whichmake it difficult to experiment with simple variations of the algorithm, or ....
S. Hiranandani et. al. Compiling Fortran-D for MIMD DistributedMemory Machines,Communications of the ACM, 35(8):66-80, August, 1992.
....review existing approaches to data layout for data parallel applications and point out differences in our approach. Then we review the existing approaches to data consistency and contrast our approach against them. Data Layout. Several automatic data layout approaches exist for Fortran programs [6, 8,9, 11, 12, 14, 18 20]. All these approaches use regular partition ing for multi dimensional arrays. Moreover, most of the mapping strategies use static program analysis or cost estimations to decide how to place data on different processors. Our partitioning strategy is general and it subsumes regular partitioning. ....
S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the ACM, 35(8):66-80, 1992.
....our approach against them. 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132 number of processors 0 10 20 30 Speedup Tiny Figure 7.9: The speedup of the parallel Poisson problem for the smaller set of data. Data Layout. Several data mapping schemes exist for Fortran programs [18, 36, 52, 57, 58, 68, 104, 110, 115]. All these approaches have two aspects in common. One is that they use Fortran like languages and thus partition multi dimensional arrays. The other is the regular partitioning function: block, cyclic and block cyclic. Our partitioning model is not tied to a particular data representation. ....
....for the distributed programming style, while retaining the scalability of the underlying distributed system. The existing approaches can be classified along two main axes: Fully automatic, or guided (through compiler directives) parallelizing (restructuring) compilers for Fortran languages [5,9,57,111]. Along this axis, an existing sequential program can be transformed to run in parallel. The great advantage of this approach is the possibility to parallelize legacy code without involving costly human resources. The main limitation resides in their restricted applicability. Such techniques are ....
Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the ACM, 35(8):66-- 80, 1992.
....on Alewife. 2 Related Work The problem of loop and data partitioning for distributed memory multiprocessors with global address spaces has been studied by many researchers. One approach to the problem is to have programmers specify data partitions explicitly in the program, as in Fortran D [7, 12]. Loop partitions are usually determined by the owner computes rule. Though simple to implement, this requires the user to thoroughly understand the access patterns of the program, a task which is not trivial even for small programs. For real medium sized or large programs, the task is a very ....
Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Compiling Fortran D for MIMD Distributed Memory Machines. Communications of the ACM, 35(8):66-80, August 1992.
....where each processor has a separate address space, e.g. CM 5 or Intel iPSC. It is usually assumed that the programmer specifies how data is distributed and the compiler tries to optimize communication by grouping references to remote data so the high cost of remote accesses can be amortized [5, 10, 11, 15, 14, 16, 17, 19, 21]. These methods only work well when the granularity of the computation is large and regular. Some recent work has looked at compilation for machines with a shared address space, physically distributed memory and globally coherent caches [3, 6] In these machines, each processor controls a local ....
Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Compiling Fortran D for MIMD Distributed Memory Machines. Communications of the ACM, 35(8):66-80, August 1992.
....machines. Manufacturers and research laboratories, led by Digital and Rice University, decided in 1991 to shift part of the burden onto compilers by providing the programmer a uniform address space to allocate objects and a (mainly) implicit way to express parallelism. Numerous research projects [38, 47, 81] and a few commercial products had shown that this goal could be achieved and the High Performance Fortran Forum was set up to select the most useful functionalities and to standardize the syntax. The initial definition of the new language, Hpf, was frozen in May 1993, and corrections were added ....
....point of view: The allocated memory is reusable (it may be allocated on the stack) and the copy assignment on local data should be quite fast. This compilation scheme directly generates optimized code which includes techniques such as guard elimination [38] message vectorization and aggregation [47, 81]. non independent A in the rhs may induce RW dependences. FORALL (i=1:n, j=1:m, MASK(i,j) A(i,j) f(A, array TMP declared and mapped as A initial copy of A into TMP because of potential RW dependences INDEPENDENT(j,i) do j=1, m do i=1, n TMP(i,j) A(i,j) enddo ....
[Article contains additional citation context not shown here]
Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Compiling Fortran D for MIMD Distributed-Memory machines. Communications of the ACM, 35(8):66--80, August 1992.
....on Alewife. 2 Related Work The problem of loop and data partitioning for distributed memory multiprocessors with global address spaces has been studied by many researchers. One approach to the problem is to have programmers specify data partitions explicitly in the program, as in Fortran D [10, 15]. Loop partitions are usually determined by the owner computes rule. Though simple to implement, this requires the user to thoroughly understand the access patterns of the program, a task which is not trivial even for small programs. For real medium sized or large programs, the task is a very ....
Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Compiling Fortran D for MIMD Distributed Memory Machines. Communications of the ACM, 35(8):66-80, August 1992.
....the single data set delivered by the I O system ( 1234 in Figure l) is required in this order by the nodes. In this example, the data set is broken into individual blocks ( l , 2 , 3 , 4 ) There are other distributions that are commonly accepted by compilers (e.g. cyclic and block cyclic)[9], and these distributions induce different scatter operations. If internal application external, then the I O node (or the I O system) must gather data from multiple external links and forward these data over a single internal link to the requesting node. Figure 2 depicts this situation. ....
S. Hiranandani, K. Kennedy, and C. W. Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications oftheACM, 35(8):66-80, August 1992.
....of compriers that can provide performance satisfactory to users. The goal of our research is to identify important compilation issues and explore possible solutions. Previous work has described the design and implementation of a prototype Fortran D compiler for regular dense matrix compu tations [13, 16, 17]. This paper describes our prehminary experiences with that compiler. Its major contributions in clude 1) advanced compilation techniques needed for complex loop nests, 2) empirical evaluation of the prototype Fortran D compiler, and 3) identifying necessary improve ments for the compiler. In ....
....computation across processors using the owner computes rule where each processor only computes values of data it owns [6, 12, 25] It performs a large number of communication and parallehsm optimizations based on data dependence. Details of the compilation process are presented elsewhere [13, 16, 17, 27]. 2.2 Prototype Compiler The prototype Fortran D compiler is implemented as a source to source Fortran translator in the context of the PardScope parallel programming environment [9] It uti hzes existing tools for performing dependence analysis, program transformations, and interprocedural ....
[Article contains additional citation context not shown here]
S. Hiranandani, K. Kennedy, and C. Tseng. Compiling Fortran D for MIMD distributed-memory machines. Com- munications of the ACM, 35(8):66-80, August 1992.
No context found.
S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiling Fortran D for MIMD Distributed-memory Machines. Commun. ACM, 35(8):66-- 80, 1992.
No context found.
S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiling Fortran D for MIMD Distributed-memory Machines. Commun. ACM, 35(8):66--80, 1992.
No context found.
S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiling fortran d for mimd distributed-memory machines. Commun. ACM, 35(8):66--80, 1992.
No context found.
S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the ACM, 35(8):66--80, 1992.
No context found.
S. Hiranandani, K. Kennedy, and C-W. Tseng. Compiling FORTRAN-D for MIMD distributedmemory machines. Communications of the ACM, 35:66--80, August 1992.
No context found.
S. Hiranandani, K. Kennedy, and C. Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the ACM, 35(8):66--80, August 1992.
No context found.
S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the ACM, 35(8):66--80, 1992.
No context found.
Hiranandani, S., Kennedy, K., and Tseng, C.-W. Compiling FORTRAN D for MIMD Distributed Memory Machines. Communications of the ACM, Available at http://www.cs.umd.edu/projects/cosmic/papers/cacm92.ps 35, 8 (August 1992), 66-80.
No context found.
S. Hiranandani, K. Kennedy, and C. Tseng. Compiling Fortran D for MIMD Distributed--Memory Machines. Communications of the ACM, 35(8):66--80, August 1992.
No context found.
Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communication of the ACM, 35(8):66--80, August 1992.
No context found.
Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communication of the ACM, 35(8):66--80, August 1992.
No context found.
S. Hiranandani, K. Kennedy, and C. Tseng. Compiling Fortran D for MIMD Distributed Memory Machines. Communications of the ACM, 35(8):66--80, Aug. 1992.
No context found.
S. Hiranandani et. al. Compiling Fortran-D for MIMD Distributed Memory Machines, Communications of the ACM, 35(8):66-80, August, 1992.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC