| A. Rogers and K. Pingali. Process decomposition through locality of reference. In Proc. of the 1989. |
....offer significant advantages over the shared memory multiprocessors with regard to cost and scalability, however, they are also more difficult to program. Much of that difficulty is due to their lack of a single global address space. Hence, the last few years have seen considerable research effort [10, 15, 12, 20, 4, 17, 18] aimed at providing a shared name space to the programmer, with the task of generating messages relegated to the compiler. Most of these parallelization systems accept a program written in a sequential or shared memory language augmented with annotations specifying distribution of data, and ....
A. Rogers and K. Pingali. Process decomposition through locality of reference. In Proc. SIGPLAN '89 Conference on Programming Language Design and Implementation, pages 69--80, June 1989.
....program, but operate on distinct data items, thus enabling the exploitation of data parallelism [28] These research efforts include the Fortran D compiler [30, 31] and the Superb compiler [81] both accepting Fortran 77 as the base language. The Crystal compiler [15] and the Id Nouveau compiler [62] are targeted for single assignment languages. Numerous other compilers, Dataparallel C [59] C [63] Kali [43, 44] Dino [64, 65] Al [77] Arf [67] Oxygen [66] Pandore [4] also produce parallel code for multicomputers, but require explicit parallelism in the source program. Some of the ....
A. Rogers and K. Pingali. Process decomposition through locality of reference. In Proc. SIGPLAN '89 Conference on Programming Language Design and Implementation, pages 69--80, June 1989.
....it has been possible to implement a comprehensive collection of advanced optimizations that are broadly applicable. Second, dHPF supports a more general computation partitioning model than other HPF compilers. With few exceptions, HPF compilers use simple variants of the ownercomputes rule [16] to partition computation among processors: partitioning of the instances for each statement (or loop iteration [3] is based on a single (generally a#ne) reference. Unlike other compilers, dHPF permits independent computation partitionings for each program statement, where the partitioning for a ....
A. Rogers and K. Pingali. Process decomposition through locality of reference. In Proceedings of the SIGPLAN '89 Conference on Programming Language Design and Implementation, Portland, OR, June 1989.
....it has been possible to implement a comprehensive collection of advanced optimizations that are broadly applicable. Second, dHPF supports a more general computation partitioning model than other HPF compilers. With few exceptions, HPF compilers use simple variants of the owner computes rule [12] to partition computation among processors: partitioning each statement s instance (or alternatively, each loop iteration [13] is based on a single (generally a#ne) reference. Unlike other compilers, dHPF permits independent computation partitionings for each program statement. The partitioning ....
A. Rogers and K. Pingali, "Process decomposition through locality of reference," in Proceedings of the SIGPLAN '89 Conference on Programming Language Design and Implementation, (Portland, OR), June 1989.
....where each processor has a separate address space, e.g. CM 5 or Intel iPSC. It is usually assumed that the programmer specifies how data is distributed and the compiler tries to optimize communication by grouping references to remote data so the high cost of remote accesses can be amortized [5, 10, 11, 15, 14, 16, 17, 19, 21]. These methods only work well when the granularity of the computation is large and regular. Some recent work has looked at compilation for machines with a shared address space, physically distributed memory and globally coherent caches [3, 6] In these machines, each processor controls a local ....
Anne Rogers and Keshav Pingali. Process Decomposition through Locality of Reference. In SIGPLAN '89, Conference on Programming Language Design and Implementation, June 1989.
.... Techniques to generate distributed code from sequential or parallel code using a uniform memory space have been extensively studied since 1988 [22, 70, 89] Techniques and prototypes have been developed based on Fortran [38, 39, 47, 18, 69, 88, 19, 20] C [8, 63, 6, 60, 7, 61] or others languages [74, 75, 58, 66, 57]. The most obvious, most general and safest technique is called run time resolution [22, 70, 74] Each instruction is guarded by a condition which is only true for processors that must execute it. Each memory address is checked before it is referenced to decide whether the address is local and ....
.... been extensively studied since 1988 [22, 70, 89] Techniques and prototypes have been developed based on Fortran [38, 39, 47, 18, 69, 88, 19, 20] C [8, 63, 6, 60, 7, 61] or others languages [74, 75, 58, 66, 57] The most obvious, most general and safest technique is called run time resolution [22, 70, 74]. Each instruction is guarded by a condition which is only true for processors that must execute it. Each memory address is checked before it is referenced to decide whether the address is local and the reference is executed, whether it is remote, and a receive is executed, or whether it is ....
Anne Rogers and Keshav Pingali. Process decomposition through locality of reference. In ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 1989.
....program annotations, incorporating explicit data decompositions. From these data decomposition specifications, SPMD (Single Process Multiple Data) code [Karp87] can be generated automatically. This approach is followed by [Callahan88, Gerndt89, Kennedy89, 4 Koelbel89] in FORTRAN, by [Rogers89] in Id Nouveau, and by [Quinn89] in C . This concept is also followed in Booster [Paalvast90] 3. Booster Language concepts Booster is a high level, fourth generation, algorithm description language for sequential and parallel computers. Parallel computers may be either distributed or shared ....
A. Rogers, K. Pingali, "Process decomposition through locality of reference," ACM Sigplan '89 Conference on Programming Language Design and Implementation, June 1989, Portland Oregon.
....Furthermore, the compiler uses a model base of target architectures in order to optimize computation and communication efficiency. The approach of inducing parallelism by explicitly decomposing the data is not new. In [Callahan88, Gerndt89, Kennedy89] applications to Fortran are described, in [Rogers89] to Id Nouveau, in [Koelbel87] to BLAZE, and in [Quinn89] to C . In particular application to Fortran is limited, because of equivalencing, passing of array subsections to subroutine calls, and any form of indirect addressing cannot be translated efficiently. A second limitation is that the ....
A. Rogers, K. Pingali, "Process Decomposition Through Locality of Reference,", ACM SIGPLAN `89 Conference on Programming Language Design and Implementation, June 1989, Portland, Oregon.
....CHALLENGES Most traditional HPF compilers use simple partitioning schemes for data and computation. The HPF directives determine the data partition, and the computation partition (CP) the assignment of computations to processors, is driven by some form of the owner computes rule heuristic [13]. Although some serial codes can be parallelized using this approach, others require more sophisticated approaches if HPF is to approach the performance of hand coded parallel implementations. In this section, we describe several aspects of programs that make it difficult for HPF compilers to ....
....NAS SP and BT application benchmarks with that of their hand parallelized counterparts to illustrate the effectiveness of the analysis and code generation techniques described in Sections 3 and 4. 3. COMPUTATION PARTITIONING (CP) OPTIMIZATIONS HPF compilers primarily use the owner computes rule [13] to partition computations among a set of processors. This rule specifies that a computation is executed by the owner of the value being computed. This rule, as well as variants (e.g. in decHPF [19] or more powerful rules (e.g. in SUIF [20] can be expressed in terms of which processor(s) own ....
A. Rogers and K. Pingali. Process decomposition through locality of reference. In Proceedings of the SIGPLAN '89 Conference on Programming Language Design and Implementation, Portland, OR, June 1989.
....per processor details and explicit communication. Nevertheless, they direct the compiler s parallelization via annotations. HPF lacks annotations to identify wavefront computations, so the compiler is solely responsible for recognizing and optimizing them from their scalar representations [HKT91, RP89] We consider the Portland Group, Inc. PGHPF and IBM xlHPF compilers separately, below. PGHPF The HPF compiler from Portland Group, Inc. PGHPF) does not perform pipelining. We determine this by examining the intermediate message passing Fortran code produced by the Mftn compiler flag. The ....
....support for pipelining wavefront computations [Ngo97] This work validates this insight in the context of a programming language with a well defined performance model. It is well known that wavefront codes admit pipelined parallel implementations. Cytron [Cyt86] and Rogers and Pingali [RP89] describe early experiences doing just this, and many others have further developed the technique [HKT91, HKT92, SY91, CTY94, AWMC 95, BL99] in particular how a compiler can automatically pipeline scalar code (e.g. in the context of HPF) In addi 80 tion, considerable applied and ....
Anne Rogers and Keshav Pingali. Process decomposition through locality of reference. In ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 69--80, June 1989.
....processor on which to locate a data item, and managing the communication of the data when accessed by processors other than the one on which it is located. Most compilers for distributed memory machines require the programmer to determine how the data will be assigned to the processors [ZBG88] RP89] KMV90] The compiler is responsible for adding the message passing to move the data to processors that use it and from processors that define it. The compiler also adds the necessary synchronization to guarantee that data dependences are preserved. Distributed memory compilers typically assume ....
A. Rogers and K. Pingali. Process decomposition through locality of reference. In Proceedings of the SIGPLAN 89 Conference on Program Language Design and Implementation, June 1989. 166
....aspect. Instead, the implementations are compared on the processor overhead of constructing the messages. 3. 4 Related work The automatic generation of message passing programs from data distribution specifications has been explored for some time in the context of various data parallel languages [7, 8, 9, 10, 11, 12]. The recent definition of HPF [1] has added some new data alignment and data distribution features for which no efficient solutions existed. As a consequence, new results have been reported in [13, 6, 14, 15, 16, 17, 18, 19, 20, 21] and, more recently and concurrent with this paper, 22, 23, 20, ....
A. Rogers and K. Pingali, "Process decomposition through locality of reference", in Proceedings of ACM SIGPLAN International Conference on Program Language Design and Implementation, June 1989.
....how the work in the program is assigned to processors. Typically, the programmer speci es a mapping of the program s data onto the target machine and the compiler uses this mapping to decompose the program into processes. The simplest compilation strategy, sometimes called runtime resolution[RP89] inserts code to determine at runtime which processor needs to execute a particular statement. Runtime resolution works because arrays are static in nature, that is, names are available for all elements of an array at compile time. To determine the processor responsible for a given array ....
Anne Rogers and Keshav Pingali. Process decomposition through locality of reference. In Proc. of the SIGPLAN '89 Conf. on Programming Language Design and Implementation, pages 69-80, Portland, Ore., Jun. 1989.
....tackle the data distribution problem. These systems do not directly specify alignments between arrays. Instead, they distribute each array individually and implicitly derive the alignment between different arrays based on their relative distributions. Systems such as Dino [RSW90] Id Nouveau [RP89] Mimdizer [SWW91] Oxygen [RA90] and Pandore [APT90] all provide data distribution specifications equivalent to some combination of BLOCK and CYCLIC. Dino also supports special stencil based data distributions with overlaps. The Aspar [IFKF90] compiler performs automatic data decomposition and ....
A. Rogers and K. Pingali. Process decomposition through locality of reference. In Proceedings of the SIGPLAN '89 Conference on Program Language Design and Implementation, Portland, OR, June 1989.
....and development, as well as education. It always emphasizes the bigger picture, without getting lost in ne points. Automatic generation of message passing programs from data distribution speci cations has been explored for some time in the context of various data parallel languages [10] [11], 12] 13] and [14] In [13] the support of the run time functions are relatively weak; the compiler needs to generate send and receive primitives to accomplish communication. Though this may have more ecient code generation after extensive program analysis, the compiler may become too ....
A. Rogers and K. Pingali, \Process Decomposition Through Locality of Reference", Proc. ACM SIGPLAN Intl Conf. Program language Design and Implementation, pp69-80, June 1989 15
....expressions, and conditional expressions are affine functions of enclosing loop indices and the number of processor is known beforehand. Under these conditions, a loop nest, an array and a processor grid can all be represented as bounded polyhedra. Our approach uses the owner computes rule [31, 57, 46], which assigns each computation to the processor that owns the data being computed. The technique presented here can be extended to handle cases where this rule is not adopted. Also, although we assume that data distributions across processors are specified using HPF like directives, our ....
A. Rogers, and K. Pingali. Process decomposition through locality of reference. In Proc. SIGPLAN'89 Conference on Programming Language Design and Implementation, pages 69--80, Portland, OR, June 1989.
....communication at run time. In addition, the compiler can analyze the program and perform transformations which will increase parallelism and reduce interprocessor communication [18] The compiler can also attempt compile time resolution to reduce the overhead due to run time resolution [24]. Previous work relating to the SPMD execution of programs on distributed memory machines has only considered static arrays [3, 24] However, language and compiler support for the SPMD execution 2 of programs that rely upon the use of pointer based data structures built using pointers has not ....
.... parallelism and reduce interprocessor communication [18] The compiler can also attempt compile time resolution to reduce the overhead due to run time resolution [24] Previous work relating to the SPMD execution of programs on distributed memory machines has only considered static arrays [3, 24]. However, language and compiler support for the SPMD execution 2 of programs that rely upon the use of pointer based data structures built using pointers has not yet been developed. Dynamic data structures, such as trees and linked lists, are often used in programs to store information that has ....
Rogers, A., and Pingali, K. Process decomposition through locality of reference. Proc. ACM SIGPLAN Conference on Programming language Design and Implementation. Portland, Oregon, 1989, pp. 69-80.
No context found.
A. Rogers and K. Pingali. Process decomposition through locality of reference. In Proc. of the 1989.
No context found.
A. Rogers and K. Pingali. Process decomposition through locality of reference. In Proc. of the 1989.
No context found.
A. Rogers and K. Pingali. Process decomposition through locality of reference. In SIGPLAN89 conference on Programming Languages, Design and Implementation, Jun 1989. 19
No context found.
Anne Rogers and Keshav Pingali. Process decomposition through locality of reference. In Proceedings of the SIGPLAN '89 Conference on Programming Language Design and Implementation, pages 69--80, Portland, Oregon, June 21--23, 1989. Published as ACM SIGPLAN Notices 24(7).
....X(IBLACK(I) X(IRED(J) END DO END DO In HPF, the assignment of computational work to processors is not directly under the control of the programmer. Instead, it relies on a combination of data distribution directives and compiler technology to produce code with good locality, as described in [6, 37]. The two basic distributions are block and cyclic distributions. Block distributing an array gives each processor a set of contiguous elements of that array; if there are p processors and n array elements, each processor gets a contiguous block of n=p elements. In a cyclic distribution, ....
....in which blocks of elements are dealt to processors in a round robin manner. The compiler can exploit data distributions in assigning work by assigning an iteration to a processor if that processor has most of the data required by that iteration. Alternative strategies like the owner computes rule [37] are also popular. An HPF program for computing is shown below. It approximates the definite integral R 1 0 4= 1 x 2 )dx by using the rectangle rule, computing the value of 1=n P n i=1 4= 1 ( i Gamma 0:5) n) 2 ) In this program, n is chosen to be 1000. SUM is an intrinsic for ....
A. Rogers and K. Pingali. Process decomposition through locality of reference. In Proceedings of the ACM Symposium on Programming Language Design and Implementation, 1989.
No context found.
Rogers, A., and Pingali, K. Process decomposition through locality of reference. In Proceedings of the SIGPLAN '89 conference on Programming Language Design and Implementation (Portland, OR, June 1989), pp. 69-80.
No context found.
A. Rogers and K. Pingali, "Process decomposition through locality of reference," in Proceedings of the ACM SIGPLAN '89 Conference on Programming Language Design and Implementation, Portland, OR, June 1989, pp. 69--80.
No context found.
A. ROGERS AND K. PINGALl, Process decomposition through locality of reference, in Proceedings of the SIGPLAN '89 Conference on Program Language Design and Implementation, Portland, OR, June 1989.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC