89 citations found. Retrieving documents...
C. Koelbel, P. Mehrotra, and J. V. Rosendale. Supporting Shared Data Structures on Distributed Memory Architectures. In Proceedings of the Second ACM SIGPLAN Symposium on Principles a nd Practices of Parallel programming, pages 177--186, March 1990.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Memory Assignment for Multiprocessor Caches through.. - Agarwal, Guttag..   (Correct)

....a distributed memory system in a way that maximizes the number of each processor s accesses that are satisfied by the tiles in its local memory is called the data partitioning problem or the domain decomposition problem. This problem is non trivial and has been the focus of much recent attention [2, 3, 6, 14, 17]. Our work applies to systems that perform data partitioning either at compile time or program creation time and to systems where we know which data tiles are accessed by which processors. An important problem faced by systems that perform data partitioning irrespective of whether they perform ....

C. Koelbel, P. Mehrotra, and J. Van Rosendale. Supporting Shared Data Struc- tures on Distributed Memory Architectures. In P,occcdings P,inciplcs and P,acticc of Pa*allcl P*og*amming II, ACM, March 1990. ACM.


Compilation Techniques for Optimizing Communication on.. - Gong, Gupta, Melhem (1993)   (15 citations)  (Correct)

....to program such an architecture. The programmer must explicitly distribute data and must program the transfer of data among the processors. One approach considered by researchers provides a combination of language extensions and compilation techniques for programming such systems [1] 9] [10] [15] In this ap proach, the user specifies the distribution of data by using language extensions and the compiler translates the program for single program multiple data (SPMD) execution. One approach to translation yields a program in which a processor executes a statement only if a data value ....

....under which we can introduce limited amount of extra communication. 6 Related Work Many approaches have been proposed to optimize the communication in SPMD execution. These in clude: Code reordering, Loop interchange, Loop reversal, Loop elimination and Communication combining [1] 6] 4] [10]. The purpose of these approaches is to extract as much parallelism as possible. Instead of inserting the communication at the earliest possible point, these approaches usually insert communication immediately before the computation. Moreover, all of these approaches assume that the send and ....

C. Koelbel, P. Mehrotra, and J. V. Rosendale, "Supporting Shared Data Structure on Distributed Memory Architectures," Proceedings of the Second A CM SICPLAN Symposium on Principles U Practice of Parallel Programming, 1990.


Automatic Parallel Program Generation and Optimization.. - Paalvast, Sips, van.. (1991)   (13 citations)  (Correct)

....in SPMD (Single Program Multiple Data) Karp87] format, given a data decomposition specification. This approach has recently gained a lot of attention. It has been applied by [Callahan88, Gerndt89, Kennedy89,90] for applications to Fortran, by [Andre90] to C, by [Rogers89] to Id Nouveau, by [Koelbel90] to Kali Fortran, by [Quinn89] to C , and by [Paalvast90] to the fourth generation parallel programming language Booster. In particular application to Fortran shows some limitations, due to equivalencing, passing of array subsections to subroutine calls, etc. A second limitation is that the ....

....and integrated. Several optimizations have been derived for important classes of data decompositions and access functions which are generally applicable. Several optimizations discussed in this paper have also been presented by others, albeit in the context of some programming language [Callahan88, Gerndt89, Koelbel90, Paalvast90] . We have generalized these results and placed them in a general context. Advantage of the methodology lies in application of the view concept, which through its associated calculus, allows for automated reasoning about compile time optimizations and code generation for a broad range of ....

C. Koelbel, P. Mehrotra, J. Van Rosendale, "Supporting Shared Data Structures on Distributed Memory Architectures," Proceedings of the Conference on Principles and Practice of Parallel Programming, March 1990, pp. 177186. 12


An Overview of a Compiler for Scalable Parallel Machines - Saman Amarasinghe Jennifer (1993)   (46 citations)  (Correct)

....will trade off extra degrees of parallelism to reduce or eliminate communication. Finally, the compiler generates code to manage the multiple address spaces and to communicate data across processors. 1 Introduction A number of compiler systems, such as SUPERB[21] AL[17] ID Noveau[15] Kali[11], Vienna FORTRAN[3] FORTRAN D[8, 16] and HPF[7] have been developed to make effective use of scalable parallel machines. These systems simplify the compilation problem by soliciting the programmer s help in determining This research was supported in part by DARPA contracts N00039 91 C 0138 and ....

C. Koelbel, P. Mehrotra, and J. Van Rosendale. Supporting shared data structures on distributed memory architectures. In Proceedings of the Second ACM/SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 177--186, March 1990.


Managing Interprocedural Optimization - Hall (1990)   (41 citations)  (Correct)

....on which to locate a data item, and managing the communication of the data when accessed by processors other than the one on which it is located. Most compilers for distributed memory machines require the programmer to determine how the data will be assigned to the processors [ZBG88] RP89] KMV90] The compiler is responsible for adding the message passing to move the data to processors that use it and from processors that define it. The compiler also adds the necessary synchronization to guarantee that data dependences are preserved. Distributed memory compilers typically assume that ....

C. Koelbel, P. Mehrotra, and J. Van Rosendale. Supporting shared data structures on distributed memory machines. In Proceedings of the Second SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, March 1990.


Global Optimizations for Parallelism and Locality on Scalable.. - Anderson, Lam (1993)   (219 citations)  (Correct)

....granularity tasks if the communication cost overwhelms the benefit of parallelization. A popular approach to this complex optimization problem is to solicit the programmer s help in determining the data decompositions. Projects using this approach include SUPERB[40] AL[34] ID Noveau[31] Kali[22], Vienna Fortran[7] and Fortran D[14, 33] The current proposal for a High Performance Fortran extension to Fortran 90 also relies upon user specified data decompositions[13] While these languagesprovide significant benefit to the programmer by eliminating the tedious job of managing the ....

C. Koelbel, P. Mehrotra, and J. Van Rosendale. Supporting shared data structures on distributed memory architectures. In Proceedings of the Second ACM/SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 177-- 186, March 1990.


SPMD Execution of Programs with Pointer-based Dynamic Data.. - Gupta   (Correct)

....processors, generates code responsible for run time resolution, and also generates code for carrying out interprocessor communication at run time. In addition, the compiler can analyze the program and perform transformations which will increase parallelism and reduce interprocessor communication [18]. The compiler can also attempt compile time resolution to reduce the overhead due to run time resolution [24] Previous work relating to the SPMD execution of programs on distributed memory machines has only considered static arrays [3, 24] However, language and compiler support for the SPMD ....

....developed by Callahan and Kennedy [3] is being used for other features. Although the SPMD execution approach exploits implicit parallelism present in sequential programs, it can benefit from information regarding parallelism present in a program. The inclusion of parallel looping constructs [18] in the language and automatic parallelism detection techniques [12, 20, 6, 16] in the compiler can further enhance performance. The information regarding parallelism in a program can be used by the compiler to determine a data distribution appropriate for parallel execution. However, it should be ....

Koelbel, C., Mehrotra, P., and Rosendale, J.V. Supporting shared data structures on distributed memory architectures. Proc. Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Seattle, Washington, 1990, pp. 177-186.


Protocol Compilation: High-Performance Communication for Parallel .. - Felten (1993)   (25 citations)  (Correct)

....or have their nondeterminism carefully encapsulated in a few constructs. Again, this eases debugging. The attractiveness of the data parallel model has led to the development of many data parallel languages. Early data parallel languages include Dino [Rosing et al. 90, Rosing 91] Kali [Koelbel et al. 90] and C [Rose Steele 87] The current generation of data parallel languages is more mature, and offers a richer set of directives to control data distribution [TMC 89, TMC 90, Fox et al. 91, Chapman et al. 92] A consortium of researchers and vendors is now developing a standardized language ....

C. Koelbel, P. Mehrotra, and J. Van Rosendale. Supporting shared data structures on distributed memory architectures. In Proceedings of Second SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 177--186, March 1990.


Automatic Localization for Distributed-Memory.. - Vivek Sarkar Lelia   (Correct)

.... onto physical processors) and interprocessor communication (messages that need to be sent to resolve accesses to variables that reside in another processor s local memory) The predominant approach in compiling a program for execution on a distributed memory multiprocessor is as follows [33, 22, 5, 17, 18, 24, 15]: 1. Data Alignment select a data distribution for each global variable onto a decomposition template i.e. onto an abstract index domain [5, 21, 12, 15] 2. Data Partitioning partition the alignment template elements among physical processors. Data Alignment and Data Partitionining ....

C. Koelbel and P. Mehrotra. Supporting shared data structures on distributed memory architectures. In Proceedings of the Second ACM SIGPLAN Symposium on Principles of Parallel Programming, March 1990.


Compiling for Distributed Memory Architectures - Rogers, Pingali (1992)   (11 citations)  (Correct)

..... Figure 21: SIMPLE Speed up 11 Related Work Several other groups[5, 6, 14, 21, 25, 26] have concurrently developed compilers that use code generation methods similar to our run time and compile time resolution. We briefly describe the different ways these groups have built upon the basic code generation schemes. Rice A group led by Ken Kennedy[5] at Rice University is considering ....

....scheme but at the expense of transmitting data that may not be needed. The overlap areas as defined do not extend nicely to wrapped mappings. For instance, the overlap area for an array used in the wrapped version of Gauss Seidel would be the whole array Kali Koelbel, Mehrotra and van Rosendale [12, 13, 14] have developed Kali, a system that compiles a functional language with a forall construct into a language that includes constructs for explicit process creation, data storage layout, and interprocessor communication and synchronization. Recently, this group has been focusing on the problem of ....

C. Koelbel, P. Mehrotra, and J. van Rosendale. Supporting shared data structures on distributed memory architectures. In Proceedings of the Second ACM SIGPLAN Symposium on the Principles and Practice of Parallel Programming, 1990.


Prefetching on the Cray-T3E - Müller, Warschko, Tichy (1998)   (Correct)

....were omitted because the shared memory library does not support irregular communication patterns and vector prefetches are not possible. SCAP and HPF behave 4.5 and 1.8 times better than Blocking, respectively. The poor performance of HPF for small virtualizations is due to inspector executor [7] which adds proportionally more runtime to small virtualizations as for larger ones. Approximation of communication time decrease is in a range of 6 to the measurements (apart from virtualizations 64) 5.3.2 Jacobi Figure 17 presents runtime and program speed up of Jacobi. For large problem ....

Koelbel, C., and Mehrotra, P. Supporting shared data structures on distributed memory architectures. In Proc. of the 2nd ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, PPOPP (Mar. 1990), pp. 177--186.


On The Implementation And Effectiveness Of Autoscheduling For.. - Moreira (1995)   (16 citations)  (Correct)

....dependent mapping Implementation Figure 3.4 HPF approach to data partition and distribution. states that iteration i is to be executed by the processor to which A(i) is assigned. Therefore processor p 1 executes iterations f1; 2; 3; 4g. The ON clause is a feature borrowed from the language Kali [25]. 3.1.3 HPF The High Performance Fortran (HPF) 6, 26, 27] language was designed as a set of extensions and modifications to Fortran 90 to support data parallel programming. The ability to achieve top performance on MIMD and SIMD computers with nonuniform memory access was one of the main goals ....

C. Koelbel, P. Mehrotra, and J. V. Rosendale, "Supporting shared data structures on distributed memory machines," in Proceedings of the Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, (Seattle, WA), March 1990.


Job Scheduling in Multiprogrammed Parallel Systems - Feitelson (1997)   (16 citations)  (Correct)

....PEs is not necessarily known at compilation, but is fixed throughout any given execution [378] A set of intrinsic functions are provided to inquire about the number of PEs and their arrangement. These intrinsics can also be used directly in array declarations. Similar assumptions are made in Kali [318], which also compiles a global address space program for execution on a distributed memory system, and in many message passing systems, including EUI [39] p4 [94] and PARMACS [95] While making the above mentioned guarantee is a commendable gesture, some systems do very little in terms of ....

C. Koelbel, P. Mehrotra, and J. Van Rosendale, "Supporting shared data structures on distributed memory architectures". In 2nd Symp. Principles & Practice of Parallel Programming, pp. 177--186, Mar 1990.


The Parallel Implementation of N-body Algorithms - Liu (1994)   (6 citations)  (Correct)

....including Fortran D [19] CM Fortran [48] Vienna Fortran [17] support parallel operations on uniform parallel arrays. However, none of them supports parallel operations on irregular data structures. Some runtime systems do support run time pointer (or index) interpretation for remote data access [15, 18, 30, 36], but these approaches take advantage of static data access and communication patterns, and do not provide efficient solutions for dynamically changing data distribution and communication patterns. This thesis is inspired by the work reported in Salmon s thesis [42] as well as the papers of Warren ....

C. Koelbel, P. Mehrotra, and J. V. Rosendale. Supporting Shared Data Structures On Distributed Memory Architectures. Technical report, ICASE, NASA Langley Research Center, 1990.


Handling Irregular Problems with Fortran D - A Preliminary Report - von Hanxleden (1993)   (32 citations)  (Correct)

....compile time knowledge. First, an inspection phase determines what data have to be communicated within a certain loop and generates a communication schedule. Then, an execution phase (repeatedly) performs the actual computation and uses the communication schedule to exchange data [MSS 88, KMV90] The Parti communication library was designed to simplify schedule generation and schedule based communication [BS90] Parti has also been used to implement user defined irregular distributions [MSS 88] and to provide a hashed cache for non local values [MSMB90] The feasibility of ....

C. Koelbel, P. Mehrotra, and J. Van Rosendale. Supporting shared data structures on distributed memory machines. In Proceedings of the Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seattle, WA, March 1990.


Compiling Distribution Directives in a Fortran 90D Compiler - Zeki Bozkus (1992)   (3 citations)  (Correct)

....The compilation technique of Fortran 77 for distributed memory systems has been addressed by Callahan and Kennedy [23] Currently, a Fortran 77D compiler is being developed at Rice [24] Superb[25] compiles a Fortran 77 program into a semantically equivalent parallel SUPRENUM multiprocessor. Kali[26] implementation puts a great deal of effort on run time analysis for optimizing message passing. Quinn et al. 18] uses a data parallel approach for compiling C to hypercube machines. The ADAPT system[27] compiles Fortran 90 for execution on MIMD distributed mem22 ory architectures. The ....

C. Koelbel, P. Mehrotra, and J. V. Rosendale. Supporting Shared Data Structures on Distributed Memory Architectures. PPoPP, March 1990.


Compiler Support for Analysis and Tuning Data Parallel.. - Adve, Koelbel, Crummey (1995)   (2 citations)  Self-citation (Koelbel)   (Correct)

.... do not require non local values and also do not have to wait for the boundary columns in the same loop to be computed (there are no loopcarried dependences in each of the red and black loops) This optimization has been implemented for the restricted case of FORALL statements in the Ke[li compiler [16] and previously suggested in [10] The reduction in waiting time for message receives in the case of RED BLACK SOR is small but significant since it reduces overhead idle time by nearly half, as shown by the measurements of the send and receive overhead given in Table 6. Exactly the same ....

C. KOELBEL, P. MEHROTRA, AND J. VAN ROSENDALE, Supporting shared data structures on distributed memory machines, in Proceedings of the Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seattle, WA, Mar. 1990.


Fortran D Language Specification - Fox, Hiranandani, Kennedy, Koelbel.. (1991)   (61 citations)  Self-citation (Koelbel)   (Correct)

....are passed to the REDUCE statements for to find the index of the minimum or maximum element of the array. If there are multiple elements with the minimum or maximum value, the assignment is performed only for the first such value found. 5. 5 On Clause Fortran D provides a feature from Kali [KMV90] an optional on clause. The on clause is used to specify the processor which will execute each iteration of a loop. This allows user greater control of where the computation is performed for load balancing and reducing communications. n proc = 4 REAL X(1024) Y(1024) Z(1024) DECOMPOSITION ....

....of them have explored the problem of specifying data decompositions, and we have drawn upon their work. In particular, we have been influenced by alignment specifications and reduction functions from CM Fortran [TMC89] and structures to handle irregular distributions from Parti [WSBH91] and Kali [KMV90, MV90] Here we quickly describe other research in the area. 6.1 Single Array Decomposition Some researchers concentrate on computations within loops that only involve a single array. These researchers do not need alignment or distribution specifications, since they automatically generate the ....

[Article contains additional citation context not shown here]

C. Koelbel, P. Mehrotra, and J. Van Rosendale. Supporting shared data structures on distributed memory machines. In Proceedings of the Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seattle, WA, March 1990.


Compiler Support for Machine-Independent Parallelization of.. - von Hanxleden (1994)   (7 citations)  Self-citation (Koelbel)   (Correct)

....for conveying information to the compiler at a high level is a part of this dissertation. 1.2. 1 The inspector executor paradigm An important concept associated with communication optimization for applications using irregular array subscripts is the inspector executor paradigm [MSS 88, KMV90, KMSB90, WSBH91] A loop that contains indirect accesses to a distributed array is processed in four steps: 4 1. The inspector runs through the loop and only records which array elements are accessed, without doing the actual computation. Communication schedules are computed that satisfy the ....

....circumstances and recently has been extended to general patterns of control flow [DSvH93, Das94] 1.2.2 Compilation systems for irregular problems Projects that have aimed at least to some degree towards compiler support for parallelizing irregular problems are the following. Kali Kali [KMV90, MV90, KM91] is the first compiler system that supports both regular and irregular computations on MIMD distributed memory machines. Programs written for Kali must specify a virtual processor array and assign distributed arrays to BLOCK, CYCLIC, or user specified decompositions. Instead of ....

C. Koelbel, P. Mehrotra, and J. Van Rosendale. Supporting shared data structures on distributed memory machines. In Proceedings of the Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seattle, WA, March 1990.


Compiler Analysis for Irregular Problems in Fortran D - von Hanxleden, Kennedy.. (1992)   Self-citation (Koelbel)   (Correct)

....depends on the input data, typically because of some indirection in the code. In this case, it is not possible to predict at compile time which data must be prefetched. We treat this lack of information by transforming the original parallel loop into two constructs called inspector and executor [9, 10]. During program execution, the inspector examines the data references made by a processor, and calculates which off processor data need to be fetched and where these data will be stored once they are received. The executor loop then uses the information from the inspector to perform the actual ....

C. Koelbel, P. Mehrotra, and J. Van Rosendale. Supporting shared data structures on distributed memory machines. In Proceedings of the Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seattle, WA, March 1990.


Emulation of a Virtual Shared Memory Architecture - Raina (1993)   (3 citations)  (Correct)

No context found.

C. Koelbel, P. Mehrotra, and J. V. Rosendale. Supporting Shared Data Structures on Distributed Memory Architectures. In Proceedings of the Second ACM SIGPLAN Symposium on Principles a nd Practices of Parallel programming, pages 177--186, March 1990.


Array Prefetching for Irregular Array Accesses in Titanium - Jimmy Su And (2004)   (Correct)

No context found.

C. Koelbel, P. Mehrotra, and J. Van Rosendale, "Supporting shared data structures on distributed memory machines", Symposium on Principles and Practice of Parallel Programming, 1990.


Array Prefetching for Irregular Array Accesses in Titanium - Su, Yelick (2004)   (Correct)

No context found.

C. Koelbel, P. Mehrotra, and J. Van Rosendale, "Supporting shared data structures on distributed memory machines", Symposium on Principles and Practice of Parallel Programming, 1990.


Runtime and Language Support for Compiling Adaptive.. - Hwang, Moon, Sharma (1995)   (25 citations)  (Correct)

No context found.

C. Koelbel, P. Mehrotra and J. Van Rosendale, `Supporting shared data structures on distributed memory architectures', 2nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, March 1990, pp. 177--186.


Booster: A High-Level Language for Portable Parallel.. - Paalvast, Sips, Breebaart (1991)   (1 citation)  (Correct)

No context found.

C. Koelbel, P. Mehrotra, J. Van Rosendale, "Supporting Shared Data Structures on Distributed Memory Architectures," In Second ACM SIGPLAN Symposium on 19 Principles & Practice of Parallel Programming PPOPP, SIGPLAN NOTICES, Vol. 25, No. 3, March 1990, pp. 177-186.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC