21 citations found. Retrieving documents...
E. Albert, K. Knobe, J. D. Lukas, and J. Guy L. Steele. Compiling Fortran 8x array features for the connection machine computer system. In Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems, pages 42--56. ACM Press, 1988.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Exploiting Superword Level Parallelism with Multimedia.. - Larsen, Amarasinghe (2000)   (20 citations)  (Correct)

....are then unpacked so they can be used in the sequential computation of the final statements. In the end, this method has the same effect as the transformations used for vector compilation, while only requiring loop unrolling and scalar renaming. while (1) dst[0] src1[0] src2[0] 1; dst[1] = src1[1] src2[1] 1; dst[2] src1[2] src2[2] 1; dst[3] src1[3] src2[3] 1; dst = 4; src1 = 4; src2 = 4; if (dst = end) break; Figure 3: An example of a hand optimized matrix operation that proves unvectorizable. Figure 3 shows a code segment that averages the ....

....so they can be used in the sequential computation of the final statements. In the end, this method has the same effect as the transformations used for vector compilation, while only requiring loop unrolling and scalar renaming. while (1) dst[0] src1[0] src2[0] 1; dst[1] src1[1] src2[1] 1; dst[2] src1[2] src2[2] 1; dst[3] src1[3] src2[3] 1; dst = 4; src1 = 4; src2 = 4; if (dst = end) break; Figure 3: An example of a hand optimized matrix operation that proves unvectorizable. Figure 3 shows a code segment that averages the elements ....

[Article contains additional citation context not shown here]

E. Albert, K. Knobe, J. Lukas, and G. Steele, Jr. Compiling Fortran 8x array features for the Connection Machine computer system. In Proceedings of the ACM SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages, and Systems (PPEALS), New Haven, CT, July 1988.


Exploiting Superword Level Parallelism with Multimedia.. - Larsen (2000)   (20 citations)  (Correct)

....variable, and re rolling the loop. In contrast, locating SLP within the loop body is simple. Since the optimized code is amenable to SLP analysis, hand optimization has had no detrimental effects on our ability to detect the available parallelism. 13 do dst[0] src1[0] src2[0] 1; dst[1] = src1[1] src2[1] 1; dst[2] src1[2] src2[2] 1; dst[3] src1[3] src2[3] 1; dst = 4; src1 = 4; src2 = 4; while (dst = end) Figure 2 3: Example of an unvectorizable code sequence. 2.3 Loop Level Parallelism Vector parallelism, exploited by vector computers, is a ....

....re rolling the loop. In contrast, locating SLP within the loop body is simple. Since the optimized code is amenable to SLP analysis, hand optimization has had no detrimental effects on our ability to detect the available parallelism. 13 do dst[0] src1[0] src2[0] 1; dst[1] src1[1] src2[1] 1; dst[2] src1[2] src2[2] 1; dst[3] src1[3] src2[3] 1; dst = 4; src1 = 4; src2 = 4; while (dst = end) Figure 2 3: Example of an unvectorizable code sequence. 2.3 Loop Level Parallelism Vector parallelism, exploited by vector computers, is a subset of ....

[Article contains additional citation context not shown here]

E. Albert, K. Knobe, J. Lukas, and G. Steele, Jr. Compiling Fortran 8x array features for the Connection Machine computer system. In Proceedings of the ACM SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages, and Systems (PPEALS), New Haven, CT, July 1988.


Exploiting Superword Level Parallelism with Multimedia.. - Larsen, Amarasinghe (2000)   (20 citations)  (Correct)

....on the SPEC95fp benchmark suite are from non vectorizable code sequences. To better explain the differences between superword level parallelism and vector parallelism, we present two short examples, shown in Figures 2 and 3. Although the first ex do dst[0] src1[0] src2[0] 1; dst[1] = src1[1] src2[1] 1; dst[2] src1[2] src2[2] 1; dst[3] src1[3] src2[3] 1; dst = 4; src1 = 4; src2 = 4; while (dst = end) Figure 3: An example of a hand optimized matrix operation that proves unvectorizable. ample can be molded into a vectorizable form, we know ....

....SPEC95fp benchmark suite are from non vectorizable code sequences. To better explain the differences between superword level parallelism and vector parallelism, we present two short examples, shown in Figures 2 and 3. Although the first ex do dst[0] src1[0] src2[0] 1; dst[1] src1[1] src2[1] 1; dst[2] src1[2] src2[2] 1; dst[3] src1[3] src2[3] 1; dst = 4; src1 = 4; src2 = 4; while (dst = end) Figure 3: An example of a hand optimized matrix operation that proves unvectorizable. ample can be molded into a vectorizable form, we know of no ....

[Article contains additional citation context not shown here]

E. Albert, K. Knobe, J. Lukas, and G. Steele, Jr. Compiling Fortran 8x array features for the Connection Machine computer system. In Proceedings of the ACM SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages, and Systems (PPEALS), New Haven, CT, July 1988.


Achieving Speedups for APL on an SIMD Distributed Memory Machine - Greenlaw, Snyder (1990)   (Correct)

....the language is not parallel. APL has sequential statement execution, but it provides operators that manipulate vectors whose semantics admit parallel execution [7] This is as opposed to other languages such as FORTRAN and C that have been extended to make them more adaptable to parallelism [3] [15] 14] In fact, many of the proposed extensions to FORTRAN, which make it more adaptable to parallelism, are features that have always existed in APL. The semantics of APL are very compatible with the computing capabilities of SIMD parallel computers, especially those with nonshared memory: ....

....replace each APL operator to be executed sequentially by its natural parallel implementation. This approach is new for language implementation on an SIMD machine. There is no need to extend the APL language. This differs with C on the Connection Machine [14] or with extended versions of FORTRAN [3]. Any existing APL program can be run in parallel without any modifications to the code. Since many APL operators have arrays as arguments, it seems intuitive that APL will adapt well to parallelism. Our simulation is designed to test out this hypothesis for the mesh and to derive estimates of ....

[Article contains additional citation context not shown here]

E. Albert, K. Knobe, J.D. Lukas, and G.L. Steele. Compiling FORTAN 8x array features for the connection machine computer system. In SIGPLAN Symposium on Parallel Programming: Experience with, Applications, Languages, and Systems, volume 23(9), pages 42--56. ACM, September 1988.


ADAPTing Fortran 90 Array Programs for Distributed Memory.. - John Merlin (1991)   (11 citations)  (Correct)

....to determine a good data partitioning [4] The distribution scheme presented here is obviously quite basic, and was chosen as a simple starting point. Many applications would benefit from allowing more general array distributions and alignments such as those in Fortran D [5] and CM Fortran [6, 7]. It is hoped that we will be able to exploit more flexible data mappings in future research. 4 The parallelisation system The basic idea of the parallelisation scheme is that, given a pre defined array partitioning, the potential for data parallelism, and the communications required to achieve ....

....be portable (modulo the distribution declarations) across a wide spectrum of architectures when Fortran 90 compilers become available. Indeed, the array features are already supported as extensions to Fortran 77 by many compilers for shared memory and vector systems and SIMD processor arrays (e.g. [7]) The modular design of the system should make it easier to port to other distributed memory MIMD machines. The pre processor converts the input program into standard Fortran 77, with no message passing or other extensions, so it can be used without modification on any target system that provides ....

E. Albert, K. Knobe, J. Lukas and G. Steele. Compiling Fortran 8x array features for the Connection Machine computer system, Symposium on Parallel Programming: Experience with Applications, Languages and Systems, New Haven, CT, July 1988.


Automatic Data Layout for Distributed Memory Machines - Kremer (1995)   (45 citations)  (Correct)

....mapping induces the corresponding data mapping. The majority of approaches emphasize the data mapping view of the mapping problem. The decomposition of the data mapping into alignment and distribution lead to a two step strategy, where alignment analysis is followed by distribution analysis [AKLS88, Wei91, LC90a, LC91b, GB91, GB92b, Who92a, CGST93] Since alignment and distribution decisions may be mutually dependent, performing distribution after alignment may lead to inferior results. To avoid this problem, some approaches consider alignment and distribution at the same time [Ke 93, ....

.... solutions of NP complete problems in an automatic data layout tool [BKK94, KK95] Compass Albert, Knobe, Lukas, Natarajan, Steele, and Weiss discuss automatic data layout as part of the design and implementation at Compass of SIMD compilers for Fortran 77 extended by Fortran 8x array features [AKLS88, KLS88, KLS90, KN90, Wei91] The target machines are the Connection Machine CM 2 and the MasPar MP 1. Automatic data layout is an integral part of these compilers. Arrays are aligned by mapping them onto virtual processors based on their usage as opposed to a their declared shape. The latter ....

E. Albert, K. Knobe, J. Lukas, and G. Steele, Jr. Compiling Fortran 8x array features for the Connection Machine computer system. In Proceedings of the ACM SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages, and Systems (PPEALS), New Haven, CT, July 1988.


Context Optimization for SIMD Execution - Kennedy, Roth (1994)   (Correct)

....90D program for execution on a SIMD architecture. The order that the steps are given reflects our compiler s structure, but this does not imply that all SIMD compilers are structured similarly. For a comparison, Albert et al. give an overview of compiling for the Connection Machine in Paris mode [2], and Sabot describes compiling for the Connection Machine in Slicewise mode [20] 2.3.1 Array Distribution To exploit parallelism, the Fortran 90D SIMD compiler distributes the data arrays across the PE array so that each PE has some of the data to process. The manner in which arrays are ....

....150 200 250 300 350 400 0 10 20 30 40 50 60 MasPar Fortran Hand optimized MPL Context split MPL Subgrid size Execution time (ms) Figure 14 Time for 5 point difference computation. 5 Related Work Work at Compass by Albert et al. describes the generation and optimization of context setting code [2]. They avoid redundant context computations when adjacent statements operate under the same context. They also perform classical optimizations on the context expressions, such as common subexpression elimination. They mention the possibility of reordering computations to minimize context changes, ....

E. Albert, K. Knobe, J. Lukas, and G. Steele, Jr. Compiling Fortran 8x array features for the Connection Machine computer system. In Proceedings of the ACM SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages, and Systems (PPEALS), New Haven, CT, July 1988.


Context Optimization for SIMD Execution - Kennedy, Roth (1994)   (Correct)

....arrays. 2.2 SIMD compilation This section describes our overall compilation strategy. It describes the steps necessary in translating a Fortran 90D program for execution on a SIMD architecture. For a comparison, Albert et al. give an overview of compiling for the Connection Machine in Paris mode [2], and Sabot describes compiling for the Connection Machine in Slicewise mode [16] 2.2.1 Array distribution To exploit parallelism, the Fortran 90D SIMD compiler distributes the data arrays across the PE array so that each PE has some of the data to process. The manner in which arrays are ....

....two split versions was minimal. The new split version reduced the execution time of the hand optimized MPL version by 12 , compared to the 13 reduction of the original split version. 5 Related work Work at Compass by Albert et al. describes the generation and optimization of context setting code [2]. They avoid redundant context computations when adjacent statements operate under the same context. They also perform classical optimizations on the context expressions, such as common subexpression elimination. They mention the possibility of reordering computations to minimize context changes, ....

E. Albert, K. Knobe, J. Lukas, and G. Steele, Jr. Compiling Fortran 8x array features for the Connection Machine computer system. In Proceedings of the ACM SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages, and Systems (PPEALS), New Haven, CT, July 1988.


Automatic Data Layout for Distributed Memory Machines - Kremer (1993)   (45 citations)  (Correct)

....3.2. 1 Albert, Knobe, Lukas, Natarajan, Steele, and Weiss at Compass and Thinking Machines Albert, Knobe, Lukas, Natarajan, Steele, and Weiss discuss automatic data layout as part of the design and implementation at Compass of SIMD compilers for Fortran 77 extended by Fortran 8x array features [AKLS88, KLS88, KLS90, KN90, Wei91] The target machines are the Connection Machine CM 2 and the MasPar MP 1. Automatic data layout is an integral part of these compilers. Arrays are aligned by mapping them onto virtual processors based on their usage as opposed to a their declared shape. The latter ....

E. Albert, K. Knobe, J. Lukas, and G. Steele, Jr. Compiling Fortran 8x array features for the Connection Machine computer system. In Proceedings of the ACM SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages, and Systems (PPEALS), New Haven, CT, July 1988.


Optimizing Fortran90D/HPF for Distributed-Memory Computers - Roth (1997)   (Correct)

....the data parallelism found in array expressions or forall statements for execution on the distributed memory machine. This strategy is used in the majority of Fortran90D HPF compilers for SIMD architectures [123, 140, 149] many of which are descendents of compiler technology developed by Compass [7, 9, 108, 109, 110], and is also employed by some MIMD compilers [28, 32] We refer to such compilers as array operation compilers or native Fortran90 compilers. The main phases of a compiler exploiting this model are depicted in Figure 2.4. After the input is parsed, the distributions and alignments of arrays are ....

....as array operation compilers as described in the preceding chapter. The group at Compass, along with their associates at Thinking Machines Corp. were the first to investigate the challenges of compiling Fortran90 style data parallel constructs for execution on distributed memory machines [7, 8, 9, 110]. Together they created what many would consider to be the first, commercially viable, distributedmemory compiler. In addition to the general SIMD compiler development effort, Compass did much of the ground breaking research in the area of data optimization [108, 109, 107, 111, 122] The purpose ....

[Article contains additional citation context not shown here]

E. Albert, K. Knobe, J. Lukas, and G. Steele, Jr. Compiling Fortran 8x array features for the Connection Machine computer system. In Proceedings of the ACM SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages, and Systems (PPEALS), New Haven, CT, July 1988.


Compiling Fortran 90D/HPF for Distributed Memory MIMD Computers - Bozkus (1995)   (2 citations)  (Correct)

....the PARTI run time support primitives using a set of run time library routines that support irregular communication on MIMD distributed memory machines. The Fortran 90D HPF compilation system may also use the PARTI[28] system to support irregular communications. CM Fortran The CM Fortran language [29, 30] is implemented as a subset of Fortran 77, extended by Fortran 8x array features to support a data parallel programming style for the Connection Machine (CM) computer system. CM Fortran maps arrays into the CM architecture. The compiler generates code to be executed by a CM system with a DEC VAX ....

K. Knobe, J. D. Lukas, and G. L. Steele. Compiling Fortran 8x Array Features for the Connection Machine Computer Systemication on SIMD machines. the ACM SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages, and Systems, 1988.


Exploiting Task and Data Parallelism on a Multicomputer - Subhlok (1993)   (55 citations)  (Correct)

....5 presents experimental results that compares the performance of programs compiled with different styles of parallelism. Section 6 discusses related work. 2 Programming model The language we chose is Fortran 77 augmented with Fortran 90 array syntax and data layout statements based on Fortran D [1] and the emerging High Performance Fortran (HPF) standard [8] The choice of Fortran was based on convenience and user acceptance. We have taken a source to source compilation approach; the output of the compiler is a Fortran 77 program (with calls to a communication library) for all the nodes in ....

....influence on this project. Several research and production compilers have been built for compiling a program in data parallel fashion to a SPMD parallel program. These include the Fortran D compiler [10] SuperB [20] the Vienna Fortran compiler [5] and the Thinking Machines Fortran compiler [1]. The general approach is to take a sequential program in a conventional language (Fortran for the above three) possibly with directives to guide data layout and parallelism. The compiler parallelizes the program, typically using the owner computes rule to distribute the computation among nodes. ....

Albert, E., Knobe, K., Lukas, J., and Steele, G. Compiling Fortran 8x array features for the Connection Machine computer system. In Proceedings of the ACM SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages and Systems (New Haven, CT, July 1988), pp. 42--56.


Fortran D Language Specification - Fox, Hiranandani, Kennedy, Koelbel.. (1991)   (61 citations)  Self-citation (Fortran)   (Correct)

....to align data arrays, both within and across dimensions. Different distributions are evaluated, then communication using Crystal collective communication routines is generated. Since data decompositions are automatically calculated, no decomposition specifications are provided. CM Fortran [AKLS88, TMC89] is a version of Fortran extended with vector notation, alignment, and data layout specifications. Programmers must explicitly specify data parallelism in CM Fortran programs by marking certain array dimensions as parallel. CM Fortran has a FORALL statement [ALS90] similar to the Fortran D ....

E. Albert, K. Knobe, J. Lukas, and G. Steele, Jr. Compiling Fortran 8x array features for the Connection Machine computer system. In Symposium on Parallel Programming: Experience with Applications, Languages and Systems, New Haven, CT, July 1988.


Compiling Fortran 77D and 90D for MIMD.. - Choudhary, Fox.. (1992)   Self-citation (Fortran)   (Correct)

....machines. Adapt proposes to scalarize and partition Fortran 90 programs using a run time library for Fortran 90 intrinsics [29] The CM Fortran compiler compiles Fortran 90 with alignment and layout specifications directly to the physical machine, and can optimize floating point register usage [3]. The Fortran 90 Y compiler uses formal specification techniques to generate efficient code for the CM 2 and CM 5 [15] Paragon is a version of C extended with array syntax, operations, reductions, permutations, and distribution specifications [14] Our compiler resembles the Vienna Fortran 90 ....

E. Albert, K. Knobe, J. Lukas, and G. Steele, Jr. Compiling Fortran 8x array features for the Connection Machine computer system. In Proceedings of the ACM SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages, and Systems (PPEALS), New Haven, CT, July 1988.


Compiler Optimizations for Fortran D on MIMD.. - Hiranandani, Kenney.. (1991)   (92 citations)  Self-citation (Fortran)   (Correct)

....4 Related Work We view the Fortran D compiler as a second generation distributed memory compiler that integrates and extends analysis and optimization techniques from many other research projects. It is related to other distributed memory compilation systems such as Al [30] CM Fortran [1], Dino [28] MIMDizer [15] Pandore [3] Paragon [10] and Spot [29] but mostly builds on the following research projects. SUPERB is a semi automatic parallelization tool that supports arbitrary user specified contiguous BLOCK distributions [13, 34] It originated overlaps as a means to both ....

E. Albert, K. Knobe, J. Lukas, and G. Steele, Jr. Compiling Fortran 8x array features for the Connection Machine computer system. In Symposium on Parallel Programming: Experience with Applications, Languages and Systems, New Haven, CT, July 1988.


Compiler Support for Machine-Independent Parallel.. - Hiranandani, Kennedy.. (1991)   (70 citations)  Self-citation (Fortran)   (Correct)

....loop peeling, etc. The Fortran D compiler will apply many of the same transformations. booster [PvGS90] provides user specified distribution functions defined as program views, but does not generate or optimize communications. 6.3 SIMD Compilation Systems 6.3. 1 CM Fortran CM Fortran [AKLS88, TMC89] is a version of Fortran 77 extended with vector notation, alignment, and data layout specifications. Programmers must explicitly specify data parallelism in cm fortran programs by marking certain array dimensions as parallel. The operating system of the underlying SIMD distributedmemory ....

E. Albert, K. Knobe, J. Lukas, and G. Steele, Jr. Compiling Fortran 8x array features for the Connection Machine computer system. In Symposium on Parallel Programming: Experience with Applications, Languages and Systems, New Haven, CT, July 1988.


Optimizing Fortran 90D Programs for SIMD Execution - Roth (1993)   Self-citation (Fortran)   (Correct)

....were written by Compass; in fact, Compass wrote the entire initial CM Fortran compiler (Paris version) The CM Fortran and MasPar Fortran compilers are described in more detail below. In addition to the general SIMD compiler development effort, Compass did research in the area of data optimization [1, 40, 41, 42, 43, 52]. The purpose of data optimization is to align data to improve locality and thus minimize interprocessor communication. Their method assumes an unlimited number of virtual processors. Then based on usage patterns, it maps arrays to the virtual processors, striving to align them so that ....

E. Albert, K. Knobe, J. Lukas, and G. Steele, Jr. Compiling Fortran 8x array features for the Connection Machine computer system. In Proceedings of the ACM SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages, and Systems (PPEALS), New Haven, CT, July 1988.


Unified Compilation of Fortran 77D and 90D - Choudhary, Fox, Hiranandani.. (1993)   (6 citations)  Self-citation (Fortran)   (Correct)

....forall loop may be executed in parallel without synchronization. However, communication may still be required before the loop to acquire non local values, and after the loop to update or merge non local values. Single statement Fortran D forall loops are identical to those supported in CM Fortran [2] and HPF [22] but multi statement Fortran D forall loops are different from those found in HPF. 3. FORTRAN D COMPILATION STRATEGY 3.1 OVERALL STRATEGY Our strategy for parallelizing Fortran D programs for distributed memory MIMD computers is illustrated in Figure 1. In brief, we transform both ....

....[35] and Adaptor [12] propose to scalarize and partition Fortran 90 programs using a run time library for Fortran 90 intrinsics. The CM Fortran compiler compiles Fortran 90 with alignment and layout specifications directly to the physical machine, and can optimize floating point register usage [2, 13]. The Fortran 90 Y compiler uses formal specification techniques to generate efficient code for the CM 2 and CM 5 [17] Forge90, formerly Mimdizer, is an interactive parallelization system for MIMD shared and distributed memory machines from Applied Parallel Research [7] It performs data flow and ....

Albert, E., Knobe, K., Lukas, J., and Steele, Jr., G. Compiling Fortran 8x array features for the Connection Machine computer system. In Proceedings of the ACM SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages, and Systems (PPEALS) (New Haven, CT, July 1988).


OOPAL: Integrating Array Programming in Object-Oriented.. - Mougin, Ducasse (2003)   (Correct)

No context found.

E. Albert, K. Knobe, J. D. Lukas, and J. Guy L. Steele. Compiling Fortran 8x array features for the connection machine computer system. In Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems, pages 42--56. ACM Press, 1988.


A Framework for Automatic Data Layout for - Distributed Memory Machines (1999)   (Correct)

No context found.

Albert E., Knobe K. et.al. Compiling Fortran 8x array features for the Connection Machine computer system. Proceedings of the ACM SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages, and Systems (PPEALS), New Haven, CT, July 1988.


High Performance Fortran: History, Status and Future - Mehrotra, Van Rosendale, Zima (1997)   (Correct)

No context found.

E. Albert, K. Knobe, J. D. Lukas, and Jr. G. L. Steele. Compiling Fortran 8x array features for the Connection Machine Computer System. In Proceedings of the Symposium on Parallel Programming: Experience with Applications, Languages, and Systems (PPEALS), pages 42--56, New Haven, CT, July 1988.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC