| Michael J. Quinn and Philip J. Hatcher. Data-Parallel Programming on Multicomputers. IEEE Software, 7(5):69--76, September 1990. |
....expensive dot products, to complex heuristics. Specifying a data parallel computation in terms of a single element is the approach we have used in creating our data parallel extensions. We call this approach elementcentered. Fundamentally, this concept is not new to data parallel languages [2, 4, 6, 13, 15, 16, 17, 20, 21]. However, we have extended the notion to encompass subset level data parallelism. By subset level data parallelism we mean allowing the definition of operations in which subsets (as opposed to elements) are the data granules, e.g. a row or a column. This is not the same as applying an ....
....data set in a relative fashion. The starting point is the element to which the operation is being applied, and the relative addresses are resolved at run time. For example, W( pixel, or W( S( pixel. The programmer may also specify boundary conditions as well. These mechanisms are similar to [21]. Invocation of an aggregate operation is shown on lines 18 21 of Figure 1. Assume that a data parallel object iden dp mbrfcn : return type AGG fcn name subset specifier ( arg agg arg ] arg ) void OVR fcn name ( arg ovr arg ] return type RED fcn name ....
[Article contains additional citation context not shown here]
M.J. Quinn and P.J. Hatcher, "Data-Parallel Programming on Multicomputers," IEEE Software, Sept. 1990, pp. 69-76.
....offer significant advantages over the shared memory multiprocessors with regard to cost and scalability, however, they are also more difficult to program. Much of that difficulty is due to their lack of a single global address space. Hence, the last few years have seen considerable research effort [10, 15, 12, 20, 4, 17, 18] aimed at providing a shared name space to the programmer, with the task of generating messages relegated to the compiler. Most of these parallelization systems accept a program written in a sequential or shared memory language augmented with annotations specifying distribution of data, and ....
....routines. Some of the ideas underlying our approach to estimating communication costs are closely related to those developed for automatically generating communication from sequential or shared memory language programs for multicomputers. The Fortran D compiler [10] the Dataparallel C compiler [15], and the Kali system [12] perform analysis of source references to determine the communication induced by the data partitioning scheme for each loop. Li and Chen [13] introduce the notion of matching source program references with syntactic patterns associated with collective communication ....
M.J. Quinn and P. J. Hatcher. Data-parallel programming on multicomputers. IEEE Software, 7:69--76, September 1990.
....[28] These research efforts include the Fortran D compiler [30, 31] and the Superb compiler [81] both accepting Fortran 77 as the base language. The Crystal compiler [15] and the Id Nouveau compiler [62] are targeted for single assignment languages. Numerous other compilers, Dataparallel C [59], C [63] Kali [43, 44] Dino [64, 65] Al [77] Arf [67] Oxygen [66] Pandore [4] also produce parallel code for multicomputers, but require explicit parallelism in the source program. Some of the commercially available compilers for multicomputers are Mimdizer [69] and Aspar [35] Many ....
M.J. Quinn and P. J. Hatcher. Data-parallel programming on multicomputers. IEEE Software, 7:69--76, September 1990.
....Data) Karp87] format, given a data decomposition specification. This approach has recently gained a lot of attention. It has been applied by [Callahan88, Gerndt89, Kennedy89,90] for applications to Fortran, by [Andre90] to C, by [Rogers89] to Id Nouveau, by [Koelbel90] to Kali Fortran, by [Quinn89] to C , and by [Paalvast90] to the fourth generation parallel programming language Booster. In particular application to Fortran shows some limitations, due to equivalencing, passing of array subsections to subroutine calls, etc. A second limitation is that the description of complex ....
M.J. Quinn, P.J. Hatcher, "Data parallel programming on multicomputers," Parallel Computing Laboratory, Department of Computer Science, University of New Hampshire, Report no. PCL-89-18, March 1989, 16 pp.
.... explicit data decompositions. From these data decomposition specifications, SPMD (Single Process Multiple Data) code [Karp87] can be generated automatically. This approach is followed by [Callahan88, Gerndt89, Kennedy89, 4 Koelbel89] in FORTRAN, by [Rogers89] in Id Nouveau, and by [Quinn89] in C . This concept is also followed in Booster [Paalvast90] 3. Booster Language concepts Booster is a high level, fourth generation, algorithm description language for sequential and parallel computers. Parallel computers may be either distributed or shared memory systems. The basic ....
M.J. Quinn and P.J. Hatcher, "Data Parallel Programming on Multicomputers," IEEE Software, September 1990, pp. 69-76.
....architectures in order to optimize computation and communication efficiency. The approach of inducing parallelism by explicitly decomposing the data is not new. In [Callahan88, Gerndt89, Kennedy89] applications to Fortran are described, in [Rogers89] to Id Nouveau, in [Koelbel87] to BLAZE, and in [Quinn89] to C . In particular application to Fortran is limited, because of equivalencing, passing of array subsections to subroutine calls, and any form of indirect addressing cannot be translated efficiently. A second limitation is that the description of complex decompositions and especially dynamic ....
M.J. Quinn, P.J. Hatcher, Data Parallel Programming on Multicomputers, Parallel Computing Laboratory, Department of Computer Science, University of New Hampshire, report number PCL-89-18, March 1989, 16 pp.
....in a number of thread objects that overwhelms the capabilities of either the Java virtual machine or the virtual memory system. To avoid that sort of inefficiency and to make a transformation work for nested parallelism as well, threads must be re used and standard virtualization loop techniques [16] must used. Both will increase the complexity of the transformation, since re use of threads is difficult to achieve in Java, and since code for index set splitting and additional boundary checks for the first last thread need to be added. ffl Fan out and fan in restrictions. In general, data ....
Michael J. Quinn and Philip J. Hatcher. Dataparallel programming on multicomputers. IEEE Software, 7(5):69--76, September 1990.
....the absence of global address space, and consequently, the need for explicit message passing among processes makes such machines very difficult to program. This has motivated considerable research towards developing compilers that relieve the programmer of the burden of generating communication [15, 23, 17, 19, 18, 16, 21]. Such compilers take a sequential or a shared memory parallel program, annotated with directives specifying data decomposition, and generate the target SPMD program with explicit message passing. Thus, the compiler performs two essential tasks: partitioning of computation, usually based on ....
M.J. Quinn and P. J. Hatcher. Data-parallel programming on multicomputers. IEEE Software, 7:69--76, September 1990.
....with directives specifying data decomposition. The compilers for these languages are responsible for partitioning the computation, and generating the communication necessary to fetch values of non local data referenced by a processor. A number of such prototype compilers have been developed [18, 33, 23, 26, 22, 25, 3, 15, 28]. Since the cost of interprocessor communication is usually orders of magnitude higher than the cost of accessing local data, it is extremely important for the compilers to optimize communication. The most common optimizations include message vectorization [18, 33] using collective communication ....
M.J. Quinn and P. J. Hatcher. Data-parallel programming on multicomputers. IEEE Software, 7:69--76, September 1990. 25
....the fine grained operations into larger grained tasks (loops) and relaxing the lock step synchronization while maintaining semantic equivalence. The aggregation of multiple operations also allows traditional code improvement techniques to be applied to the aggregate. Recently, Quinn and Hatcher [53] have demonstrated techniques for compiling the data parallel language C [57] for MIMDmultiprocessors. However, the programs they can handle are limited by the restricted semantics of C . Our work goes beyond this, showing how to handle loops with dependences, such as the scan primitive of APL ....
....languages such as APL [39] FP [4] and Sisal [31] loop fusion techniques in Fortran compilers, optimization of series expressions, and type checking in polymorphic languages. We now discuss how our work relates to each of these. C : Quinn and Hatcher have worked on compiling C for MIMDmachines [54, 53, 36]. While their work has many of the same goals as ours, the two differ in the following ways: # The domain construct in C provides the information that size inference computes; moreover, the C model requires all domain sizes to be known at compile time. 3 Domains cannot be created on the fly ....
QUINN,M.J.,AND HATCHER , P. J. Data-parallel programming on multicomputers. IEEE Software 7,5(Sept. 1990), 69--76.
....a map operation, some form of reduction, perhaps using only a fixed set of operators, and later scans (parallel prefixes) and permutation operations. In approximately chronological order, these models are: scan [32] multiprefix [170] paralations [100, 171] the C data parallel language [111, 165], the scan vector model and NESL [33 38] and CamlFlight [109] As for other data parallel languages, these models are simple and fairly abstract. For instance, C is an extension of the C language that incorporate features of the SIMD parallel model. In C data parallelism is implemented by ....
M.J. Quinn and P.J. Hatcher. Data-parallel programming on multicomputers. IEEE Software, pages 69--76, September 1990.
.... early performance feedback in the upper, compile time layer of the performance prediction hierarchy has been applied in many parallel programming environments, in which parallelization is performed either explicitly [12, 35, 26] or through restructuring [2, 11, 20, 36] and data partitioning [3, 22, 27, 32, 34, 37, 39, 40, 47]. Due to the procedure oriented (e.g. fork2 join) source level paradigm, the computational task graph has a series parallel structure, which permits an efficient prediction, typically implemented through a recursive reduction scheme, based on the existence of a homomorphism between ....
M.J. Quinn and P.J. Hatcher, "Data-parallel programming on multicomputers, " Software, Sept. 1990, pp. 69--76.
....applications, it has been estimated that 90 of scientific and engineering problems are amenable to data parallel solutions [28] Data parallel algorithms are easier to design and debug. Control parallel algorithms tend to suffer from time related errors such as deadlocks and data incoherence [49]. Data parallel algorithms, however, are logically synchronous and inherently deterministic. Data parallel algorithms are better able to scale to large numbers of processors than control parallel algorithms [50] Since data parallel algorithms generally work for a range of problem sizes, more ....
....algorithms is limited by the problem decomposition. Related to the data parallel control parallel issue is Flynn s [27] classification of multiprocessor computer architectures as SIMD (single instruction stream, multiple data stream) or MIMD (multiple instruction stream, multiple data stream) [49]. In both SIMD and MIMD machines, each processor operates on its own local data. In a SIMD machine, all the processors execute the same operations simultaneously, proceeding in lock step fashion under the direction of a single control unit. In contrast, each processor of a MIMD machine can execute ....
[Article contains additional citation context not shown here]
M. J. Quinn and P. J. Hatcher. Data-Parallel Programming on Multicomputers. IEEE Software 7 (5), 69--76 (1990).
....with directives specifying data decomposition. The compilers for these languages are responsible for partitioning the computation, and generating the communication necessary to fetch values of non local data referenced by a processor. A number of such prototype compilers have been developed [13, 31, 17, 23, 16, 21, 12, 26]. The performance of these programs depends greatly on the data partitioning scheme chosen by the programmer. In general, the best partitioning scheme depends not only on program characteristics, but also on numerous machine specific parameters, and on the kind of optimizations performed by the ....
M.J. Quinn and P. J. Hatcher. Data-parallel programming on multicomputers. IEEE Software, 7:69--76, September 1990.
....the fine grained operations into larger grained tasks (loops) and relaxing the lock step synchronization while maintaining semantic equivalence. The aggregation of multiple operations also allows traditional code improvement techniques to be applied to the aggregate. Recently, Quinn and Hatcher [28] have demonstrated techniques for compiling the data parallel language C [30] for MIMD multiprocessors. However, the programs they can handle are limited by the restricted semantics of C . Our work goes beyond this, showing how to handle loops with dependences, such as the scan primitive of APL ....
....data dependence considerations. Permutes present a major obstacle because they are impervious to dependence analysis. Recently, there has been some work on identifying idioms such as scans and reductions in Fortran programs [27] C : Quinn and Hatcher have worked on compiling C for MIMD machines [28]. Their work has some of the same goals as ours. It differs from ours in two main ways: their runtime model involves virtual processor emulation by the physical processors, and they do not attempt any inter statement storage optimizations. They also do not attempt to perform source to source ....
Michael J. Quinn and Philip J. Hatcher. Data-Parallel Programming on Multicomputers. IEEE Software, 7(5):69--76, September 1990.
....toward developing highlevel languages that are efficiently portable among parallel and vector supercomputers. A common approach has been to add data parallel operations to existing languages, as exemplified by the High Performance Fortran (HPF) effort [33] and various extensions to C (such as C [49, 47], UC [5] and C [38] Such data parallel extensions offer fine grained parallelism and a simple programming model, while permitting efficient implementation on SIMD, MIMD, and vector machines. On the other hand, it is generally agreed that although these language extensions are well suited for ....
Michael J. Quinn and Philip J. Hatcher. Data-parallel programming on multicomputers. IEEE Software, 7(5):69--76, September 1990.
....supported in part by the Office of Naval Research under Contract N0001491J 1096. This research was supported in part by the Office of Naval Research under Contract N00014 91J 1096. multicomputer. Examples of such systems include the Fortran D [9] Superb [22] Oxygen [18] and the Dataparallel C [16] compilers. The communication overheads and the extent of parallelism exploited in the resulting target program are determined largely by the manner in which data is partitioned across different processors of the machine. Most of the compilers provide no assistance to the programmer in the crucial ....
M.J. Quinn and P. J. Hatcher. Data-parallel programming on multicomputers. IEEE Software, 7:69--76, September 1990.
....directed toward developing highlevel languages that are efficiently portable among parallel and vector supercomputers. A common approach has been to add data parallel operations to existing languages, as exemplified by the High Performance Fortran (HPF) effort [23] and various extensions to C [35, 34, 4]. Such data parallel extensions offer fine grained parallelism and a simple programming model, while permitting efficient implementation on SIMD, MIMD, and vector machines. On the other hand, it is generally agreed that although these language extensions are ideally suited for computations on ....
M. J. Quinn and P. J. Hatcher. Data-parallel programming on multicomputers. IEEE Software, 7(5):69--76, Sept. 1990.
....as does the generation of iteration code to loop over contained elements, and other mind numbing details. Details of the computation model (macro data flow) the Mentat programming language and the Mentat run time system can be found elsewhere [7, 8, 17] 2. 3 Related Work Dataparallel C [10, 11, 18, 19], pC [2, 15] C [14] Fortran D [5] Fortran 90 [10] and High Performance Fortran (HPF) 16] are the languages from which we have borrowed ideas and which are related to our work. We have also developed some new ideas. C and pC are based on C . Dataparallel C is based on C, but uses some ....
....C is based on C, but uses some ideas from objectoriented language design. HPF s origins are obvious. All of these languages are strictly data parallel languages, and we differ from them all in that we are combining both control and data parallelism within the same language. Dataparallel C In [19], Quinn and Hatcher, the designers of Dataparallel C, show that a strictly SIMD program can be compiled to a distributed memory architecture without sacrificing performance. This entails loosening the synchronization points by synchronizing only when communication between physical processors is ....
[Article contains additional citation context not shown here]
M.J. Quinn and P.J. Hatcher, "Data-Parallel Programming on Multicomputers," IEEE Software, Sept. 1990, pp. 69-76.
....results in a number of thread objects that overwhelms the capabilities of either the Java virtual machine or the virtual memory system. To avoid these inefficiency and to make a transformation work for nested parallelism as well, threads must be reused and standard virtualization loop techniques (Quinn and Hatcher, 1990) must used. Both will increase the complexity of the transformation, since reuse of threads is difficult to achieve in Java, and since code for index set splitting and additional boundary checks for the first last thread need to be added. Lack of Efficiency 2: Fan out and fan in restrictions. The ....
Quinn, M. J. and Hatcher, P. J. (1990). Data-parallel programming on multicomputers.
....compiler. As with CM Fortran, virtual processors are generated for each element of a domain and mapped to each physical processor. Researchers have also examined synchronization problems when translating SIMD programs into equivalent SPMD programs, as well as several communication optimizations [QH90] 6.3.3 DINO Dino [RSW89, RSW90, RW90] is an extended version of C supporting general purpose distributed computation. Dino supports BLOCK, CYCLIC, and special stencilbased data distributions with overlaps, but provides no alignment specifications. A Dino program contains a virtual parallel ....
M. Quinn and P. Hatcher. Data parallel programming on multicomputers. IEEE Software, September 1990.
....being used to deliver very high levels of performance for a variety of applications. Over the last few years, considerable research effort has gone into developing compilers that make it easier to program such machines by relieving the programmer of the burden of managing communication explicitly [9, 22, 16, 12, 4, 17, 18]. Most of these compilers take a sequential or a shared memory parallel program, and based on the user specified partitioning of data, generate the parallel program targeted to a multicomputer. An obvious challenge before all of these efforts is to match the quality and efficiency of the code with ....
M.J. Quinn and P. J. Hatcher. Data-parallel programming on multicomputers. IEEE Software, 7:69--76, September 1990.
....paradigm. In this paradigm, parallelism comes from simultaneous operations across large sets of data, rather than from multiple threads of control[2] Data parallelism is a synchronous paradigm and therefore well suited to SIMD machines. It has also been implemented successfully on a MIMD machine[5]. As an extension of C, C inherits most of the drawbacks of its ancestor, but we are not concerned about those here. Neither are we concerned with limiting C to a synchronous paradigm, even though an asynchronous one would be more general. We are concerned, however, with the principles of ....
Michael J. Quinn and Philip J. Hatcher. Dataparallel programming on multicomputers. IEEE Software, pages 69--76, September 1990.
No context found.
Michael J. Quinn and Philip J. Hatcher. Data-Parallel Programming on Multicomputers. IEEE Software, 7(5):69--76, September 1990.
No context found.
Michael J. Quinn and Philip J. Hatcher, "Data-Parallel Programming on Multicomputers," IEEE Software, pages 69--76, September 1990.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC