| Fox, G. C. The architecture of problems and portable parallel software systems. Tech. Rep. SCCS-134, Syracuse Center for Computational Science, Syracuse University, 1991. |
....paradigm. In essence, data parallel programs apply the same conceptual operations to all elements of large data structures. This form of parallelism occurs naturally in many scientific and engineering applications such as partial differential equation solvers and linear algebra routines [Fox91] In these programs, a decomposition of the data domain exploits the inherent parallelism and adapts it to a particular machine. Compilers can use programmersupplied decomposition patterns such as block and cyclic to partition computation, generate communication and synchronization, and guide ....
G. Fox. The architecture of problems and portable parallel software systems. Technical Report SCCS-78b, Northeast Parallel Architectures Center, Syracuse University, Syracuse, NY 13244, 1991.
....and engineering applications on distributed memory machines is the Single Program Multiple Data (SPMD) model. In this model, parallelism is achieved by partitioning data among processors which effectively represents parallelism in a class of applications called loosely synchronous applications [Fox91] To achieve load balance, express locality of access, reduce communication and other optimizations, several distribution and data alignment strategies are often used (e.g. block, cyclic, along rows, columns, etc. Many parallel programming languages or language extensions have been developed ....
Geoffrey Fox. The Architecture of Problems and Portable Parallel Software Systems. Technical Report SCCS-78b, Northeast Parallel Architectures Center, Syracuse University, Syracuse, NY 13244, 1991.
....area of research for several years. Several parallelizing compilers have been developed for data parallel programs, including Fortran D [15] and Vienna Fortran [4] and for task parallel programs [9, 10] Recent research shows that a large class of applications contain task and data parallelism [7] and it is important to exploit them in a single compiler 2 The number of processors in our machine is too small for a meaningful discussion with the full size stereo. diff diff diff Partition error error error min dgen diff diff diff error error error min dgen diff error Figure 12: Stereo ....
FOX, G. The architecture of problems and portable parallel software systems. Tech. Rep. CRPC-TR91-172, Northeast Parallel Architectures Center, 1991.
....3: The components of the decomposed discrete PDE problem based on the splitting of the mesh or grid D h used numerically. This discrete mesh is partitioned by interfaces nodes (shown as circles) into discrete subdomains D h i of the corresponding sequential computation is loosely synchronous [Fox91] The programming model for loosely synchronous computations is single program multiple data, where parallelism is achieved by partitioning the underlying geometric data (continuous or discrete) of the PDE problem and allocating the disjoint subproblems or subcomputations to the processors. ....
G. C. Fox. The architecture of problems and portable parallel software systems. Technical Report SCCS-134, NPAC, Syracuse University, 1991.
....and engineering applications on distributed memory machines is the Single Program Multiple Data (SPMD) model. In this model, parallelism is achieved by partitioning data among processors which effectively represents parallelism in a class of applications called loosely synchronous applications [Fox91] To achieve load balance, express locality of access, reduce communication and other optimizations, several distribution and data alignment strategies are often used (e.g. block, cyclic, along rows, columns, etc. Many parallel programming languages or language extensions have been developed ....
Geoffrey Fox. The architecture of problems and portable parallel software systems. Technical Report SCCS-78b, Northeast Parallel Architectures Center, Syracuse University, Syracuse, NY 13244, 1991.
....paradigm. In essence, dataparallel programs apply the same conceptual operations to all elements of large data structures. This form of parallelism occurs naturally in many scientific and engineering applications such as partial differential equation solvers and linear algebra routines [Fox91] In these programs, a decomposition of the data domain exploits the inherent parallelism and adapts it to a particular machine. Compilers can use programmer supplied decomposition patterns such as block and cyclic to partition computation, generate communication and synchronization, and guide ....
G. Fox. The architecture of problems and portable parallel software systems. Technical Report SCCS-78b, Northeast Parallel Architectures Center, Syracuse University, Syracuse, NY 13244, 1991.
....the program such that the total execution time is minimized. Experience with parallel computing has shown that a good mapping is a critical part of executing a program on such computers. This mapping can be typically performed statically or dynamically. For most regular and synchronous problems [9], this mapping can be performed at the time of compilation by giving directions in the language to decompose the data and its corresponding computations (based on the owner computes rule) We are currently developing a compiler for Fortran D, which provides a rich set of such directives [5] Load ....
Geoffrey C. Fox. The architecture of problems and portable parallel software systems. Technical Report Revised SCCS-78b, Syracuse University, July 1991.
....the complex control structure of MIMD architecture makes the machine expensive and the system overhead large. Application problems can be classified into three categories: synchronous, loosely synchronous, and asynchronous. Table I shows a few application problems in each of the three categories [13]. ffl The synchronous problems have a uniform problem structure. In each time step, every processor executes the same operation over different data, resulting in a naturally balanced load. ffl The loosely synchronous problems can be structured iteratively with two phases: the computation phase ....
G.C. Fox. The architecture of problems and portable parallel software systems. Technical Report SCCS-78b, Syracuse University, 1991.
....area of research for several years. Several parallelizing compilers have been developed for data parallel programs, including Fortran D [13] and Vienna Fortran [3] and for task parallel programs [8, 9] Recent research shows that a large class of applications contain task and data parallelism [6] and it is important to exploit them in a single compiler framework [4, 5, 12] There is also a large body of literature on, partitioning, load balancing and scheduling of parallel programs [1, 10] We have addressed the specific partitioning and load balancing issues that arise when task and data ....
FOX, G. The architecture of problems and portable parallel software systems. Tech. Rep. CRPC-TR91172, Northeast Parallel Architectures Center, 1991.
.... recent solution to the problem of easily and efficiently expressing parallel algorithms that operate on irregular data structures [4] These irregular data structures, such as unstructured sparse matrices, graphs, and trees, are becoming increasingly prevalent in computationally intensive problems [16]. Existing parallel languages are generally either dataparallel or control parallel in nature. The data parallel (or collection oriented [27] model has a single thread of control, and is easy to understand and use. Parallelism is achieved by applying functions in parallel across a set of values, ....
G. C. Fox. The architecture of problems and portable parallel software systems. Technical Report SCCS-134, Syracuse Center for Computational Science, Syracuse University, 1991.
....machines. On the other hand, it is generally agreed that although these language extensions are well suited for computations on dense matrices or regular meshes, they are not as well suited for algorithms that operate on irregular structures, such as unstructured sparse matrices, graphs, or trees [29]. Languages with control parallel constructs are often better suited for such problems, but unfortunately these constructs do not port well to vector machines, SIMD machines, or MIMD machines with vector processors. Nested data parallellanguages[10] combine aspects of both data parallel and ....
.... data parallel languages have been proposed for portable parallel programming, such as C [49] MPP Pascal [7] Lisp [39] UC [5] and Fortran 90 [2] Section 2 explained some of the expressibility and efficiency limitations imposed by flat languages; these problems are also discussed elsewhere [10, 11, 29, 32, 54]. Two existingparallel languages permit the user to describe nested data parallel operations: CM Lisp [63] and Paralation Lisp [51] However, the implementations of these languages only exploit the bottom level of parallelism; for the sparse matrix example, this results in a parallel sum for each ....
Geoffrey C. Fox. The architecture of problems and portable parallel software systems. Technical Report SCCS-134, Syracuse Center for Computational Science, Syracuse University, 1991.
....between various mappings for them and show how the framework is used to obtain efficient mappings. 1 Introduction Parallelizing compilers typically support data parallelism [7, 25] or task parallelism [16, 19] but not both. However, many applications contain a mix of task and data parallelism [1, 6, 10, 12], and it is important for the compiler to exploit both styles [8, 11, 24] In this paper, we address how program characteristics influence the tradeoffs in taskand data parallel programs, andpresentaframework for finding good parallel mappings. Thereare severalreasonsto support both task and data ....
FOX, G. The architecture of problems and portable parallel software systems. Tech. Rep. CRPC-TR91-172, Northeast Parallel Architectures Center, 1991.
.... On the other hand, it is generally agreed that although these language extensions are ideally suited for computations on dense matrices or regular meshes, they are not as well suited for algorithms that operate on irregular structures, such as unstructured sparse matrices, graphs or trees [21]. Languages with control parallel constructs are often better suited for such problems, but unfortunately these constructs do not port well to vector machines, SIMD machines or MIMD machines with vector processors. Nested data parallel languages [45, 7, 22] combine aspects of both strict ....
....languages have been proposed for portable parallel programming, such as C [35] MPP Pascal [6] Lisp [29] UC [4] and Fortran 90 [2] Section 2 explained some of the expressibility and efficiency limitations imposed by this type of language. These problems are also discussed elsewhere [7, 21, 39, 22, 8]. There are two existing languages that permit the user to describe nested data parallel operations: Connection Machine Lisp [45] and Paralation Lisp [36] However, the implementations of these languages are only able to exploit the bottom level of parallelism; for the sparse matrix example, this ....
G. C. Fox. The architecture of problems and portable parallel software systems. Technical Report SCCS-134, Syracuse Center for Computational Science, Syracuse University, 1991.
....applications. However, the complex control structure of MIMD architecture makes the machine expensive and the system overhead large. It is also difficult to program on an MIMD machine. Application problems can be classified into three categories: synchronous, loosely synchronous, and asynchronous [6]. ffl The synchronous problems have a uniform problem structure. In each time step, every processor executes the same operation over different data, resulting in a naturally balanced load. ffl The loosely synchronous problems can be structured iteratively with two phases, the computation phase ....
G.C. Fox. The architecture of problems and portable parallel software systems. Technical Report SCCS78b, Syracuse University, 1991.
....of the parallel machine. These specifications in Fortran 90D are the same as in Fortran 77D [5, 7] The main advantage of Fortran 90D is that it uses high level data structures explicitly (as arrays) and so the problem architecture is clear and not hidden in values of pointers and DO loop indices [3]. Compilers can effectively map Fortran 90D into all parallel architectures suitable for synchronous problems, including MIMD SIMD parallel machines, systolic arrays and heterogeneous networks [3] Compiler methods for converting Fortran 77D programs into distributed memory node programs are ....
.... and so the problem architecture is clear and not hidden in values of pointers and DO loop indices [3] Compilers can effectively map Fortran 90D into all parallel architectures suitable for synchronous problems, including MIMD SIMD parallel machines, systolic arrays and heterogeneous networks [3]. Compiler methods for converting Fortran 77D programs into distributed memory node programs are discussed in [8, 9] We are developing a Fortran 90D compiler, which converts Fortran 90D code into Fortran 77 plus message passing node programs for a distributed memory machine [12] Fortran 90D has ....
Fox, G., The Architecture of Problems and Portable Parallel Software Systems, Technical Report SCCS-134, Syracuse Center for Computational Science. CRPC-TR91-172.
....of type (iii) involve communication of q n and dq n values from 1 or 2 cell deep layer in adjacent block. 4. 2 Parallel Implementation The computations associated with data parallel PDE solvers that preserve the ordering of the corresponding sequential computations is loosely synchronous [13]. The programming model for loosely synchronous computations is SPMD (single program multiple data) where parallelism is achieved by partitioning the underlying geometric data of the PDE problem and allocating the smaller subsets of data to the processors. Efficient parallel implementation ....
G.C. Fox, The Architecture of Problems and Portable Parallel Software Systems Technical Report SCCS-134, NPAC, Syracuse University, 1991.
....[Fox:94h] Fox:94c] Fox:94i] Mills:93a] Here, we summarize relevant features of it in Section 2. Section 3 reviews and extends a classification of problem architectures originally developed in 1988 from a rather complete survey of parallel applications at the time [Fox:88b] Fox:88tt] [Fox:91g], Fox:94a] In Section 4, we show how the different problem categories or architectures are addressed by parallel software systems with different capabilities. We give illustrative examples, but not an exhaustive list of existing software systems with these characteristics. We consider High ....
Fox, G. C. "The architecture of problems and portable parallel software systems." Technical Report SCCS-134, Syracuse University, NPAC, Syracuse, NY, July 1991. Revised SCCS-78b.
....those related to complex systems, are helpful in developing a theory of computation and indeed may become more important as the computers and the problems they simulate get larger and more complicated. Here we present a review of these concepts. Several references contain more detailed discussions [34, 33, 25, 26, 27, 30, 28, 17, 21, 22, 24, 32]. In Section 2 we give an overview of the state of the art and future trends in parallel computing, concentrating on the use of parallel computers for simulation, particularly of complex systems. We describe recent progress in defining a standardized, portable, high level parallel language called ....
....compilers exist and can be incorporated into parallel compilers, and migrating existing code to parallel machines is much easier. In any case, to be generally usable, especially for scientific computing, any new language would need to implement the standard features and libraries of C and Fortran [19, 21]. The purpose of software, and in particular computer languages, is to map a problem onto a machine, as described in Section 3.1. A drawback of current software and languages is that they are often designed around the machine architecture, rather than the problem architecture. This can make it ....
[Article contains additional citation context not shown here]
Fox, G. C. (1991d). The architecture of problems and portable parallel software systems. NPAC Technical Report SCCS-134. Unpublished.
....relevant for the loosely synchronous class of iterative solvers considered in this work. In the loosely synchronous model, computations are carried out in phases. Each phase consists of computations on the local subproblem followed by interprocessor communication for nonlocal data [19] [22]. For parallel iterative PDE solvers, the data mapping problem can be formulated in two different levels: i) at the geometric level, using the discrete geometrical data structures (element meshes or tensor grids) associated with the PDE domain and (ii) at the algebraic level using the linear ....
G. C. Fox. The architecture of problems and portable parallel software systems. Technical Report SCCS-134, NPAC, Syracuse University, 1991.
....Mapping, Remapping, Parallel, Merging, Sorting 1 Introduction Parallelization of data parallel programs on distributed memory parallel computers requires careful attention to load balancing and reduction of communication to achieve a good performance. For most regular and synchronous problems [13], mapping can be performed at the time of compilation by giving directives to decompose the data and its corresponding computations [8] For irregular applications, achieving a good mapping is considerably more difficult; the nature of the irregularities may not be known at the time of compilation ....
G. Fox. The Architecture of Problems and Portable Parallel Software Systems. Technical report, Syracuse University, July 1991.
No context found.
Fox, G. C. "The architecture of problems and portable parallel software systems." Technical Report SCCS-134, Syracuse University, NPAC, Syracuse, NY, July 1991. Revised SCCS-78b.
No context found.
G.C. Fox, "The Architecture of Problems and Portable Parallel Software Systems ", NPAC Technical Report SCCS-134.
....loadbalancing and reduction of communication are two important issues for achieving a good performance. It is important to map the program such that the total execution time is minimized; the mapping can typically be performed statically or dynamically. For most regular and synchronous problems [10], this mapping can be performed at the time of compilation by giving directives in the language to decompose the data and its corresponding computations (based on the owner computes rule) 3] For a large class of scientific problems that are irregular in nature, achieving a good mapping is ....
Geoffrey C. Fox. The architecture of problems and portable parallel software systems. Technical Report Revised SCCS-78b, Syracuse University, July 1991.
....of the parallel machine. These specifications in Fortran 90D are the same as in Fortran 77D [5, 7] The main advantage of Fortran 90D is that it uses high level data structures explicitly (as arrays) and so the problem architecture is clear and not hidden in values of pointers and DO loop indices [3]. Compilers can effectively map Fortran 90D into all parallel architectures suitable for synchronous problems, including MIMD SIMD parallel machines, systolic arrays and heterogeneous networks [3] Compiler methods for converting Fortran 77D programs into distributed memory node programs are ....
.... and so the problem architecture is clear and not hidden in values of pointers and DO loop indices [3] Compilers can effectively map Fortran 90D into all parallel architectures suitable for synchronous problems, including MIMD SIMD parallel machines, systolic arrays and heterogeneous networks [3]. Compiler methods for converting Fortran 77D programs into distributed memory node programs are discussed in [8, 9] We are developing a Fortran 90D compiler, which converts Fortran 90D code into Fortran 77 plus message passing node programs for a distributed memory machine [12] Fortran 90D has ....
Fox, G., The Architecture of Problems and Portable Parallel Software Systems, Technical Report SCCS-134, Syracuse Center for Computational Science. CRPC-TR91-172.
No context found.
Fox, G. C. The architecture of problems and portable parallel software systems. Tech. Rep. SCCS-134, Syracuse Center for Computational Science, Syracuse University, 1991.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC