15 citations found. Retrieving documents...
J. Subhlok. Automatic Mapping of task and data Parallel programs for Efficient Execution on Multiprocessors. Technical Report CMU-CS-93-212, Carnegie mellon University, 1993.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Modeling the Communication Behavior of Distributed Memory.. - Foschia, Rauber, Rünger   (Correct)

....compilers, there is considerable research effort to build modeling tools because such tools are imperative to derive efficient implementations. The significant work includes the work related to the Fortran D compiler [4] the Paradigm compiler [5] the Suif compiler [2] the Fx compiler [34, 33], and the Vienna Fortran Compiler [11, 12] Other approaches include the use of petri nets [13] queuing networks, and Markov chains [35] The Fortran D compiler contains an interactive tool that allows the programmer to select regions of the sequential input program. The tool responds with a ....

J. Subhlok. Automatic Mapping of task and data Parallel programs for Efficient Execution on Multiprocessors. Technical Report CMU-CS-93-212, Carnegie mellon University, 1993.


Scheduling of Multiprocessor Tasks for Numerical Applications - Rauber, Rünger (1996)   (Correct)

....can be executed by independent groups of processors. This is expressed by grouping subroutine calls to modules if they are executed by the same processor group. The Fx compiler provides a mapping tool for the grouping of subroutine calls to modules and the mapping of processors to modules [17]. Although the Fx approach is similar in spirit to the TwoL approach, there are some important differences: ffl The parallel sections in Fx cannot be nested whereas TwoL allows an hierarchical structure of the modules. This provides a larger flexibility and seems to be more natural for the ....

....quite naturally. ffl The mapping tool of Fx is based on static runtime expressions that do not take the structure of the subroutines into consideration and the parameter of which are determined by separate runtime tests for each application. There is no verification of the predicted runtimes in [17]. An exploitation of task and data parallelism in the context of a parallelizing compiler can be found in the Paradigm compiler [20, 21, 32] The Paradigm compiler provides a framework that expresses task parallelism by a macro dataflow graph which is similar to a graphical representation of the ....

J.Subhlok. Automatic Mapping of task and data Parallel programs for Efficient Execution on Multiprocessors. Technical Report CMU-CS-93-212, Carnegie mellon University, 1993.


The Compiler TwoL for the Design of Parallel Implementations - Rauber, Rünger (1996)   (1 citation)  (Correct)

....parallel and, thus, no data dependence analysis is needed to extract the available parallelism within the modules. A different approach would be to use a parallel programming language like HPF or Fortran 90. A module specification is more general than the specification of a parallel section in Fx [20] because module calls are allowed to have an arbitrary call structure. Currently, we do not allow the use of recursive calls. The module specification does not determine a data distribution for composed variables of the modules nor does it contain an assignment of processors to modules. It even ....

....can be executed by independent groups of processors. This is expressed by grouping subroutine calls to modules if they are executed by the same processor group. The Fx compiler provides a mapping tool for the grouping of subroutine calls to modules and the mapping of processors to modules [20]. Although the Fx approach is similar in spirit to the TwoL approach, there are some important differences: The parallel sections in Fx cannot be nested whereas TwoL allows an hierarchical structure of the modules. This provides a larger flexibility and seems to be more natural for the development ....

[Article contains additional citation context not shown here]

J. Subhlok. Automatic Mapping of task and data Parallel programs for Efficient Execution on Multiprocessors. Technical Report CMU-CS-93-212, Carnegie mellon University, 1993.


Randomized, Oblivious, Minimal Routing Algorithms for.. - Nesson (1995)   (Correct)

....as opposed to most parallel compilers which only do one or the other. The authors suggest that it is possible to determine the best combination of task and data parallelism for a program at compile time for a given target architecture, and that it may be possible to do this automatically [121]. Fx is targeted at applications that process a stream of input and whose computational structure is fairly static and predictable [122] Hence, the range of applications is limited. In Fx, data parallelism is expressed through array syntax and parallel loops. Task parallelism is expressed ....

....the functions themselves. However, there is also an automatic scheme in which the user provides a representative data set for each task subroutine, and the three functions are derived by the compiler by compiling, linking, and running versions of each routine for different numbers of processors [121]. Using the outputs from the functions described above, the compiler determines which task subroutines should be mapped to the same set of processors (called modules) how many processors should be assigned to each module, and whether modules should be replicated (e.g. it may be more efficient ....

J. Subhlok, Automatic Mapping of Task and Data Parallel Programs for Efficient Execution on Multicomputers, Tech. Report CMU-CS-93-212, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, November 1993. WWW URL is http://www.cs.cmu.edu.


Integrating Library Modules into Special Purpose Parallel.. - Rauber, Rünger (1997)   (1 citation)  (Correct)

.... languages [4, 18, 9] parallelizing compilers [3] and runtime prediction models [12, 2] A more detailed discussion of this work can be found in [26] There is also some relationship between our work and the work on combining task and data parallelism as it is considered in Fortran M [15] Fx [29], and the Paradigm compiler [5] In Fx, the programmer has the possibility to express task parallelism by parallel sections containing subroutine calls. Each call is followed by input and output directives identifying the input and output parameters of the subroutine. Each subroutine call in a ....

J. Subhlok. Automatic Mapping of task and data Parallel programs for Efficient Execution on Multiprocessors. Technical Report CMU-CS-93-212, Carnegie Mellon University, 1993.


A Framework for Exploiting Data and Functional.. - Ramaswamy.. (1994)   (7 citations)  (Correct)

....methods of [23, 27, 28] then use clustering on the nodes to form larger nodes during the construction of a schedule. The second approach to allocation and scheduling is a top down approach like the ones used by Prasanna and Agarwal in [24] Belkhale and Banerjee in [12, 13] by Subhlok et al. in [17, 29, 30], by Ramaswamy and Banerjee in [14, 15] and in this paper. Top down approaches start with the assumption of heavyweight nodes (again, in terms of computation requirements) in the MDG and break them down during the process of constructing an optimal schedule. Top down methods are considered better ....

....processing cost model used. We do not make any assumptions for our MDGs and use very realistic cost models. The work in [12, 13] also does not consider the effects of non zero data transfer costs. Their allocation and scheduling algorithms are similar to the ones we use. The research presented in [17, 29, 30] considers allocation and scheduling for a class of problems that process continuous streams of data sets. The computation for each data set has a tree structured MDG for all their benchmark programs [31] A set of heuristics are used to decide on a good allocation and scheduling scheme. There is ....

[Article contains additional citation context not shown here]

J. Subhlok, "Automatic Mapping of Task and Data Parallel Programs for Efficient Execution on Multicomputers," Tech. Rep. CMU-CS-94-106, Carnegie Mellon University, 1994.


A Framework for Generating Task Parallel Programs - Ursula Fissgus, Thomas.. (1999)   (Correct)

....of processors like the ASCI teraflop machines or many of the T3E installations. Applications that benefit from a combination of task and data parallelism include examples from author supported by Deutsche Forschungsgemeinschaft 2 MODEL OVERVIEW 2 numerical analysis [23, 24] signal processing [29], and multidisciplinary codes like Global Climate Modeling [2] The integration of task and data parallelism is an active area of research because of its potential benefits and several approaches have been proposed recently. These include language approaches like Fortran M, Fx, and Opus which add ....

....processors into subgroups and to assign computations to different subgroups (task regions) 30] Computations of a specific subgroup are executed in a data parallel way. The Fx compiler provides a mapping tool for the grouping of subroutine calls to modules and the mapping of processors to modules [29]. Although the Fx approach is similar in spirit to the TwoL approach, there are some important differences: The task regions in Fx cannot be nested lexically whereas TwoL allows a hierarchical structure of the modules. On the other hand allows Fx a dynamic nested partitioning of processors by ....

[Article contains additional citation context not shown here]

J. Subhlok. Automatic Mapping of task and data Parallel programs for Efficient Execution on Multiprocessors. Technical Report CMU-CS-93-212, Carnegie Mellon University, 1993.


A Framework for Generating Group-Parallel Programs - Ursula Fissgus, Thomas.. (1998)   (1 citation)  (Correct)

....This is especially important for parallel machines with a large number of processors like the ASCI teraflop machines and many of the T3E installations. Applications that benefit from a combination of task and data parallelism include examples from numerical analysis [21, 20] signal processing [25], and multidisciplinary codes like Global Climate Modeling, see [2] for a good overview. The integration of task and data parallelism is an active area of research because of its potential benefits and several approaches have been proposed recently. These include language approaches like Fortran ....

....processors into subgroups and to assign computations to different subgroups (task regions) 26] Computations of a specific subgroup are executed in a dataparallel way. The Fx compiler provides a mapping tool for the grouping of subroutine calls to modules and the mapping of processors to modules [25]. Although the Fx approach is similar in spirit to the TwoL approach, there are some important differences: The task regions in Fx cannot be nested lexically whereas TwoL allows an hierarchical structure of the modules. On the other hand allows Fx a dynamic nested partitioning of processors by ....

[Article contains additional citation context not shown here]

J. Subhlok. Automatic Mapping of task and data Parallel programs for Efficient Execution on Multiprocessors. Technical Report CMU-CS-93212, Carnegie Mellon University, 1993.


Modeling the Communication Behavior of the Intel Paragon - Riccardo Foschia, Thomas .. (1997)   (Correct)

....compilers, there is considerable research effort to build modeling tools because such tools are imperative to derive efficient implementations. The significant work includes the work related to the Fortran D compiler [3] the Paradigm compiler [4] the Suif compiler [2] the Fx compiler [17], and the Vienna Fortran Compiler [8] Other approaches include the use of Petri nets [9] queuing networks, and Markov chains [18] 3 Overview of the Intel Paragon The Intel Paragon supercomputer is a DMM with an architecture that can accommodate up to 4096 heterogeneous nodes connected in a ....

J. Subhlok. Automatic Mapping of task and data Parallel programs for Efficient Execution on Multiprocessors. Technical Report CMU-CS-93-212, Carnegie mellon University, 1993.


Communication and Memory Requirements as the Basis .. - Subhlok.. (1994)   (21 citations)  Self-citation (Subhlok)   (Correct)

....by the programmer with directives. In this paper, we develop a framework to guide the programmer in finding a good mapping for a program, and illustrate the procedure with a few applications. A more rigorous description of the framework, along with an automatic mapping tool are discussed in [13]. 3 Application domain The mapping framework discussed in this paper is applicable to programs that process a stream of inputs, and whose behavior is fairly static and predictable. In particular, it is not applicable to programs with dynamic behavior in terms of execution time and size of data ....

....the input data sets. These characteristics can be determined experimentally or analytically. In related research, we have shown that 5 different program executions are enough to determine the main characteristics of a program with sufficient accuracy for the purpose of identifying a good mapping [13]. The mapping procedure assumes knowledge of program and machine characteristics. In many cases, a qualitative idea of the program is sufficient for making a mapping decision, and detailed measurements are not necessary. Computer vision, image processing and signal processing are examples of ....

SUBHLOK, J. Automatic mapping of task and data parallel programs for efficient execution on multicomputers. Tech. Rep. CMU-CS-93-212, School of Computer Science, Carnegie Mellon University, November 1993.


Communication and Memory Requirements as the Basis for.. - Jaspal Subhlok (1994)   (21 citations)  Self-citation (Subhlok)   (Correct)

....time. They can be determined experimentally or analytically. In related research, we have shown that a small number of program executions are sufficient to determine them with sufficient accuracy for the purpose of identifying a good mapping for the programs in the domain of our mapping framework [23]. The problem that we are solving in this paper is how to best use the available processors to maximize the throughput of the system. This problem is fundamentally different from the many partitioning problems addressed in the literature (e.g. 3, 17, 21] for several reasons: 1) the ....

....a mix of task and data parallelism, and that good mapping decisions can be based on well understood program properties. In related work, we have used this analysis to develop an automatic tool that obtains profile data from program executions, and uses it to automatically generate a good mapping [23]. Our experience is that real applications in signal and image processing often have constraints that cannot be captured easily by a fully automatic system, hence some user involvement in the mapping process is important (see [14] for a discussion of current directives) There are several reasons ....

SUBHLOK, J. Automatic mapping of task and data parallel programs for efficient execution on multicomputers. Tech. Rep. CMU-CS-93-212, School of Computer Science, Carnegie Mellon University, November 1993.


Task Parallel Programming in Fx - Subhlok, O'Hallaron, Gross (1994)   (11 citations)  Self-citation (Subhlok)   (Correct)

....(this collection is then called a module) but only one task subroutine is active at any given time on a single node. The programmer can control the mapping using directives. We are also developing tools to automatically choose an efficient mapping, and to assist the programmer in choosing one [Sub93] In this paper, we assume that the mapping process is driven by explicit directives, which may be provided by the programmer, or automatically generated by a tool. The compiler currently supports mappings that are fixed at compile time, since that is simpler and requires minimal support from ....

....the selection of a mapping is not a part of the core compiler, which assumes that the mapping information is provided by the user or another tool. The program characteristics that influence the choice of a mapping are discussed in [SOG 94] and an automatic mapping scheme is discussed in [Sub93] 4.2 Array section analysis The compiler has to precisely determine the data elements that must be communicated between task subroutines. The input and output directives determine the variables that a task subroutine call defines and uses. Since conditional statements are not permitted inside ....

J. Subhlok. Automatic mapping of task and data parallel programs for efficient execution on multicomputers. Technical Report CMU-CS-93-212, School of Computer Science, Carnegie Mellon University, November 1993.


The CMU Task Parallel Program Suite - Peter Dinda, Thomas Gross, David.. (1994)   (68 citations)  Self-citation (Subhlok)   (Correct)

....1. 1D fast Fourier transform. 2. 2D fast Fourier transform. 3. Narrowband tracking radar. 4. Multibaseline stereo. 5. Airshed simulation. We identified these applications in the course of developing an integrated model of task and data parallelism for the Fx parallelizing Fortran compiler [9, 19, 17, 18, 23] and have found them to be extremely helpful. Complete Fortran 77 sources of the programs are available from the authors. Each program is fewer than 500 lines of code. The Fortran 77 sources are useful for a number of reasons. First, the sources provide an unambiguous specification of each ....

Subhlok, J. Automatic mapping of task and data parallel programs for efficient execution on multicomputers. Tech. Rep. CMU-CS-93-212, School of Computer Science, Carnegie Mellon University, November 1993.


Task Parallelism in a High Performance Fortran Framework - Gross, Hallaron, Subhlok (1994)   (67 citations)  Self-citation (Subhlok)   (Correct)

....of performance. The directives that control mapping may be provided by the user, or generated by an automatic mapping tool. The situation is analogous to the data layout directives in HPF. The mapping process is discussed in more detail in [24] and an automatic mapping tool is discussed in [23]. Here we briefly discuss the basic mapping criterion, which is the same whether the mapping is done by hand or by an automatic tool. In our experience, the following three dimensions have the biggest impact on the quality of a mapping: Scalability: When a computation or a subroutine is not ....

....only slightly from 10 to 40 nodes. Replication is not applied to the colffts module, since it scales nearly linearly. Figure 5 shows the steps and the resulting mapping for a 64 node parallel system. The quantitative measurements used by our automatic tool to arrive at this mapping is discussed in [23]. colffts rowffts hist Clustering Replication Final mapping onto 64 nodes rowffts hist rowffts hist colffts rowffts hist rowffts hist Figure 5: Mapping steps and the final mapping of the example program In summary, a combined task and data parallel mapping is often needed to achieve the ....

[Article contains additional citation context not shown here]

SUBHLOK, J. Automatic mapping of task and data parallel programs for efficient execution on multicomputers. Tech. Rep. CMU-CS-93-212, School of Computer Science, Carnegie Mellon University, November 1993.


Final Report on Research in Parallel Computing.. - December Carnegie (1996)   (Correct)

No context found.

Subhlok, J. Automatic Mapping of Task and Data Parallel Programs for Efficient Execution on Multicomputers. Technical Report CMU-CS-93-212, School of Computer Science, Carnegie Mellon University, November, 1993.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC