| J. Subhlok, D. O'Hallaron, T. Gross, P. Dinda, and J. Webb. Communication and memory requirements as the basis for mapping task and data parallel programs. In Proceedings of Supercomputing '94, pages 330--339, Washington, DC, November 1994. |
....been one of the PROCESSORS SUB PROGRAM 1 EXECUTING PROCESSORS EXECUTING SUB PROGRAM 2 REDISTRIBUTION A[i] A[i] Figure 2: Need for Redistribution in MPMD Programs primary motivations for the work presented in this paper. More details on MPMD programs can be found in [8, 9, 16, 17, 18, 19]. 1.2 The Data Redistribution Problem Having motivated the need for redistribution, we can now formally define a redistribution R to be the set of routines that given an n dimensional array A on a set of source processors P s with source distribution D s , transfer all the elements of the ....
J. Subhlok, D. O'Halloran, T. Gross, P. A. Dinda, and J. Webb, "Communication and Memory Requirements as the Basis for Mapping Task and Data Parallel Programs," Tech. Rep. CMU-CS-94-106, Carnegie Mellon University, 1994.
....then the execution time is is the number of schedule graph nodes in the pipeline loop, and is the maximum number of processors that can be assigned. For the final step, we are investigating how known clustering techniques, e.g. reducing the makespan [2] or increasing throughput [11], apply to this environment. Determining the Pipeline Parameters The following is a simple analytical model that is used to illustrate how optimal parameters of a pipeline can be determined. The base case of the model assumes that we have a pipeline loop which consists of a sending node ....
J. Subhlok, D. R. O'Hallaron, T. Gross, P. A. Dinda, and J. Webb. Communication and memory requirements as the basis for mapping task and data parallel programs. In Proceedings of SuperComputing 1994.
.... Other papers describe the overall GrADS project approach [10] and results obtained in a related project focused on Grid execution of ScaLAPACK [26] Many other projects have addressed specific issues discussed in this project, including resource selection (e.g. 9, 11, 30] mapping (e.g. [31]) and process migration (e.g. 13, 25, 32] We do not claim innovation in these areas, but instead emphasize the merits of our overall architecture and, in particular, the techniques used to integrate adaptive mechanisms into the Cactus architecture. 7 Summary and Further Work We have ....
Subhlok, J., O'Hallaron, D., Gross, T., Dinda, P. and Webb, J. Communication and Memory Requirements as the Basis for Mapping Task and Data Parallel Programs. in Proceedings of Supercomputing '94, Washington, DC, 1994, 330-339.
....to implement a method efficiently on a specific parallel machine, and if so, which choice is the best parallel version, before an actual implementation is realized. The prediction of exact runtimes is also important for parallelizing compilers like aspar [22] paradigm [16] Crystal [27] or Fx [34]. Such compilers are usually based on a data parallel SPMD model in which the distribution of the program data among the processors of the machine plays a major role for the performance, because an inappropriate data distribution may incur a large communication overhead. In this situation, it is ....
....compilers, there is considerable research effort to build modeling tools because such tools are imperative to derive efficient implementations. The significant work includes the work related to the Fortran D compiler [4] the Paradigm compiler [5] the Suif compiler [2] the Fx compiler [34, 33], and the Vienna Fortran Compiler [11, 12] Other approaches include the use of petri nets [13] queuing networks, and Markov chains [35] The Fortran D compiler contains an interactive tool that allows the programmer to select regions of the sequential input program. The tool responds with a ....
J. Subhlok, D. O'Hallaron, T. Gross, P. Dinda, and J. Webb. Communication and memory Requirements as the basis for Mapping Task and Data Parallel Programs. In Proceedings of Supercomputing '94, 1994.
.... combination of task and data parallelism for a program at compile time for a given target architecture, and that it may be possible to do this automatically [121] Fx is targeted at applications that process a stream of input and whose computational structure is fairly static and predictable [122]. Hence, the range of applications is limited. In Fx, data parallelism is expressed through array syntax and parallel loops. Task parallelism is expressed through special code regions called parallel sections which consists of a loop which contains only calls to task subroutines. Input and ....
....dependencies between task subroutines. Task subroutines may also have data parallelism inside them. The central idea of Fx is to examine the scalability, memory requirements, and inter task communication costs to identify the use of task and data parallelism which provides the best performance [122]. It is assumed that task subroutines operate in a pipelined fashion (i.e. data may flow from one task subroutine to the next) and are already partitioned by the data parallel compiler to run on an arbitrary number of processors (i.e. Fx may decide how many processors should be assigned to a ....
J. Subhlok, D. O'Hallaron, T. Gross, P. Dinda, and J. Webb, Communication and Memory Requirements as the Basis for Mapping Task and Data Parallel Programs, in Proc. of Supercomputing '94, Washington, DC, November 1994, pp. 330--339. WWW URL is http://www.cs.cmu.edu.
....compilers, there is considerable research effort to build modeling tools because such tools are imperative to derive efficient implementations. The significant work includes the work related to the Fortran D compiler [1] the Paradigm compiler [2] the Suif compiler [3] and the Fx compiler [4]. These approaches differ from our model not only in the way how they exploit the mixed parallelism but also in the cost model. Modeling the runtimes of communication operation with parametrized formulas is considered in [5, 6, 7] Optimal algorithms for several communication operations and their ....
....lead to an a priori estimation of the prospective gain of a parallel implementation. Thus, the programmer can decide on the benefits of a parallel implementation before actually performing it. An important application area for analytical performance prediction mechanisms are compilers for DMMs [21, 4, 22, 23]. These compilers assist the programmer in generating an efficient parallel implementation. To make suggestions for the design of a parallel implementation, e.g. a selection of a data distribution, the compiler has to have access to a powerful performance prediction tool to estimate the effects ....
J. Subhlok, D. O'Hallaron, T. Gross, P. Dinda, and J. Webb. Communication and memory Requirements as the basis for Mapping Task and Data Parallel Programs. In Proceedings of Supercomputing '94, 1994.
....dependence supported. Some issues on the parallelization of recursive code (some of our benchmarks are recursive) have been addressed in [83, 84] Additional information on the simultaneous exploitation of task (functional) and data (loop) parallelism on message passing systems can be found in [85, 86, 87]. A performance evaluation of various alternatives for the implementation of thread management on shared memory multiprocessors is presented in [88] The study covers implementations issues such as queue organization and lock management. The performance analysis includes both analytical models ....
J. Subhlok, D. R. O'Hallaron, T. Gross, P. A. Dinda, and J. Webb, "Communication and memory requirements as the basis for mapping task and data parallel programs," Tech. Rep. CMU-CS-94-106 R, School of Computer Science, Carnegie Mellon University, August 1994.
....methods of [23, 27, 28] then use clustering on the nodes to form larger nodes during the construction of a schedule. The second approach to allocation and scheduling is a top down approach like the ones used by Prasanna and Agarwal in [24] Belkhale and Banerjee in [12, 13] by Subhlok et al. in [17, 29, 30], by Ramaswamy and Banerjee in [14, 15] and in this paper. Top down approaches start with the assumption of heavyweight nodes (again, in terms of computation requirements) in the MDG and break them down during the process of constructing an optimal schedule. Top down methods are considered better ....
....processing cost model used. We do not make any assumptions for our MDGs and use very realistic cost models. The work in [12, 13] also does not consider the effects of non zero data transfer costs. Their allocation and scheduling algorithms are similar to the ones we use. The research presented in [17, 29, 30] considers allocation and scheduling for a class of problems that process continuous streams of data sets. The computation for each data set has a tree structured MDG for all their benchmark programs [31] A set of heuristics are used to decide on a good allocation and scheduling scheme. There is ....
[Article contains additional citation context not shown here]
J. Subhlok, D. O'Halloran, T. Gross, P. A. Dinda, and J. Webb, "Communication and Memory Requirements as the Basis for Mapping Task and Data Parallel Programs," Tech. Rep. CMUCS -94-106, Carnegie Mellon University, 1994.
....knowlege can be costly in a large distributed system. The impact of varying levels of global knowledge on the quality of load balancing decisions by individual processors was studied in [CasKuh87] A dynamic load balancing approach which combines task and data load balancing approaches is in [SuHa94]. This load balances at compile time using estimations of execution time augmented with run time profiles of previous runs. The RVI load balancing makes use of a combination of data, computation, and task approaches to dynamic load balancing with run time decision making using only local ....
Subhlok J, O'Halloran, et. al., "Communication and Memory Requirements as the Basis for Mapping Task and Data Parallel Programs", Proceedings Supercomputing 1994, Pages 330-339, IEEE Computer Society Press.
....operation on another rope over another or the same domain subset. Thus both task and data parallelism are supported. 5 Example: Narrow band Tracking Radar We study the application of our model of parallelism to the narrow band tracking radar problem in the task parallel suite available from CMU [5, 21]. The suite program was in Fortran 77 and we translated it to C and to our programming model. The narrow band tracking radar benchmark is used to measure the effectiveness of various multicomputers for radar applications. The problem is interesting for studying combinations of task and data ....
Jaspal Subhlok, David O'Hallaron, Thomas Gross, Peter Dinda, and John Webb. Communication and Memory Requirements as the Basis for Mapping Task and Data-Parallel Programs. In Proceedings of Supercomputing '94, Washington D.C., November 1994.
....is implemented as a C library, everything is handled at runtime. Clearly this is often inefficient. Moreover, it is limited to sparse matrix structures and in the operations that can be applied on these matrices. Other languages allow explicit declaration of task parallelism [26, 29, 8] In Fx [29, 45], HPF is extended with language constructs to define tasks explicitly. The communication between the tasks is handled by the compiler, but the user must still specify on which processor each task must be executed. Similarly, in Fortran M [26, 27] HPF is extended with language constructs that are ....
J. Subhlok, D. O'Hallaron, T. Gross, P. Dinda, J. Webb, "Communication and memory requirements as the basis for mapping task and data parallel programs," In Proc. Supercomputing '94, Washington, DC, Nov. 1994, pp. 330-339.
....and repartionable machine (PASM) Library based parallel systems were discussed in [8, 15] Our work differs from the above in exploiting both the task and data parallelism on distributed memory architectures. Scheduling task and data parallelism has recently been studied for program compilation [3, 13, 18, 19, 24]. The optimization function in [18, 19] is for maximizing the throughput with fixed task parameters. The work in [3, 13] deals with DAGs of fixed data distribution and processor parameters, but not with graphs with loops. In [5] techniques for optimizing the data distribution are presented for ....
....parallel systems were discussed in [8, 15] Our work differs from the above in exploiting both the task and data parallelism on distributed memory architectures. Scheduling task and data parallelism has recently been studied for program compilation [3, 13, 18, 19, 24] The optimization function in [18, 19] is for maximizing the throughput with fixed task parameters. The work in [3, 13] deals with DAGs of fixed data distribution and processor parameters, but not with graphs with loops. In [5] techniques for optimizing the data distribution are presented for nested loops, which can be viewed as the ....
J. Subhlok, D. O'Hallaron, T. Gross, P. Dinda, and J. Webb, Communication and memory requirements as the basis for mapping task and data parallel programs. In Proc. of Supercomputing '94, pp. 330-339.
No context found.
SUBHLOK, J., O'HALLARON, D., GROSS, T., DINDA, P., AND WEBB, J. Communication and memory requirements as the basis for mapping task and data parallel programs. In Proc. Supercomputing '94 (Washington, DC, November 1994), pp. 330--339.
....and task (or function) parallel computing. Compiler and runtime support for task and data parallel computing is an active area of research, and several solutions have been proposed [4, 5, 9, 10, 19, 21] Recent research has also examined the benefits of mixed task and data parallel programming [3, 7, 18]. This paper specifically addresses the mapping of applications composed of a linear chain of data parallel tasks that act on a stream of input data sets. In this model, each task repeatedly receives input from its predecessor task, performs its computation, and sends the output to its successor ....
SUBHLOK, J., O'HALLARON, D., GROSS, T., DINDA, P., AND WEBB, J. Communication and memory requirements as the basis for mapping task and data parallel programs. In Proceedings of Supercomputing '94 (Washington, DC, November 1994), pp. 330--339.
....and task (or function) parallel computing. Compiler and runtime support for task and data parallel computing is an active area of research and several solutions have been proposed [2, 3, 7, 8, 13] Recent research has also examined the tradeoffs involved in mapping task and data parallel programs [5, 12]. In this paper we address the problem of optimizing the throughputof task and data parallel programs. We address applications composed of a linear chain of data parallel tasks that act on a stream of input data sets. Each task repeatedly receives input from its predecessor, performs its ....
....the Fx compiler to generate mappings for several task and data parallel programs. In Table 2 we present results from FFT Hist and two other applications; narrowband tracking radar and multibaseline stereo [6] The properties of the latter two programs that influence their mapping are discussed in [12]. Table 2 compares the predicted optimal throughput, correspondingmeasured throughput,and the measured throughput for a simple data parallel mapping. Even when all modules are rectangular, it may not be possible to map all of them together due to machine or compiler constraints discussed earlier. ....
SUBHLOK, J., O'HALLARON, D., GROSS, T., DINDA, P., AND WEBB, J. Communication and memory requirements as the basis for mapping task and data parallel programs. In Supercomputing '94 (Washington, DC, November 1994), pp. 330-- 339.
....data and task (or function) parallel computing. Compiler and runtime support for task and data parallel computing is an active area of research, and several solutions have been proposed [3, 4, 8, 9, 14] Recent research has also examined the benefits of mixed task and data parallel programming [2, 6, 13]. This paper specifically addresses the mapping of applications composed of a linear chain of data parallel tasks that act on a stream of input data sets. Each task repeatedly receives input from its predecessor task, performs its computation, and sends the output to its successor task. The first ....
....the mapping tool and the Fx compiler to generate mappings for several task and data parallel programs. In Table 1 we present results from FFT Hist and two other applications; narrowband tracking radar and multibaseline stereo [7] The relevant properties of the latter two programs are discussed in [13]. Table 1 shows the optimal latency obtained with and without a throughput constraint. We observe that adding a throughput constraint can lead to a mapping with a significantly higher latency. The sole exception is the radar program, for which the latency remains unchanged. The reason is that the ....
Subhlok, J., O'Hallaron, D., Gross, T., Dinda, P., and Webb, J. Communication and memory requirements as the basis for mapping task and data parallel programs. In Supercomputing '94 (Washington, DC, November 1994), pp. 330--339.
.... distributions in an arbitrary number of array dimensions [13, 14] an index permutation intrinsic, and a parallel DO loop that is integrated with arbitrary user defined associative reduction operators [19] Fx also provides a mechanism for mixing task and data parallelism in the same program [5, 17, 16]. The initial target was the Intel iWarp. Fx was later ported to the IBM SP 2, the Intel Paragon, and workstation clusters. Much of the early work on Fx was driven by the 2D fast Fourier transform (FFT) and algorithms for 1 Although the first meeting of the HPF Forum was not until January 1992, ....
....improve the performance of applications with functions that do not scale well. For example, using a mix of task and data parallelism doubled the throughput (compared to the most efficient data parallel code) of the 240 Theta 256 Fx STEREO program so that it was able to run in real time [16]. Since HPF does not currently support task parallelism, there is the risk that HPF sensor based codes with smaller data sets will not run efficiently. This puts additional pressure on HPF developers to maximize the loop, reduction, and permutation efficiencies identified earlier. 8. Relation to ....
SUBHLOK, J., O'HALLARON, D., GROSS, T., DINDA, P., AND WEBB, J. Communication and memory requirements as the basis for mapping task and data parallel programs. In Proc. Supercomputing '94 (Washington, DC, Nov. 1994), pp. 330--339.
No context found.
J. Subhlok, D. O'Hallaron, T. Gross, P. Dinda, and J. Webb. Communication and memory requirements as the basis for mapping task and data parallel programs. In Proceedings of Supercomputing '94, pages 330--339, Washington, DC, November 1994.
No context found.
Subhlok, J., O'Hallaron, D., Gross, T., Dinda, P., Webb, J.: Communication and memory requirements as the basis for mapping task and data parallel programs. In: Proceedings of Supercomputing '94, Washington, DC (1994) 330--339
No context found.
Subhlok, J., O'Hallaron, D., Gross, T., Dinda, P., Webb, J.: Communication and memory requirements as the basis for mapping task and data parallel programs. In: Proceedings of Supercomputing '94, Washington, DC (1994) 330--339 Anne Benoit et al.
No context found.
Jaspal Subhlok, David O'Hallaron, Thomas Gross, Peter A. Dinda, and Jon Webb. Communication and memory requirements as the basis for mapping task and data parallel programs. IEEE Proceedings of Supercomputing, 1994.
No context found.
J. Subhlok, D. O'Halloran, T. Gross, P. Dinda, and J. Webb, "Communication and Memory Requirements as the Basis for Mapping Task and Data Parallel Programs," in Proceedings of Supercomputing '94, Washington D.C., Nov. 1994, pp. 330--339.
No context found.
Subhlok, J., O'Hallaron, D., Gross, T., Dinda, P. and Webb, J. "Communication and Memory Requirements as the Basis for Mapping Task and Data Parallel Programs" Supercomputing `94, November 1994.
No context found.
J. Subhlok, D. O'Hallaron, T. Gross, P. Dinda, J. Webb "Communication and memory requirements as the basis for mapping task and data parallel programs." Proc. Supercomputing '94, Washington, DC, Nov. 1994, pp. 330-339.
No context found.
Subhlok, J., D. O'Hallaron, T. Gross, P. Dinda, and J. Webb. Communication and Memory Requirements as the Basis for Mapping Task and Data Parallel Programs . In Supercomputing '94, pages 330-339. Washington, DC, November, 1994.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC