| S. Chakrabarti, J. Demmel, and K. Yelick. Modeling the benefits of mixed data and task parallelism. In Symposium on Parallel Algorithms and Architectures (SPAA), July 1995. http://HTTP.CS.Berkeley.EDU/~yelick/soumen/mixed-spaa95.ps. |
....p=32 linear p=32 extended Figure 8. Runtimes of the different execution schemes of the extrapolation method with 4 different stepsizes for # ##and # ##processors (top) and for # ###and # ###processors (bottom) of a Cray T3E. 2, 19] for an overview of systems and approaches and see [4] for a detailed investigation of the benefits of combining task and data parallel executions. Most closely related to our work concerning the parallel programming model are approaches which combine multiprocessor task and data parallelism. Several models support the programmer in writing ....
S. Chakrabarti, J. Demmel, and Yelick K. Modeling the benefits of mixed data and task parallelism. In Symposium on Parallel Algorithms and Architecture (SPAA), pages 74--83, 1995.
....the tree is well balanced among the processors, then the implementation should still achieve 85 efficiency even in the worst case of bad distribution of deflations. However, it is worth considering this problem for our implementation. A possible issue is dynamic splitting versus static splitting [6]. A task list is used to keep track of the various parts of the matrix during the decomposition process and make use of data and task parallelism. This approach has been investigated for the parallel implementation of the spectral divide and conquer algorithm for the unsymmetric eigenvalue ....
Soumen Chakrabarti, James Demmel, and Katherine Yelick. Modeling the benefits of mixed data and task parallelism. Technical Report CS-95-289, Department of Computer Science, University of Tennessee, Knoxville, TN, USA, May 1995. LAPACK Working Note 97.
....language framework to aid in analysis and new language constructs to obviate the need for some analyses. For the optimization problem, we will use a combination of static and dynamic performance information. 5 We have extensive experience using analytical models to optimize parallel programs [18, 93, 56, 113]. Some compiler projects also use performance models to help choose tile sizes in automatic cache optimizations, for example. These models are useful for eliminating bad algorithmic choices, and to a lesser extent in choosing the best one. The models are limited by their accuracy of reflecting ....
....level that is sensible to the programmer, either by creating a high level model or by refining an analytical one supplied in advance. The information in performance models will be used by the runtime system and operating system to make decisions about degree of parallelism that should be used [18], whether to load balance computation [19] or whether to move entire jobs to other clusters. In Titanium we will adapt our language and compiler analyses in two ways to address memory hierarchy optimizations on array based programs: 1) through the type system, the compiler will prove there are ....
S. Chakrabarti, J. Demmel, and K. Yelick. Modeling the benefits of mixed data and task parallelism. In Symposium on Parallel Algorithms and Architectures (SPAA), Santa Barbara, California, July 1995.
....the tasks. Efficient dynamic scheduling algorithms for different networks are studied by Feldmann, Sgall, and Teng in [10] A good overview of theoretical work in this area can be found in [35] Practical approaches for the scheduling of M tasks are presented by Chakrabarti, Demmel, and Yelick in [3] where an upper bound on benefits of mixed (data and task) parallelism for divide and conquer task trees and for irregular task graphs are given. 7 Conclusions Many algorithms from the area of scientific computing exhibit two levels of parallelism, expressed as an upper level of M tasks each ....
S. Chakrabarti, J. Demmel, and Yelick K. Modeling the benefits of mixed data and task parallelism. In Symposium on Parallel Algorithms and Architecture (SPAA), pages 74--83, 1995.
....faster set of processes (those that experience deflation) will have to wait for the other set of processes before beginning the next merge. This reduces the speed up gained though the use of the tree. A possible issue concerning load balancing the work is dynamic splitting versus static splitting [6]. In dynamic splitting, a task list is used to keep track of the various parts of the matrix during the decomposition process and to make use of data and task parallelism. This approach has been investigated 1 for the parallel implementation of the spectral divide and conquer algorithm for the ....
S. Chakrabarti, J. Demmel, and K. Yelick, Modeling the Benefits of Mixed Data and Task Parallelism, Technical Report CS-95-289, Department of Computer Science, University of Tennessee, Knoxville, TN, 1995. LAPACK Working Note 97.
....and task (or function) parallel computing. Compiler and runtime support for task and data parallel computing is an active area of research, and several solutions have been proposed [4, 5, 9, 10, 19, 21] Recent research has also examined the benefits of mixed task and data parallel programming [3, 7, 18]. This paper specifically addresses the mapping of applications composed of a linear chain of data parallel tasks that act on a stream of input data sets. In this model, each task repeatedly receives input from its predecessor task, performs its computation, and sends the output to its successor ....
CHAKRABARTI, S., DEMMEL, J., AND YELICK, K. Modeling the benefits of mixed data and task parallelism. In Seventh Annual ACM Symposium on Parallel Algorithms and Architectures (Santa Barbara, CA, July 1995). 16
.... This method was designed to work well on parallel computers, offering both task and data parallelism [46] Efficient parallel implementations are not straightforward to program, and the decision to switch from task to data parallelism depends on the characteristics of the underlying machine [17]. Due to such complications, all the currently available parallel software libraries, such as ScaLAPACK [22] and PeIGS [52] use algorithms based on bisection and inverse iteration. A drawback of the current divide and conquer software in LAPACK is that it needs extra workspace of more than 2n 2 ....
S. Chakrabarti, J. Demmel, and K. Yelick. Modeling the benefits of mixed data and task parallelism. In Symposium on Parallel Algorithms and Architectures (SPAA), Santa Barbara, California, july 1995.
....threads of control (processes or tasks) that can synchronize and communicate in arbitrary ways. Recently, interest has arisen in integrating task and This research is supported in part by a PIONIER grant from the Netherlands Organization for Scientific Research (N.W.O. data parallelism [3, 5, 7, 9, 11, 12, 18, 19]. Such integration offers several advantages. First, programmers can use a single language to write either a data parallel or a task parallel program, whichever is most suitable for the application at hand. Second, and even more important, many applications can exploit both types of parallelism in ....
S. Chakrabati, J. Demmel, and K. Yelick. Modeling the benefits of mixed data and task parallelism. In ACM Symp. on Parallel Algorithms and Architectures, 1995.
....objects, data access patterns. Data dependence graph (DDG) Dependencecomplete task graph Iterative asynchronous Task assignments, data object owners schedules and execution Figure 1: The stages of run time parallelization in RAPID. T[8] T[8,9] T[8,9] T[8,9] 0 3 4 7 10 11 12 1 2 5 6 8 9 (a) T[1] T[3] T[4] T[7] T[2] T[1,6] T[1,10] T[3,8] T[3,9] T[3,10] T[4,8] T[5,8] T[5,9] T[5,10] T[7,8] T[7,10] T[5] Proc0 Proc1 Proc0 Proc1 (b) c) T[3] T[5] T[7] T[4] T[2] T[3,8] T[4,8] T[5,8] T[1,6] T[1,10] T[7,8] T[8] T[7,10] T[1] T[3,9] T[5,9] T[3] T[5] T[7] T[4] T[2] T[3,8] T[4,8] T[5,8] T[7,8] T[8] ....
....dependence graph (DDG) Dependencecomplete task graph Iterative asynchronous Task assignments, data object owners schedules and execution Figure 1: The stages of run time parallelization in RAPID. T[8] T[8,9] T[8,9] T[8,9] 0 3 4 7 10 11 12 1 2 5 6 8 9 (a) T[1] T[3] T[4] T[7] T[2] T[1,6] T[1,10] T[3,8] T[3,9] T[3,10] T[4,8] T[5,8] T[5,9] T[5,10] T[7,8] T[7,10] T[5] Proc0 Proc1 Proc0 Proc1 (b) c) T[3] T[5] T[7] T[4] T[2] T[3,8] T[4,8] T[5,8] T[1,6] T[1,10] T[7,8] T[8] T[7,10] T[1] T[3,9] T[5,9] T[3] T[5] T[7] T[4] T[2] T[3,8] T[4,8] T[5,8] T[7,8] T[8] T[1,10] T[3,10] T[5,10] T[7,10] T[1] T[3,9] ....
[Article contains additional citation context not shown here]
S. Chakrabarti, J. Demmel, and K. Yelick. Modeling the Benefits of Mixed Data and Task Parallelism. In Proceedings of 7th ACM Symposium on Parallel Algorithms and Architectures, pages 74--83, July 1995.
....following functional form: S(n) 1 Gamma n ) 1 Gamma ) with 0 1. The motivation for this model is to facilitate analysis. Again, the parameter has no semantic content. No prior study has demonstrated that a proposed model describes the behavior of real programs. Chakrabarti et al. [2] propose a model for efficiency of data parallel tasks; they use measurements of ScaLAPACK programs to validate this model. Many of the allocation strategies that have been proposed for malleable jobs assume that the scheduler knows the average parallelism of all jobs [16] 8] 15] 17] 3] 12] ....
Soumen Chakrabarti, James Demmel, and Katherine Yelick. Modeling the benefits of mixed data and task parallelism. In Seventh Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA '95), July 1995.
....it tackles many of the same problems, namely run time adaptation to changing data layouts, use of sequential code to improve efficiency, and minimizing the overhead of parallel code. Additionally, Chakrabarti and others have analyzed the theoretical benefits of mixed data and control parallelism [12]. They conclude that best results are obtained when communication is slow or when there is a large number of processors, and that a single switch between data and control parallelism can achieve most of the benefits of a more general model. Most work on parallel divide and conquer algorithms has ....
S. Chakrabarti, J. Demmel, andK. Yelick. Modeling the benefits of mixed data and task parallelism. In Proceedings of ACM Symposium on Parallel Algorithms and Architectures, July 1995.
....data and task (or function) parallel computing. Compiler and runtime support for task and data parallel computing is an active area of research, and several solutions have been proposed [3, 4, 8, 9, 14] Recent research has also examined the benefits of mixed task and data parallel programming [2, 6, 13]. This paper specifically addresses the mapping of applications composed of a linear chain of data parallel tasks that act on a stream of input data sets. Each task repeatedly receives input from its predecessor task, performs its computation, and sends the output to its successor task. The first ....
Chakrabarti, S., Demmel, J., and Yelick, K. Modeling the benefits of mixed data and task parallelism. In Seventh Annual ACM Symposium on Parallel Algorithms and Architectures (Santa Barbara, CA, July 1995).
....the tree is well balanced among the processors, then the implementation should still achieve 85 efficiency even in the worst case of bad distribution of deflations. However, it is worth considering this problem for our implementation. A possible issue is dynamic splitting versus static splitting [6]. A task list is used to keep track of the various parts of the matrix during the decomposition process and make use of data and task parallelism. This approach has been investigated 1 for the parallel implementation of the spectral divide and conquer algorithm for the unsymmetric eigenvalue ....
Soumen Chakrabarti, James Demmel, and Katherine Yelick. Modeling the benefits of mixed data and task parallelism. Technical Report CS-95-289, Department of Computer Science, University of Tennessee, Knoxville, TN, USA, May 1995. LAPACK Working Note 97.
....and repartionable machine (PASM) Library based parallel systems were discussed in [8, 15] Our work differs from the above in exploiting both the task and data parallelism on distributed memory architectures. Scheduling task and data parallelism has recently been studied for program compilation [3, 13, 18, 19, 24]. The optimization function in [18, 19] is for maximizing the throughput with fixed task parameters. The work in [3, 13] deals with DAGs of fixed data distribution and processor parameters, but not with graphs with loops. In [5] techniques for optimizing the data distribution are presented for ....
....both the task and data parallelism on distributed memory architectures. Scheduling task and data parallelism has recently been studied for program compilation [3, 13, 18, 19, 24] The optimization function in [18, 19] is for maximizing the throughput with fixed task parameters. The work in [3, 13] deals with DAGs of fixed data distribution and processor parameters, but not with graphs with loops. In [5] techniques for optimizing the data distribution are presented for nested loops, which can be viewed as the optimization for one macro task. In contrast, our current scheduling scheme deals ....
S. Chakrabarti, J. Demmel, and K. Yelick, Modeling the Benefits of Mixed Data and Task Parallelism, To appear in Proc. of ACM SPAA, Santa Barabra, 1995.
....called Data Parallelism. Of course, it is possible to use a combination of these strategies for optimal scheduling, and such a strategy is referred to as Mixed Parallelism. Several researchers have worked on exploiting mixed parallelism, both in theory [BB90, FST92, LT94, TWY92] and in practice [CDY95, Cha91, RSB94, SSOG93] In a number of problems, all the tasks may not be known in advance but may be generated dynamically as existing tasks are processed. This is the case with problems whose efficient solutions use the divide and conquer strategy. The execution of an instance of such a problem ....
....in the number of processors. CHAPTER 6. CONCATENATED PARALLELISM 79 If the time required to divide the subtasks is significantly higher than the cost of redistribution, communication time due to allocation of the subtasks can be ignored. Such an assumption is often made in the literature [CDY95] Unfortunately, it is not valid for several important problems, which include quicksort, quickhull, construction of quad octrees and multidimensional binary search trees. In this chapter, we propose a new strategy called Concatenated Parallelism for efficient solution of problems resulting in ....
S. Chakrabarti, J. Demmel, and K. Yelick. Modeling the benefits of mixed data and task parallelism. Technical report, University of California, Berkeley, 1995.
....programming tools such as RAPID. It is still challenging to develop a fully automatic system. In the future, it is interesting to study automatic generation of coarsegrained DAGs from sequential code [Cosnard and Loi 1995] extend our results for more complicated dependence structures [Chakrabarti et al. 1995; Girkar and Polychronopoulos 1992; Ramaswamy et al. 1994] and investigate use of the proposed techniques in performance engineered parallel systems [DARPA 1998] While massively parallel distributed memory machines will still be valuable for high end large scale application problems in the ....
Chakrabarti, S., Demmel, J., and Yelick, K. 1995. Modeling the Benefits of Mixed Data and Task Parallelism. In Proceedings of 7th ACM Symposium on Parallel Algorithms and Architectures. 74--83.
....the other is called Data Parallelism. Of course, it is possible to use a combination of these strategies for optimal scheduling, and such a strategy is referred to as Mixed Parallelism. Several researchers have worked on exploiting mixed parallelism, both in theory [3, 6, 11, 17] and in practice [4, 5, 13, 16]. In a number of problems, all the tasks may not be known in advance but may be generated dynamically as existing tasks are processed. This is the case with problems whose efficient solutions use the divide and conquer strategy. The execution of an instance of such a problem can be represented by ....
....parallel algorithm often decreases with increase in the number of processors. If the time required to divide the subtasks is significantly higher than the cost of redistribution, communication time due to allocation of the subtasks can be ignored. Such an assumption is often made in the literature [4]. Unfortunately, it is not valid for several important problems, which include quicksort, quickhull, construction of quad octrees and multidimensional binary search trees. In this paper, we propose a new strategy called Concatenated Parallelism for efficient solution of problems resulting in ....
S. Chakrabarti, J. Demmel and K. Yelick, Modeling the benefits of mixed data and task parallelism, Computer Science Division, University of California, Berkeley.
No context found.
S. Chakrabarti, J. Demmel, and K. Yelick, "Modeling the benefits of mixed data and task parallelism", Symposium on Parallel Algorithms and Architectures, 1995.
No context found.
S. Chakrabarti, J. Demmel, and K. Yelick, "Modeling the benefits of mixed data and task parallelism", Symposium on Parallel Algorithms and Architectures, 1995.
....to be scheduled dynamically, and therefore cannot use the expensive optimization algorithms, such as linear programming, that are used in compile time scheduling. In Chapter 3 we will describe a simple and effective heuristic for scheduling divide and conquer applications with mixed parallelism [26]. The algorithm classifies the tasks into two types. The large problems near the root are allocated all the processors in turn, while small problems close to the leaves are packed in a task parallel fashion, each task being assigned exactly one processor. There is some internal frontier at which ....
....There is also task parallelism between unordered nodes. Recently algorithms have been designed for trading between locality and load balance in this scenario [33] We will come back to similar problems in Chapter 4. The switching technique has been independently discovered after our paper [26] was published in a different context: that of scheduling tasks with penalties. Every task has a running time, and a penalty for rejection; the goal is to minimize the sum of the makespan of accepted tasks and the penalty of rejected tasks [15] While their main result is an on line algorithm for ....
S. Chakrabarti, J. Demmel, and K. Yelick. Modeling the benefits of mixed data and task parallelism. In Symposium on Parallel Algorithms and Architectures (SPAA). ACM, 1995.
....HQR algorithm in the last paragraph (matrix inversion, matrix multiply and QR factorization) but also requires more floating point operations; it remains to be seen for which problems and on which machines which algorithm is faster. The sign function also entails a dynamic load balancing scheme [7] to implement its divide and conquer approach most efficiently. 3.7 Singular Value Decomposition Let A be a general real m by n matrix. The singular value decomposition (SVD) of A is the factorization A = USV T , where U and V are orthogonal, and S = diag(oe 1 ; oe r ) r = min(m; n) ....
S. Chakrabarti, J. Demmel, and K. Yelick. Modeling the benefits of mixed data and task parallelism. In Symposium on Parallel Algorithms and Architectures (SPAA), July 1995.
No context found.
S. Chakrabarti, J. Demmel, and K. Yelick. Modeling the benefits of mixed data and task parallelism. In Symposium on Parallel Algorithms and Architectures (SPAA), July 1995. http://HTTP.CS.Berkeley.EDU/~yelick/soumen/mixed-spaa95.ps.
No context found.
S. Chakrabarti, J. Demmel, and Yelick K. Modeling the benefits of mixed data and task parallelism. In Symposium on Parallel Algorithms and Architecture (SPAA), pages 74--83, 1995.
No context found.
Soumen Chakrabarti, James Demmel, and Katherine Yelick. Modeling the benefits of mixed data and task parallelism. In Proceedings of the 7th Annual ACM Symposium on Parallel Algorithms and Architectures, July 1995.
No context found.
Chakrabarti, S., Demmel, J., Yelick, K., Modeling the Benefits of Mixed Data and Task Parallelism, Seventh Annual ACM Symposium on Parallel Algorithms and Architectures, July 17-19, 1995, UC Santa Barbara, CA.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC