| Minos Garofalakis and Yannis Ioannidis. Multi-dimensional resource scheduling for parallel queries. In SIGMOD Int. Conf. on Management of Data. ACM, 1996. |
....in [1] where the execution time of a pipeline participant is observed as a function of the number of processors assigned to it. However, for large join queries, it is rather expected that multiple processes compete for the same processor. Pipelined and bushy parallelism [3] is studied in [1, 4, 2, 6], where the communication across logical links between interacting processes is also considered. However, the network configuration affects the selection of the physical channels to connect two communicating processors. A router may even dynamically choose different routes between the same ....
Minos Garofalakis and Yannis Ioannidis. Multi-dimensional resource scheduling for parallel queries. In SIGMOD Int. Conf. on Management of Data. ACM, 1996.
....is essentially the best possible unless NP=ZPP. In addition to their theoretical importance, these problems have several applications such as load balancing, cutting stock, and resource allocation, to name a few. One of our motivations for studying these problems comes from recent interest [12, 13, 14] in multi dimensional resource scheduling problems in parallel query optimization. A favored architecture for parallel databases is the so called shared nothing environment [5] where the parallel system consists of a set of independent processing units each of which has a set of time sharable ....
....by a single aggregate work measure. This simplification is done typically to reduce the complexity of the scheduling problem. However, for large task systems that are typically encountered in database applications, ignoring the multi dimensionality could lead to bad performance. The work in [11, 12, 13, 14] demonstrates the practical effectiveness of the multi dimensional approach. One of the basic resource scheduling problem that is considered in the above papers is the problem of scheduling d dimensional vectors (tasks) on d dimensional bins (machines) to minimize the maximum load on any dimension ....
Minos N. Garofalakis and Yannis E. Ioannidis. Multi-dimensional resource scheduling for parallel queries. In Proceedings of the
....2. Related Work Parallel query processing has been studied in a large variety of facets, see e.g. PMC 90, DG92, HS93, WFA95, Gra95] Most of related work in this eld concentrated on possibilities to speedup highly complex queries with long running times. Approaches as taken in [HM94] and [GI96] suggest a decomposition of the query plans into sub plans which are then executed in parallel on di erent nodes of the parallel processing environment. The granularity of this decomposition varies and can be as ne as parallelizing single operator as studied for example in [SD89, SD90, WFA95] ....
....plans into sub plans which are then executed in parallel on di erent nodes of the parallel processing environment. The granularity of this decomposition varies and can be as ne as parallelizing single operator as studied for example in [SD89, SD90, WFA95] but is often chosen coarser [HM95, CHM95, GI96, GI97] 3. Architecture 3 These approaches have in common that they require communication between single nodes for shipping or exchanging partial results. This causes network contention and synchronization e ects where nodes have to wait for others to complete their tasks rst. As a result, a ....
M. N. Garofalakis and Y. E. Ioannidis. Multi-dimensional Resource Scheduling for Parallel Queries. In Proc. of the ACM SIGMOD Int'l. Conf. on Management of Data, pages 365-376, Montreal, Canada, June 1996.
....achieve the best possible parallelization of a given sequential plan [8, 9, 13, 3, 10] A common approach is to incorporate many features of the target architecture in the cost model, e.g. communication costs or hardware description. Based on this information a static parallel schedule is derived [8, 4]. However, from a validation point of view increasing the number features considered during optimization is dangerous. The prime reason is that small errors in the estimates propagate through a QEP. Such estimation errors turn out to be exponential [12] and lead to suboptimal parallel schedules. ....
....skew handling as well as intra operator parallelism. Chekuri et al. develop a more general treatment and allow for arbitrary query plans using the same cost model [2] Again, skew is not considered. Garofalakis and Ioannidis discuss a richer cost model and focus on shared nothing architectures [4, 5]. Their scheduling heuristics are also based on the assumptions that no skew affects the execution. Lo et al. study constraint processor allocation for pipelined hash joins [13, 3] and extend this approach in [10] to scheduling of separate pipelining segments. For their simulation model, they ....
M. N. Garofalakis and Y. E. Ioannidis. Multi-dimensional Resource Scheduling for Parallel Queries. In Proc. ACM SIGMOD Int'l. Conf., Montreal, Canada, June 1996.
....time) or a subgraph (for the global time) The last page time is the time, when the operator terminates its processing or the execution of the plan has been completed. This model has first been introduced by Ganguly et al. 29] see subsection 3. 2) and has been reused by Ioannidis et al. later [52]. The original definition does not yet introduce communication costs. In our framework communication costs are modeled with the help of special communication operators, which associated local cost represents the communication cost [53] 6 Experimental validation The goal of this experimental ....
M. N. Garofalakis and Y. E. Ioannidis. Multi-dimensional Resource Scheduling for Parallel Queries. In Proceedings of the ACM SIGMOD International Conference of Managment of Data, pages 365--376, Montral, Canada, June 1996.
....the operator ordering problem which is common to both, the one phase and two phase strategy. It is the most important problem to be solved in parallel query optimization, as all other problems depend on it. Parallel scheduling is addressed elsewhere and good solutions have yet been proposed (see [14]) The estimated cost associated to one ordering and the determination of the degree of intra operator parallelism will be concretized in the experimental context for a sample parallel machine and database schema. For the general methodologies we can make abstraction of it. Focus on parallel ....
M. N. Garofalakis and Y. E. Ioannidis. Multi-dimensional Resource Scheduling for Parallel Queries. In Proceedings of the ACM SIGMOD International Conference of Managment of Data, pages 365--376, Montral, Canada, June 1996.
....for parallel execution. In this paper we will discuss some of the ways in which queries can be optimized for parallel execution. We will look at various cost models, search algorithms and methods of generating query execution plans (QEPs) GANG92] LU91] resource allocation techniques [GAR96], NIC93] TAN93] and ways of dealing with loadimbalance [LuTAN94] We will focus on some of the major issues associated with parallel databases and how well these algorithms address them. Search Strategies for Finding Optimal QEPs Optimization strategies for parallel execution come in two ....
....likely to generate optimal plans since it is difficult to see how an optimal plan can be constructed without considering preemptable resources such as CPU, disks and the communication network. Modeling a QEP is the first step in searching for an optimal one. The method used in most approaches ([GAR96], LU91] LAN94] GANG92] is to model QEPs as 4 annotated join trees. The level of granularity is thus reflected in the nodes of the join tree, which are base relations at the leaves and components of join operations in internal nodes. Figure 1. A three way join of R1, R2 and R3 Figure 2. ....
[Article contains additional citation context not shown here]
M.N. Garofalakis, Y.E. Ioannidis, " Multi-Dimensional Resource Scheduling for Parallel Queries", SIGMOD '96, 6/96 Montreal, Canada
....a cost model for schedule cost evaluation [SF96] and we are going to experiment with the GP technique in its solutions space. We further plan to measure the robustness and effectiveness of GP in comparison with heuristics for optimal QEP and schedule generation, as proposed in [SMK93, HM94, GI96] ....
Minos Garofalakis and Yannis Ioannidis. Multi-dimensional resource scheduling for parallel queries. In SIGMOD Int. Conf. on Management of Data. ACM, 1996.
....added to the total time of the resource responsible for the step (disc, cpu or communications) A bottleneck resource is established and its response time is used as the response time of the algorithm phase. The sum of the response times of all phases gives the response time of the algorithm. In [14] scheduling in shared nothing systems is studied and an analytical model of parallel execution and resource sharing is developed to estimate the response time of a given schedule (and, hence, the corresponding query) The model takes into account the overlapped use of multiple resources and ....
....of the profile. Intra operator parallelism such as pipelined or partitioned execution is taken into account during traversal and determines the way usage time is accumulated. Full details of this are given in Section 3.3. 3. 1 Preparation The starting point for the analysis is an operator tree [14,13], created from a query execution tree by decomposing each of its nodes into one or more constituent operator nodes, e.g. SCAN, AGGREGATE, BUILD, PROBE. Operator trees are generated by the compiler of the DBMS and given as input to the model. An operator tree together with a mapping of system ....
[Article contains additional citation context not shown here]
M. Garofalakis and Y. E. Ioannidis, "Multi-dimensional resource scheduling for parallel queries", In Proc. of the 1996 ACM SIGMOD Int. Conf. on Management of Data, Montreal, Canada, pp.365-376, June 1996.
....different kinds of skew [Kit90, DeW92, Ber92, Sha93] based on dynamic data redistribution. With inter operator parallelism, distributing the query s operators among all processors can also yield poor load balancing. Much research has been dedicated to inter operator load balancing in sharednothing [Meh95, Rah95, Gar96] which is done statiPage cally during optimization or dynamically prior to execution. The potential reasons for poor load balancing in shared nothing are studied in [Wil95] First, the degree of parallelism and the allocation of processors to operators, decided in the parallel optimization ....
M. N. Garofalakis, Y. E. Yoannidis, "Multidimensional Resource Scheduling for Parallel Queries". ACM-SIGMOD Int. Conf. Montreal, June 1996.
....lower bound on the makespan, for the above choice of parameters (V , Pi and T ) We give a simple approximation algorithm that matches the O(V Pi log T ) makespan bound. In contrast to our algorithm, most known practical solutions use some variant of greedy list or queue type scheduling [50, 59, 90]. Jobs on arrival are placed in a list ordered by some heuristic (often FIFO) The scheduler dispatches the first ready job on the list when enough resources become available. List scheduling and its variants are appealingly simple to implement, but they can be notoriously bad, which is not ....
....justify our model. In x4.3 and x4.4 we give the makespan lower and upper bounds. In x4.5 we show how to extend the makespan algorithm to a WACT algorithm. In x4.6 we pose some unresolved problems. 4.2 Motivation 4.2. 1 Databases Query scheduling in parallel databases is a topic of active research [23, 90, 119, 66, 59]. Queries arrive from many users to a front end manager process. A query is an in tree, where the internal vertices are operations like sort, merge, select, join etc. which we call jobs. We think of pipelines as collapsed into single vertices. The leaves are relations stored on disk. Edges ....
[Article contains additional citation context not shown here]
M. Garofalakis and Y. Ioannidis. Multi-dimensional resource scheduling for parallel queries. In ACM SIGMOD Conference on the Management of Data. ACM, May 1996.
....1.2 Known approaches Many simplified variants of our problem are strongly NP hard, even for makespan. Thus the goal is to find approximations for the worst case and heuristics in practical settings. All known practical solutions use some variant of greedy list or queue type scheduling [6, 9, 18]. Jobs on arrival are placed in a list ordered by some heuristic (often FIFO) The scheduler dispatches the first ready job on the list when enough resources become available. List scheduling and its variants are appealingly simple to implement; however they can be notoriously bad. In fact, ....
....scenarios to justify our model. In x3 and x4 we give the makespan lower and upper bounds. In x5 and x6 we show how to extend the makespan algorithm to a WACT algorithm. In x7 we pose unresolved problems. 2 Model Databases. Query scheduling in parallel databases is a topic of active research [2, 18, 27, 12, 9]. Queries arrive from many users to a front end manager process. A query is an in tree where the internal vertices are operations like sort, merge, select, join etc. which we call jobs. We think of pipelines as collapsed into single vertices. The leaves are relations stored on storage devices. ....
[Article contains additional citation context not shown here]
M. Garofalakis and Y. Ioannidis. Multi-dimensional resource scheduling for parallel queries. In SIGMOD conference. ACM, May 1996. To appear.
....time algorithm can achieve a better approximation ratio. Besides having theoretical importance, these problems have several applications such as load balancing, cutting stock, and resource allocation, to name a few. One of our motivations for studying these problems comes from recent interest [14, 15, 16] in multidimensional resource scheduling problems in parallel query optimization. A favored architecture for parallel databases is the so called shared nothing environment [7] where the parallel system consists of a set of indepen1 2 dent processing units each of which has a set of timeshareable ....
....by a single aggregate work measure. This simplification is done typically to reduce the complexity of the scheduling problem. For large task systems that are typically encountered in database applications, however, ignoring the multi dimensionality could lead to bad performance. The work in [13, 14, 15, 16] demonstrates the practical effectiveness of the multi dimensional approach. One of the basic resource scheduling problem that is considered in the above papers is the problem of scheduling d dimensional vectors (tasks) on d dimensional bins (machines) to minimize the maximum load on any ....
Minos N. Garofalakis and Yannis E. Ioannidis. Multidimensional resource scheduling for parallel queries. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 365--76, June 1996.
....modelled in [1] where the execution time of a pipeline participant is observed as a function of the number of processors assigned to it. However, for large join queries, it is rather expected that multiple processes compete for the same processor. Pipelined and bushy parallelism [3] is studied in [1, 4, 2, 6], where the communication across logical links between interacting processes is also considered. However, the network configuration affects the selection of the physical channels to connect two communicating processors. A router may even dynamically choose different routes between the same ....
Minos Garofalakis and Yannis Ioannidis. Multi-dimensional resource scheduling for parallel queries. In SIGMOD Int. Conf. on Management of Data. ACM, 1996.
....efficient cost models for parallel query optimization and propose a solution that captures all the important parameters of parallel execution. Note to referees: Parts of this paper have appeared in the Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD 96) GI96] and in the Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB 97) GI97] Besides the more extensive coverage, this paper extends these earlier conference publications with significant new material, including: a) experimental results from an implementation of our ....
.... system resources can be categorized into two radically different classes with respect to their mode of usage by query plan operators: ffl Time Shared (ts) or, preemptable) resources (e.g. CPUs, disks, network interfaces) that can be sliced between operators at very low overhead [GHK92, GI96] For such resources, operators specify an amount of work (i.e. the effective time for which the resource is used) that can be stretched over the operator s execution time. ffl Space Shared (ss) resources (e.g. memory buffers) whose time sharing among operators introduces prohibitively high ....
[Article contains additional citation context not shown here]
Minos N. Garofalakis and Yannis E. Ioannidis. "Multi-dimensional Resource Scheduling for Parallel Queries". In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 365--376, Montreal, Canada, June 1996.
....Figure 3: The OPERATORSCHEDULE algorithm The following theorem bounds the worst case performance ratio of our algorithm. As with all theoretical results presented here, the theorem is stated without proof due to space constraints. The details can be found in the full version of this paper [GI96] Theorem 5.1 The parallel execution time of the schedule returned by OPERATORSCHEDULE is (a) within (2d 1) of the length of the optimal schedule that uses the same degrees of intra operator parallelism for all floating operators, and (b) within (2d(fd 1) 1) of the optimal CG f schedule ....
....operator, and ffl T (CP) is the total response time of the critical (i.e. most time consuming) path in the plan assuming the maximum allowable degree of coarse grain parallelism for each operator. By assumption A4, OPTBOUND is indeed a lower bound on the length of the optimal CG f execution [GI96] The results for queries of 20 and 40 joins are shown in Figure 6(b) for f = 0:7 and overlap ffl = 0:5. These curves verified our expectations, showing that the average performance of TREESCHEDULE is much closer to optimal than what we would expect from the worst case bound derived in Theorem ....
[Article contains additional citation context not shown here]
M. N. Garofalakis and Y. E. Ioannidis. "Multidimensional Resource Scheduling for Parallel Queries". Unpublished manuscript, March 1996.
No context found.
Minos N. Garofalakis and Yannis E. Ioannidis. "Multidimensional Resource Scheduling for Parallel Queries". In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 365--376, Montreal, Canada, June 1996.
....amount of resource bandwidth that the service request must consume for the duration of its execution to ensure smooth delivery of the media strand(s) to the user. This multi dimensional model of resource usage was inspired by our previous work on resource scheduling for parallel database systems [GI]. Consider the content based video retrieval example mentioned earlier. Assume a media server consisting of a single CPU, three magnetic disks, and a single network interface, corresponding to rate vector dimensions 1 through 5 respectively (Figure 1(a) Also, assume that the retrieved video ....
Minos N. Garofalakis and Yannis E. Ioannidis. "Multi-dimensional Resource Scheduling for Parallel Queries". To appear in the 1996 ACM SIGMOD International Conference on Management of Data.
....or to republish, requires a fee and or special permission from the Endowment. Proceedings of the 23rd VLDB Conference Athens, Greece, 1997 ffl Time Shared (TS) or, preemptable) resources (e.g. CPUs, disks, network interfaces) that can be sliced between operators at very low overhead [9, 11]. For such resources, operators specify an amount of work (i.e. the effective time for which the resource is used) that can be stretched over the operator s execution time. ffl Space Shared (SS) resources (e.g. memory buffers) whose time sharing among operators introduces prohibitively high ....
....has ignored the multi dimensional nature of databasequeries and has concentrated on simplified models of SS resources, resulting in unrealistic approaches to the problem. Similar limitations exist in previous efforts within the field of deterministic scheduling theory. 1 In our earlier work [11], we have presented a multidimensional framework for query scheduling in shared nothing parallel systems with only TS resources, dealing with the full variety of bushy plans and schedules that incorporate independent and pipelined forms of inter operation parallelism as well as intraoperation ....
[Article contains additional citation context not shown here]
M. N. Garofalakis and Y. E. Ioannidis. "Multi-dimensionalResource Scheduling for Parallel Queries". In Proc. of the 1996 ACM SIGMOD Intl. Conf., June 1996.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC