| C. Chekuri, W. Hasan, and R. Motwani. Scheduling problems in parallel query optimization. In PODS, 1995. |
....the total communication cost may not necessarily decrease. Plan Space : 33] discusses the plan space explored by this algorithm. It will be a subspace of the plan space explored by the exhaustive algorithm. 3.4. Two phase Optimization Two phase optimization [26] has been used extensively [9, 19] in distributed and parallel query optimization mainly because of its simplicity and the ease of implementation. This algorithm works in two phases : The techniques described in [33] based on minimum selectivity, etc. can be applied orthogonally. However, since they do nothing to reflect ....
C. Chekuri, W. Hasan, and R. Motwani. Scheduling problems in parallel query optimization. In PODS, 1995.
....software was main memory size limitations) Staging was also known to improve CPU performance in the mid seventies [AWE] This section discusses representative pieces of work from a broad scope of research in databases, operating systems, and computer architecture. Parallel database systems [DG92][CHM95] exploit the inherent parallelism in a relational query execution plan and apply a dataflow approach for designing highperformance, scalable systems. In the GAMMA database machine project [De 90] each relational operator is assigned to a process, and all processes work in parallel to achieve ....
C. Chekuri, W. Hasan, and R. Motwani. "Scheduling Problems in Parallel Query Optimization." In Proc. PODS, pp. 255-265, 1995.
....load of the processors. POT scheduling problem was first introduced by Hasan and Motwani for identical processor systems and was shown to be NP hard [11] They proposed several approximation algorithms for restricted cases of the POT. Chekuri et al. devised two algorithms for the general case [4]. The first one, called LOCALCUTS, has the worst case performance ratio of 3 p 17 2 and runs in O(n log n) time. The second, called BOUNDEDCUTS, has a smaller performance ratio of (1 )2:87, at the expense of higher time complexity of O( 1 n log n) Chekuri has also shown that there ....
....3, we present our POT scheduling algorithms for a system with two uniform processors. The solutions for POT scheduling on an arbitrary number uniform processors are presented in section 4. 2. The model and problem definition The following definitions are based on earlier models presented in [11, 4]. A POT is represented as a weighted operator tree P = V; E) The weight p k of the node k is the time to run the operator in isolation assuming all communications are local. The weight c kj of the edge from node k to node j is the additional CPU overhead that both k and j will incur for ....
[Article contains additional citation context not shown here]
Chekuri, C., Hasan, W., and Motwani, R., "Scheduling Problems in Parallel Query Optimization," In Proceedings of Fourteenth ACM SIGACT-SIGMODSIGART Symposium on Principle of Database Systems, pp: 255-265, San Joes, California,May 1995.
....Internet and the ensuing increasingly complex socio economic context of computation. 1 Consider the query optimization problem, for example, arguably the most important and complex problem in databases. Trade offs between parallelism and communication in query optimization have been studied in [CHM, HM], and elsewhere. The Mariposa wide area database system [SAP ] was architected to make such trade offs explicit in an advantageous way. Mariposa assumes that a subquery can be executed in many diverse database sites, and each site submits a bid for the query, specifying a delay for delivering ....
C. Chekuri, W. Hasan, and R. Motwani. Scheduling Problems in Parallel Query Optimization. Proc. of the Fourteenth ACM Symposium on Principles of Database Systems (PODS), 1995, pp. 255-265.
....query plans into sub plans which are then executed in parallel on di erent nodes of the parallel processing environment. The granularity of this decomposition varies and can be as ne as parallelizing single operator as studied for example in [SD89, SD90, WFA95] but is often chosen coarser [HM95, CHM95, GI96, GI97] 3. Architecture 3 These approaches have in common that they require communication between single nodes for shipping or exchanging partial results. This causes network contention and synchronization e ects where nodes have to wait for others to complete their tasks rst. As a ....
C. Chekuri, W. Hasan, and R. Motwani. Scheduling Problems in Parallel Query Optimization. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 255-265, San Jose, CA, USA, May 1995.
....apply to a restricted class of query plans, such as star queries and paths of pipelined operators. The heuristics proposed ignore skew handling as well as intra operator parallelism. Chekuri et al. develop a more general treatment and allow for arbitrary query plans using the same cost model [2]. Again, skew is not considered. Garofalakis and Ioannidis discuss a richer cost model and focus on shared nothing architectures [4, 5] Their scheduling heuristics are also based on the assumptions that no skew affects the execution. Lo et al. study constraint processor allocation for pipelined ....
C. Chekuri, W. Hasan, and R. Motwani. Scheduling Problems in Parallel Query Optimization. In Proc. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, San Jose, CA, USA, May 1995.
....and therefore avoiding supplement communication and memory utilization. This first allocation strategy appeared to us the most important, as all shared resources are used less. The same significance was attributed by many other authors, for instance see the algorithm developed by Hasan et al. [47] or Hameurlain et al. 48] for optimizing the communication costs. Even if communication network speed has increased dramatically over the last years (e.g. faster networks such as the optical one come up quickly) additional costs on the sender and receiver machine could not be neglected. For ....
C. Chekuri, W. Hasan, and R. Motwani. Scheduling Problems in Parallel Query Optimization. In Proceedings of the Principles of Database Sytems, pages 255-- 265, San Jose, California, USA, May 1995.
....this reduction in optimization cost may come at the price of selecting highly suboptimal plans. Nevertheless, even in this case, effective query scheduling algorithms are still necessary for distributing the execution of the plan on the run time environment during the parallelization phase [HM94, CHM95] Hence, resource scheduling techniques form an important component of any approach to query processing and optimization in parallel database systems. As a result, significant research effort has concentrated on the problem of minimizing the response time of a single query through ....
....query processing and optimization in parallel database systems. As a result, significant research effort has concentrated on the problem of minimizing the response time of a single query through parallelization of an execution plan, i.e. scheduling of the plan s operators on the system s sites [CHM95, GW93, HM94, Hon92, HCY94, LCRY93] Most of these efforts, however, are based on simplifying assumptions that limit their applicability. We address the parallel query plan scheduling problem in its most general form, assuming the full variety of bushy plans and schedules that incorporate ....
[Article contains additional citation context not shown here]
Chandra Chekuri, Waqar Hasan, and Rajeev Motwani. "Scheduling Problems in Parallel Query Optimization ". In Proceedings of the Fourteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 255--265, San Jose, California, May 1995.
....concurremment aux ressources partag ees et d optimiser la disponibilite de la memoire. Quand elle peut s appliquer, cette heuristique nous semble donc la plus pertinente. On retrouve la philosophie generale de cette approche dans de nombreux travaux (ex. algorithme de coloriage de Hasan et al. CHM95] ou algorithmes d optimisation de Hameurlain et al. HM95a] En pratique, la localite ne peut etre maintenue que si la relation interieure est distribu ee sur l attribut de jointure et si les processeurs disposent d assez d espace memoire pour correctement executer la jointure. Si tel n est pas ....
C. Chekuri, W. Hasan, and R. Motwani. Scheduling Problems in Parallel Query Optimization. In Proceedings of the Principles of Database Sytems, San Jose, California, USA, May 1995.
....as a classical problem of scheduling (second level) This problem is NP complete [16] 17] so heuristic approaches must be proposed. Depending on the characteristics of the operator graph, various heuristic solutions have been considered [18] A. Hamurlai et al. in [19] and W. Hasan et al. 17] [20] proposed scheduling strategies for a given query at run time, but both authors do not allow operators to control query processing. Wei Hong proposed [21] 22] the idea of a runtime system controlling the query processing in order to detect errors in cost estimation models. The degree of ....
Hasan W. Chekuri C. and Motwani R. Scheduling Problems in Parallel Query Optimization. In Proceedings of Principles of Database Sytems, 1995.
....of operators may also show pipelined parallelism, but we can regard those nodes to be collapsed into a single node with data parallelism. There is also task parallelism between unordered nodes. Recently algorithms have been designed for trading between locality and load balance in this scenario [33]. We will come back to similar problems in Chapter 4. The switching technique has been independently discovered after our paper [26] was published in a different context: that of scheduling tasks with penalties. Every task has a running time, and a penalty for rejection; the goal is to minimize ....
C. Chekuri, W. Hasan, and R. Motwani. Scheduling problems in parallel query optimization. In ACM Symposium on Principles of Database Systems, 1995.
....its job as soon as the first tuple has been processed by the join operator and then work in parallel with that latter. The models based on this approach suffer from the fact that they do not provide appropriate representations of all the kinds of parallelization strategies. Thus, most models, e.g. [3, 4, 5, 6], only consider data or simple precedence dependencies between operators. This prevents from taking into account some possible processing strategies like, for instance, those in which operators are ordered without reference to a data streams. Furthermore, none of these models deal correctly with ....
Chekuri C. Hasan W. and Motwani R. Scheduling Problems in Parallel Query Optimization. In Proceedings of the Principles of Database Sytems, 1995.
....problems: 1. compile time optimization: minimizing the response time of a single query through parallelization of an execution plan, i.e. scheduling of the plan s operators on the system s sites (the plan is usually the result of an earlier phase of conventional centralized query optimization) CHM95, GW93, HM94, Hon92, HCY94, LCRY93] and 2. run time execution: achieving some system wide performance goals (e.g. maximizing query throughput) by adaptive scheduling of the operators of multiple concurrent queries [MD93, MD95, RM95] We address the first problem, i.e. parallelization of query ....
....Previous work on parallel query scheduling has typically ignored the multi dimensional nature of database queries. It has simplified the allocation of resources to a mere allocation of processors, hiding the multidimensionality of query operators under a scalar cost metric like work or time [CHM95, GW93, HM94, HCY94, LCRY93] This one dimensional model of scheduling is inadequate for database operations that impose a significant load on multiple system resources. In this paper, we present a framework for multi dimensional resource scheduling in shared nothing parallel database systems. ....
[Article contains additional citation context not shown here]
Chandra Chekuri, Waqar Hasan, and Rajeev Motwani. "Scheduling Problems in Parallel Query Optimization ". In Proceedings of the Fourteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 255--265, San Jose, California, May 1995.
....problems: 1. compile time optimization: minimizing the response time of a single query through parallelization of an execution plan, i.e. scheduling of the plan s operators on the system s sites (the plan is usually the result of an earlier phase of conventional centralized query optimization) CHM95, GW93, HM94, Hon92, HCY94, LCRY93] and 2. run time execution: achieving some system wide performance goals (e.g. maximizing query throughput) by adaptive scheduling of the operators of multiple concurrent queries [MD93, MD95, RM95] We address the first problem, i.e. parallelization of query ....
....Previous work on parallel query scheduling has typically ignored the multi dimensional nature of database queries. It has simplified the allocation of resources to a mere allocation of processors, hiding the multi dimensionality of query operators under a scalar cost metric like work or time [CHM95, GW93, HM94, HCY94, LCRY93] This onedimensional model of scheduling is inadequate for database operations that impose a significant load on multiple system resources. In this paper, we present a framework for multidimensional resource scheduling in shared nothing parallel database systems. ....
[Article contains additional citation context not shown here]
C. Chekuri, W. Hasan, and R. Motwani. "Scheduling Problems in Parallel Query Optimization". In Proc. of the 14th ACM Symposium on Principles of Database Systems, San Jose, California, May 1995.
....database management system [HON92] used a single site optimizer during this phase and therefore ignored communication cost. HM95] presents algorithms that incorporate communication costs. Dividing a plan into parts and scheduling the parts in an optimal way is itself an NP complete problem. [CHM95] presents two approximation algorithms for dividing query plans into subplans for scheduling on a parallel machine. The original query plans must consist only of non blocking operators such as sorts and hash table builds. We plan to implement and test these approaches to dividing query plans into ....
....too large. For example, the TPC D queries can produce plans with up to 60 or so nodes. If each node were bid out individually to ten sites, there would be 10 60 possible solutions. To address this issue, we intend to pursue the work mentioned in Section 2 on optimal division of plan trees [CHM95]. HOME SITE BLACK PROCESSING SITE SELECT O ORDER. Parser Optimizer Fragmenter Query Broker Bidder GRAY PROCESSING SITE Bidder ( 5 5 minutes) 10 30 seconds) Time NAME SERVER Coordinator JOIN SCAN SCAN AGG JOIN SCAN SCAN AGG JOIN SCAN SCAN AGG JOIN SCAN SCAN AGG Single Site DBMS ....
[Article contains additional citation context not shown here]
C. Chekuri, W. Hasan, R. Motwani, "Scheduling Problems in Parallel Query Optimization," Proceedings of the Fourteenth ACM Symposium on Principles of Database Systems (PODS), 1995, pp. 255-265.
....the XPRS approach to be inapplicable to architectures such as shared nothing that have significant communication costs. Other work on parallel query optimization [SE93, LST91, SYT93, CLYY92, HLY93, ZZBS93, GHK92] also ignored modeling communication overheads of parallelism. Our earlier work [HM94, CHM95] focussed on the parallelization phase and has developed scheduling algorithms that account for the trade off between parallelism and communication. Though query processing in parallel and distributed databases [CP84, OV91, YC84] is fundamentally similar, repartitioning intermediate results to ....
C. Chekuri, W. Hasan, and R. Motwani. Scheduling Problems in Parallel Query Optimization. In Proceedings of the Fourteenth ACM SIGACT-SIGMODSIGART Symposium on Principles of Database Systems, 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC