| M.S. Chen, P.S. Yu, K.L. Wu, Scheduling and Processor Allocation for Parallel Execution of Multi-Join Queries, ICDE Conf., Tempe, 1992. 21 |
....interfaces tend to generate complex queries that may contain larger numbers of joins between relations. Consequently, the development of execution strategies for the parallel evaluation of multi join queries has drawn the attention of the scientific community. A number of strategies was proposed [CLY92,CYW92, HoS91,HCY94,ScD90] and their performance was evaluated via simulation. However, no comparative experimental performance evaluation is available. This paper describes the proposed strategies in a common framework. Four strategies are implemented on PRISMA DB and a comparative performance evaluation is done. The ....
M. S. Chen, P.S. Yu & K. L. Wu, "Scheduling and processor allocation for parallel execution of multi-join queries.," in Proc 8th Data Engineering Conf, Tempe, Arizona, USA, February 3-7, 1992, 58 67.
....plan is first constructed for a uniprocessor machine and then parallelized. In one step optimization, the parameters of parallelism are already taken into account when establishing the optimal plan, which thus contains scheduling information. The first approach is adopted in studies like [1], 6] 7] An optimal sequential plan is produced at compile time, and an optimal parallelization of this plan is selected according to some heuristics at runtime. As pointed out by Lanzelotte et al. there is no guarantee that the optimal uniprocessor plan will remain optimal when parallelized ....
....is exploited to improve the quality of the results and to reduce the optimization overhead. In that aspect, our technique is innovative, since previous studies on combinatorial query optimization were based on sequential algorithms. Our technique processes bushy query trees. In models like [1], 7] 18] one bushy tree is created by a constructive algorithm, which is intended to find the optimal way of performing the joins. In models like [35] 34] 12] 9] 10] an initial tree is created using some mechanism (possibly an augmentation heuristic) and a randomized algorithm is ....
[Article contains additional citation context not shown here]
M.-S. Chen, P. Yu, and K.-L. Wu, "Scheduling and Processor Allocation for Parallel Execution of Multi-Join Queries," Proc. Eighth Int'l Conf. Data Eng., pp. 58--67, IEEE, 1992.
....size, we implemented several search strategies to cope with different size of search space, and let the optimizer choose the appropriate one, according to the query type (ad hoc vs repetitive and simple vs. complex) We definitely think that compiling heuristics such as the one proposed in [4] or [5] is essential. However, the fanout of parallel architectures and execution models is quite large and it is difficult to find general heuristics. This problem is especially true with our approach since our optimizer is targetted for both shared memory and distributedmemory architectures. Thus we ....
M-S Chen, P. S. Yu, and K-L Wu. Scheduling and processor allocation for parallel execution of multi-join queries. In Proc. Int. Conf. on Data Engineering, 1992.
....left deep and right deep, section 6 describes the heuristic approach, section 7 discusses the evaluation parameters, and the experimental design, section 8 has results and discussions, and section 9 has conclusions and future work. 2 Related Work Work in parallel query optimization [18] 1] [2] [8] 7] 10] 15] 20] 23] 6] 26] 22] has focussed on minimizing the total execution time (elapsed time) for a query. It is difficult, if not impossible, to directly compare the different techniques proposed in literature. This is because these methods have been developed for different ....
....sequential environment [21] 17] and continues to be done [14] cost models for the parallel environment are still in their infancy. This is predominantly because researchers have looked at various different architectures, namely shared everything [8] 7] 15] 22] 23] 26] shared disk [18] 1] [2], and shared nothing architectures [20] 10] Even when considering the same architecture, methods differ in whether they differentiate between sequential and random I O, if overlap in I O and cpu utilization is considered, if inter operator and pipelined parallelism is considered, and if multiple ....
[Article contains additional citation context not shown here]
M.S. Chen, P.S. Yu, and K.L. Wu. Scheduling and processor allocation for parallel execution of multi-join queries. Proceedings of the 8th International Conference on Data Engineering, pages 58--67, April 1992.
....the problem, e.g. in a distributed memory architecture processor memory assignment are equivalent since each goes with the other. Disk allocation, primarily for 2 intermediate results, will usually be guided by the data declustering approach being used. Processor assignment has been considered in [LU91, HONG92, GANG92, SRIV93, TUREK92, CHEN92b, HUA93], while memory assignment has been considered in [SRIV93 and ZIAN93] Since memory continues to be a critical resource for the performance of database operations even in the parallel environment [SCHN90] a careful modeling of contention on it is important. A first step towards memory allocation ....
....intra operator and pipelining are used effectively. The goals are (i) to obey the precedence constraints (i.e. some operations may not be started until operations lower in the join tree have completed) ii) allocate processors to avoid idling as much as possible (called system fragmentation [CHEN92b]) and (iii) allocate memory to maximize its effectiveness. These goals increase the difficulty of scheduling [CHEN92b] The first step is to transform the join tree into a more basic form, called an operator tree. Here joins are replaced by their constituent operations. For example, a sort merge ....
[Article contains additional citation context not shown here]
M.S. Chen, P.S. Yu and K. Wu, "Scheduling and Processor Allocation for Parallel Execution of Multi-Join Queries", in Proceedings of Data Engineering 92, 1992.
....which is not an LBT (i.e. the left and right subtrees of the last processed join are neither a base relation nor a single join) 4 . Next section will position the new LBT processing strategy in relation to previous works. 3 Previous works Early related work in parallel query optimization (e.g. [18,20]) have concentrated on linear trees and intra operator parallelism (e.g. 21,22] These works have not yet consider operator orderings including inter operation parallelism because of its high scheduling complexity and its difficult synchronization. In the last years, several parallel DBMS ....
M.-S. Cheng and P.S. Yu. Scheduling and Processor Allocation for Parallel Execution of Multi-Join Queries. In Proceedings of the International Conference on Data Engineering, pages 58--67, February 1992.
....contention, ii) the estimation of processor numbers allocated (logically) to an operation, iii) parallel scheduling and allocation of resources, and (iv) the minimization of data communication costs taking into account the interconnection network topology. In this way, recently, several authors [4, 19, 23] have proposed parallelization strategies derived from search spaces represented by five structures called left deep trees, rightdeep trees, bushy trees 1 , segmented right deep trees and zig zag trees. Two approaches of parallelization strategies have been proposed in the literature [17] ....
....from search spaces represented by five structures called left deep trees, rightdeep trees, bushy trees 1 , segmented right deep trees and zig zag trees. Two approaches of parallelization strategies have been proposed in the literature [17] one phase and two phase. For the two phase approach [4, 11, 12, 13, 14], the first phase, called plan generation, consists in generating an execution plan (without considering the allocation of resources) Furthermore, for each operation is selected the adequate strategy. The second phase achieves an optimal allocation of resources for the execution plan generated in ....
[Article contains additional citation context not shown here]
Chen, M., et al. Scheduling and processor allocation for parallel execution of multi-join queries. In Proc. 8th Intl. Conf. Data Eng. (feb. 1992), I. C. S. Press, Ed., pp. 58--67.
....input relation consists of one million tuples. The selection selectivity is 100 , join selectivity is 50 and the joins are allocated their maximum memory allocation. There are multiple ways of scheduling such a complex query, each differing in the amount of parallelism and pipelining exploited [Schn90, Chen92a, Chen92b]. In this experiment, both left deep and right deep scheduling [Schn90] are considered. These represent the two extremes in query scheduling strategies; left deep schedules have the least parallelism and limited pipelining, while rightdeep schedules have the highest parallelism and maximum ....
Chen, Ming-Syan et al., "Scheduling and Processor Allocation for Parallel Execution of multi-join Queries",<F3.17e+05> Proc. 8th IEEE Data Engineering<F3.733e+05> Conf., Phoenix, Ariz, Feb. 1992.
....we opted to optimize dynamically during query execution to avoid problems of errorpropagation during the optimization phase. We used the greedy heuristic where the two remaining tables whose join result is expected to have the smallest size were selected for joining. This scheme was shown in [6] to produce query plans close to the optimal plan given a similar cost model. The same 100 instances were solved by both the traditional method where tables were only joined, and by the enhanced method were non join attributes were projected out in order to represent the solution as an acyclic ....
M. Chen, P. S. Yu, and K. Wu, Scheduling and processor allocation for parallel execution of multi-join queries. In Proc. of the 8th International Conference on Data Engineering, 58-67, 1992.
.... by ignoring independent (bushy tree) parallelism; these include the right deep trees of Schneider [Sch90] and the segmented right deep trees of Chen et al. CLYY92] Nevertheless, the advantages offered by such parallelism, especially for large queries, have been demonstrated in prior research [CYW92] Tan and Lu [TL93] and Niccum et al. NSHL95] consider the general problem of scheduling bushy join plans on parallel machines exploiting all forms of intra query parallelism and suggest heuristic methods of splitting the bushy plan into non overlapping shelves of concurrent joins. For the same ....
....response time. We selected SYNCHRONOUS as a one dimensional adversary since it is the state of the art method for exploiting bushy tree parallelism in parallel query execution 3 [WFA95] Prior research has demonstrated the advantages offered by such parallelism, especially for large queries [CYW92] To the best of our knowledge, optimal processor distribution within general join pipelines remains an open problem. We therefore decided to restrict our experiments to bushy hash join query plans so that the optimal technique of Lo et al. could be used in SYNCHRONOUS. We should stress, however, ....
M.-S. Chen, P. S. Yu, and K.-L. Wu. "Scheduling and Processor Allocation for Parallel Execution of MultiJoin Queries". In Proc. of the 8th Intl. Conference on Data Engineering, Phoenix, Arizona, February 1992.
....can be another operator, bushy parallelism cannot be exploited either. In the right deep tree [16] all operators are executed in pipeline mode, but the memory demand is high. The zigzag tree [25] alleviates this problem by combining left deep and right deep subtrees. Right deep stratified trees [1] exploit pipelining, while supporting also bushy parallelism to a limited degree. However, bushy and pipelined parallelism can be fully exploited only if the processing tree is bushy itself [5, 12] The cost of parallel execution is not only affected by the types of parallelism exploited, but also ....
M.-S. Chen, P. Yu, and K.-L. Wu. Scheduling and processor allocation for parallel execution of multi-join queries. In Eighth Int. Conf. on Data Engineering, pages 58--67. IEEE, 1992.
....a long history of research on efficient evaluation of join operations in relational database systems in both sequential and parallel environments. Recently researchers have extended this work to efficient parallel evaluation of multi join expressions R 1 . R k of relations R 1 ; R k [1, 2, 3, 4]. In particular, they have studied how such expressions can be evaluated in the minimum total time on different parallel machine models. The existing algorithms have used either intra operation parallelism, interoperation parallelism, or a combination of both. In intra operation parallelism, each ....
P.S. Yu M.S. Chen and K.L. Wu. Scheduling and processor allocation for parallel execution of multi-join queries. Proc 8th Data Engineering Conf., 1992.
....on the number of user queries concurrently active, the number of drives and the sizes of the relations joined. 6.3 Related topics in secondary memory systems 6.3.1 Query scheduling Parallel and distributed database systems Query scheduling is common in parallel [AC88, BSCD91, HS93, Y. W95, Gra90, CYW92] and distributed database systems [S 96, CP84] Processing a plan tree accessing multiple relations each of which could be horizontally fragmented across many different sites raises many interesting scheduling issues, and a variety of algorithms have been proposed. The details of the ....
M. Chen, P.S. Yu, and K. Wu. Scheduling and processor allocation for parallel execution of multi-join queries. In Proc. International Conference on Data Engineering, pages 58--66, 1992.
No context found.
M.-S. Chen, P. S. Yu, and K.-L. Wu. Scheduling and Processor Allocation for Parallel Execution of Multi-Join Queries. Proceedings of the 8th International Conference on Data Engineering, pages 58--67, February 1992. 40
....and inter operator parallelism was presented in [21] where a greedy scheme taking various join methods and their corresponding costs into consideration was proposed. A two step approach to deal with join sequence scheduling and processor allocation for parallel query processing was devised in [6]. Several query plans in processing multi join queries in a shared nothing architecture were investigated in [27] In addition, experimental studies on evaluating various query plan generation strategies were conducted in [36] Among various join methods, the hash join has been the focus of much ....
....of pipelined hash joins. Little effort was made to take processing power into consideration and optimize processor allocation. It has been shown that for sort merge joins, the execution of bushy trees can outperform that of linear trees, especially when the number of relations in a query is large [6]. However, as far as the hash join is concerned, the scheduling for an execution plan of a bushy tree structure is much more complicated than that of a right deep tree structure. Particularly, it is very difficult to achieve the synchronization required for the execution of bushy trees such that ....
[Article contains additional citation context not shown here]
M.-S. Chen, P. S. Yu, and K.-L. Wu. Scheduling and Processor Allocation for Parallel Execution of Multi-Join Queries. Proceedings of the 8th International Conference on Data Engineering, pages 58--67, February 1992. 25
....in the inter operator level. Notice, however, that those two limiting factors have been phased out by the rapid increase in the capacity of multiprocessors and the trend for queries to become more complicated nowadays, thus justifying the necessity of exploiting inter operator parallelism [4] [13] 14] 35] Similarly to the study on intra operator parallelism, to explore inter operator parallelism, one has to consider the join methods employed. Among various join methods, the hash join has been elaborated upon by much research effort and reported to have superior performance to ....
....was proposed. In [10] the resource contention for parallel query execution is taken into consideration to incorporate the sources and deterrents of parallelism in the traditional execution space and minimize the response time subject to constraints on throughput. In addition, it has been shown in [4] that for sort merge joins, the execution of bushy trees can outperform that of linear trees, especially when the number of relations in a query is large. However, as far as the hash join is concerned, the scheduling for an execution plan of a bushy tree structure is much more complicated than ....
[Article contains additional citation context not shown here]
M.-S. Chen, P. S. Yu, and K.-L. Wu. Scheduling and Processor Allocation for Parallel Execution of Multi-Join Queries. Proceedings of the 8th International Conference on Data Engineering, pages 58--67, February 1992.
....task mix to be executed concurrently. For efficient solutions, only schemes that execute at most two tasks at a time were explored in [11] A two step approach to deal with join sequence scheduling and processor allocation for parallel query processing using sort merge joins was devised in [7]. Pipelining hash joins in a bushy tree and processor allocation within each pipeline were studied in [5] and [18] respectively. In addition, various query plans in processing multi join queries in a shared nothing architecture were investigated in [24] While most prior work on inter operator ....
....at the compile time 2 . The join sequence numbers 1 Note that in dealing with a linear execution tree, one usually has only two joining relations residing in memory at a time, thus limiting the applicability of hash filters to the joining attribute. 2 Various heuristics, such as those in [7] and [20] can be applied to build a bushy execution tree. Note that assigning sequence numbers to joins while building a bushy tree involves little overhead. 3 R 1 A B a 1 b 1 a 2 b 1 a 2 b 2 a 2 b 4 a 3 b 4 a 4 b 7 a 4 b 9 HFR1 (B) h(b i ) set 0 0 1 1 2 1 3 0 4 1 R 2 B C b 1 ....
[Article contains additional citation context not shown here]
M.-S. Chen, P. S. Yu, and K.-L. Wu. Scheduling and Processor Allocation for Parallel Execution of Multi-Join Queries. Proceedings of the 8th International Conference on Data Engineering, pages 58--67, February 1992.
....Relational operations are set oriented and this provides the query optimizer lots of flexibility in selecting the parallelizable access path. In relational database systems, joins are the most expensive operations to execute, especially with the increases in database size and query complexity [11, 27, 30, 39, 53]. For future database management, parallelism has been recognized as a solution for the efficient execution of multi join queries [1, 17, 18, 25, 36, 42, 52, 54, 55] As pointed out in [46] the methods to exploit parallelism in the execution of database operations in a multiprocessor system can ....
....until their operands generated by prior joins are available. Also, after a sequence of processor allocation and release, there might be a few processors left idle since they do not form a cluster large enough to execute any remaining join efficiently. This phenomenon is termed system fragmentation [11]. Clearly, execution dependency and system fragmentation, as well as the operational point selection, have to be taken into account for a better processor allocation strategy, thus complicating the minimization procedure for the query execution time. To deal with this problem, we propose and ....
[Article contains additional citation context not shown here]
M.-S. Chen, P. S. Yu, and K.-L. Wu. Scheduling and Processor Allocation for Parallel Execution of Multi-Join Queries. Proceedings of the 8th International Conference on Data Engineering, pages 58--67, February 1992.
....costs into consideration was proposed. A heuristic approach that deals with a query plan tree for effective resource assignments in a bottom up manner was presented in [23] A twostep approach to handle join sequence scheduling and processor allocation for parallel query processing was devised in [5]. A hierachical approach was proposed in [27] to schedule the execution of multiple queries. In addition, various query plans in processing multi join queries in a shared nothing architecture were studied in [19] 21] Among various join methods, the hash join has been the focus of much research ....
....segmented right deep trees [4] and zigzag trees [28] resorted to simple heuristics to allocate processors to pipeline stages. It has been shown that for sort merge joins, the execution of bushy trees can outperform that of linear trees, especially when the number of relations in a query is large [5]. However, as far as the hash join is concerned, the scheduling for an execution plan of a bushy tree structure is much more complicated than that of a right deep tree structure. Particularly, it is very difficult to achieve the synchronization required for the execution of bushy trees such that ....
[Article contains additional citation context not shown here]
M.-S. Chen, P. S. Yu, and K.-L. Wu. Scheduling and Processor Allocation for Parallel Execution of Multi-Join Queries. Proceedings of the 8th International Conference on Data Engineering, pages 58--67, February 1992.
....in the inter operator level. Notice, however, that those two limiting factors have been phased out by the rapid increase in the capacity of multiprocessors and the trend for queries to become more complicated nowadays, thus justifying the necessity of exploiting inter operator parallelism [4] [9] 11] 17] 21] Similarly to the study on intra operator parallelism, to explore inter operator parallelism, one has to consider the join methods employed. Among various join methods, the hash join has been elaborated upon by much research effort and reported to have superior performance to ....
....The bushy tree, on the other hand, offers more flexibility on query plan generation at the cost of searching a larger space. It has been shown that for sort merge joins, the execution of bushy trees can outperform that of linear trees, especially when the number of relations in a query is large [4]. However, as far as the hash join is concerned, the scheduling for an execution plan of a bushy tree structure is much more complicated than that of a right deep tree structure. Particularly, it is very difficult, if not impossible, to achieve the synchronization required for the execution of ....
[Article contains additional citation context not shown here]
M.-S. Chen, P. S. Yu, and K.-L. Wu. Scheduling and processor allocation for parallel execution of multi-join queries. Proceedings of the 8th Intern 'l Conf. on Data Engineering, pages 58--67, Feb. 1992.
....task mix to be executed concurrently. For efficient solutions, only schemes that execute at most two tasks at a time were explored in [10] A two step approach to deal with join sequence scheduling and processor allocation for parallel query processing using sort merge joins was devised in [6]. Pipelining hash joins in a bushy tree and processor allocation within each pipeline were studied in [4] and [15] respectively. In addition, various query plans in processing multi join queries in a sharednothing architecture were investigated in [20] While most prior work on inter operator ....
....in the performance study in Section 4 is a multiprocessor system with distributed memories and shared disks. Barring any tuple placement skew, the scheme developed in this paper is applicable to the shared nothing architecture where each disk is accessi 2 Various heuristics, such as those in [6] and [17] can be applied to build a bushy execution tree. Note that assigning sequence numbers to joins while building a bushy tree involves little overhead. ble only by a single node. To facilitate our presentation and performance evaluation, the join method on which we shall demonstrate the ....
[Article contains additional citation context not shown here]
M.-S. Chen, P. S. Yu, and K.-L. Wu. Scheduling and Processor Allocation for Parallel Execution of Multi-Join Queries. Proceedings of the 8th International Conference on Data Engineering, pages 58--67, February 1992.
.... attention from both the academic and industrial communities because they can efficiently execute complex database operations [1] 5] 10] 17] In relational database systems, joins are the most expensive operations to execute, especially with the increases in database size and query complexity [3] [14] 19] Many applications usually need to specify the desired results in terms of multi join queries, some of which may take hours or even days to complete. As a result, parallelism has been recognized as the only solution for the efficient execution of multi join queries for future database ....
M.-S. Chen, P. S. Yu, and K.-L. Wu. Scheduling and Processor Allocation for Parallel Execution of Multi-Join Queries. Proceedings of the 8th International Conference on Data Engineering, pages 58--67, February 1992.
No context found.
M.S. Chen, P.S. Yu, K.L. Wu, Scheduling and Processor Allocation for Parallel Execution of Multi-Join Queries, ICDE Conf., Tempe, 1992. 21
No context found.
CYW92 Chen, M.; Yu, P.; Wu, K. 1992: Scheduling and Processor Allocation for Parallel Execution of Multi-Join Queries. Proc. 8th IEEE Data Engineering Conference, 58-67.
No context found.
Chen, M-S., Yu, P.S., Wu, K-L., Scheduling and Processor Allocation For Parallel Execution Of Multi-Join Queries. IEEE 8 th Int. Conf. on Data Engg., 1992.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC