| J. Hellerstein and J. Naughton. Query execution techniques for caching expensive methods. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 423--434, 1996. |
....and efficiently executing queries involving dependent joins is highly applicable. A general purpose query optimization algorithm in the presence of dependent joins is provided in [FLMS99] A caching technique that can be applied to improve the implementation of dependent joins is discussed in [HN96] Much of the research discussed in this section is either preliminary or complementary to WSQ DSQ. To the best of our knowledge, no previous work has taken our approach of enabling a non parallel database engine to support many concurrent calls to external sources during the execution of a ....
....table interface, traditional execution of queries involving WebCount or WebPages would be extremely slow due to many high latency calls to one or more Web search engines. As mentioned in Section 2, CDY95] proposes optimizations that can reduce the number of external calls, and caching techniques [HN96] are important for avoiding repeated external calls. But these approaches can only go so far even after extensive optimization, a query involving WebCount or WebPages must issue some number of search engine calls. In many situations, the high latency of the search engine will dominate the ....
[Article contains additional citation context not shown here]
J.M. Hellerstein and J.F. Naughton. Query execution techniques for caching expensive methods. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 423--434, Montreal, Canada, June 1996.
.... and S3:LowOzone(oz.imgHDF) true Query Q date IRSat OzoneSat S1 S2 S1 Figure 1: Sample query on distributed data and programs (left) and one possible execution plan (right) LowOzone=true is performed before the join on date (as we did in actions 1 and 2 above) However, as noted in [2, 14], if a cache is used for LowOzone at S 3 , it would be better to compute the join rst, since LowOzone will be applied on 4 times fewer values of the image than it would if it were executed rst. To cope with this observation, we present new cachebased algorithms to perform expensive selections ....
....are, while we consider programs that cannot be moved, thus addressing the extra di culty of e ciently RR n 4239 26 I. Manolescu, L. Bouganim, F.Fabret, E.Simon parallelizing data transfer and program execution. Hash and sort based caching algorithms for expensive functions have been studied in [14]; sort based algorithms do not apply in our context since it would be a performance loss to stop the pipelined query execution for a sorting step. They propose a hybrid hash scheme to ensure the cache ts in memory. We envision using cache as soon as the cost of a program invocation is more ....
Joseph M. Hellerstein and Jerey F. Naughton. Query execution techniques for caching expensive methods. In Proc. of ACM SIGMOD Conf. on Management of Data, pages 423434, 1996.
....are executed against the database as a particular workload, and to apply some of the existing techniques proposed in the literature to optimize a given workload. Such techniques have been considered in various contexts, such as view materialization [18, 8, 9, 3] index selection, function caching [12, 11, 5], multiple query optimization [17] and reusing query invariants [13, 16] However, none of the above techniques exploit a key aspect of our context, namely the structure of the web site. The structure of a web site imposes a topology over the possible navigational paths through the site and ....
J. M. Hellerstein and J. F. Naughton. Query execution techniques for caching expensive methods. In SIGMOD, pages 423-434, 1996.
....possible. Here we assume that a QEP is a tree like structure. The bottom part of a QEP consists of operations scanning stored databases files. In general, the less (extra) sort operations, the better the performance. 5 Related Work User Defined Functions are widely studied in recent years ( 7] [6], 5] 22] 21] 14] 10] However, most of these efforts consider limited or non parallel execution. For example, developers of IBM DB2 can specify ALLOW PARALLEL or DISALLOW PARALLEL for a user defined scalar function[8] Our approach allows the parallel execution for more general UDFs. ....
J.M. Hellerstein and J.F. Naughton. Query execution techniques for caching expensive methods. In Proc. of ACM SIGMOD, 1996.
....the multi operator implementations in the usual way to speed up processing further, i.e. by means of data parallelism and pipelining. 6. Related work User Defined Functions (UDFs) have attracted increasing interest of researchers as well as the industry in recent years (see e.g. 1] 7] 15] [16], 24] 27] 32] 34] 38] 40] However, most of the work discusses only the non parallel execution of UDFs, special implementation techniques like caching, or it is directed towards query optimization for UDFs. In [34] pipeline parallelism for functions as well as intra function ....
Hellerstein, J. M., Naughton, J. F.: Query Execution Techniques for Caching Expensive Methods. SIGMOD Conf. 1996: 423-434
....against the database as a particular workload, and to apply some of the existing techniques proposed in the literature to optimize a given workload. Such techniques have been considered in various contexts, such as view materialization [13, 23, 10, 12, 11, 6] index selection [7] function caching [16, 14], multiple query optimization [22] and reusing query invariants [17, 21] However, none of the above techniques exploit a key aspect of our context, namely the structure of the web site. The structure of a web site imposes a topology over the possible navigational paths through the site and ....
....database queries that produce the content of HTML pages. In database systems, caching the result of parameterized computations has also been considered in several contexts such as data integration [1] nested correlated queries (implemented in commercial databases) caching for expensive methods [16, 14]. Our work takes the idea of caching further into the context of web site management: our decisions of what to cache are based on cost estimates, and we do not necessarily cache exactly the computation specified by the parameterized input, but possibly only parts of it or larger computations. In ....
J. M. Hellerstein and J. F. Naughton. Query execution techniques for caching expensive methods. In Proc. of ACM SIGMOD Conf. on Management of Data, pages 423--434, 1996.
....Phase 2 creates a new plan for processing the remainder of the query. Our work considers more generic reoptimization cases and allows user defined trigger rules to exploit more reconfiguration opportunities. We note that user defined operators have been widely studied in recent years ( 10] [11], 12] 15] 18] 21] 22] Yet most published work discusses only optimization before query evaluation starts. For example, the PREDATOR database system [22] which views the world as an integrated collection of data types, each of which supports a declarative, optimizable query language, ....
J.M. Hellerstein and J.F. Naughton. Query Execution Techniques for Caching Expensive Methods. In Proceedings of ACM SIGMOD, 1996.
....path expressions in object oriented databases. Furthermore, several caching techniques to optimize the implementation of dependent joins are discussed in the context of the Montage system [3] An ellaborated implementation of dependent joins, that combines hashing and caching, has been proposed in [6]. Finally, we note that that dependent joins have received several other names in the literature (e.g. functional join, implicit join, filter join, theta semi join, bind join) Annotated plans: As mentioned earlier, our algorithm is going to search the space of annotated query execution plans. ....
....then any complete plan with open subplans results in the same execution as some plan without open subplans: in the example above, the two query plans have equivalent executions under the nested loop implementation of dependent joins. However, more efficient implementations have been proposed [3, 6], for which this property no longer holds. For example consider an implementation using caching techniques where every time the expression 1 For atomic plans (plans on a single relation) Garlic considers one plan for every viable binding pattern. S(y b ; z f ) Gamma . z T (z b ; w ....
J. M. Hellerstein and J. F. Naughton. Query execution techniques for caching expensive methods. In Proc. of ACM SIGMOD, 1996.
....possible. Here we assume that a QEP is a tree like structure. The bottom part of a QEP consists of operations scanning stored databases files. In general, the less (extra) sort operations, the better the performance. 5 Related Work User Defined Functions are widely studied in recent years ( 7] [6], 5] 21] 20] 14] 10] In [21] the PREDATOR database system, which views the world as an integrated collection of data types, each of which supports a declarative, optimizable query language, proposed optimizing queries in an ORDBMS with enhanced abstract data types [20] However, most ....
J.M. Hellerstein and J.F. Naughton. Query execution techniques for caching expensive methods. In Proc. of ACM SIGMOD, 1996.
....Thus the three picked permutations are equivalent in their complexity to 1 3 2, 2 3 1, and 3 2 1, respectively. Technical Report 98 1718. Department of Computer Science, Cornell University, November 1998. 13 issue was developed in [HS93] The goal of recent work by Hellerstein and Naughton [HN97] is to optimize the execution of queries with expensive predicates by caching their arguments and results. The resulting technique, Hybrid Caching, is promising in the presence of repeated invocations of a predicate on the same arguments. Obtaining realistic estimates of the costs of user defined ....
J. Hellerstein, J. Naughton. Query Execution Techniques for Caching Expensive Methods. ACM SIGMOD'97.
....If the outer reference don t exist, we insert the values of the outer references and the corresponding subquery result into the hash table. If the probe succeeds, we can reuse the cached result instead of reexecuting the subquery. Recently, a new hybrid cache method has been suggested in [HN96] to avoid hash table thrashing. The second approach, described in [SAC 79] is to presort all the rows from the outer query block on the outer references before executing the subquery. Then in the subquery node, we simply cache the result of the last execution of the subquery. Since outer ....
....can expect better performance when using the invariant technique. In cases when there are duplicates in outer references and the outer references are not the primary ordering keys specified in the query, we can still use the above method, but we use the alternative hashing method (or hybrid cache [HN96] to avoid the reexecution of the subquery on repeated outer reference values. Similar ideas have been proposed in other different applications [JM96, HHW97] To summarize our experiments, we have the following conclusion. The invariant technique helps when query rewriting is not applicable. As ....
Joseph M. Hellerstein and Jeffrey F. Naughton. Query execution techniques for caching expensive methods. In Proceedings of the ACM SIGMOD Conference, pages 423-- 433, 1996.
....query processing. This area is starting to receive the attention it deserves. A number of conventional relational query processing approaches have been applied to or extended for answering OLAP queries. Some of this work has concentrated on efficiently performing GROUP BY [8, 9, 20] aggregation [10, 23, 33, 30, 50, 68, 69], join or range queries [32, 60, 64] or supporting incomplete query answers [6, 29, 66] Several approaches have been proposed for supporting the SQL CUBE operator, including [2, 17, 23, 42, 53, 58] Yet another facet of query processing that has received attention in the literature is that of ....
J.M. Hellerstein and J.F. Naughton. Query execution techniques for caching expensive methods. In Proc. ACM SIGMOD Intl. Conf. on Management of Data, pages 423--424, Montreal, Quebec, Canada, June 1996.
....a sub optimal query plan due to data arrival delays from network. Compared to that, our work considers more generic cases and provides methods for context capturing and restoring other than intermediate results materialization. User Defined Functions are widely studied in recent years ( 10] [9], 8] 18] 17] 15] 13] Yet most published work focuses on the optimization before the query evaluation. 13] studied the UDFs execution using data parallelism. It is interesting to compare our classification for reconfiguration to their classification for parallelization. Similar ....
J.M. Hellerstein and J.F. Naughton. Query execution techniques for caching expensive methods. In Proc. of ACM SIGMOD, 1996.
....Phase 2 creates a new plan for processing the remainder of the query. Our work considers more generic reconfiguration cases and provides methods for context capturing and restoring other than intermediate results materialization. User Defined Functions are widely studied in recent years ( 11] [10], 9] 21] 20] 19] 15] Yet most published work discusses only optimization before query evaluation starts. In [21] the PREDATOR database system, which views the world as an integrated collection of data types, each of which supports a declarative, optimizable query language, proposed ....
J.M. Hellerstein and J.F. Naughton. Query execution techniques for caching expensive methods. In Proceedings of ACM SIGMOD, pages 423--434, 1996.
No context found.
J. M. Hellerstein and J. Naughton. Query Execution Techniques for Caching Expensive Methods. In SIGMOD, 1996.
....index joins with remote sources is in an asynchronous fashion as described in [GW00] requiring a SteM on S (a rendezvous buffer) to hold S tuples pending matches from the index. In order to minimize latency, a SteM on T should also be built, as a cache of previous expensive T lookups, as in [HN96]. This dataflow looks almost identical to Figure 2, except that S probes are also routed from the Eddy directly to an index access method on T. The two plans described above can be combined into a single nearly identical plan that contains one Eddy, two SteMs, and both access methods on T. In ....
Hellerstein, J., Naughton, J., Query Execution Techniques for Caching Expensive Methods. In SIGMOD(1996).
....A pipelining best effort reorder operator appears to be substitutable for regular sort operators at other places in query plans. For instance, it can replace a sort operator that is designed to reuse memoized values of a correlated subquery or expensive user defined function [Selinger et al. 1979; Hellerstein and Naughton 1996; Seshadri et al. 1996] Here, online reordering amounts to computing the set of variable bindings on the fly, possibly with some duplication. Acknowledgments We would like to thank all the members of the CONTROL project at Berkeley for many useful discussions. Mehul Shah helped clarifying the ....
Hellerstein, J. M. and Naughton, J. Query execution techniques for caching expensive methods. In Proc. ACM SIGMOD Intl. Conf. on Management of Data, 1996.
....spreadsheet problem. A pipelining best effort reorder operator appears to be substitutable for regular sort operators at other places in query plans. For instance, it can replace a sort operator that is designed to reuse memoized values of a correlated subquery or expensive user defined function [HN96, S 96] Here, online reordering amounts to computing the set of variable bindingson the fly, possibly with some duplication. Acknowledgments We would like to thank all the members of the CONTROL project at Berkeley for many useful discussions. The idea of using reordering in general query ....
J. M. Hellerstein and J. Naughton. Query execution techniques for caching expensive methods. In SIGMOD, 1996.
....for example, or in disk based systems that store small amounts of data on which very complex operations are performed. Even in such systems, however, PullUp can produce very poor plans if join selectivities are greater than 1. This problem can be avoided by using method caching, as described in [Hellerstein and Naughton 1996]. Query 2 (Figure 5) is the same as Query 1, except T10 is used instead of T2. This minor change causes PullUp to choose a suboptimal plan. Recall that table names reflect the relative cardinality of the tables, so in this case T10.ua1 has more 22 Delta J.M. Hellerstein Query 3: SELECT T3.a1 ....
....on T2 above the costly join predicate. The result of this was that the costly join predicate had to be evaluated on all tuples in the Cartesian product of T4 and the subtree containing T2, T1, and T3. This extremely bad plan required significant effort in caching at execution time (as discussed in [Hellerstein and Naughton 1996]) to avoid calling the costly selection multiple times on the same input. 26 Delta J.M. Hellerstein significant, however, was the effort required to debug, test, and tune the code so that it was robust enough for use in a commercial product. Of the three months spent on the Illustra version of ....
[Article contains additional citation context not shown here]
Hellerstein, J. M. and Naughton, J. F. 1996. Query Execution Techniques for Caching Expensive Methods. In Proc. ACM-SIGMOD International Conference on Management of Data (Montreal, June 1996), pp. 423--424.
....to thrash. This problem is alleviated by using unary Hybrid Hashing [Bra84] It may be expected that the number of distinct groups in a query should be relatively small, and hence naive hashing may be acceptable in many cases. A recent optimization of unary Hybrid Hashing called Hybrid Cache [HN96] guarantees performance that is equivalent to naive hashing for the cases where the hash table fits in memory, and scales gracefully when the hash table grows too large. SQL supports aggregates of the form aggregate(DISTINCT columns) For such aggregates, the system must remove duplicates from ....
J. M. Hellerstein and J. F. Naughton. Query execution techniques for caching expensive methods. In Proc. ACMSIGMOD Intl. Conf. Management of Data, Montreal, June 1996, pages 423--424.
No context found.
J. Hellerstein and J. Naughton. Query execution techniques for caching expensive methods. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 423--434, 1996.
No context found.
Joseph M. Hellerstein and Jeffrey F. Naughton. Query execution techniques for caching expensive methods. In ACM SIGMOD, 1996.
No context found.
J. M. Hellerstein and J. F. Naughton. Query execution techniques for caching expensive methods. In SIGMOD, 1996.
No context found.
J. M. Hellerstein and J. F. Naughton. Query execution techniques for caching expensive methods. In ACM SIGMOD International Conference on Management of Data, ACM SIGMOD Record, Vol.25, No.2, pages 423--434, 1996.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC