| S. Acharya, P.B. Gibbons, V. Poosala, and S. Ramaswamy (1999 ), Join synopses for approximate query answering, Proc. of the ACM SIGMOD Conference. |
....the length of the segments they join. 72 5.8 Definition of cmax i and cmin i for computing MBRs . 76 5.9 The M Regions associated with a 2M dimensional MBR. The boundary of a region G is denoted by G = G[1], G[2] G[3] G[4] 78 5.10 Computation of MINDIST . 80 5.11 The time taken (in seconds) to build an index using various transformations over a range of query lengths and ....
....concurrency control in multidimensional AMs. 2. 8 Approximate Query Answering Techniques Approximate query processing has recently emerged as a viable, cost effective solution for dealing with the huge data volumes and stringent response time requirements of today s Decision Support Systems (DSS) [1, 51, 53, 61, 64, 70, 115, 144, 145]. The general approach is to first construct compact synopses of the interesting relations in the database (using a data reduction technique) and then answering the user queries Figure 2.10: Data reduction techniques for approximate query answering. by using just the synopsis. Data reduction ....
[Article contains additional citation context not shown here]
Swarup Acharya, Phillip B. Gibbons, Viswanath Poosala, and Sridhar Ramaswamy. "Join Synopses for Approximate Query Answering". In Proceedings of the 1999.
....and guide the subsequent clustering e ort. In data mining, Toivonen [Toi96] examined the problem of using sampling during the discovery of association rules. Sampling has also been recently successfully applied in query optimization [CMN99, CMN98] as well as approximate query answering [GM98, AGPR99] Independently Palmer and Faloutsos [PF00] developed an algorithm to sample for clusters by using density information, under the assumption that clusters have a zip an distribution. Their technique is designed to nd clusters when they di er a lot in size and density, and there is no noise. ....
S. Acharya, P. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses For Approximate Query Answering. Proceedings of SIGMOD, pages 275-286, June 1999.
.... 1 Introduction Maintaining compact and accurate statistics on data distributions is of crucial importance for a number of tasks: 1) traditional query optimization that aims to find a good execution plan for a given query [5, 21] 2) approximate query answering and initial data exploration [13, 1, 18, 4, 12], 3) prediction of run times and result sizes of complex data extraction and data analysis tasks on data mining platforms, where absolute predictions with decent accuracy are mandatory for prioritization and scheduling of long running tasks (sometimes including the decision whether a given data ....
....for a single data distribution such as one database table with pre selected relevant attributes. The equally important problem which combination of synopses to maintain on the application s various datasets and how to divide the available memory between them has received only little attention [1, 8, 23], putting the burden of selecting and tuning appropriate synopses on the database administrator. This creates a physical design problem for data synopses, which can be very di#cult in advanced settings such as predicting run times of data analysis tasks or information wealth of Web sources by a ....
[Article contains additional citation context not shown here]
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses for Approximate Query Answering. In Proceedings of the ACM SIGMOD Conference, pages 275--286. ACM Press, 1999.
....of processing general, possibly multi join, aggregate queries over continuous data streams. On the other hand, efficient ap proximate multi join processing has received considerable attention in the context of approximate query answering, a very active area of database research in recent years [1, 6, 12, 19, 20, 24]. The vast majority of existing proposals, however, rely on the assumption of a static data set which enables either several passes over the data to construct effective, multi dimensional data synopses (e.g. histograms [20] and Haar wavelets [6, 24] or intelligent strategies for randomizing the ....
....approximate query processing tools inapplicable in a data stream setting. Note that, even though random sample data summaries can be easily constructed in a single pass [23] it is well known that such summaries typically give very poor result estimates for queries involving one or more joins [1, 6, 2] ) Our Contributions. In this paper, we tackle the hard technical problems involved in the approximate processing of complex (possibly multi join) aggregate decision support queries over continuous data streams with limited memory. Our approach is based on randomizing techniques that compute ....
[Article contains additional citation context not shown here]
S. Acharya, P.B. Gibbons, V. Poosala, and S. Ramaswamy. "Join Synopses for Approximate Query Answering". In Proc. of the 1999.
....6] Much less work has been done on estimating the selectivity of joins. Commercial DBMSs commonly make the uniform join assumption. One approach that has been suggested is based on random sampling: randomly sample the two tables, and compute their join. This approach is flawed in several ways [1], and some work has been devoted to alternative approaches that generate samples in a more targeted way [20] An alternative recent approach is the work of Acharya et al. 1] on join synopses, which maintains statistics for a few distinguished joins. To our knowledge, no work has been done on ....
....is based on random sampling: randomly sample the two tables, and compute their join. This approach is flawed in several ways [1] and some work has been devoted to alternative approaches that generate samples in a more targeted way [20] An alternative recent approach is the work of Acharya et al. [1] on join synopses, which maintains statistics for a few distinguished joins. To our knowledge, no work has been done on approaches that support selectivity estimation for queries containing both select and join operations in real world do mains. In this paper, we propose an alternative approach ....
S. Acharya, P. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In SIGMOD. ACM Press, 1999.
....this architecture to derive the benefits that the architecture provides, while at the same time addressing some of its limitations. One of the important limitations addressed in our work is their assumption that there is little variability in the data. Acharya, Gibbons, Poosala, and Ramaswamy [2] proposed the use of synopses (i.e. precomputed samples of relations) for answering aggregation queries. Gibbons and Matias [9] developed techniques for the fast incremental maintenance of summary statistics, and considered their application to providing approximate query answers. A key ....
Acharya S., Gibbons P., Poosala V., and Ramaswamy S. Join synopses for approximate query answering. In Proceedings of the ACM SIGMOD Conference, 1999.
....the work of Faloutsos et al. in multiple dimensions. Maximum entropy has also been used for the identification of interesting correlations in data [Tho98] There exists a sizeable bibliography in histogramming techniques and approximate query answering [IP95] PIHS96] JKM 98] VWI98] AGPR99] SFB99] BS97] BW00] Our approach is fundamentally different. Previous work focused on the problem of data reconstruction by constructing specialized summarized representations (typically histograms) of the data. We argue, that since 4 data are already stored in an aggregated form in the ....
Swarup Acharya, Phillip B. Gibbons, Viswanath Poosala, and Sridhar Ramaswamy. Join Synopses for Approximate Query Answering. In ACM SIGMOD, pages 275--286, Philadelphia, PA, USA, June 1999.
....1. INTRODUCTION Histograms capture distribution statistics in a space ef cient fashion. They have been designed to work well for numeric value domains, and have long been used to support cost based query optimization [22, 11, 12, 25, 27, 26, 23, 14, 13, 15, 20, 17] approximate query answering [7, 2, 1, 29, 28, 24], data mining [16] and map simpli cation [3] Query optimization is a problem of central interest to database systems. A database query is translated by a parser into a tree of physical database operators (denoting the dependencies between operators) that have to be executed and form the query ....
....is that for very large data sets on which execution of complex queries is time consuming, is much better to provide a fast approximate answer. This is very useful for quick and approximate analysis of large data sets [2] Research has been conducted on the construction of histograms for this task [7, 1] as well as ecient approximations of datacubes [8] via histograms [28, 29, 24] An additional application of histograms is data mining of large time series datasets. Histograms are an alternate way to compress time series information. Through the application of the minimum description length ....
S. Acharya, P. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses For Approximate Query Answering. Proceedings of ACM SIGMOD, pages 275-286, June 1999.
.... [27, 14, 17] sampling [8, 16, 12] and parametric curve fitting techniques [32, 10] all the waytohighly sophisticated methods based on kernel estimators [7] Wavelets and other transforms [21, 20] for range selectivity estimation, methods based on sampling for approximation of foreign key joins [3], and other data reduction techniques [6] Gibbons et al. have coined the term data synopsis as the general concept that abstracts from the variety of all these different representations [13] All the prior work takes a local viewpoint in that they aim at the best possible synopsis for a single ....
....result of an equijoin between two tables with given frequency and density synopses for the two join attributes. However, the join over two approximated data distributions is often very different from an approximate representation (e.g. a sample) of a join of the underlying complete distributions [8, 3]. Therefore we extend our repertoire of data synopses and adopt the idea of [3] to capture important join result distributions in a special class of join synopses. Because the main difficulty in estimating joinresult sizes lies in the accurate approximation of the density distribution (whereas the ....
[Article contains additional citation context not shown here]
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses for Approximate Query Answering. In Proceedings of the ACM SIGMOD Conference, pages 275--286. ACM Press, 1999.
....employ uniform random sampling to improve the efficiency of the techniques. Toivonen [28] examined the problem of using sampling during the discovery of association rules. Sampling has also been recently successfully applied in query optimization [6, 5] as well as approximate query answering [7, 1]. The focus of this paper is in the design of alternative forms of sampling that can be used to expedite data mining tasks such as cluster and outlier detection. A natural question arises regarding the potential benefits of such sampling techniques. In other words, why is uniform random sampling ....
S. Acharya, P. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses For Approximate Query Answering. In Proc. of SIGMOD, pages 275--286, June 1999.
....the ability to focus their explorations quickly and effectively, without consuming inordinate amounts of valuable system resources. Prior Work. 1 The strong incentive for approximate answers has spurred a flurry of research activity on approximate query processing techniques in recent years [1, 5, 6, 11, 17, 18]. The majority of the proposed techniques, however, have been somewhat limited in their query processing scope, typically focusing on specific forms of aggregate queries. Besides the range of queries, another crucial aspect of an approximate query processing technique is the employed data ....
....two inherent limitations that restrict its applicability as an approximate query processing tool. First, a join operator applied on two uniform random samples results in a non uniform sample of the join result that typically contains very few tuples, even when the join selectivity is fairly high [1]. Thus, join operations typically lead to significant degradations in the quality of an approximate aggregate. Join synopses [1] provide a solution, but only for foreign key joins that are known beforehand; that is, they cannot support arbitrary join queries over any schema. Second, for a ....
[Article contains additional citation context not shown here]
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. "Join Synopses for Approximate Query Answering". In Proc. of the 1999 ACMSIGMOD Intl. Conf. on Management of Data.
....the ability to focus their explorations quickly and effectively, without consuming inordinate amounts of valuable system resources. Prior Work. 1 The strong incentive for approximate answers has spurred a flurry of research activity on approximate query processing techniques in recent years [1, 5, 6, 11, 17, 18]. The majority of the proposed techniques, however, have been somewhat limited in their query processing scope, typically focusing on specific forms of aggregate queries. Besides the range of queries, another crucial aspect of an approximate query processing technique is the employed data ....
....two inherent limitations that restrict its applicability as an approximate query processing tool. First, a join operator applied on two uniform random samples results in a non uniform sample of the join result that typically contains very few tuples, even when the join selectivity is fairly high [1]. Thus, join operations typically lead to significant degradations in the quality of an approximate aggregate. Join synopses [1] provide a solution, but only for foreign key joins that are known beforehand; that is, they cannot support arbitrary join queries over any schema. Second, for a ....
[Article contains additional citation context not shown here]
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. "Join Synopses for Approximate Query Answering". In Proc. of the 1999 ACMSIGMOD Intl. Conf. on Management of Data.
....times to aggregate queries. However, very large database sizes may not allow true interactivity despite careful design and development of an OLAP system. Approximate query answering (AQUA) systems are being developed with a goal to reduce response times to true levels of interactivity [VL93, AGPR99b, CMN99, PG99, SFB99, MS00] That most decision support applications can tolerate approximate answers to queries is exploited by AQUA systems to achieve truly interactive response times. Several approximate query answering approaches for different data domains have been proposed: samplingbased ....
....CMN99, PG99, SFB99, MS00] That most decision support applications can tolerate approximate answers to queries is exploited by AQUA systems to achieve truly interactive response times. Several approximate query answering approaches for different data domains have been proposed: samplingbased [AGPR99b, AGPR99a] histogram based [PG99] clustering based [SFB99] probabilistic [MS00] and wavelet based [VW99] approaches. Both histogram based and wavelet based approaches assume that attributes of underlying relations are numerical. The sampling based approach does not require such assumptions. ....
[Article contains additional citation context not shown here]
S. Acharya, P.B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In Proceedings of the ACM SIGMOD International Conference on Managment of Data, pages 275--286, Philadelphia, PA, June 1999.
....is: m[ j; a( p; j) p j m[ j; a( p; j) 1] A bit string of length b = d Gamma1 j=0 b j represents each cell. Such a bit string is the concatenation of the bit strings of the interval numbers of the cell. Finally, the approximation of p is the bit string of the 4 0 1 2 3 0 1 2 3 m[1,0] m[1,1] m[1,2] m[1,3] m[1,4] m[0,4] 01 10 00 10 10 10 11 10 01 01 00 01 10 01 11 01 01 00 00 00 10 00 11 00 01 11 00 11 10 11 11 11 p q dist 2 uBnd 2 lBnd 2 cell(p) m[j,a(p,j) m[j,a(p,j) 1] a) b) Figure 1: Illustration of the VA File cell that contains p. Notice that for ....
....m[ j; a( p; j) p j m[ j; a( p; j) 1] A bit string of length b = d Gamma1 j=0 b j represents each cell. Such a bit string is the concatenation of the bit strings of the interval numbers of the cell. Finally, the approximation of p is the bit string of the 4 0 1 2 3 0 1 2 3 m[1,0] m[1,1] m[1,2] m[1,3] m[1,4] m[0,4] 01 10 00 10 10 10 11 10 01 01 00 01 10 01 11 01 01 00 00 00 10 00 11 00 01 11 00 11 10 11 11 11 p q dist 2 uBnd 2 lBnd 2 cell(p) m[j,a(p,j) m[j,a(p,j) 1] a) b) Figure 1: Illustration of the VA File cell that contains p. Notice that for large d, ....
[Article contains additional citation context not shown here]
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In Proc. ACM SIGMOD Int. Conf. Management of Data, 1--3 June 1999.
....subset S of size 3 b of the tuples in R. Given a query Q, the selectivity sel(S; Q) is computed. The value n b sel(S; Q) is used to estimate sel(R; Q) Sampling is simple and efficient, and so it is widely used for estimating the selectivity [30, 6, 10, 21, 25] or for on line aggregation [14, 2]. Sampling can be used to estimate the selectivity of a query regardless of the dimensionality of the space, and can directly be applied to real domains. More sophisticated kernel estimation statistical techniques [36, 7, 32] have rarely been applied in database problems. One similar statistical ....
S. Acharya, P.B. Gibbons, V. Poosala and S. Ramaswamy. Join Synopses for Approximate Query An11 swering. Proc. of the 1999 ACM SIGMOD, pp. 275286, June 1999.
....for most practical applications, they do not fulfill our requirements of error bounds for approximations and of efficient updates. Another direction of research aims at using histograms for fast approximate answers [24] Histograms are an efficient technique to summarize data. Systems like AQUA [1] use small pre computed statistics, called synopses [11] which usually are histograms and samples) to provide fast approximate answers. Synopses can be used to answer queries. The AQUA system, for instance, returns fast answers with probabilistic error bounds [1] confidence intervals, no ....
....summarize data. Systems like AQUA [1] use small pre computed statistics, called synopses [11] which usually are histograms and samples) to provide fast approximate answers. Synopses can be used to answer queries. The AQUA system, for instance, returns fast answers with probabilistic error bounds [1] (confidence intervals, no absolute bounds) Up to a certain degree the user can theoretically trade off accuracy for faster responses by allowing smaller samples to be used. The accuracy of the query response is limited by the information stored in the histograms and samples. For further ....
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In Proc. ACM SIGMOD, 1999.
....for most practical applications, they do not fulfill our requirements for error bounds for approximations and efficient updates. Another direction of research aims at using histograms for fast approximate answers [23] Histograms are an efficient technique to summarize data. Systems like AQUA [1] use small pre computed statistics, called synopses [11] which usually are histograms and samples) to provide fast approximate answers. Synopses can be used to answer range queries. Since they approximate data distributions, it is not possible to get the exact answer to a query without accessing ....
....Synopses can be used to answer range queries. Since they approximate data distributions, it is not possible to get the exact answer to a query without accessing additional data (in the data sources) The AQUA system, for instance, provides fast answers with probabilistic error confidence bounds [1] (no absolute bounds) Up to a certain degree the user can theoretically trade off accuracy for faster responses by allowing smaller samples to be used. The accuracy of the query response is limited by the information stored in the histograms and samples. For further refinements (up to the exact ....
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In Proc. SIGMOD, 1999.
....is materialized (e.g. the base data) and the rest is computed on demand. These techniques however, can impose long delays in answering queries. These limitations have prompted researchers to look for techniques to compress the datacube in such a way that only a fraction of the space is needed [9, 8, 1, 21, 24]. Since the compression techniques are lossy, one can only provide approximate answers to the queries posed to the datacube. On the other hand, the queries can be answered without incurring into much disk I O, so the response time is considerably smaller than the one experienced in uncompressed ....
....of the distribution of the data. When answering queries, the system can use the models and the retained cells to give an answer with a guaranteed maximum error level attached to it. Parametric methods to compress datacubes have an advantage over other techniques (such as the ones described in [1, 24]) the parameters computed describe the data accurately and can serve as a good basis to mine important conclusions about the underlying distribution of data. The structure of the model describes the patterns of interaction. Moreover, one can immediately know which dimension (or combinations of ....
[Article contains additional citation context not shown here]
S. Acharya, P.B. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses for Approximate Query Answering. In Proceedings of the 1999 ACM-SIGMOD International Conference on Management of Data, Philadelphia, PA, June 1999.
....[BDF 97] for a recent survey. GM99] presented a formal framework for evaluating such sublinear space synopsis data structures, and a survey of some of the results in this area. There has been a flurry of recent work in approximate query answering (e.g. VL93, Olk93, BDF 97, HHW97, GM98, AGPR99, HH99, VW99, IP99, AGP00, GLR00, CCMN00, CGRS00, MVW00, CDN01, LM01, Gib01, GKS01] The work in [HHW97, AGPR99, HH99, IP99, CGRS00] looked at the problem of providing approximate answers to queries seeking aggregates (e.g. count, sum, avg) of attribute values for the tuples satisfying a ....
....data structures, and a survey of some of the results in this area. There has been a flurry of recent work in approximate query answering (e.g. VL93, Olk93, BDF 97, HHW97, GM98, AGPR99, HH99, VW99, IP99, AGP00, GLR00, CCMN00, CGRS00, MVW00, CDN01, LM01, Gib01, GKS01] The work in [HHW97, AGPR99, HH99, IP99, CGRS00] looked at the problem of providing approximate answers to queries seeking aggregates (e.g. count, sum, avg) of attribute values for the tuples satisfying a predicate that occur in the join of multiple relations. The count aggregate (over joins but with no other predicates) ....
[Article contains additional citation context not shown here]
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In Proc. ACM SIGMOD International Conf. on Management of Data, pages 275--286, June 1999.
....queries. Approximate query answering is becoming an indispensable means for providing fast response times to decision support queries over large data warehouses. Fast, approximate answers are often provided from small synopses of the data (such as samples, histograms, wavelet decompositions, etc. [14, 37, 3, 25, 33, 36, 1, 6, 12, 8]. Commercial data warehouses are approaching 100 terabytes, and new decision support arenas such as click stream analysis and IP traffic analysis only increase the demand for high speed query processing over terabytes of data. Thus it is crucial to provide highly accurate approximate answers to an ....
....approximate answers to an increasingly rich set of queries. Distinct values queries are an important class of decision support queries, and good quality estimates for such queries may be returned to users as part of an online aggregation system [20, 17] or an approximate query answering system [14, 37, 2, 3, 25, 33, 36, 1, 6, 12, 8, 26]. Because the answers are returned to the users, the estimates must be highly accurate (say within 10 or better with 95 confi select count(distinct target attr) from rel where P Figure 1: Distinct Values Query template. select count(distinct o custkey) from orders where o orderdate = ....
[Article contains additional citation context not shown here]
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In Proc. ACM SIGMOD International Conf. on Management of Data, pages 275--286, June 1999.
....to the function being estimated, e.g. the number of occurrences of the label within the stream. We could also store e(i) if desired. Maintaining a distinct labels sample in the presence of new data is useful for approximate query answering systems for data warehouses, such as the Aqua system [1, 2]. Average interarrival gap. Our relative error approximation of U permits a relative error approximation of the average interarrival gap I in the union of multiple streams, for the common case where time is discretized, i.e. I = number of time slots size of union Set resemblance. The set ....
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In Proc. ACM SIGMOD International Conf. on Management of Data, pages 275--286, June 1999.
....no accumulation value, and the hash table is used merely to ensure that each distinct label is stored only once in the sample at a party. Maintaining a distinct labels sample in the presence of new data is useful for approximate query answering systems for data warehouses, such as the Aqua system [AGPR99a, AGPR99b]. As discussed in Section 1, industry benchmarks have many queries and reports over distinct values. Average interarrival gap. Our relative error approximation of U permits a relative error approximation of the average interarrival gap G in the union of multiple streams, for the common case where ....
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In Proc. ACM SIGMOD International Conf. on Management of Data, pages 275--286, June 1999.
....efficiently and with minimal overheads (Section 8) Previous work related to approximate query answering is presented in Section 9. Due to limited space, we omit the proofs of all theoretical results from this paper and refer the reader to a full version of this paper for all the details [AGPR99b] The research in this paper was conducted as part of our efforts to develop an efficient decision support system based on approximate query answering, called Aqua [GMP97a] A brief introduction of Aqua is presented in the next section. 2 The Aqua System The goal of Aqua is to improve response ....
....uses them to answer queries. Currently, these statistics take the form of various types of samples and histograms on the data in the data warehouse. A key feature of Aqua is that the system provides probabilistic error confidence bounds on the answer, based on the Hoeffding and Chebychev formulas [AGPR99b] Currently, the system handles arbitrarily complex SQL queries applying aggregate operations (avg, sum, count, etc. over the data in the warehouse. Aqua has three key components: Statistics Collection: This component of Aqua is responsible for collecting all the synopses which Aqua uses to ....
[Article contains additional citation context not shown here]
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. Technical report, Bell Laboratories, Murray Hill, New Jersey, 1999. Full version of the paper appearing in SIGMOD'99.
....size is expected to exceed 400GB by the year 2000 and that a single decision process may involve more than ten fairly complex queries. 1 A novel approach to address this problem, which has been receiving attention lately, is to provide approximate answers to the queries very quickly [HHW97, AGPR99, VW99, IP99] This approach is particularly attractive for large scale and exploratory applications such as OLAP. For example, a typical decision making process involves posing several preliminary queries to identify interesting regions of the data. For these queries, precise answers are often ....
....to the user. 2 Because these statistics are typically much smaller in size, the query is processed very quickly. The statistics may either be generated on the fly after the query is posed, as in the Online Aggregation approach [HHW97] or may be precomputed a priori, as in the Aqua system [AGPR99] we have developed. A popular technique for summarizing data is taking samples of the original data. In fact, this is the fundamental technique used by both the above mentioned approaches to approximate query answering. In particular, uniform random sampling , in which every item in the original ....
[Article contains additional citation context not shown here]
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In Proc. ACM SIGMOD International Conf. on Management of Data, pages 275--286, June 1999.
....with few or no accesses to the original data. 3 Aqua Technical Results and Operational Details There are several technical problems arising in answering approximate queries. We have identified and solved a few of them, and incorporated the solutions into Aqua. Many of these results appear in [GMP97, GM98, AGPR99]. The key features of Aqua are as follows: ffl Novel incremental maintenance techniques for keeping histograms and samples up to date in the presence of database updates. ffl Improved error bounds based on a novel subsampling scheme. ffl Strategies for allocating space among various summary ....
....address this handicap. Incorporation of these techniques into Aqua have shown their utility in making group by queries significantly more accurate in practice. As an illustration of query processing in Aqua, we present a key component, the query rewrite process using a simple example (details in [AGPR99]) Figure 2 gives an example of this rewriting that takes into account join synopses. The query is based on the schema for the TPC D benchmark. When the query is submitted to Aqua, it identifies the join being computed in the query and rewrites the query to refer to the appropriate join synopsis. ....
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In Proc. of ACM SIGMOD Conf, June 1999.
.... approach constantly refines the answer using larger and larger sets of statistics of the data until the accurate answer is obtained [8] whereas the precomputation approach presents a small number of discrete approximate answers (typically, just one) by using precomputed summaries of the data [1]. The precomputation approach has the advantage of being faster in providing an answer because only the small summary data has to be processed at run time; on the other hand online aggregation has the flexibility of refining the answers. However, it is possible to deploy Copyright 1999 IEEE. ....
....set of histograms on the data cube when an error bound is given. Due to space limitations, we omit our experimental study from this article, and refer the reader to other papers for those This work constitutes a part of our efforts to build an efficient data analysis system called Aqua [1]. In this system, we store the statistics (histograms and samples) as relations in the DBMS and rewrite the user query posed on the original relations as a query on the statistics relations. The rewritten query is then submitted to the DBMS for execution. This middleware architecture, coupled with ....
[Article contains additional citation context not shown here]
S. Acharya, P. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses for Approximate Query Answering. In Proc. ACM SIGMOD Conference, 1999.
....reduction techniques for massive data sets. GM98b] presents a formal framework for evaluating synopsis data structures and a survey of some of the results in this area. There has been a flurry of recent work in approximate query answering (e.g. VL93, BDF 97, GMP97a, GMP97b, HHW97, GM98a, AGPR99, HH99, AGP99, MS99] The work in [HHW97, AGPR99, HH99] has looked at the problem of providing approximate answers to queries seeking aggregates (e.g. sum, avg) of attribute values for the tuples satisfying a predicate that occur in the join of multiple relations. Thus although joins are ....
....presents a formal framework for evaluating synopsis data structures and a survey of some of the results in this area. There has been a flurry of recent work in approximate query answering (e.g. VL93, BDF 97, GMP97a, GMP97b, HHW97, GM98a, AGPR99, HH99, AGP99, MS99] The work in [HHW97, AGPR99, HH99] has looked at the problem of providing approximate answers to queries seeking aggregates (e.g. sum, avg) of attribute values for the tuples satisfying a predicate that occur in the join of multiple relations. Thus although joins are involved, the goal in these works is to estimate the ....
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In Proc. ACM SIGMOD International Conf. on Management of Data, June 1999.
.... approach constantly refines the answer using larger and larger sets of statistics of the data until the accurate answer is obtained [8] whereas the precomputation approach presents a small number of discrete approximate answers (typically, just one) by using precomputed summaries of the data [1]. The precomputation approach has the advantage of being faster in providing an answer because only the small summary data has to be processed at run time; on the other hand online aggregation has the flexibility of refining the answers. However, it is possible to deploy Copyright 1999 IEEE. ....
....set of histograms on the data cube when an error bound is given. Due to space limitations, we omit our experimental study from this article, and refer the reader to other papers for those This work constitutes a part of our efforts to build an efficient data analysis system called Aqua [1]. In this system, we store the statistics (histograms and samples) as relations in the DBMS and rewrite the user query posed on the original relations as a query on the statistics relations. The rewritten query is then submitted to the DBMS for execution. This middleware architecture, coupled with ....
[Article contains additional citation context not shown here]
S. Acharya, P. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses for Approximate Query Answering. In Proc. ACM SIGMOD Conference, 1999.
....efficiently and with minimal overheads (Section 8) Previous work related to approximate query answering is presented in Section 9. Due to limited space, we omit the proofs of all theoretical results from this paper and refer the reader to a full version of this paper for all the details [AGPR99b] The research in this paper was conducted as part of our efforts to develop an efficient decision support system based on approximate query answering, called Aqua [GMP97a] A brief introduction of Aqua is presented in the next section. 2 The Aqua System The goal of Aqua is to improve response ....
....uses them to answer queries. Currently, these statistics take the form of various types of samples and histograms on the data in the data warehouse. A key feature of Aqua is that the system provides probabilistic error confidence bounds on the answer, based on the Hoeffding and Chebychev formulas [AGPR99b] Currently, the system handles arbitrarily complex SQL queries applying aggregate operations (avg, sum, count, etc. over the data in the warehouse. Aqua has three key components: ffl Statistics Collection: This component of Aqua is responsible for collecting all the synopses which Aqua uses to ....
[Article contains additional citation context not shown here]
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. Technical report, Bell Laboratories, Murray Hill, New Jersey, 1999. Full version of the paper appearing in SIGMOD'99.
No context found.
S. Acharya, P.B. Gibbons, V. Poosala, and S. Ramaswamy (1999 ), Join synopses for approximate query answering, Proc. of the ACM SIGMOD Conference.
No context found.
Acharya, S., Gibbons, P. B., Poosala, V., Ramaswamy, S., Join Synopses for Approximate Query Answering, In Proc. of the 1999.
No context found.
S. Acharya, P.B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In Proceedings of the ACM SIGMOD International Conference on Managment of Data, pages 275--286, Philadelphia, PA, June 1999.
No context found.
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. "Join Synopses for Approximate Query Answering". In Proc. of the 1999 ACMSIGMOD Intl. Conf. on Management of Data.
No context found.
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses for Approximate Query Answering. In Proc. of the 1999.
No context found.
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 275--286, 1999.
No context found.
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In SIGMOD Conference, 1999.
No context found.
S. Acharya, P. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In SIGMOD. ACM Press, 1999.
No context found.
S.Acharya,P.B.Gibbons,V.Poosala,and S. Ramaswamy. Join synopses for approximate query answering. In Proc. of the ACM SIGMOD 1999.
No context found.
S. Acharya, P. Gibbons, V. Poosala, and S. Ramaswamy, "Join Synopses for Approximate Query Answering," Proc. SIGMOD, pp. 275-286, June 1999.
No context found.
S.Acharya,P.B.Gibbons,V.Poosala,and S. Ramaswamy. Join synopses for approximate query answering. In SIGMOD 1999.
No context found.
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In SIGMOD Proceedings, pages 275-- 286, 1999.
No context found.
Acharya S., Gibbons P., Poosala V., Ramaswamy S.: Join Synopses for Approximate Query Answering. SIGMOD Conf. (1998) 275-286
No context found.
S. Acharya, P. B. Gibbsons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. Proceedings of the ACM SIGMOD, pages 275-286, June 1999.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC