| D. Gunopulos, G. Kollios, V. J. Tsotras, and C. Domeniconi. Approximating multi-dimensional aggregate range queries over real attributes. In Proceedings of ACM SIGMOD International Conference on Management of Data, pages 463-- 474, 2000. |
....centralized environments to distributed environments. The problem here is to identify the independently operated databases that are likely to contain the top N tuples for a given query; such a problem does not exist in a centralized environment. 2. Some recent techniques to construct histograms ([5, 9, 11, 13, 18, 19]) are employed here, although there is a signi cant di erence. Histograms were traditionally used to estimate the number of tuples satisfying a certain query condition. In this paper, we modify existing techniques and propose a new technique so that they can be used to estimate the distance of ....
D. Gunopulos, G. Kollios, V.J. Tsotras and C. Domeniconi. Approximating multi-dimensional aggregate range queries over real attributes. ACM SIGMOD Conference, 2000. 15
....and b, respectively [APR99] The total number of objects intersecting q is predicted by summing the results of all buckets. Evidently, satisfactory estimation accuracy depends on the degree of uniformity of objects distributions in the buckets. This can be maximized using various algorithms [MD88, PI97, APR99, GKTD00, JAS00, BGC01], which differ in the way that buckets are structured. For example, in [MD88] buckets have similar sizes (i.e. equi width ) or cover approximately the same number of objects (i.e. equi depth ) while in [PI97, APR99] bucket extents minimize the so called spatial skew . Jin et al. JAS00] ....
.... [APR99] 200 200 11 workload i i ii Err est act act = ## (8 1) We use the above definition, instead of another common metric [ est i act i ) act i ] because the latter is often dominated by the large error of small queries (i.e. those with low act i ) As with previous work [APR99, GKTD00, BGC01], we aim at evaluating performance for relatively large queries, since in practice query optimization for small queries is trivial (i.e. index search, rather than sequential scan, should always be used) It is worth mentioning, however, that our solutions consistently outperform that of [CC02] ....
Gunopulos, D., Kollios, G., Tsotras, V., Domeniconi, C. Approximating MultiDimensional Aggregate Range Queries over Real Attributes. ACM SIGMOD, 2000.
....centralized environments to distributed environments. The problem here is to identify the independently operated databases that are likely to contain the top N tuples for a given query; such a problem does not exist in a centralized environment. 2. Some recent techniques to construct histograms ([5, 9, 11, 13, 18, 19]) are employed here, although there is a signi cant di erence. Histograms were traditionally used to estimate the number of tuples satisfying a certain query condition. In this paper, we modify existing techniques and propose a new technique so that they can be used to estimate the distance of ....
D. Gunopulos, G. Kollios, V.J. Tsotras and C. Domeniconi. Approximating multi-dimensional aggregate range queries over real attributes. ACM SIGMOD Conference, 2000.
....and vertical neighbors. This simplifies the problem, e.g. for compilers that automatically parallelize code, and in fact partitionings are part of the High Performance Fortran 2. 0 standard [13] Another advantage, important for example in grid files [28] and certain classes of histograms [1, 11], is that the partitioning is uniquely defined by coordinates along the axis and coordinates along the axis. Thus, tiles can be indexed very efficiently by these coordinates. This simple structure also enables optimizations for cases where many tiles are empty or almost empty, and so ....
....compression) parallel computing (e.g. load balancing) computer graphics (e.g. spatial data structures) and video compression (e.g. block matching) see [17, 22, 2, 4, 9] for some discussion. Following are a few examples that explicitly study partitionings. In databases, several authors [1, 11] have used partitionings to construct histograms in two or more dimensions. In this case, the metric used is often either MAX SUM for V Optimal histograms [30] or SUM VAR for Equi Depth histograms [31] In particular, the algorithm for updating a histogram in [1] uses the structure of ....
D. Gunopulos, G. Kollios, V. J. Tsotras, C. Domeniconi. Approximating multi-dimensional aggregate range queries over real attributes. In Proc. of the ACM SIGMOD Conference, pages 463--474, 2000.
....field. The main techniques comprise the use of multidimensional histograms [13] Some variations over histograms include the use of parametric curve fitting techniques inside buckets [10] self tuning histograms [1] and lately, multidimensional histograms for dealing with real valued attributes [9]. Other multidimensional density estimation techniques are wavelets [12] and fractal dimension concepts [8, 2] 7 Conclusions In this paper, we have presented a new robust scheme for answering multiattribute top k queries by mapping them to relational selection queries. We have reported the ....
D. Gunopulos, G. Kollios, V. J. Tsotras, and C. Domeniconi. Approximating multi-dimensional aggregate range queries over real attributes. In Proceedings of the 2000.
..... A condensed cube provides accurate aggregate values. It is different from those approaches that reduce the cube size through approximation with various forms, such as wavelet [16] multivariate polynomials [3] mixed model by multivariate Gaussians [15] histogram [12] sampling [1] and others [8]. A condensed cube supports general OLAP applications. It is different from those proposals to reduce the size of a cube by tailoring it to only answering certain types of queries [10] Contributions of our work can be summarized as follows: We introduced the concept of condensed cube which ....
D. Gunopulos, G. Kollios, V. J. Tsotras, and C. Domeniconi. Approximating multi-dimensional aggregate range queries over real attributes. In SIGMOD
....improving quality estimates in a progressive manner. The method presented in this paper differs in that it offers deterministic guarantees of error and can also be used for determining the exact answer of the aggregate query by performing a very small number of I Os. Lately Gunopulos et al. [3] proposed kernel estimation as an extension to simple sampling for selectivity estimation (i.e. COUNT aggregate) that improves the quality of the given answers. While sampling reduces the I O cost of answering aggregates by visiting a random sample of the database, our approach uses a ....
....of the given answers. While sampling reduces the I O cost of answering aggregates by visiting a random sample of the database, our approach uses a hierarchical data structure to only examine data that are relevant to the user s query. Histogram techniques (Ioannidis et al. 7] Gunopulos et al. [3]) work by subdividing the data space into a number of buckets; the aggregate information about the buckets is kept. The estimation is given by calculating the overlap of the query region with the various buckets and aggregating the estimation for the different buckets. Commonly, a uniformity ....
[Article contains additional citation context not shown here]
D. Gunopulos, G. Kollios, V. J. Tsotras, and C. Domeniconi. Approximating multi-dimensional aggregate range queries over real attributes. In SIGMOD Conference 2000, Dallas, Texas., pages 463--474. ACM Press, 2000.
....and do not support more sophisticated data mining operations such as customer profiling or association rules. On line analytical processing (OLAP) tools are designed to support complex, multi dimensional and multi level on line analysis of large volumes of data stored in data warehouses [1,2 4,8,10]. In our prior work, we have described a scalable framework developed on top of an Oracle 8 based data warehouse and a commercially available multi dimensional OLAP server, Oracle Express, which we have used to develop applications for analyzing customer calling patterns from telecom networks and ....
Dimitrios Gunopulos, George Kollios, Vassilis Tsotras, Carlotta Domeniconi, "Approximating multi-dimensional aggregate range queries overreal attributes", Proc. ACMSIGMOD '00, 2000.
No context found.
D. Gunopulos, G. Kollios, V. Tsotras, and C. Domeniconi, "Approximating Multi-Dimensional Aggregate Range Queries over Real Attributes," Proc. SIGMOD, May 2000.
....discrete cosine transform [LKC99] on the data, using kernel estimators [BKS99] Sco92] Sil86] as well as sampling [OR90] LNS90] HS92] Although our biased sampling technique can use any density estimation method, using kernel density estimators is a good choice. Work on query approximation [GKTD00] and in the statistics literature [Sil86] Sco92] shows that kernel functions always estimate the density of the dataset better than just using a random sample of the dataset. The technique is also very ecient. Finding a kernel density estimator is as ecient as taking a random sample, and can be ....
D. Gunopulos, G. Kollios, V. Tsotras, and C. Domeniconi. Approximating Multi-Dimensional Aggregate Range Queries Over Real Attributes. Proceedings of SIGMOD, Dallas, TX, May 2000. 32
....the same) 38] presents two new algorithms, PHASED and MHIST 2, that attempt to produce MaxDiff histograms. 1] consider the case that the dataset consists of rectangles rather than points, and present a histogram technique to approximate the selectivity of range queries in this environment. [18] presents a generalized histogram technique. It computes an approximation of the data distribution in the space, and then uses a constant number of buckets to approximate the data distribution. The basic characteristics in our technique are that (i) the buckets can be overlapping and (ii) can be ....
....can be used to estimate the selectivity of a query regardless of the dimensionality of the space and can be applied to real domains as is. 6] introduced kernels [11, 45] to the database community and used them to estimate the selectivity of one dimensional range queries on metric attributes. [18] extended the technique to high dimensions, and [la] gave a new algorithm for computing the kernel bandwidths. Also, 5] use a density estimation technique to answer approximate neighbor queries, which involves clustering the data and then fitting a number of gaussians. 3 Batch Data Mining ....
Dimitrios Gunopulos, George Kollios, Vassilis J. Tsotras, Carlotta Domeniconi. Approximating Multi- Dimensional Aggregate Range Queries over Real Attributes. In SIGMOD Conference 2000: 63-7.
....the number of points in the interior of the hyper rectangle it represents. Since n can be very large, the problem of approximating the selectivity of a given range query Q arises naturally. Approaches proposed to address this problem include multidimensional histograms (Ioannidis Poosala, 1999; Gunopulos et al. 2000), kernels (Shanmugasundaram et al. 1999; Gunopulos et al. 2000) and wavelets (Vitter et al. 1998; Chakrabarti et al. 2000) To formalize the notion of approximating the selectivity of range queries, let f(x 1 ; x d ) be a d dimensional, non negative function, defined in [0; 1] d and ....
....Since n can be very large, the problem of approximating the selectivity of a given range query Q arises naturally. Approaches proposed to address this problem include multidimensional histograms (Ioannidis Poosala, 1999; Gunopulos et al. 2000) kernels (Shanmugasundaram et al. 1999; Gunopulos et al. 2000), and wavelets (Vitter et al. 1998; Chakrabarti et al. 2000) To formalize the notion of approximating the selectivity of range queries, let f(x 1 ; x d ) be a d dimensional, non negative function, defined in [0; 1] d and with the property R [0;1] d f(x 1 ; x d )dx 1 : dx d ....
[Article contains additional citation context not shown here]
Gunopulos, D., Kollios, G., Tsotras, V., & Domeniconi, C. (2000). Approximating multi-dimensional aggregate range queries over real attributes. Proc. of the ACM SIGMOD Intern. Conf. on Management of Data.
....on the data, using kernel estimators [4] as well as sampling [21] 18] 10] Although our biased sampling technique can use any density estimation method, kernel density estimators provide the best solution because they are accurate and can be computed efficiently. Work on query approximation [9] and in the statistics literature [25] 24] shows that kernel functions always estimate the density of the dataset more accurately than using a random sample of the dataset. Finding a kernel density estimator can be done in one dataset pass. The technique is presented in detail in [9] 2.2 The ....
....[9] and in the statistics literature [25] 24] shows that kernel functions always estimate the density of the dataset more accurately than using a random sample of the dataset. Finding a kernel density estimator can be done in one dataset pass. The technique is presented in detail in [9]. 2.2 The Proposed Biased Sampling Technique In this section we outline our technique for densitybiased sampling. Let D be a d dimensional dataset with n points. For simplicity we assume that the space domain is [0; 1] d , otherwise we can scale the attributes. Let f(x 1 ; x d ) 0; ....
D. Gunopulos, G. Kollios, V. Tsotras, and C. Domeniconi. Approximating Multi-Dimensional Aggregate Range Queries over Real Attributes. In Proc. of ACM SIGMOD, Dallas, TX, May 2000.
No context found.
D. Gunopulos, G. Kollios, V. J. Tsotras, and C. Domeniconi. Approximating multi-dimensional aggregate range queries over real attributes. In Proceedings of ACM SIGMOD International Conference on Management of Data, pages 463-- 474, 2000.
No context found.
Gunopulos, D., Kollios, G., Tsotras, V. J., Domeniconi, C., Approximating Multi-Dimensional Aggregate Range Queries over Real Attributes, In Proc. of the 2000.
No context found.
Gunopulos, D., Kollios, G., Tsotras, V., J. Approximating Multi-dimensional Aggregate Range Queries over Real Attributes ACM-SIMOD, 2000, pp. 463-474.
No context found.
Gunopulos D., Kollios G., Tsotras V., Domeniconi C.: Approximating Multi-Dimensional Aggregate Range Queries Over Real Attributes. SIGMOD Conf. (2000) 463-474
No context found.
D. Gunopulos, G. Kollios, V. J. Tsotras, C. Domeniconi. Approximating multi-dimensional aggregate range queries over real attributes. In Proc. of the ACM SIGMOD Conference, pages 463--474, 2000.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC