| J. Shanmugasundaram, U. Fayyad, and P. Bradley. Compressed data cubes for OLAP aggregate query approximation on continuous dimensions. In Proceedings of KDD-99, pages 223--232, 1999. |
....queries over objects with non zero extents in d dimensional space. Formally, it is de ned as: given n weighted rectangular objects and a query rectangle r in the d dimensional space, nd the cumulative weight of all the objects which intersect r . Previous work on multi dimensional aggregations [16, 27, 10, 14, 26, 24, 13] considers only point objects (i.e. objects with zero extent in all dimensions) that fall on a xed multi dimensional grid. One exception is the work in [28, 30] that examines aggregations over interval (i.e. 1 dimensional) objects. We will use the term box aggregation since in recent ....
J. Shanmugasundaram, U. M. Fayyad, P. S. Bradley, \Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions", Proc. of KDD, 1999.
....network learning for queries with somewhat di erent goals and di erent techniques to those of this paper. Mixtures of Gaussian models have been proposed for query selectivity estimation on real valued data sets consisting of relatively low dimensional data cubes, e.g. 5 or fewer dimensions [26]. The main contribution of [26] was the introduction of a model based approach to query answering that resulted in a highly memory and time ecient data representation. More general probabilistic tools than mixture models, Bayesian networks and probabilistic relational models, were considered for ....
....with somewhat di erent goals and di erent techniques to those of this paper. Mixtures of Gaussian models have been proposed for query selectivity estimation on real valued data sets consisting of relatively low dimensional data cubes, e.g. 5 or fewer dimensions [26] The main contribution of [26] was the introduction of a model based approach to query answering that resulted in a highly memory and time ecient data representation. More general probabilistic tools than mixture models, Bayesian networks and probabilistic relational models, were considered for the task of selectivity ....
J. Shanmugasundaram, U. Fayyad, and P. Bradley. Compressed data cubes for OLAP aggregate query approximation on continuous dimensions. In Proceedings of KDD-99, pages 223-232. New York, NY: ACM Press, 1999.
....into the following categories: 1) e#cient computation of full or iceberg cubes with simple or complex measures [1, 25, 18, 6, 11] 2) selective materialization of views [13, 3, 9, 10, 21] 3) computation of compressed data cubes by approximation, such as quasi cubes, wavelet cubes, etc. [4, 23, 20, 5], 4) computation of condensed, dwarf, or quotient cubes [15, 24, 22, 16] and (5) computation of stream cubes for multi dimensional regression analysis [7] Among these categories, we believe that the first one, e#cient computation of full or iceberg cubes, plays a key role because it is a ....
J. Shanmugasundaram, U. M. Fayyad, and P. S. Bradley. Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimension. KDD'99.
....made this observation practical byintroducing the lazy wavelet transform, an algorithm that translates polynomial range sums to the wavelet domain in polylogarithmic time. Wavelets are often thought of as a data approximation tool, and have been used this way for approximate range query answering [34, 32,3,9]. The efficacy of this approach is highly data dependent# it only works when the data haveaconcisewavelet approximation. Furthermore the wavelet approximation is difficult to maintain. Toavoid these problems, we use wavelets to approximate incoming queries rather than the underlying data . By ....
J. Shanmugasundaram, U. Fayyad, and P. Bradley. Compressed data cubes for OLAP aggregate query approximation on continuous dimensions. In Fifth ACM SIGKDD International ConferenceonKnowledge Discovery and Data Mining, August 1999.
....queries over objects with nonzero extents in d dimensional space. Formally, it is de ned as: given n weighted rectangular objects and a query rectangle r in the d dimensional space, nd the cumulative weight of all the objects which intersect r . Previous work on multidimensional aggregations [18, 36, 11, 15, 35, 31, 14] considers only point objects (i.e. objects with zero extent in all dimensions) that fall on a xed multi dimensional grid. One exception is the work in [37, 39] that examines aggregations over interval (i.e. 1 dimensional) objects. We will use the term box aggregation since in recent ....
J. Shanmugasundaram, U. M. Fayyad, P. S. Bradley, \Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions", Proc. of KDD, 1999.
....of inclusion exclusion for query approximation has been previously mentioned in [18] Probabilistic models of various forms have also been investigated in limited contexts. Mix tures of Gaussian independence models were investigated for generating approximate queries on real valued data sets in [32]. Bayesian networks for the task of selectivity estimation over multiple tables in a relational database were considered in [14] The use of statistical inter action models in dependency based histograms for query selectivity estimation is described in [12] However, none of this work contains a ....
J. Shanmugasundaram, U. Fayyad, and P. Bradley. Compressed data cubes for OLAP aggregate query approxi- mation on continuous dimensions. In Proceedings of Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'99), pages 223-232. New York, NY: ACM Press, 1999.
....Base Endowment. To copy otherwise, or to republish, requires a fee and or special permission from the Endowment. Proceedings of the 28th VLDB Conference Hong Kong China 2002 scratch [8, 2, 20] choosing views to materialize under space constraints [11] handling sparsity [14] cube compression [16, 19, 17], approximation [3, 4, 18] and computing the cube under user specified constraints [5] In the second generation, researchers began to fo cus attention on extracting more semantics from a data cube. e.g. 15] studies the most general contexts under which observed patterns occur and [12] uses ....
....functions, query answering, constraints, and incremental maintenance, while Section 9 summarizes the paper and describes future work. Proofs of results are omitted for space limitations, and can be found in the full paper [13] 1. 1 Related Work Works on compressing the cube size such as [16, 4, 7, 10] are of clear relevance to us. However, our ma n goal is constructing an exact and concise summary of a cube that preserves the cube semantics and lattice structure, as opposed to just compressing the cube size, which distinguishes it from all these works. The recent work on condensed cube by ....
J. Shanmugasundaram et al. Compressed Data Cubes for OLAP Aggregate Query Approximation on Contin- uous Dimensions. KDD'99:223-232.
....on the size of the cube is a real problem. All methods proposed in the literature try to deal with the space problem, either by precomputing a subset of the possible group bys [HRU96, GHRU97, Gup97, BPT97, SDN98] by estimating the values of the group bys using approximation [GM98, VWI98, SFB99, AGP00] or by using online aggregation [HHW97] techniques. This paper defines Dwarf, a highly compressed structure for computing, storing, and querying data cubes. Dwarf solves the storage space problem, by identifying prefix and suffix redundancies in the structure of the cube and factoring ....
....group bys with partitions less than the minimum support. Recently, work has been performed on approximating Data Cubes through various forms of compression such as wavelets [VWI98] multivariate polynomials [BS98] or by using sampling [GM98, AGP00] or data probability density distributions [SFB99] While these methods can substantially reduce the size of the Cube, they do not actually store the values of the group bys, but rather approximate them, thus not always providing accurate results. In Cubetrees [RKR97, KR98] group bys are mapped into orthogonal hyperplanes of a multidimensional ....
J. Shanmugasundaram, U. Fayyad, and P.S. Bradley. Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions. In Proc. of the Intl. Conf. on Knowledge Discovery and Data Mining (KDD99), 1999.
....Dynamic Data Cube [7, 6] Another thread of research has developed approximate OLAP techniques. Nonparametric wavelet based data compression has been explored in [16, 15] where it is also noted that constant range functions have a sparse wavelet representation. Parametric modeling is proposed in [14, 2]. All of these techniques follow the pattern of reducing the size of the data, then evaluating queries on the smaller dataset. These two threads are brought together by progressive query evaluation techniques. The pCube [11] uses an R tree like structure to provide progressive query evaluation ....
....entire domain has N = 2 dj points. De nition 1 A Range in D is a rectangular region of the form R = Q d 1 i=0 [l i ; h i ] where l i h i . An additive range query is usually de ned as the sum of some measure attribute over the points in the database that lie in the range. It is noted in [14] that these sums can be written as integrals over the range of the product of a measure function and the data density function. This is the formulation we will use to make the following de nition. De nition 2 Given a function f : D C (the measure attribute) a range R D, and a database of ....
[Article contains additional citation context not shown here]
J. Shanmugasundaram, U. Fayyad, and P. Bradley. Compressed data cubes for OLAP aggregate query approximation on continuous dimensions. In Fifth ACM SIGKDD Inter. Conf. on Knowledge Discovery and Data Mining, August 1999.
....a condensed cube is used to answer queries. A condensed cube provides accurate aggregate values. It is different from those approaches that reduce the cube size through approximation with various forms, such as wavelet [16] multivariate polynomials [3] mixed model by multivariate Gaussians [15], histogram [12] sampling [1] and others [8] A condensed cube supports general OLAP applications. It is different from those proposals to reduce the size of a cube by tailoring it to only answering certain types of queries [10] Contributions of our work can be summarized as follows: We ....
....structures can efficiently provide approximate query results in many applications. The statistical methods and models used for the purpose of fast approximate query answering in the data warehouse environment include wavelet [16] multivariate polynomials [3] mixed model by multivariate Gaussians [15], histogram [12] sampling [1] etc. Several specialized data structures for fast processing of special types of queries were also proposed. Prefix sum and relative prefix sum methods [5] are proposed to answer range sum queries using a large pre computed prefix cube. More recently, the ....
J. Shanmugasundaram, U. M. Fayyad, and P. S. Bradley. Compressed data cubes for olap aggregate query approximation on continuous dimensions. SIGKDD
....of the former approach is the work of [10] which constructs data cubes specifically for improving the efficiency of a data mining algorithm. This is very different from our approach, which uses data mining to extend the functionality of OLAP. An example of the latter approach is the work of [19], which uses a clustering technique to estimate the probability density of multidimensional data. This leads to the ability to answer OLAP queries in a significantly more efficient and flexible way, by storing only a compact representation of the database produced by the clustering algorithm. ....
Shanmugasundaram, J.; Fayyad, U. and Bradley, P.S. (1999) Compressed data cubes for OLAP aggregate query approximation on continuous dimensions. Proc. 5th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, 223-232. ACM.
....over the points in the database that lie in the range. A simple example in our gasoline domain would be the sum of gasoline volumes consumed over all states and grades. These sums can be written as integrals over the range of the product of a measure function and the data density function [Shanmugasundaram et al. 1999]. This is the formulation we will use to make the following definition. Definition 2 Givenafunction# # # # # (the measure attribute) a range # # #, and a database of points in # with density function ### # # # , the range query #### ## ## is defined as #### ## ## # # ### ##### # ####### ....
....query evaluation strategy. 4 Performance Evaluation We have implemented POLAP in C using Db4 wavelets. This implementation has been tested on real data from our gasoline domain. While these results appear to compare favorably with other approximate OLAP algorithms [Vitter et al. 1998, Shanmugasundaram et al. 1999] we do not attempt a direct comparison. The purpose of this demonstration is to show how quickly POLAP converges to the exact result, and how this convergence is affected by the sparseness of the data, the size of the query relative to the domain, and the size of the domain itself. Throughout ....
Shanmugasundaram, J., Fayyad, U., and Bradley, P. (1999). Compressed data cubes for OLAP aggregate query approximation on continuous dimensions. In Fifth ACM SIGKDD Inter. Conf. on Knowledge Discovery and Data Mining.
.... to the query selectivity problem [20, 31, 7] multidimensional histograms ( 22, 27] and sampling ( 16, 13] Mixtures of Gaussian independence models were proposed for selectivity query estimation on real valued data sets from relatively low dimensional data cubes (5 or fewer dimensions) [30]. Generalized queries were considered by [28] in the context of language modeling using context free grammars. To our knowledge, our paper is the first to directly address the problem of generalization queries for large high dimensional transaction data sets and to systematically compare several ....
J. Shanmugasundaram, U. Fayyad, and P. Bradley. Compressed data cubes for olap aggregate query approximation on continuous dimensions. In Proceedings of the 5th Intl. Conf. on Knowledge Discovery and Data Mining (KDD99), pp. 223-232, ACM Press, New York, 1999., 1999. 19
....and investigate different aspects of their performance. A specific class of probabilistic models, mixtures of Gaussian independence models, have previously been proposed for generating approximate queries on real valued data sets from relatively lowdimensional data cubes (5 or fewer dimensions) [10]. We generalize this approach to include not only mixtures of independence models (for binary data) but also several other probabilistic models, and we demonstrate how these models can be used for approximate querying on data sets involving several hundred dimensions. The rest of this paper is ....
J. Shanmugasundaram, U. Fayyad, and P. Bradley, "Compressed data cubes for olap aggregate query approximation on continuous dimensions," in Proceedings of the 5th Intl. Conf. on Knowledge Discovery and Data Mining (KDD99), pp. 223-232, ACM Press, New York, 1999., 1999.
....effect. But there are applications in which, given a value of k, one desires to have a cluster model with k non empty clusters. These include the situation when the value of k is know a priori and applications in which the cluster model is utilized as a compressed version of a specific dataset [1, 8]. The remaining portion of the paper is organized as follows. Section 2 formalizes the constrained clustering optimization problem and outlines the algorithm computing a locally optimal solution. The sub problem of computing cluster assignments so that cluster h contains at least h points is ....
J. Shanmugusundaram, U. M. Fayyad, and P. S. Bradley. Compressed data cubes for olap aggregate query approximation on continuous dimensions. In Proc. 5th Intl. Conf. on Knowledge Discovery and Data Mining (KDD99), pages 223--232, New York, 1999. ACM Press.
.... superior to other alternatives for statistical modeling purposes [GMPS97, PE96, B95, CS96, NH99] Utility of the statistical model computed via EM has been demonstrated in Scaling EM Clustering to Large Databases Bradley, Fayyad, and Reina 5 approximating OLAP aggregate queries on continuous data [SFB99] and approximating nearestneighbor queries [BFG99] These applications require the statistical semantics and theory, which is a substantial advantage over clustering methods that do not derive statistically proper models. We next discuss the standard EM approach to mixture model estimation. 1.1 ....
....of interest in this paper is how to best model a large data set with a mixture distribution. Once the distribution is obtained, standard statistics can then be leveraged to do all sorts of probabilistic inference, including applications in indexing [BFG99] and compressing data cubes for OLAP [SFB99]. The CURE algorithm [GRS98] is a scalable clustering technique based upon the hierarchical agglomerative clustering (HAC) approach. Initially each data record is considered a cluster and the two nearest clusters are merged. The difference between CURE and standard HAC is that clusters are ....
J. Shanmugasundaram, U. M. Fayyad and P. S. Bradley. Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions. In S. Chaudhuri and D. Madigan (eds.), Proc. 5 th Intl. Conf. on Knowledge Discovery and Data Mining (KDD99), pp. 223-232, ACM Press, New York, 1999.
No context found.
J. Shanmugasundaram, U. Fayyad, and P. Bradley. Compressed data cubes for OLAP aggregate query approximation on continuous dimensions. In Proceedings of KDD-99, pages 223--232, 1999.
No context found.
J. Shanmugasundaram, U. Fayyad, and P. Bradley. Compressed data cubes for OLAP aggregate query approximation on continuous dimensions. In Proceedings of KDD-99, pages 223--232. New York, NY: ACM Press, 1999.
No context found.
J. Shanmugasundaram, U. M. Fayyad, and P. S. Bradley. Compressed data cubes for OLAP aggregate query approximation on continuous dimensions. In KDD'99.
No context found.
J. Shanmugasundaram, U. Fayyad, and PS Bradley. Compressed data cubes for olap aggregate query approximation on continuous dimensions. In Knowledge Discovery and Data Mining, pages 223--232, 1999.
No context found.
J. Shanmugasundaram, U. Fayyad, and P.S. Bradley. Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions. In KDD 1999.
No context found.
J. Shanmugasundaram, U. M. Fayyad, and P. S. Bradley. Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimension. KDD'99.
No context found.
J. Shanmugasundaram, U. M. Fayyad, and P. S. Bradley. Compressed data cubes for OLAP aggregate query approximation on continuous dimensions. In KDD'99.
No context found.
J. Shanmugasundaram, U. M. Fayyad, P. S. Bradley, "Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions", Proc. of KDD, 1999.
No context found.
J. Shanmugasundaram, U. M. Fayyad, P. S. Bradley, "Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions", Proc. of KDD, pp. 223-232, 1999.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC