| D. Barbara and M. Sullivan. Quasi-cubes: Exploiting approximation in multidimensional databases. SIGMOD Record, 26:12--17, 1997. |
....into the following categories: 1) e#cient computation of full or iceberg cubes with simple or complex measures [1, 25, 18, 6, 11] 2) selective materialization of views [13, 3, 9, 10, 21] 3) computation of compressed data cubes by approximation, such as quasi cubes, wavelet cubes, etc. [4, 23, 20, 5], 4) computation of condensed, dwarf, or quotient cubes [15, 24, 22, 16] and (5) computation of stream cubes for multi dimensional regression analysis [7] Among these categories, we believe that the first one, e#cient computation of full or iceberg cubes, plays a key role because it is a ....
D. Barbara and M. Sullivan. Quasi-cubes: Exploiting approximation in multidimensional database. SIGMOD Record, 26:12-17, 1997.
....or to republish, requires a fee and or special permission from the Endowment. Proceedings of the 28th VLDB Conference Hong Kong China 2002 scratch [8, 2, 20] choosing views to materialize under space constraints [11] handling sparsity [14] cube compression [16, 19, 17] approximation [3, 4, 18], and computing the cube under user specified constraints [5] In the second generation, researchers began to fo cus attention on extracting more semantics from a data cube. e.g. 15] studies the most general contexts under which observed patterns occur and [12] uses the cube structure to ....
D. Barbara &; M. Sullivan. Quasi-cubes: Exploiting ap- proximation in multidimensional databases. $IGMOD Record, 26:12-17, 1997.
....to obtain optimal space time tradeoff. Data cube maintenance and refreshing issues has been addressed in [MQM97] They present a summary delta algorithm to incrementally maintain precomputed aggregates. A model for reducing a size of the data cube with approximation methods is presented in [BS97]. The data cube can be presented by materializing it only partly and the other values can then be computed with a linear regression method. The data warehousing issues has been widely discussed in the WHIPS project [Wid95] LZW 97] Main topics include integration of heterogeneous data into the ....
....missing value is computed when performing queries. The computation can be made with a simple linear interpolation method or with an advanced function of a higher order. The system must offer methods to perform these simple interpolations internally. This approach is close to the model presented in [BS97], but they provide a method to conserving the size of the data cube by approximating its contents, as we concentrate on approximating missing values. In the third approach, there must be an alternate time series available. Usually the estimates are used to compensate missing measurement values. ....
D. Barbara, M. Sullivan. Quasi-Cubes: Exploiting approximations in multidimensional databases. SIGMOD Record, Vol. 26, No. 3, September 1997, pp. 12-17.
....of aggregation function is required when a condensed cube is used to answer queries. A condensed cube provides accurate aggregate values. It is different from those approaches that reduce the cube size through approximation with various forms, such as wavelet [16] multivariate polynomials [3], mixed model by multivariate Gaussians [15] histogram [12] sampling [1] and others [8] A condensed cube supports general OLAP applications. It is different from those proposals to reduce the size of a cube by tailoring it to only answering certain types of queries [10] Contributions of ....
....substantially small sizes, synopsis data structures can efficiently provide approximate query results in many applications. The statistical methods and models used for the purpose of fast approximate query answering in the data warehouse environment include wavelet [16] multivariate polynomials [3], mixed model by multivariate Gaussians [15] histogram [12] sampling [1] etc. Several specialized data structures for fast processing of special types of queries were also proposed. Prefix sum and relative prefix sum methods [5] are proposed to answer range sum queries using a large ....
D. Barbar a and M. Sullivan. Quasi-cubes: Exploiting approximations in multidimensional databases. SIGMOD Record, 26(3):12--17, 1997.
....Faloutsos et al. in multiple dimensions. Maximum entropy has also been used for the identification of interesting correlations in data [Tho98] There exists a sizeable bibliography in histogramming techniques and approximate query answering [IP95] PIHS96] JKM 98] VWI98] AGPR99] SFB99] BS97] BW00] Our approach is fundamentally different. Previous work focused on the problem of data reconstruction by constructing specialized summarized representations (typically histograms) of the data. We argue, that since 4 data are already stored in an aggregated form in the warehouse, it is ....
Daniel Barbar'a and Mark Sullivan. Quasi-Cubes: Exploiting Approximations in Multidimensional Databases. 26(3):12--17, 1997.
....information can often fit in mainmemory, response times are significantly reduced. This work constitutes a part of our efforts to build an efficient data analysis system called AQUA [1] The general concept of approximate query answering based on precomputed statistics has been proposed before [1, 16, 24, 2]. However, the focus of this paper is on a specific novel approach based on histograms and handling the technical issues arising there. The viability of this idea relies on the hypothesis that many OLAP applications can readily tolerate small errors in query results in exchange for significantly ....
....Sparsity Factor 0 10000 20000 30000 Space Needed (in bytes) SIMPLE RANDOM GREEDY Figure 8. Effect of data dependence Figure 9. Space required for various error bounds Figure 10. Effect of Data Sparsity One way to solve this problem is to use an accurate value domain, as proposed in [2]. This can be achieved by storing a bit map of the occupation of various cells in the data cube. Many commercial systems already maintain a highly compressed bit map of this form, hence this approach seems reasonable in practice. On the other hand, in their current form, histograms use ....
[Article contains additional citation context not shown here]
D. Barbara and M. Sullivan. Quasi-cubes: exploiting approximations in multidimensional databases. SIGMOD Record, 26(3):12--17, 1997.
....the distribution used in randomizing values of an attribute. There is rich query optimization literature on estimating attribute distributions from partial information [BDF 97] In the OLAP literature, there is work on approximating queries on sub cubes from higher level aggregations (e.g. BS97] However, these works did not have to cope with information that has been intentionally distorted. Closely related, but orthogonal to our work, is the extensive literature on access control and security (e.g. Din78] ST90] Opp97] RG98] Whenever sensitive information is exchanged, it must ....
D. Barbara and M. Sullivan. Quasi cubes: Exploiting approximations in multidimensional databases. SIGMOD Record, 26(3):12--17, 1997.
No context found.
D. Barbara and M. Sullivan. Quasi-cubes: Exploiting ap- proximation in multidimensional databases. In Proc. Int. Conf. on Management of Data (SIGMOD), 1997.
....that fit in memory. Moreover, we need to point out that all this I O activity takes place before the EDA is undertaken. Chunks and their models can be stored as part of the cuboid and reused for further analysis and other uses, such as approximate query processing and other types of data mining [1, 2]. A point we need to make in this subsection is that of modeling. The general technique is quite independent of the model chosen for the chunks. Of course, there will be some models that are better suited for specific classes of data and therefore will produce smaller estimation errors. However, ....
D. Barbar'a and M. Sullivan. Quasi-Cubes: Exploiting Approximations in Multidimensional Databases. SIGMOD Record, 26(3), September 1997.
....that fit in memory. Moreover, we need to point out that all this I O activity takes place before the EDA is undertaken. Chunks and their models can be stored as part of the cuboid and reused for further analysis and other uses, such as approximate query processing and other types of data mining [5, 6, 3]. A point we need to make in this subsection is that of modeling. Several choices of models are possible (a survey of methods can be found in [4] The general technique is, however, quite independent of the model chosen for the chunks. Of course, there will be some models that are better suited ....
D. Barbar'a and M. Sullivan. Quasi-Cubes: Exploiting Approximations in Multidimensional Databases. SIGMOD Record, 26(3), September 1997.
....loglinear models for compressing the data cube and obtaining approximate answers. For the technique to be efficient, dense clusters in the data cube have to be identified and are then approximated by loglinear models (other approximation techniques could be used as well, e.g. regression models [3]) The approach is mainly applicable to dense low dimensional data cubes. It is possible to return absolute error bounds based on the approximation model and a guaranteed bound for the approximation error per cell. However, for queries with large query cubes this approach leads to slow responses. ....
D. Barbara and M. Sullivan. Quasi-cubes: Exploiting approximations in multidimensional databases. SIGMOD Record, 26(3), 1997.
....is materialized (e.g. the base data) and the rest is computed on demand. These techniques however, can impose long delays in answering queries. These limitations have prompted researchers to look for techniques to compress the datacube in such a way that only a fraction of the space is needed [9, 8, 1, 21, 24]. Since the compression techniques are lossy, one can only provide approximate answers to the queries posed to the datacube. On the other hand, the queries can be answered without incurring into much disk I O, so the response time is considerably smaller than the one experienced in uncompressed ....
....than the one experienced in uncompressed datacubes. In this paper we present a technique to compress datacubes based on loglinear models [4] loglinear models are a form of statistical parametric models) We have preliminary explored a simpler parametric technique based on linear regression in [9, 8]. The technique uses loglinear models to characterize dense chunks of the datacube. These models, can be used to estimate cells, with a certain degree of accuracy. We keep the errors caused by the estimation process under control by storing, along with the model parameters, those cells whose ....
D. Barbar'a and M. Sullivan. Quasi-cubes: Exploiting approximations in multidimensional databases. SIGMOD Record, 26(3), September 1997.
No context found.
D. Barbara and M. Sullivan. Quasi-cubes: Exploiting approximation in multidimensional databases. SIGMOD Record, 26:12--17, 1997.
No context found.
D. Barbara and M. Sullivan. Quasi-cubes: Exploiting approximation in multidimensional database. SIGMOD Record, 26:12-17, 1997.
No context found.
D. Barbara and M. Sullivan. Quasi-cubes: Exploiting approximation in multidimensional databases. SIGMOD Record, 26:12--17, 1997.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC