MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  1

Download:
Download as a PDF | Download as a PS
by Michael O. Akinde, Michael H. Bohlen, Theodore Johnson, Laks V. S, Divesh Srivastava
http://www.research.att.com/~divesh/papers/abj+2002-dolap.ps
Add To MetaCart

Abstract:

Abstract. The success of Internet applications has led to an explosive growth in the demand for bandwidth from ISPs. Managing an IP network requires collecting and analyzing network data, such as owlevel trac statistics. Such analyses can typically be expressed as OLAP queries, e.g., correlated aggregate queries and data cubes. Current day OLAP tools for this task assume the availability of the data in a centralized data warehouse. However, the inherently distributed nature of data collection and the huge amount of data extracted at each collection point make it impractical to gather all data at a centralized site. One solution is to maintain a distributed data warehouse, consisting of local data warehouses at each collection point and a coordinator site, with most of the processing being performed at the local sites. In this paper, we consider the problem of ecient evaluation of OLAP queries over a distributed data warehouse. We have developed the Skalla system for this task. Skalla translates OLAP queries, specied as certain algebraic expressions, into distributed evaluation plans which are shipped to individual sites. Salient properties of our approach are that only partial results are shipped { never parts of the detail data. We propose a variety of optimizations to minimize both the synchronization trac and the local processing done at each site. We nally present an experimental study based on TPC(R) data. Our results demonstrate the scalability of our techniques and quantify the performance benets of the optimization techniques that have gone into the Skalla system. 1

Citations

500 Fundamentals of Database Systems – Elmasri, Navathe - 1994
342 Principles of Distributed Database Systems, 2nd Edition – Ozsu, Valduriez - 1999
259 The data warehouse toolkit – Kimball, Ross - 2002
186 On the computation of multidimensional aggregates – Agarwal, Agrawal, et al. - 1996
172 An Overview of Data Warehousing – Chaudhuri, Dayal - 1997
95 Prototyping Bubba, a highly parallel database system – Boral, Alexander, et al. - 1990
84 Fast computation of sparse datacubes – Ross, Srivastava - 1997
46 Parallel algorithms for the execution of relational database operations – Bitton, Boral, et al. - 1983
34 Adaptive Parallel Aggregation Algorithms – Shatdal, Naughton - 1995
30 DataCube: A Relational Aggregation Operator Generalizing Group-By – Gray - 1997
27 der Merwe. Measurement and analysis of ip network usage and behaviour – Caceres, Duffield, et al. - 2000
23 The state of the art in distributed query processing – Kossman - 2000
17 Complex Aggregation at Multiple Granularities – Ross, Srivastava, et al. - 1998
14 Daytona and the fourth-generation language cymbal – Greer - 1999
12 The MD-join: An operator for complex OLAP – Chatziantoniou, Akinde, et al. - 2001
6 Deriving trac demands for operational IP networks: methodology and experience – Feldmann, Greenberg, et al. - 2000
5 On the E#cient Gathering of Su#cient Statistics for Classification from Large SQL – Graefe, Fayyad, et al. - 1998
3 Ad hoc OLAP: Expression and evaluation – Chatziantoniou - 1999
2 Generalized MD-joins: Evaluation and reduction to SQL – Akinde, Bohlen - 2001
2 An integrated algorithm for distributed query processing – Yu, Guh, et al. - 1987