Download:
|
by Michael O. Akinde, Michael H. Bohlen, Theodore Johnson, Laks V. S, Divesh Srivastava
http://www.research.att.com/~divesh/papers/abj+2002-dolap.ps
Add To MetaCart
Abstract:
Abstract. The success of Internet applications has led to an explosive growth in the demand for bandwidth from ISPs. Managing an IP network requires collecting and analyzing network data, such as
owlevel trac statistics. Such analyses can typically be expressed as OLAP queries, e.g., correlated aggregate queries and data cubes. Current day OLAP tools for this task assume the availability of the data in a centralized data warehouse. However, the inherently distributed nature of data collection and the huge amount of data extracted at each collection point make it impractical to gather all data at a centralized site. One solution is to maintain a distributed data warehouse, consisting of local data warehouses at each collection point and a coordinator site, with most of the processing being performed at the local sites. In this paper, we consider the problem of ecient evaluation of OLAP queries over a distributed data warehouse. We have developed the Skalla system for this task. Skalla translates OLAP queries, specied as certain algebraic expressions, into distributed evaluation plans which are shipped to individual sites. Salient properties of our approach are that only partial results are shipped { never parts of the detail data. We propose a variety of optimizations to minimize both the synchronization trac and the local processing done at each site. We nally present an experimental study based on TPC(R) data. Our results demonstrate the scalability of our techniques and quantify the performance benets of the optimization techniques that have gone into the Skalla system. 1
Citations
|
500
|
Fundamentals of Database Systems
– Elmasri, Navathe
- 1994
|
|
342
|
Principles of Distributed Database Systems, 2nd Edition
– Ozsu, Valduriez
- 1999
|
|
259
|
The data warehouse toolkit
– Kimball, Ross
- 2002
|
|
186
|
On the computation of multidimensional aggregates
– Agarwal, Agrawal, et al.
- 1996
|
|
172
|
An Overview of Data Warehousing
– Chaudhuri, Dayal
- 1997
|
|
95
|
Prototyping Bubba, a highly parallel database system
– Boral, Alexander, et al.
- 1990
|
|
84
|
Fast computation of sparse datacubes
– Ross, Srivastava
- 1997
|
|
46
|
Parallel algorithms for the execution of relational database operations
– Bitton, Boral, et al.
- 1983
|
|
34
|
Adaptive Parallel Aggregation Algorithms
– Shatdal, Naughton
- 1995
|
|
30
|
DataCube: A Relational Aggregation Operator Generalizing Group-By
– Gray
- 1997
|
|
27
|
der Merwe. Measurement and analysis of ip network usage and behaviour
– Caceres, Duffield, et al.
- 2000
|
|
23
|
The state of the art in distributed query processing
– Kossman
- 2000
|
|
17
|
Complex Aggregation at Multiple Granularities
– Ross, Srivastava, et al.
- 1998
|
|
14
|
Daytona and the fourth-generation language cymbal
– Greer
- 1999
|
|
12
|
The MD-join: An operator for complex OLAP
– Chatziantoniou, Akinde, et al.
- 2001
|
|
6
|
Deriving trac demands for operational IP networks: methodology and experience
– Feldmann, Greenberg, et al.
- 2000
|
|
5
|
On the E#cient Gathering of Su#cient Statistics for Classification from Large SQL
– Graefe, Fayyad, et al.
- 1998
|
|
3
|
Ad hoc OLAP: Expression and evaluation
– Chatziantoniou
- 1999
|
|
2
|
Generalized MD-joins: Evaluation and reduction to SQL
– Akinde, Bohlen
- 2001
|
|
2
|
An integrated algorithm for distributed query processing
– Yu, Guh, et al.
- 1987
|