by Khaled Alsabti, Sanjay Ranka, Vineet Singh
In Proc. 23rd VLDB Conference
ftp://ftp.cise.ufl.edu/pub/faculty/ranka/quant.ps.gz
Add To MetaCart
Abstract:
The '-quantile of an ordered sequence of data values is the element with rank ' \Theta n, where n is the total number of values. Accurate estimates of quantiles are required for the solution of many practical applications. In this paper, we present a new algorithm for estimating the quantile values for disk-resident data. Our algorithm has the following characteristics: (1) It requires only one pass over the data; (2) It is deterministic; (3) It produces good lower and upper bounds of the true values of the quantiles; (4) It requires no a priori knowledge of the distribution of the data set; (5) It has a scalable parallel formulation; (6) Extra time and memory for computing additional quantiles (beyond the first one) is constant per quantile. We present experimental results on the IBM SP-2. The experimental results show that the algorithm is indeed robust and does not depend on the distribution of the data sets.
Citations
|
389
|
Introduction To Parallel Computing: Design And Analysis
– Kumar, Grama, et al.
- 1994
|
|
281
|
Expected time bounds for selection
– Floyd, Rivest
- 1975
|
|
259
|
Mining quantitative association rules in large relational tables
– Srikant, Agrawal
- 1996
|
|
182
|
Improved histograms for selectivity estimation of range predicates
– Poosala, Ioannidis, et al.
- 1996
|
|
97
|
Fast Similarity Search
– Agrawal, Lin, et al.
- 1995
|
|
93
|
Mining Associations between Sets of Items in Massive Databases
– Agrawal, Imielinski, et al.
- 1993
|
|
64
|
The Optimization of Queries in Relational Databases
– Kooi
- 1980
|
|
18
|
Mining Quantitative Association
– Srikant, Agrawal
|
|
12
|
A One-Pass Space-Efficient Algorithm for Finding Quantiles
– Agrawal, Swami
- 1995
|
|
4
|
3rd edition
– Probability, Wiley
- 1995
|
|
1
|
A One-Pass Parallel Algorithm for Accurately Estimating Quantiles for DiskResident Data. http://www.cise.ufl.edu/�� ranka
– Alsabti, Ranka, et al.
- 1997
|
|
1
|
et al. Time Bounds for Selection. Journal of Computers and Systems
– Blum
- 1972
|
|
1
|
Equidepth Partitioning of a Data Set based on Finding its Medians
– Jain, Chlamtac
- 1990
|