MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data (1997) [25 citations — 4 self]

Download:
pdf | ps
by Khaled Alsabti, Sanjay Ranka, Vineet Singh
In Proc. 23rd VLDB Conference
ftp://ftp.cise.ufl.edu/pub/faculty/ranka/quant.ps.gz
Add To MetaCart

Abstract:

The '-quantile of an ordered sequence of data values is the element with rank ' \Theta n, where n is the total number of values. Accurate estimates of quantiles are required for the solution of many practical applications. In this paper, we present a new algorithm for estimating the quantile values for disk-resident data. Our algorithm has the following characteristics: (1) It requires only one pass over the data; (2) It is deterministic; (3) It produces good lower and upper bounds of the true values of the quantiles; (4) It requires no a priori knowledge of the distribution of the data set; (5) It has a scalable parallel formulation; (6) Extra time and memory for computing additional quantiles (beyond the first one) is constant per quantile. We present experimental results on the IBM SP-2. The experimental results show that the algorithm is indeed robust and does not depend on the distribution of the data sets.

Citations

389 Introduction To Parallel Computing: Design And Analysis – Kumar, Grama, et al. - 1994
281 Expected time bounds for selection – Floyd, Rivest - 1975
259 Mining quantitative association rules in large relational tables – Srikant, Agrawal - 1996
182 Improved histograms for selectivity estimation of range predicates – Poosala, Ioannidis, et al. - 1996
97 Fast Similarity Search – Agrawal, Lin, et al. - 1995
93 Mining Associations between Sets of Items in Massive Databases – Agrawal, Imielinski, et al. - 1993
64 The Optimization of Queries in Relational Databases – Kooi - 1980
18 Mining Quantitative Association – Srikant, Agrawal
12 A One-Pass Space-Efficient Algorithm for Finding Quantiles – Agrawal, Swami - 1995
4 3rd edition – Probability, Wiley - 1995
1 A One-Pass Parallel Algorithm for Accurately Estimating Quantiles for DiskResident Data. http://www.cise.ufl.edu/�� ranka – Alsabti, Ranka, et al. - 1997
1 et al. Time Bounds for Selection. Journal of Computers and Systems – Blum - 1972
1 Equidepth Partitioning of a Data Set based on Finding its Medians – Jain, Chlamtac - 1990