25 citations found. Retrieving documents...
K.-Y. Whang, B. T. Vander-Zanden, and H. M. Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Trans. Database Syst., 15(2):208--229, 1990.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Counting Distinct Elements in a Data Stream - Bar-Yossef, Jayram, Kumar.. (2002)   (1 citation)  (Correct)

....for F 0 in the data stream model. Counting the number of distinct elements in a (column of a relational) table of data is a fairly fundamental problem in databases. This has applications to estimating the selectivity of queries, designing good plans for executing a query, etc. see, for instance, [WVT90,HNSS96]. Another application of counting distinct elements is in routing of Internet tra#c. The router usually has very limited memory, but it is desirable to have the router gather various statistical properties (say, the number of distinct destination addresses) of the tra#c flow. The number of ....

K.-Y. Whang, B. T. Vander-Zanden, and H. M. Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems, 15(2):208--229, 1990.


Loglog Counting of Large Cardinalities - Durand, Flajolet (2003)   (5 citations)  (Correct)

....H H 0 ; where H 0 : log 2 m In case a value too close to H 0 is adopted (say 0 H H 0 3) then the e ect of hashing collisions must be compensated for. This is achieved by inverting the function that gives the expected value of the number of collisions in a hash table (see [3, 15] for an analogous discussion) The estimator is then to be changed into e mm (No detectable degradation of performance results from the last modi cation of the estimator function, and it can safely be used in all cases. 0 200 400 600 800 1000 40,000 50,000 60,000 Figure ....

Whang, K.-Y., Zanden, B. T. V., and Taylor, H. M. A linear-time probabilistic counting algorithm for database applications. TODS 15, 2 (1990), 208-229. Address: Algorithms Project, INRIA{Rocquencourt, F78153 Le Chesnay (France)


Approximate Frequency Counts over Data Streams - Manku, Motwani (2002)   (64 citations)  (Correct)

....approach is that it does not require a lookahead into the data stream. 7 Related and Future Work Problems related to frequency counting that have been studied in the context of data streams include approximate frequency moments [AMS96] differences [FKSV99] distinct values estimation [FM85, WVZT90] bit counting [DGIM02] and top k queries [GM98, CCFC02] Algorithms over data streams that pertain to aggregation include approximate quantiles [MRL99, GK01] Voptimal histograms [GKS01b] wavelet based aggregate queries [GKMS01, MVW00] and correlated aggregate queries [GKS01a] We are ....

K.-Y. WHANG, B. T. VANDER-ZANDEN, AND H. M. TAYLOR. A linear-time probabilistic counting algorithm for database applications. ACM Trans. on Database Systems, 15(2):208--229, 1990.


Random Sampling from Databases - Olken (1993)   (37 citations)  (Correct)

....and Seshadri [NS90] describe the estimation of the size of projections via sampling. Approaches to the problem vary depending on whether there is an index available on some or all of the projection attributes: 1. Do the projection, then count. 2. Scan the relation, doing probabilistic counting [WVZT90, FM85, Fla85, FM83, ASW87] 3. Sample the base relation; for each element of the sample find the number of the records in the base relation which match the sampled record on the projection attributes, call this jx i j. Estimate the size of the projection as the estimated average contribution of ....

Kyu-young Whang, B.T. Vander-Zanden, and H.M. Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems, 15(2):208--29, June 1990.


Managing Periodically Updated Data in Relational Databases: A.. - Gal, Eckstein (2001)   (10 citations)  (Correct)

....expected value R (s; f) R f s R (t) dt: A homogeneous Poisson process may be considered as the special 4 This vector can be computed exactly and efficiently using indices. Alternatively, in the absence of an index for a given attribute, statistical methods (such as probabilistic counting [37], sampling based estimators [14] and wavelets [22] can be applied. RRR 37 2000 Page 5 s; f Points in time R;S 2 R;R(t) jR(s)j Relations; R s extension at time t; its cardinality at time s. A 2 B; A B; domA;domA Attribute; compound attribute; domain of attribute; domain of compound ....

K.-Y. Whang, B.T. Vander Zanden, and H.M. Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems (TODS), 15(2):208--229, 1990.


Estimating Simple Functions on the Union of Data Streams - Gibbons, Tirthapura (2001)   (26 citations)  (Correct)

....and the time required to process each item is dominated by the time required to perform O(log(1=ffi) multiplications over a finite field of Theta(log m) bits. We note that this is an interesting result in itself because of the importance of the F0 function in database optimization (see, e.g. [6, 15, 26]) and in internet traffic analysis, e.g. the number of distinct web pages requested, or the number of distinct visitors to a website. This problem has been studied in [4, 10] and elsewhere, but we do not know of an (ffl; ffi) approximation scheme for F0 whose bounds match the bounds we obtain. ....

....Consider the zeroth frequency moment (F0) of a sequence of n items in [1: m] where m n. Estimating this function has been studied in the context 1 Note that the time per item is critical in practice, due to the extremely high network traffic rate. of a single stream for both public coins [7, 10, 26] and stored coins [4] These previous algorithms trivially extend to the distributed streams model, but the space and or time bounds for an (ffl; ffi) approximation scheme for F0 are worse than our algorithm s bounds. The best previous bounds are due to Cohen [7] which matches our space bound, ....

K.-Y. Whang, B. T. Vander-Zanden, and H. M. Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems, 15(2):208--229, 1990.


Domains and Active Domains: What This Distinction Implies for.. - Ciaccia, Maio (1995)   (Correct)

....periodically update measures for them. If an index is not present on the attribute, conventional techiniques based on sorting [3] or hashing can be costly in terms of disk accesses. On the other hand, probabilistic techniques based either on random sampling (see, e.g. 15] or on linear counting [34] provide reasonably accurate estimates at low cost. In particular, the linear counting algorithm can yield estimates within a 1 error by using a single relation scan and a bit vector with no more than n=12 bits [34] The second part of Observation 1 is far less obvious and needs a deeper ....

....based either on random sampling (see, e.g. 15] or on linear counting [34] provide reasonably accurate estimates at low cost. In particular, the linear counting algorithm can yield estimates within a 1 error by using a single relation scan and a bit vector with no more than n=12 bits [34]. The second part of Observation 1 is far less obvious and needs a deeper analysis. We first observe that Eqs. 2) and (3) assume to use ads, since, according to [13, page 593] although the actual sizes of the domains might not be available, this is not a major drawback since they can be easily ....

K.-Y. Whang, B. T. Vander Zanden, and H. M. Taylor. A lineartime probabilistic counting algorithm for database applications. ACM Trans. Database Syst., 15(2):208--229, June 1990.


Online Prediction Algorithms for Databases and Operating Systems - Krishnan (1995)   (7 citations)  (Correct)

....quite different. Query optimizers have cost models that estimate the access cost as a function of the predicted number of qualifying rows and find the cheaper alternative. Models already exist in current day relational database management systems (RDBMSs) to predict selectivity for numeric fields [ASW, FlM, Iye, SAC, WVT]. With the popularity of textual data being stored in RDBMS, it has become important to predict the selectivity accurately even for alphanumeric fields. A particularly problematic predicate used against alphanumeric fields is the like predicate [Iye] For example, consider the inventory of a ....

....phase must be minimal. In Chapter 11 we present our techniques for predicting selectivity for the like predicate; i.e. techniques for estimating alphanumeric selectivity. 10.1 Background and Related Work Models already exist in current day RDBMSs to predict selectivity for numeric fields [ASW, FlM, Iye, SAC, WVT]. Typically, in the preprocessing phase, a few numbers that capture the distribution of data are accumulated and stored in the catalog. In the earlier example dealing with salaries, the RDBMS would perform an analysis of the data in the salary field of the database, and select a small set of ....

[Article contains additional citation context not shown here]

K.Y. Whang, B. T. Vander-Zanden, and H. M. Taylor, "A Linear-Time Probabilistic Counting Algorithm for Database Applications," ACM Transactions on Database Systems 15 (June 1990), 208--229.


Aqua Project White Paper - Gibbons, Matias, Poosala (1997)   (2 citations)  (Correct)

....n) bits of memory. Flajolet and Martin [FM83, FM85] designed an algorithm for approximating the number of distinct values in a relation in a single pass through the data and using only O(lg n) bits of memory. Other algorithms for approximating the number of distinct values in a relation include [WVZT90, HNSS95] Probabilistic techniques for fast parallel estimation of the size of a set were studied in [Mat92] None of this previous work uses the new techniques described in this paper. 9 Project status and future directions We are in the process of implementing the base Aqua system. A simple ....

K.-Y. Whang, B. T. Vander-Zanden, and H. M. Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems, 15(2):208--229, 1990. 28


Random Sampling from Databases - A Survey - Olken, Rotem (1994)   (10 citations)  (Correct)

....Seshadri (1990) describe the estimation of the size of projections via sampling. Approaches to the problem vary depending on whether there is an index available on some or all of the projection attributes: 1. Do the projection, then count. 2. Scan the relation, doing probabilistic counting (see Whang, Vander Zanden and Taylor (1990), Flajolet and Martin (1985) Flajolet (1985) Flajolet and Martin (1983) and Astrahan, Schkolnick and Whang (1987) 3. Sample the base relation; for each element of the sample find the number of the records in the base relation which match the sampled record on the projection attributes, call ....

Whang, K.-y., Vander-Zanden, B. and Taylor, H. (1990). A linear-time probabilistic counting algorithm for database applications, ACM Transactions on Database Systems 15, 208--29.


New Sampling-Based Summary Statistics for Improving.. - Gibbons, Matias (1998)   (93 citations)  (Correct)

....n) bits of memory. Flajolet and Martin [FM83, FM85] designed an algorithm for approximating the number of distinct values in a relation in a single pass through the data and using only O(lg n) bits of memory. Other algorithms for approximating the number of distinct values in a relation include [WVZT90, HNSS95] Alon, Matias and Szegedy [AMS96] developed sublinear space randomized algorithms for approximating various frequency moments, as well as tight bounds on the minimum possible memory required to approximate such frequency moments. Probabilistic techniques for fast parallel estimation of ....

K.-Y. Whang, B. T. Vander-Zanden, and H. M. Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems, 15(2):208--229, 1990.


The Space Complexity of Approximating the Frequency Moments - Alon, Matias, Szegedy (1996)   (376 citations)  (Correct)

....O(lg lg m) O(lg lg n) bits of memory. Flajolet and Martin [8] designed an algorithm for approximating F0 using O(lg n) bits of memory. Their analysis, however, is based on the assumption that explicit families of hash functions with very strong random properties are available. Whang et al. [19] considered the problem of approximating F0 in the context of databases. Here we obtain tight bounds for the minimum possible memory required to approximate the numbers Fk . We prove that for every k 0, Fk can be approximated randomly using at most O(n 1 Gamma1=k lg n) memory bits. We further ....

K.-Y. Whang, B.T. Vander-Zanden, and H.M. Taylor, A linear-time probabilistic counting algorithm for database applications, ACM Transactions on Database Systems, 15(2) (1990), 208-229.


Computing Iceberg Queries Efficiently - Min Fang (1998)   (49 citations)  (Correct)

....how to remove these errors using our HYBRID algorithms in the next section. 3. 2 Coarse counting by bucketizing elements (COARSE COUNT) Coarse counting or probabilistic counting is a technique often used for query size estimation, for computing the number of distinct targets in a relation [10, 23], for mining association rules [16] and for other applications. The simplest form of coarse counting uses an array A[1: m] of m counters and a hash function h 1 , which maps target values from log 2 n bits to log 2 m bits, m n. The CoarseCount algorithm works as follows: Initialize all m ....

....or q = 2. For relatively large values of memory, we recommend UNISCAN with multiple hash functions, since we can choose K 1 and apply multiple hash functions within one hashing scan, as we discuss in the full version of this paper [9] 8 Related Work Flajolet and Martin [10] and Whang et al. [23] proposed a simple form of coarse counting for estimating the number of distinct elements in a multiset. Park et al. 16] proposed coarse counting in the context of mining association rules. All the above approaches use a single hash function for their coarse counting, and hence tend to have many ....

K. Whang, B.T. Vander-Zanden, and H.M. Taylor. A linear-time probabilistic counting algorithm for db applications. ACM Transactions on Database Systems, 15(2):208 -- 229, 1990.


The Space Complexity of Approximating the Frequency Moments - Alon, Matias, Szegedy (1996)   (376 citations)  (Correct)

....log m) O(log log n) bits of memory. Flajolet and Martin [7] designed an algorithm for approximating F 0 using O(log n) bits of memory. Their analysis, however, is based on the assumption that explicit families of hash functions with very strong random properties are available. Whang et al. [17] considered the problem of approximating F 0 in the context of databases. Here we obtain rather tight bounds for the minimum possible memory required to approximate the numbers F k . We prove that for every k 0, F k can be approximated randomly using at most O(n 1 Gamma1=k log n) memory bits. ....

K.-Y. Whang, B.T. Vander-Zanden, and H.M. Taylor, A linear-time probabilistic counting algorithm for database applications, ACM Transactions on Database Systems, 15(2) (1990), 208-229.


Estimating Alphanumeric Selectivity in the Presence of.. - Krishnan, Vitter, Iyer (1996)   (11 citations)  (Correct)

....on accurate cost estimation of various query reorderings [BGI] Estimating predicate selectivity, or the fraction of rows in a database that satisfy a selection predicate, is key to determining the optimal join order. Previous work has concentrated on estimating selectivity for numeric fields [ASW, HaSa, IoP, LNS, SAC, WVT]. With the popularity of textual data being stored in databases, it has become important to estimate selectivity accurately for alphanumeric fields. A particularly problematic predicate used against alphanumeric fields is the SQL like predicate [Dat] Techniques used for estimating numeric ....

....consulted to estimate selectivity; the processing in the query optimization phase must be minimal. Further, the space available in the metadata descriptors for any one column of the database is limited. Models already exist in current day relational DBMS to estimate selectivity for numeric fields [ASW, HaSa, IoP, LNS, SAC, WVT]. Typically, in the runstats phase, a few numbers that capture the distribution of data are accumulated and stored in the metadata, as histograms, for example. The problem of estimating alphanumeric selectivity is a natural extension to the problem of estimating numeric selectivity: the like ....

K.Y. Whang, B. T. Vander-Zanden, and H. M. Taylor, "A Linear-Time Probabilistic Counting Algorithm for Database Applications," ACM Trans. on Database Sys. 15 (June 1990), 208--229.


Computing Iceberg Queries Efficiently - Min Fang (1998)   (49 citations)  (Correct)

....to remove these errors using our HYBRID algorithms in the next section. 3. 2 Coarse counting by bucketizing elements (COARSE COUNT) Coarse counting or probabilistic counting is a technique often used for query size estimation, for computing the number of distinct targets in a relation [FM85, WVZT90] for mining association rules [PCY95] and for other applications. The simplest form of coarse counting uses an array A[1: m] of m counters and a hash function h 1 , which maps target values from log 2 n bits to log 2 m bits, m n. The CoarseCount algorithm works as follows: Initialize all ....

.... relatively large values of memory, we recommend UNISCAN with multiple hash functions, since we can choose K 1 and apply multiple hash functions within one hashing scan, as we discuss in the full version of this paper [FSGM 97] 8 Related Work Flajolet and Martin [FM85] and Whang et al. WVZT90] proposed a simple form of coarse counting for estimating the number of distinct elements in a multiset. Park et al. PCY95] proposed coarse counting in the context of mining association rules. All the above approaches use a single hash function for their coarse counting, and hence tend to have ....

K. Whang, B.T. Vander-Zanden, and H.M. Taylor. A linear-time probabilistic counting algorithm for db applications. ACM Transactions on Database Systems, 15(2):208 -- 229, 1990.


A One-Pass Aggregation Algorithm with the Optimal Buffer.. - Lee, Whang, Moon, Song (2002)   Self-citation (Whang)   (Correct)

No context found.

Whang, K., Vander-Zanden, B.T., and Taylor, H.M., "A Linear-time Probabilistic Counting Algorithm for Database Applications," ACM Trans. on Database Systems, Vol. 15, No. 2, pp. 208--229, June 1990.


A New Method For Estimating The Number Of Objects Satisfying.. - Cho, Chong-Mok (1996)   (2 citations)  Self-citation (Whang)   (Correct)

....estimating the cardinalities of unconditional joins and selectivity factors of a query involving partial participation classes. Partial participation has not been considered seriously in the literature [15] Most of existing query optimization techniques [2, 8, 11, 16, 19] except for Whang et al. [24, 21, 22] have not considered partial participation. We discuss in Section 4.1 that these conventional techniques often incur large estimation errors for the queries involving partial participation classes. We also consider the effect of multi valued attributes on the estimation of intermediate results ....

....attributes on the estimation of intermediate results cardinalities. Most of existing estimation techniques [2, 16, 19, 24, 21] have not considered multi valued attributes seriously. The proposed technique for estimating the cardinalities of unconditional joins extends Whang s technique [24, 21, 22] so as to consider the characteristics of object oriented databases. The proposed techniques require new types of statistics for estimating query costs. We also show that these statistics can be easily obtained by taking advantage of inherent properties of object oriented databases. The paper is ....

[Article contains additional citation context not shown here]

K. Y. Whang, B. T. Vander-Zanden, and H. M. Taylor. A linear time probabilistic counting algorithm for database applications. ACM Trans. on Database Systems, 15(2):208--229 (1990).


A New Method For Estimating The Number Of Objects Satisfying An .. - Wan-Sup Cho (1996)   (2 citations)  Self-citation (Whang)   (Correct)

....estimating the cardinalities of unconditional joins and selectivity factors of a query involving partial participation classes. Partial participation has not been considered seriously in the literature [15] Most of existing query optimization techniques [2, 8, 11, 16, 19] except for Whang et al.[24, 21, 22] have not considered partial participation. We discuss in Section 4.1 that these conventional techniques often incur large estimation errors for the queries involving partial participation classes. We also consider the effect of multi valued attributes on the estimation of intermediate results ....

....attributes on the estimation of intermediate results cardinalities. Most of existing estimation techniques [2, 16, 19, 24, 21] have not considered multi valued attributes seriously. The proposed technique for estimating the cardinalities of unconditional joins extends Whang s technique [24, 21, 22] so as to consider the characteristics of object oriented databases. The proposed techniques require new types of statistics for estimating query costs. We also show that these statistics can be easily obtained by taking advantage of inherent properties of object oriented databases. The paper is ....

[Article contains additional citation context not shown here]

K. Y. Whang, B. T. Vander-Zanden, and H. M. Taylor. A linear time probabilistic counting algorithm for database applications. ACM Trans. on Database Systems, 15(2):208--229 (1990).


Fast and Accurate Traffic Matrix Measurement - Using Adaptive Cardinality (2005)   (Correct)

No context found.

K.-Y. Whang, B. T. Vander-Zanden, and H. M. Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Trans. Database Syst., 15(2):208--229, 1990.


Duplicate Detection in Click Streams - Metwally, Agrawal, Abbadi (2005)   (Correct)

No context found.

K. Whang, B. Vander-Zanden, and H. Taylor. A Linear-Time Probabilistic Counting Algorithm for Database Applications. ACM Transactions on Database Systems, 15:208--229, 1990.


Duplicate Detection in Click Streams - Metwally, Agrawal, Abbadi (2005)   (Correct)

No context found.

K. Whang, B. Vander-Zanden, and H. Taylor. A Linear-Time Probabilistic Counting Algorithm for Database Applications. ACM Transactions on Database Systems, 15:208--229, 1990.


Data Streaming Algorithms for Efficient and Accurate.. - Kumar, Sung, Xu, Wang (2004)   (1 citation)  (Correct)

No context found.

K. Whang, B. Vander-Zanden, and H. Taylor, "A linear-time probabilistic counting algorithm for database applications," ACM Transactions on Database Systems, 1990.


Data Streaming Algorithms for Efficient and Accurate.. - Kumar, Sung, Xu, Wang (2004)   (1 citation)  (Correct)

No context found.

K. Whang, B. Vander-Zanden, and H. Taylor, "A linear-time probabilistic counting algorithm for database applications," ACM Transactions on Database Systems, 1990.


Approximate Frequency Counts over Data Streams - Manku (2002)   (64 citations)  (Correct)

No context found.

K.-Y. WHANG, B. T. VANDER-ZANDEN, AND H. M. TAYLOR. A linear-time probabilistic counting algorithm for database applications. ACM Trans. on Database Systems, 15(2):208--229, 1990.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC