| P. Flajolet, G. N. Martin. Probabilistic Counting. In Proc. Symp. on Foundations of Computer Science, 1983. |
....in which the methods may be used: most notably they can be employed on the .fly as well as in the context of distributed processing with minimal exchanges of information between processors and without any degradation of performances. Preliminary results about this work have been reported in [3]. 2. A PROBABILISTIC COUNTING PROCEDURE AND ITS ANALYSIS The Basic Counting Procedure We assume here that we have at our disposal a hashing function hash of the type: function hash(x: records) scalar range [0. 2 L 1 ] that transforms records into integers sufficiently uniformly ....
P. FLAJOLET ANON. MARTIN, Probabilistic counting, in "Proc. 24th IEEE Sympos. Foundations of Computer Science, Nov. 1983," pp. 76-82.
.... used is O(log N log u) and the per item processing time is O(log log N ) Notice that if one additionally wished to estimate just #ff Gamma rare or jX t Y t j (numerators of ae and oe respectively) then we can use known results for estimating the number of distinct elements (#distinct) [15, 23] and union size of streams (jX t Y t j) 25] in addition to our estimates of ae, and oe and get 1 Sigma ffl approximation for them as well, up to relative precision p. 1.3 Related Work There has been very little work on algorithms for the windowed data stream model. This model was defined ....
....c i (B) This is because for min hash function h as in Section 2, we have h(X Y ) minfh(X) h(Y )g. Thus the concatenation takes only O(k) O(1) time. Finally, if we wish to estimate jR ff (t)j rather than ae ff (t) we can get an estimate for the number of distinct items seen so far using [23] as described in Section 2 and multiply that by ae ff ; the result will be a 1 Sigma ffl approximation to jR ff (t)j up to precision p. This may be desired in some cases. 4 Similarity Estimation We now consider the similarity estimation problem. Recall from Section 2 that to estimate ....
P. Flajolet, G. Martin. Probabilistic Counting. In Proc. 24th Symposium on Foundations of Computer Science, 1983.
....of the stream, and update this every time an item is added or removed. We focus on these synopsis methods, since they can work in our data streams model, whereas sampling is not suited to dynamic modification of the data. The most widely applicable synopsis method is that of Flajolet and Martin [17, 18], which we describe in outline to enable comparison with our algorithm. The algorithm is shown in Figure 1. The crucial part is the set of m hash functions hash j , which map item values onto the range [1 . log n] hash j is designed so that the probability Pr[hash j (i) #] 2 # . ....
....vector, then we expect d 2 to be mapped to the first entry, d 4 to be mapped to the second, and so on until there are none expected in the (log 2 d)th. Several repetitions of this procedure are done independently, and the result scaled by an appropriate factor (1.2928) Theorem 3. 1 Theorem 2 of [17]. This procedure gives an unbiased estimate of the number of distinct values that are seen in a stream. This solves the problem of finding the number of distinct values in a stream of values, which as we know it is the Hamming norm of that stream. However, this method fails to find the Hamming ....
P. Flajolet and G. N. Martin. Probabilistic counting. In 24th Annual Symposium on Foundations of Computer Science, pages 76--82, 1983.
....describe the estimation of the size of projections via sampling. Approaches to the problem vary depending on whether there is an index available on some or all of the projection attributes: 1. Do the projection, then count. 2. Scan the relation, doing probabilistic counting [WVZT90, FM85, Fla85, FM83, ASW87] 3. Sample the base relation; for each element of the sample find the number of the records in the base relation which match the sampled record on the projection attributes, call this jx i j. Estimate the size of the projection as the estimated average contribution of each element of the ....
P. Flajolet and G.N. Martin. Probabilistic counting. In 24th Annual Symposium on Foundations of Computer Science, pages 76--82, 1983.
....and the timestamp are strictly increasing. This scheme has a worst case space requirement of O(N log R) bits. However, if the data elements arrive in a random order, the expected space complexity will be O(logN log R) 6. 5 Distinct Values: dIt is easy to adapt the technique of Flajolet and Martin [6] to estimate the number of distinct elements in the last N data elements. Their probabilistic counting technique maintains a bitmap of size O(log R) where R is an upper bound on the number of distinct values in the data set. In the case of sliding windows, R N and a bitmap of size O(log N ) ....
P. Flajolet, G. Martin. Probabilistic Counting. In Proc. 24th Symposium on Foundations of Computer Science, 1983.
....and sorting as a function of the number of passes over the data. The model was formalized by Henzinger, Raghavan, and Rajagopalan [9] who gave several algorithms and complexity results related to graph theoretic problems and their applications. Other recent results on data streams can be found in [6, 18, 19, 5, 10]. The work of Feigenbaum et al. [5, 10] constructs sketches of data streams, under the assumption that the input is ordered by the adversary. Here we intend to succinctly capture the input data in histograms, thus the attribute values are assumed to be indexed in time. This is mostly motivated from ....
P. Flajolet and G. N. Martin. Probabilistic counting. Proceedings of 24th Annual IEEE Symposium on Foundations of Computer Science, pages 76-82, 1983.
....sorting as a function of the number of passes over the data. The model was formalized by Henzinger, Raghavan, and Rajagopalan [7] who gave several algorithms and complexity results re1 lated to graph theoretic problems and their applications. Other recent results on data streams can be found in [4, 13, 14, 6]. Related Work on Clustering In this paper we shall consider models in which clusters have a distinguished point, or center. In the k Median problem, the objective is to minimize the average distance from data points to their closest cluster centers. The 1 median problem was rst posed by ....
P. Flajolet and G. N. Martin. Probabilistic Counting In Proceedings of 24th Annual IEEE Symposium on Foundations of Computer Science, pages 76-82, 1983.
....any given data set. Note that the synopsis data structure is maintained while observing the entire data set. In practice, this can be realized while the data set is loaded into the disks, and the synopsis data structure is maintained in main memory with very small overhead. Flajolet and Martin [FM83, FM85] described a randomized algorithm for estimating F 0 using only O(log u) memory bits, and analyzed its performance assuming one may use in the algorithm an explicit family of hash functions which exhibits some ideal random properties. The (log n) synopsis data structure consists of a bit ....
....the expected number of items selecting V [i] is F 0 =2 i , and therefore 2 i 0 , where i 0 is the largest i such that V [i] 1, is a good estimate for F 0 . Alon et al. [AMS96] adapted the algorithm so that linear hash functions could be used instead, obtaining the following. Theorem 3. 2 [FM83, AMS96] For every c 2 there exists an algorithm that, given a sequence A of n members of U = f1; 2; ug, computes a number Y using O(log u) memory bits, such that the probability that the ratio between Y and F 0 is not between 1=c and c is at most 2=c. Proof. Let d be the smallest ....
P. Flajolet and G. N. Martin. Probabilistic counting. In Proc. 24th IEEE Symp. on Foundations of Computer Science, pages 76--82, November 1983.
....for maintaining random sample views. Matias et al. MVN93, MVY94, MSY96] proposed and studied approximate data structures that provide fast approximate 20 answers. These data structures have linear space footprints. Other works on incremental maintenance of approximate synopses include [FM83, FM85, WVZT90, HNSS95, AMS96, GMP97b, GP97] Finally, there has been considerable work on sampling based estimation algorithms for use within a query optimizer (e.g. H OT88, H OT89, LN89, LN90, LNS90, H OD91, HS92, LS92, LNSS93, HNSS93, HNS94, LN95, HNSS95, GGMS96] None of this previous ....
P. Flajolet and G. N. Martin. Probabilistic counting. In Proc. 24th IEEE Symp. on Foundations of Computer Science, pages 76--82, October 1983.
....but note that, in general, one pays a high cost in processing time, even just to read the input the input has been expanded exponentially. The algorithm of Corollary 18 avoids this cost, because it is efficient in both input representations. 4. 4 Earlier work on probabilistic counting In [FM83], the authors give a small space randomized algorithm that approximates the number of distinct elements in a stream. Their algorithm assumed the existence of certain ideal hash functions. Later [AMS96] improved this result by substituting a practically available family of hash functions. AMS96] ....
P. Flajolet and G. N. Martin. Probabilistic Counting. In Proc. 24'th Foundations of Computer Science Conference, IEEE Computer Society, Los Alamitos, pages 76--82, 1983.
....approximation and randomization are necessary for approximating F k to within a relative constant if only o(min(n; m) memory bits are used. We also presented an Omega Gamma 2 m) lower bound on the memory bits required to approximate F 0 to within a relative constant, matching the upper bound of [FM83, FM85] Moreover, we presented an Omega Gamma 1 lg n) lower bound on the memory bits required to approximate F 1 to within a relative constant, matching the upper bound of [Mor78] In [AGMS96] we extended the algorithm for maintaining F k in the presence of inserts to the data set to handle ....
....22 in [MD88, PI97] A number of probabilistic techniques have been previously proposed for various counting problems. Morris [Mor78] see also [Fla85] HK95] showed how to approximate the sum of a set of n values in [1: m] using only O(lg lg m lg lg n) bits of memory. Flajolet and Martin [FM83, FM85] designed an algorithm for approximating the number of distinct values in a relation in a single pass through the data and using only O(lg n) bits of memory. Other algorithms for approximating the number of distinct values in a relation include [WVZT90, HNSS95] Probabilistic techniques for ....
P. Flajolet and G. N. Martin. Probabilistic counting. In Proc. 24th IEEE Symp. on Foundations of Computer Science, pages 76--82, October 1983.
....set, the more elements there probably were. Using only O(log r) bits in the bit vectors, there will clearly be many collisions. If the assignment of elements to bits is equi probable, the bit vector will quickly saturate and estimates for large cardinalities will be terrible. Flajolet and Martin [10] show that by assigning elements to bits with exponentially decreasing probability, the bit vector avoids saturation. Estimates with errors that are within a given percentage of the actual are possible. By averaging the estimates from multiple random bit vector assignments, this percentage error ....
....with errors that are within a given percentage of the actual are possible. By averaging the estimates from multiple random bit vector assignments, this percentage error can be brought arbitrarily low. With V = 64 bit vectors, the error is less than 5 with high probability, independent of r [10]. Each bit vector should be of length l = ceiling(log r) To assign each object a bit, first choose a uniformly distributed random number, x, between 0 and 2 l 1. Then compute the bit position of the first 1 in the binary representation of x. This gives a random number exponentially ....
[Article contains additional citation context not shown here]
P. Flajolet and G. N. Martin, "Probabilistic Counting," presented at Foundations of Computer Science, 1983, p. 76-82.
....to the problem vary depending on whether there is an index available on some or all of the projection attributes: 1. Do the projection, then count. 2. Scan the relation, doing probabilistic counting (see Whang, Vander Zanden and Taylor (1990) Flajolet and Martin (1985) Flajolet (1985) Flajolet and Martin (1983), and Astrahan, Schkolnick and Whang (1987) 3. Sample the base relation; for each element of the sample find the number of the records in the base relation which match the sampled record on the projection attributes, call this jx i j. Estimate the size of the projection as the estimated average ....
Flajolet, P. and Martin, G. (1983). Probabilistic counting, 24th Annual Symposium on Foundations of Computer Science, pp. 76--82.
....in general, one pays a high cost in processing time, even just to read the input the input has been expanded exponentially. The algorithm of Corollary 16 avoids this cost, because it is efficient in both input representations. 4. 4 Earlier work on probabilistic counting Flajolet and Martin [9] give a small space randomized algorithm that approximates the number of distinct elements in a stream. Their algorithm assumed the existence of certain ideal hash functions. Later, Alon et al. 1] improved this result by substituting a practically available family of hash functions and also gave ....
P. Flajolet and G. N. Martin. Probabilistic Counting. In Proc. 24'th Foundations of Computer Science Conference, IEEE Computer Society, Los Alamitos, pages 76--82, 1983.
....but note that, in general, one pays a high cost in processing time, even just to read the input the input has been expanded exponentially. The algorithm of Corollary 16 avoids this cost, because it is efficient in both input representations. 4. 4 Earlier work on probabilistic counting In [FM83], the authors give a small space randomized algorithm that approximates the number of distinct elements in a stream. Their algorithm assumed the existence of certain ideal hash functions. Later [AMS96] improved this result by substituting a practically available family of hash functions. AMS96] ....
P. Flajolet and G. N. Martin. Probabilistic Counting. In Proc. 24'th Foundations of Computer Science Conference, IEEE Computer Society, Los Alamitos, pages 76--82, 1983.
....space footprints. A number of probabilistic techniques have been previously proposed for various counting problems. Morris [Mor78] see also [Fla85] HK95] showed how to approximate the sum of a set of n values in [1: m] using only O(lg lg m lg lg n) bits of memory. Flajolet and Martin [FM83, FM85] designed an algorithm for approximating the number of distinct values in a relation in a single pass through the data and using only O(lg n) bits of memory. Other algorithms for approximating the number of distinct values in a relation include [WVZT90, HNSS95] Alon, Matias and Szegedy ....
P. Flajolet and G. N. Martin. Probabilistic counting. In Proc. 24th IEEE Symp. on Foundations of Computer Science, pages 76--82, October 1983.
....case, that is, the space complexity as a function of n, m, the relative error and the error probability ffl. Morris [17] see also [7] 12] showed how to approximate F1 (that is; how to design an approximate counter) using only O(lg lg m) O(lg lg n) bits of memory. Flajolet and Martin [8] designed an algorithm for approximating F0 using O(lg n) bits of memory. Their analysis, however, is based on the assumption that explicit families of hash functions with very strong random properties are available. Whang et al. [19] considered the problem of approximating F0 in the context of ....
....using only O(lg n) memory bits. In addition we observe that a version of the FlajoletMartin algorithm for approximating F0 can be implemented and analyzed using very simple linear hash functions, and that (not surprisingly) the O(lg lg n) and the O(lg n) bounds in the algorithms of [17] and [8] for estimating F1 and F0 respectively are tight. We also make some comments concerning the space complexity of deterministic algorithms that approximate the frequency moments Fk as well as on the space complexity of randomized or deterministic algorithms that compute those precisely. The rest of ....
[Article contains additional citation context not shown here]
P. Flajolet and G. N. Martin, Probabilistic counting, FOCS 1983, 76-82.
....that is, the space complexity as a function of n, m, the relative error and the error probability ffl. Morris [15] see also [6] 11] showed how to approximate F 1 (that is; how to design an approximate counter) using only O(log log m) O(log log n) bits of memory. Flajolet and Martin [7] designed an algorithm for approximating F 0 using O(log n) bits of memory. Their analysis, however, is based on the assumption that explicit families of hash functions with very strong random properties are available. Whang et al. [17] considered the problem of approximating F 0 in the context ....
....using only O(log n) memory bits. In addition we observe that a version of the Flajolet Martin algorithm for approximating F 0 can be implemented and analyzed using very simple linear hash functions, and that (not surprisingly) the O(log log n) and the O(log n) bounds in the algorithms of [15] and [7] for estimating F 1 and F 0 respectively are tight. We also make some comments concerning the space complexity of deterministic algorithms that approximate the frequency moments F k as well as on the space complexity of randomized or deterministic algorithms that compute those precisely. The rest ....
[Article contains additional citation context not shown here]
P. Flajolet and G. N. Martin, Probabilistic counting, FOCS 1983, 76-82.
....that the value of the estimator is k was computed via the inclusion exclusion principle. After some more approximations, the very interesting Dirichlet series N(s) X k1 ( Gamma1) k) k s arises with the familiar sum of digits function (k) One of the major challenges of the papers [40, 51] is to find an analytic continuation of this series to the complex plane. This is somehow related to the infinite product Y k0 i 1 Gamma x 2 k j = X n0 ( Gamma1) n) x n : A paper on adaptive sampling [85] might also be seen in this context. 12 H. PRODINGER AND W. SZPANKOWSKI 13. ....
P. Flajolet and P. N. Martin. Probabilistic counting. In Proceedings of the 24th Annual Symposium on Foundations of Computer Science, pages 76--82, 1983.
....has been expanded exponentially. The algorithm of Corollary 18 avoids this cost, because it is e#cient in both input representations. Thus, the L 2 di#erence is in PASST( log(M) log(n) log(1 #) # 2 , field(log(n) log(M) log(1 #) # 2 ) 4. 4 Earlier work on probabilistic counting In [FM83], the authors give a small space randomized algorithm that approximates the number of distinct elements in a stream. Their algorithm assumed the existence of certain ideal hash functions. Later, AMS99] improved this result by substituting a practically available family of hash functions. AMS99] ....
P. Flajolet and G. N. Martin. Probabilistic Counting. In Proc. 24'th Foundations of Computer Science Conference, IEEE Computer Society, Los Alamitos, pages 76--82, 1983.
No context found.
P. Flajolet, G. N. Martin. Probabilistic Counting. In Proc. Symp. on Foundations of Computer Science, 1983.
No context found.
P. Flajolet and G. N. Martin. Probabilistic counting. In Proc. of FOCS, 1983.
No context found.
P. Flajolet and G.N. Martin. Probabilistic counting. In Proc. 24th IEEE Symposium on Foundation of Computer Science, pages 76-82, 1983.
No context found.
P. Flajolet and G. N. Martin. Probabilistic counting. In Proc. FOCS'83, pages 76--82, 1983.
No context found.
P. Flajolet and G. N. Martin. Probabilistic counting. In Proc. 24th Annual Symp. on Foundations of Computer Science (FOCS 1983.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC