45 citations found. Retrieving documents...
FLAJOLET,P.,AND MARTIN, G. N. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31,2 (1985), 182--209.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Counting Distinct Elements in a Data Stream - Bar-Yossef, Jayram, Kumar.. (2002)   (1 citation)  (Correct)

....properties (say, the number of distinct destination addresses) of the tra#c flow. The number of distinct elements is also a natural quantity of interest in several large data set applications (e.g. the number of distinct queries made to a search engine over a week) Flajolet and Martin [FM85] designed the first algorithm for approximating F 0 in the data stream (or what was then thought of as a one pass) model. Unfortunately, their algorithm assumed the existence of hash functions with some ideal properties; it is not known how to construct such functions with limited space. Alon, ....

....Bar Yossef et al. BKS02] gave an algorithm that used O(1 # m) space (and time per element) but that had some other nice property required for their application. Cohen [Coh97] considered this problem in the context of graph theoretic applications; her algorithm is similar in spirit to that of [FM85,AMS99]; specifically, it has a high level viewpoint similar to the first algorithm in this paper. However, the implementation is very di#erent, and does not yield a o(m) space algorithm. One of the drawbacks of the algorithms of [GT01,BKS02] is that the space and time are the product of poly(1 #) and ....

[Article contains additional citation context not shown here]

P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31:182--209, 1985.


Combinatorics of Geometrically Distributed Random.. - Knopfmacher, Prodinger   (Correct)

....according to the geometric distribution with PfX = kg = pq , with p q = 1. We find it useful also to use the abbreviation Q = q Gamma1 . The motivation for this work comes from Computer Science. In this context records arise in the study of skip lists [11, 9] and probabilistic counting [2, 8]. Also, since equal letters are now allowed, there are two versions that should be considered in parallel, the standard version, and the weak version, where is replaced by , which means that a new maximum only has to be larger or equal to the previous ones. The paper [12] contains asymptotic ....

P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31:182--209, 1985.


Comparing Data Streams Using Hamming Norms (How to Zero.. - Cormode, Datar, Indyk.. (2002)   (Correct)

....of the stream, and update this every time an item is added or removed. We focus on these synopsis methods, since they can work in our data streams model, whereas sampling is not suited to dynamic modification of the data. The most widely applicable synopsis method is that of Flajolet and Martin [17, 18], which we describe in outline to enable comparison with our algorithm. The algorithm is shown in Figure 1. The crucial part is the set of m hash functions hash j , which map item values onto the range [1 . log n] hash j is designed so that the probability Pr[hash j (i) #] 2 # . ....

P. Flajolet and G. N. Martin. Probabilistic counting algorithms for database applications. Journal of Computer and System Sciences, 31:182--209, 1985.


The Connectivity and Fault-Tolerance of the Internet Topology - Palmer, Siganos, Michalis (2001)   (2 citations)  (Correct)

....u This algorithm will be horribly inefficient to use in practice because the set operations are expensive. Instead, we use a too] ca]led approximate counting. An approximate counting algorithm takes as input a multi set and then estimates the number of distinct elements in the multi set. In [5], each possible element (for us, that is each node) is assigned a random bit using an exponential distribution (haft the nodes get bit 0, a quarter get bit 1, etc. To estimate the number of elements in a multi set, you simply 0g together the bits that we assigned to each element. The estimate is ....

P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31:182209, 1985.


Random Sampling from Databases - Olken (1993)   (37 citations)  (Correct)

....[NS90] describe the estimation of the size of projections via sampling. Approaches to the problem vary depending on whether there is an index available on some or all of the projection attributes: 1. Do the projection, then count. 2. Scan the relation, doing probabilistic counting [WVZT90, FM85, Fla85, FM83, ASW87] 3. Sample the base relation; for each element of the sample find the number of the records in the base relation which match the sampled record on the projection attributes, call this jx i j. Estimate the size of the projection as the estimated average contribution of each ....

P. Flajolet and G.N. Martin. Probabilistic counting algorithms for database applications. Journal of Computer and System Sciences, 31(2):182--209, Oct. 1985.


A Cluster Architecture for Parallel Data Warehousing - Dehne, Eavis, Rau-Chaplin (2001)   (Correct)

....distributed datasets, they are not as reliable on real world data warehouses. Consequently, the use of probabilistic estimators that rely upon a single pass of the dataset have been suggested. As described in [17] our implementation builds upon the counting algorithm of Flajolet and Martin [6]. Essentially, we concatenate the dimension fields into bitvectors of length and then hash the vectors into the range . The algorithm then uses a probablistic technique to count the number of distinct records (or hash values) that are likely to exist in the input set. To ....

P. Flajolet and G. Martin. Probabilistic counting algorithms for database applications. Journal of Computer and System Sciences, 31(2):182--209, 1985.


Reductions in Streaming Algorithms, with an.. - Bar-Yossef, Kumar..   (Correct)

....we make considerable new progress. Computing the number of distinct elements is a fundamental 2 problem in its own right, besides its role as a primitive in streaming algorithm design. Surprisingly, however, this problem has not been fully solved. Alon et al. AMS99] based on the hashing idea of [Sip83, Sto83, FM85], show how to produce an approximator f F 0 in the streaming model, so that F 0 =c f F 0 cF 0 , for any constant c 2. We rst show (Section 4) how to build on the algorithm of [AMS99] to obtain similar approximations for the more interesting case where c = 1 for any 0. Our ....

P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences (JCSS), 31(2):182-209, 1985.


Fast Approximation of the "Neighbourhood" Function for.. - Palmer, Gibbons   (Correct)

....function. 3.1 Approximate Counting Approximate counting algorithms are used to approximate the number of distinct items in a multiset. Two di#erent methods that have been proposed in the literature. The first, which we call the BitMask (BM ) approach was proposed by Flajolet and Martin [7]. The second, which we call the Random Interval (RI ) approach was proposed by Cohen [4] Both algorithms solve the same problem: given a set X of N elements and a multi set M of elements drawn from X , estimate the number of distinct elements in M . The RI method works as follows. Each of the n ....

P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31:182--209, 1985.


Binomial Multifractal Curve Fitting for View Size.. - Nadeau, RUNAPONGSA..   (Correct)

....= v v(1 1 v) n . 1) Cardenas formula assumes uniform distribution. However, the data distribution affects the number of rows in an aggregate. In order to capture the effect of data distribution, other methods have been developed. Probabilistic counting was introduced as a new approach in [2]. A hashing function is applied to the values, and meta data is gathered on the output. Probabilistic analysis is applied to the meta data, determining an estimate of the number of distinct Sell CustID DateID BindID Cost Fact Table DateID Month Quarter Year Calendar CustID Name City ....

....fitting approach. The scalability in the number of rows in the fact table is currently being tested empirically using databases with up to 5 million rows. Scalability in the number of dimensions is a problem for further research. We are working on implementing the probabilistic counting approach [2]. We did not include this algorithm with the original testing since probabilistic counting requires a full scan of the fact table, and we were focusing on sampling approaches. However, probabilistic counting deserves testing along with these other approaches. Probabilistic counting is extremely ....

P. Flajolet, G. N. Martin. Probabilistic Counting Algorithms for Database Applications. Journal of Computer and System Sciences 31, 1985, pp. 182 - 209.


Estimating Simple Functions on the Union of Data Streams - Gibbons, Tirthapura (2001)   (26 citations)  (Correct)

....is an interesting result in itself because of the importance of the F0 function in database optimization (see, e.g. 6, 15, 26] and in internet traffic analysis, e.g. the number of distinct web pages requested, or the number of distinct visitors to a website. This problem has been studied in [4, 10] and elsewhere, but we do not know of an (ffl; ffi) approximation scheme for F0 whose bounds match the bounds we obtain. Our logarithmic space bounds for union and for F0 using coordinated sampling contrast with Omega Gamma p n) and Omega Gamma p m) lower bounds we present for union and F0 ....

....Consider the zeroth frequency moment (F0) of a sequence of n items in [1: m] where m n. Estimating this function has been studied in the context 1 Note that the time per item is critical in practice, due to the extremely high network traffic rate. of a single stream for both public coins [7, 10, 26] and stored coins [4] These previous algorithms trivially extend to the distributed streams model, but the space and or time bounds for an (ffl; ffi) approximation scheme for F0 are worse than our algorithm s bounds. The best previous bounds are due to Cohen [7] which matches our space bound, ....

P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. J. Computer and System Sciences, 31:182--209, 1985.


Distinctness of compositions of an integer: A Probabilistic.. - Hitczenko, Louchard (2001)   (Correct)

....m: More generally, let us de ne ( R e x [F (x) F (x 1) dx, we obtain ( e 1 = e m e 1 ( Now ( e m ( which proves the lemma. Numerous applications of this lemma can be found in algoritm analysis: let us mention approximate counting ( 11] [15]) Tries ( 33] adaptative sampling ( 35] Digital search trees ( 34] leader election ( 10] Lempel Ziv algorithm ( 38] polyominos analysis ( 36] data structures maxima ( 28] etc. For instance, we derive 3 = 3 ; 4 = 4 2 =2 1=80; 5 = 5 5=6 3 : Also var(Mn ....

Flajolet, P., Martin, G. Probabilistic counting algorithms for data base applications, J. Comput. System Sci. 31(1985), 182-209.


Estimating Simple Functions on the Union of Data Streams - Gibbons, Tirthapura (2000)   (26 citations)  (Correct)

....that this is an interesting result in itself because of the importance of the F 0 function in database optimization (see, e.g. HNSS95] and in IP traffic analysis, e.g. the number of distinct web pages requested, or the number of distinct visitors to a website. This problem has been studied in [FM85, AMS96], but we do not know of an (ffl; ffi) approximation scheme for F 0 whose bounds match the bounds we obtain. Our logarithmic space bounds for union and for F 0 using coordinated sampling contrast with an Omega Gamma p n) lower bound for union and an Omega Gamma m) lower bound for F 0 using an ....

....n) sample sizes. Thus uncoordinated sampling would require Omega Gamma p n) workspace. Distinct Counting. Consider the zeroth frequency moment (F 0 ) of a sequence of n items in f1; mg, where m n. This function has been studied in the context of a single stream for both public coins [FM85] and stored coins [AMS96] These previous algorithms trivially extend to the distributed streams model, but we do not know how to convert them into an (ffl; ffi) approximation scheme; and very likely, even if we obtained one, the space and or time bounds would be worse than our algorithm s ....

P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. J. Computer and System Sciences, 31:182--209, 1985.


The Connectivity and Fault-Tolerance of the Internet.. - Palmer, Siganos.. (2001)   (2 citations)  (Correct)

....h)j: This algorithm will be horribly inefficient to use in practice because the set operations are expensive. Instead, we use a tool called approximate counting. An approximate counting algorithm takes as input a multiset and then estimates the number of distinct elements in the multi set. In [5], each possible element (for us, that is each node) is assigned a random bit using an exponential distribution (half the nodes get bit 0, a quarter get bit 1, etc. To estimate the number of elements in a multi set, you simply OR together the bits that we assigned to each element. The estimate is ....

P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31:182-- 209, 1985.


Distinctness of compositions of an integer: A Probabilistic.. - Pawe Hitczenko And (2001)   (Correct)

....More generally, let us de ne ( R e x [F (x) F (x 1) dx, we obtain ( e 1 = e m e 1 ( 12 Now ( e m ( which proves the lemma. Numerous applications of this lemma can be found in algoritm analysis: let us mention approximate counting ( 9] [11]) Tries ( 28] adaptative sampling ( 30] Digital search trees ( 29] leader election ( 8] Lempel Ziv algorithm ( 33] polyominos analysis ( 31] data structures maxima ( 23] etc. For instance, we derive 3 = 3 ; 4 = 4 2 =2 1=80; 5 = 5 5=6 3 : Also var(Mn ) ....

Flajolet, P., Martin, G. Probabilistic counting algorithms for data base applications, J. Comp. and System Sciences 31(1985), 182-209.


Series and Infinite Products Related to Binary Expansion of.. - Allouche (1992)   (Correct)

....equals 1 or Gamma1 according to the parity of the number of w in the binary expansion of n. For example, Y n0 (2 n 1) 2 (n 1) 4 n 1) fi(n) p 2 2 ; if fi(n) Gamma1) u(n) is the Rudin Shapiro sequence associated to pattern w = 11. In the same vein, the Flajolet product [6] Y n0 2 n 2 n 1 ffl(n) is not known. We point out that the rational fraction 2 n= 2 n 1) is related to pattern w = 01. 3. Newman Coquet sequence This is our last example. The first few terms of the Thue Morse sequence ffl(n) are (with j 1 and Gamma j Gamma1) Gamma Gamma ....

Flajolet (P.) and Martin (G. N.). -- Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, vol. 31, n 2, October 1985, pp. 182--209.


Synopsis Data Structures for Massive Data Sets - Gibbons, Matias (1999)   (29 citations)  (Correct)

....given data set. Note that the synopsis data structure is maintained while observing the entire data set. In practice, this can be realized while the data set is loaded into the disks, and the synopsis data structure is maintained in main memory with very small overhead. Flajolet and Martin [FM83, FM85] described a randomized algorithm for estimating F 0 using only O(log u) memory bits, and analyzed its performance assuming one may use in the algorithm an explicit family of hash functions which exhibits some ideal random properties. The (log n) synopsis data structure consists of a bit vector ....

P. Flajolet and G. N. Martin, Probabilistic counting algorithms for data base applications, J. Computer and System Sciences 31 (1985), 182--209.


Dynamic Maintenance of Wavelet-Based Histograms - Matias, Vitter, Wang (2000)   (22 citations)  (Correct)

....If j is not a type (a) node, we do nothing. When the number of entries in log L reaches Max Log Size, we process the entries in the log. For any entry i in L, we update all the corresponding type (b) nodes according to (6) For a type (c) node j, we use a probabilistic counting technique [FM85]: We flip a coin with probability p(j) of heads. If the coin flips a head, we set node j s magnitude b S(j) to be v(j) a value to be determined later) and we replace the smallest node (in magnitude) in H 0 with node j. When all the entries in the log L have been processed, we adjust H and H ....

Philippe Flajolet and G. Nigel Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31(2):182--209, October 1985.


Online Prediction Algorithms for Databases and Operating Systems - Krishnan (1995)   (7 citations)  (Correct)

....quite different. Query optimizers have cost models that estimate the access cost as a function of the predicted number of qualifying rows and find the cheaper alternative. Models already exist in current day relational database management systems (RDBMSs) to predict selectivity for numeric fields [ASW, FlM, Iye, SAC, WVT]. With the popularity of textual data being stored in RDBMS, it has become important to predict the selectivity accurately even for alphanumeric fields. A particularly problematic predicate used against alphanumeric fields is the like predicate [Iye] For example, consider the inventory of a ....

....phase must be minimal. In Chapter 11 we present our techniques for predicting selectivity for the like predicate; i.e. techniques for estimating alphanumeric selectivity. 10.1 Background and Related Work Models already exist in current day RDBMSs to predict selectivity for numeric fields [ASW, FlM, Iye, SAC, WVT]. Typically, in the preprocessing phase, a few numbers that capture the distribution of data are accumulated and stored in the catalog. In the earlier example dealing with salaries, the RDBMS would perform an analysis of the data in the salary field of the database, and select a small set of ....

[Article contains additional citation context not shown here]

P. Flajolet and G. N. Martin, "Probabilistic Counting Algorithms for Database Applications," Journal of Computer and Systems Sciences 31 (1985), 182--209.


Loglog Counting of Large Cardinalities - Durand, Flajolet (2003)   (5 citations)  Self-citation (Flajolet)   (Correct)

....but is has the disadvantage of not being universal, as it makes de nite statistical assumptions ( stationarity ) regarding the data input to the algorithm. We recommend the thorough engineering discussion of [3] Closer to us is the Probabilistic Counting algorithm of Flajolet and Martin [7]. This uses a certain observable that has excellent statistical properties but is relatively costly to maintain in terms of storage. Indeed, Probabilistic Counting estimates cardinalities with an error close to 0:78= m given a table of m words , each of size about log 2 Nmax . Yet another ....

.... of approximate counting is provided by Alon, Matias, and Szegedy in [1] The authors discuss a class of frequency moments statistics which includes ours (as their F 0 statistics) Our LogLog Algorithm has principles that evoke some of those found in the intersection of [1] and the earlier [7], but contrary to [1] we develop here a complete eminently practical algorithmic solution and provide a very precise analysis, including bias correction, error and risk evaluation, as well as complete dimensioning rules. We estimate that our LogLog algorithm outperforms the earlier Probabilistic ....

Flajolet, P., and Martin, G. N. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences 31, 2 (1985), 182-209.


Reversible Sketches for Efficient and Accurate Change .. - Schweller, Gupta.. (2004)   (Correct)

No context found.

FLAJOLET,P.,AND MARTIN, G. N. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31,2 (1985), 182--209.


Mutable Strings in Java: Design, Implementation and.. - Boldi, Vigna   (Correct)

No context found.

P. Flajolet, G. N. Martin, Probabilistic counting algorithms for data base applications, J. Comput. System Sci. 31 (2).


Spatio-Temporal Aggregation Using Sketches - Tao, Kollios, Considine, Li.. (2004)   (2 citations)  (Correct)

No context found.

Flajolet, P., Martin, G. Probabilistic Counting Algorithms for Data Base Applications. JCSS, 32(2): 182-209.


Synopsis Diffusion for Robust Aggregation in Sensor Networks - Nath, Gibbons (2003)   (11 citations)  (Correct)

No context found.

FLAJOLET, P., AND MARTIN, G. N. Probabilistic counting algorithms for database applications. Journal of Computer and System Sciences 31 (1985), 182--209.


Cluster-based Optimizations for Distributed Hash Tables - Considine (2002)   (Correct)

No context found.

FLAJOLET,P.,AND MARTIN, G. N. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences 31, 2 (1985), 182--209.


Cluster-based Optimizations for Distributed Hash Tables - Considine (2002)   (Correct)

No context found.

FLAJOLET, P., AND MARTIN, G. N. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences 31, 2 (1985), 182--209.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC