Compressed counting
| Venue: | CoRR |
| Citations: | 9 - 4 self |
BibTeX
@ARTICLE{Li_compressedcounting,
author = {Ping Li},
title = {Compressed counting},
journal = {CoRR},
year = {},
pages = {2008}
}
OpenURL
Abstract
We propose Compressed Counting (CC) for approximating the αth frequency moments (0 < α ≤ 2) of data streams under a relaxed strict-Turnstile model, using maximallyskewed stable random projections. Estimators based on the geometric mean and the harmonic mean are developed. When α = 1, a simple counter suffices for counting the first moment (i.e., sum). The geometric mean estimator of CC has asymptotic variance ∝ ∆ = |α − 1|, capturing the intuition that the complexity should decrease as ∆ = |α−1 | → 0. However, the previous classical algorithms based on symmetric stable random projections[12, 15] required O ( 1/ɛ 2) space, in order to approximate the αth moments within a 1 + ɛ factor, for any 0 < α ≤ 2 including α = 1. We show ( that using the geometric mean estimator, CC 1 requires O log(1+ɛ) + 2 √ ∆ log3/2 ( √∆)) + o space, as ∆ → (1+ɛ) 0. Therefore, in the neighborhood of α = 1, the complexity of CC is essentially O (1/ɛ) instead of O ( 1/ɛ 2). CC may be useful for estimating Shannon entropy, which can be approximated by certain functions of the αth moments with α → 1. [10, 9] suggested using α = 1 + ∆ with (e.g.,) ∆ < 0.0001 and ɛ < 10 −7, to rigorously ensure reasonable approximations. Thus, unfortunately, CC is “theoretically impractical ” for estimating Shannon entropy, despite its empirical success reported in [16]. 1







