Results 1 
4 of
4
Bounded Independence Fools Halfspaces
 In Proc. 50th Annual Symposium on Foundations of Computer Science (FOCS), 2009
"... We show that any distribution on {−1, +1} n that is kwise independent fools any halfspace (a.k.a. linear threshold function) h: {−1, +1} n → {−1, +1}, i.e., any function of the form h(x) = sign ( ∑n i=1 wixi − θ) where the w1,..., wn, θ are arbitrary real numbers, with error ɛ for k = O(ɛ−2 log 2 ..."
Abstract

Cited by 46 (18 self)
 Add to MetaCart
(Show Context)
We show that any distribution on {−1, +1} n that is kwise independent fools any halfspace (a.k.a. linear threshold function) h: {−1, +1} n → {−1, +1}, i.e., any function of the form h(x) = sign ( ∑n i=1 wixi − θ) where the w1,..., wn, θ are arbitrary real numbers, with error ɛ for k = O(ɛ−2 log 2 (1/ɛ)). Our result is tight up to log(1/ɛ) factors. Using standard constructions of kwise independent distributions, we obtain the first explicit pseudorandom generators G: {−1, +1} s → {−1, +1} n that fool halfspaces. Specifically, we fool halfspaces with error ɛ and seed length s = k · log n = O(log n · ɛ−2 log 2 (1/ɛ)). Our approach combines classical tools from real approximation theory with structural results on halfspaces by Servedio (Comput. Complexity 2007).
Fooling Functions of Halfspaces under Product Distributions
, 2010
"... ... under a very broad class of product distributions. This class includes not only familiar cases such as the uniform distribution on the discrete cube, the uniform distribution on the solid cube, and the multivariate Gaussian distribution, but also includes any product of discrete distributions wi ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
... under a very broad class of product distributions. This class includes not only familiar cases such as the uniform distribution on the discrete cube, the uniform distribution on the solid cube, and the multivariate Gaussian distribution, but also includes any product of discrete distributions with probabilities bounded away from 0. Our first main result shows that a recent pseudorandom generator construction of Meka and Zuckerman [MZ09], when suitably modified, can fool arbitrary functions of d halfspaces under product distributions where each coordinate has bounded fourth moment. To ǫfool any sizes, depthd decision tree of halfspaces, our pseudorandom generator uses seed length O((dlog(ds/ǫ)+logn)·log(ds/ǫ)). For monotone functions of d halfspaces, the seed length can be improved to O((dlog(d/ǫ)+logn)·log(d/ǫ)). We get better bounds for larger ǫ; for example, to1/polylog(n)foolallmonotonefunctionsof(logn)/loglognhalfspaces,ourgeneratorrequires a seed of length just O(logn). Our second main result generalizes the work of Diakonikolas et al. [DGJ + 09] to show that bounded independence suffices to fool functions of halfspaces under product distributions. Assuming each coordinatesatisfiesacertainstrongermoment condition, we showthat anyfunction computable by a sizes, depthd decision tree of halfspaces is ǫfooled by Õ(d4 s 2 /ǫ 2)wise independence. Our technical contributions include: a new multidimensional version of the classical BerryEsseen theorem; a derandomization thereof; a generalization of Servedio [Ser07]’s regularity lemma for halfspaceswhichworksunderanyproduct distribution with bounded fourth moments; an extension of this regularity lemma to functions of many halfspaces; and, new analysis of the sandwiching polynomials technique of Bazzi [Baz09] for arbitrary product distributions.
Realtime Approximate Range Motif Discovery & Data Redundancy Removal Algorithm
"... Removing redundancy in the data is an important problem as it helps in resource and compute efficiency for downstream processing of massive (10 million to 100 million records) datasets. In application domains such as IR, stock markets, telecom and others there is a strong need for realtime data red ..."
Abstract
 Add to MetaCart
(Show Context)
Removing redundancy in the data is an important problem as it helps in resource and compute efficiency for downstream processing of massive (10 million to 100 million records) datasets. In application domains such as IR, stock markets, telecom and others there is a strong need for realtime data redundancy removal of enormous amounts of data flowing at the rate of 1Gb/s or higher. We consider the problem of finding Range Motifs (clusters) over records in a large dataset such that records within the same cluster are approximately close to each other. This problem is closely related to the approximate nearest neighbour search but is more computationally expensive. Realtime scalable approximate Range Motif discovery on massive datasets is a challenging problem. We present the design of novel sequential and parallel approximate Range Motif discovery and data deduplication algorithms using Bloom filters. We establish asymptotic upper bounds on the false positive and false negative rates for our algorithm. Further, time complexity analysis of our parallel algorithm on multicore architectures has been presented. For 10 million records, our parallel algorithm can perform approximate Range Motif discovery and data deduplication, on 4 sets (clusters), in 59s, on 16 core Intel Xeon 5570 architecture. This gives a throughput of around 170K records/s and around700Mb/s (using records of size4K bits). To the best of our knowledge, this is the highest realtime throughput for approximate Range Motif discovery and data redundancy removal on such massive datasets.
Abstract
, 2010
"... under averybroadclassofproductdistributions. This classincludes notonlyfamiliarcasessuch as the uniform distribution on the discrete cube, the uniform distribution on the solid cube, and the multivariate Gaussian distribution, but also includes any product of discrete distributions with probabilitie ..."
Abstract
 Add to MetaCart
under averybroadclassofproductdistributions. This classincludes notonlyfamiliarcasessuch as the uniform distribution on the discrete cube, the uniform distribution on the solid cube, and the multivariate Gaussian distribution, but also includes any product of discrete distributions with probabilities bounded away from 0. Our first main result shows that a recent pseudorandom generator construction of Meka and Zuckerman [MZ09], when suitably modified, can fool arbitrary functions of d halfspaces under product distributions where each coordinate has bounded fourth moment. To ǫfool any sizes, depthd decision tree of halfspaces, our pseudorandom generator uses seed length O((dlog(ds/ǫ)+logn)·log(ds/ǫ)). For monotone functions of d halfspaces, the seed length can be improved to O((dlog(d/ǫ)+logn)·log(d/ǫ)). We get better bounds for larger ǫ; for example, to1/polylog(n)foolallmonotonefunctionsof(logn)/loglognhalfspaces,ourgeneratorrequires a seed of length just O(logn). Our second main result generalizes the work of Diakonikolas et al. [DGJ + 09] to show that bounded independence suffices to fool functions of halfspaces under product distributions. Assuming each coordinatesatisfiesacertainstrongermoment condition, we showthat anyfunction computable by a sizes, depthd decision tree of halfspaces is ǫfooled by Õ(d4 s 2 /ǫ 2)wise independence. Our technical contributions include: a new multidimensional version of the classical BerryEsseen theorem; a derandomization thereof; a generalization of Servedio [Ser07]’s regularity lemmaforhalfspaceswhichworksunderanyproductdistributionwithboundedfourthmoments; an extension of this regularity lemma to functions of many halfspaces; and, new analysis of the sandwiching polynomials technique of Bazzi [Baz09] for arbitrary product distributions.