Results 1  10
of
100
Analysis of representations for domain adaptation
 In NIPS
, 2007
"... Domain is a distribution D on an instance set X Domain adaptation of a classifier A classification task Source domain (DS) ..."
Abstract

Cited by 165 (11 self)
 Add to MetaCart
(Show Context)
Domain is a distribution D on an instance set X Domain adaptation of a classifier A classification task Source domain (DS)
The art of uninformed decisions: A primer to property testing
 Science
, 2001
"... Property testing is a new field in computational theory, that deals with the information that can be deduced from the input where the number of allowable queries (reads from the input) is significally smaller than its size. ..."
Abstract

Cited by 157 (25 self)
 Add to MetaCart
(Show Context)
Property testing is a new field in computational theory, that deals with the information that can be deduced from the input where the number of allowable queries (reads from the input) is significally smaller than its size.
Detecting Change in Data Streams
, 2004
"... Detecting changes in a data stream is an important area of research with many applications. ..."
Abstract

Cited by 135 (3 self)
 Add to MetaCart
Detecting changes in a data stream is an important area of research with many applications.
Testing random variables for independence and identity
 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
, 2000
"... Given access to independent samples of a distribution �over�℄�℄, we show how to test whether the distributions formed by projecting�to each coordinate are independent, i.e., whether�isclose in the norm to the product distribution��for some distributions�over �℄and�over�℄. The sample complexity of o ..."
Abstract

Cited by 78 (20 self)
 Add to MetaCart
(Show Context)
Given access to independent samples of a distribution �over�℄�℄, we show how to test whether the distributions formed by projecting�to each coordinate are independent, i.e., whether�isclose in the norm to the product distribution��for some distributions�over �℄and�over�℄. The sample complexity of our test is �poly, assuming without loss of generality that �. We also give a matching lower bound, up to poly� � factors. Furthermore, given access to samples of a distribution �over�℄, we show how to test if�isclose in norm to an explicitly specified distribution�. Our test uses��poly samples, which nearly matches the known tight bounds for the case when�is uniform. 1.
On testing expansion in boundeddegree graphs
 Electronic Colloquium on Computational Complexity (ECCC
, 2000
"... Abstract. We consider testing graph expansion in the boundeddegree graph model. Specifically, we refer to algorithms for testing whether the graph has a second eigenvalue bounded above by a given threshold or is far from any graph with such (or related) property. We present a natural algorithm aime ..."
Abstract

Cited by 72 (5 self)
 Add to MetaCart
(Show Context)
Abstract. We consider testing graph expansion in the boundeddegree graph model. Specifically, we refer to algorithms for testing whether the graph has a second eigenvalue bounded above by a given threshold or is far from any graph with such (or related) property. We present a natural algorithm aimed towards achieving the foregoing task. The algorithm is given a (normalized) eigenvalue bound λ < 1, oracle access to a boundeddegree Nvertex graph, and two additional parameters ǫ, α> 0. The algorithm runs in time N 0.5+α /poly(ǫ), and accepts any graph having (normalized) second eigenvalue at most λ. We believe that the algorithm rejects any graph that is ǫfar from having second eigenvalue at most λ α/O(1) , and prove the validity of this belief under an appealing combinatorial conjecture.
Streaming and sublinear approximation of entropy and information distances
 In ACMSIAM Symposium on Discrete Algorithms
, 2006
"... In most algorithmic applications which compare two distributions, information theoretic distances are more natural than standard ℓp norms. In this paper we design streaming and sublinear time property testing algorithms for entropy and various information theoretic distances. Batu et al posed the pr ..."
Abstract

Cited by 69 (13 self)
 Add to MetaCart
(Show Context)
In most algorithmic applications which compare two distributions, information theoretic distances are more natural than standard ℓp norms. In this paper we design streaming and sublinear time property testing algorithms for entropy and various information theoretic distances. Batu et al posed the problem of property testing with respect to the JensenShannon distance. We present optimal algorithms for estimating bounded, symmetric fdivergences (including the JensenShannon divergence and the Hellinger distance) between distributions in various property testing frameworks. Along the way, we close a (log n)/H gap between the upper and lower bounds for estimating entropy H, yielding an optimal algorithm over all values of the entropy. In a data stream setting (sublinear space), we give the first algorithm for estimating the entropy of a distribution. Our algorithm runs in polylogarithmic space and yields an asymptotic constant factor approximation scheme. An integral part of the algorithm is an interesting use of an F0 (the number of distinct elements in a set) estimation algorithm; we also provide other results along the space/time/approximation tradeoff curve. Our results have interesting structural implications that connect sublinear time and space constrained algorithms. The mediating model is the random order streaming model, which assumes the input is a random permutation of a multiset and was first considered by Munro and Paterson in 1980. We show that any property testing algorithm in the combined oracle model for calculating a permutation invariant functions can be simulated in the random order model in a single pass. This addresses a question raised by Feigenbaum et al regarding the relationship between property testing and stream algorithms. Further, we give a polylogspace PTAS for estimating the entropy of a one pass random order stream. This bound cannot be achieved in the combined oracle (generalized property testing) model. 1
Sampling Algorithms: Lower Bounds and Applications (Extended Abstract)
, 2001
"... ] Ziv BarYossef y Computer Science Division U. C. Berkeley Berkeley, CA 94720 zivi@cs.berkeley.edu Ravi Kumar IBM Almaden 650 Harry Road San Jose, CA 95120 ravi@almaden.ibm.com D. Sivakumar IBM Almaden 650 Harry Road San Jose, CA 95120 siva@almaden.ibm.com ABSTRACT We develop a fr ..."
Abstract

Cited by 60 (2 self)
 Add to MetaCart
] Ziv BarYossef y Computer Science Division U. C. Berkeley Berkeley, CA 94720 zivi@cs.berkeley.edu Ravi Kumar IBM Almaden 650 Harry Road San Jose, CA 95120 ravi@almaden.ibm.com D. Sivakumar IBM Almaden 650 Harry Road San Jose, CA 95120 siva@almaden.ibm.com ABSTRACT We develop a framework to study probabilistic sampling algorithms that approximate general functions of the form f : A n ! B, where A and B are arbitrary sets. Our goal is to obtain lower bounds on the query complexity of functions, namely the number of input variables x i that any sampling algorithm needs to query to approximate f(x1 ; : : : ; xn ). We define two quantitative properties of functions  the block sensitivity and the minimum Hellinger distance  that give us techniques to prove lower bounds on the query complexity. These techniques are quite general, easy to use, yet powerful enough to yield tight results. Our applications include the mean and higher statistical moments, the median and other selection functions, and the frequency moments, where we obtain lower bounds that are close to the corresponding upper bounds. We also point out some connections between sampling and streaming algorithms and lossy compression schemes. 1.
Property Testing: A Learning Theory Perspective
"... Property testing deals with tasks where the goal is to distinguish between the case that an object (e.g., function or graph) has a prespecified property (e.g., the function is linear or the graph is bipartite) and the case that it differs significantly from any such object. The task should be perfor ..."
Abstract

Cited by 49 (9 self)
 Add to MetaCart
Property testing deals with tasks where the goal is to distinguish between the case that an object (e.g., function or graph) has a prespecified property (e.g., the function is linear or the graph is bipartite) and the case that it differs significantly from any such object. The task should be performed by observing only a very small part of the object, in particular by querying the object, and the algorithm is allowed a small failure probability. One view of property testing is as a relaxation of learning the object (obtaining an approximate representation of the object). Thus property testing algorithms can serve as a preliminary step to learning. That is, they can be applied in order to select, very efficiently, what hypothesis class to use for learning. This survey takes the learningtheory point of view and focuses on results for testing properties of functions that are of interest to the learning theory community. In particular, we cover results for testing algebraic properties of functions such as linearity, testing properties defined by concise representations, such as having a small DNF representation, and more. 1
The complexity of approximating the entropy
 SIAM JOURNAL ON COMPUTING
, 2005
"... We consider the problem of approximating the entropy of a discrete distribution under several different models of oracle access to the distribution. In the evaluation oracle model, the algorithm is given access to the explicit array of probabilities specifying the distribution. In this model, linear ..."
Abstract

Cited by 49 (8 self)
 Add to MetaCart
(Show Context)
We consider the problem of approximating the entropy of a discrete distribution under several different models of oracle access to the distribution. In the evaluation oracle model, the algorithm is given access to the explicit array of probabilities specifying the distribution. In this model, linear time in the size of the domain is both necessary and sufficient for approximating the entropy. In the generation oracle model, the algorithm has access only to independent samples from the distribution. In this ( case, we show that a γmultiplicative approximation to the entropy can be obtained in O n (1+η)/γ2 log n time for distributions with entropy Ω(γ/η), where n is the size of the domain of the distribution and η is an arbitrarily small positive constant. We show that this model does not permit a multiplicative approximation to the entropy in general. For ( the class of distributions to which our upper bound applies, we obtain a lower bound of Ω n1/(2γ2) We next consider a combined oracle model in which the algorithm has access to both the
Algorithmic and Analysis Techniques in Property Testing
"... Property testing algorithms are “ultra”efficient algorithms that decide whether a given object (e.g., a graph) has a certain property (e.g., bipartiteness), or is significantly different from any object that has the property. To this end property testing algorithms are given the ability to perform ..."
Abstract

Cited by 48 (7 self)
 Add to MetaCart
Property testing algorithms are “ultra”efficient algorithms that decide whether a given object (e.g., a graph) has a certain property (e.g., bipartiteness), or is significantly different from any object that has the property. To this end property testing algorithms are given the ability to perform (local) queries to the input, though the decision they need to make usually concern properties with a global nature. In the last two decades, property testing algorithms have been designed for many types of objects and properties, amongst them, graph properties, algebraic properties, geometric properties, and more. In this article we survey results in property testing, where our emphasis is on common analysis and algorithmic techniques. Among the techniques surveyed are the following: • The selfcorrecting approach, which was mainly applied in the study of property testing of algebraic properties; • The enforce and test approach, which was applied quite extensively in the analysis of algorithms for testing graph properties (in the densegraphs model), as well as in other contexts;