Results 1  10
of
136
Learnability and the VapnikChervonenkis dimension
, 1989
"... Valiant’s learnability model is extended to learning classes of concepts defined by regions in Euclidean space E”. The methods in this paper lead to a unified treatment of some of Valiant’s results, along with previous results on distributionfree convergence of certain pattern recognition algorith ..."
Abstract

Cited by 727 (22 self)
 Add to MetaCart
Valiant’s learnability model is extended to learning classes of concepts defined by regions in Euclidean space E”. The methods in this paper lead to a unified treatment of some of Valiant’s results, along with previous results on distributionfree convergence of certain pattern recognition algorithms. It is shown that the essential condition for distributionfree learnability is finiteness of the VapnikChervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufftcient conditions are provided for feasible learnability.
Sphere Packing Numbers for Subsets of the Boolean nCube with Bounded VapnikChervonenkis Dimension
, 1992
"... : Let V ` f0; 1g n have VapnikChervonenkis dimension d. Let M(k=n;V ) denote the cardinality of the largest W ` V such that any two distinct vectors in W differ on at least k indices. We show that M(k=n;V ) (cn=(k + d)) d for some constant c. This improves on the previous best result of ((cn ..."
Abstract

Cited by 112 (4 self)
 Add to MetaCart
: Let V ` f0; 1g n have VapnikChervonenkis dimension d. Let M(k=n;V ) denote the cardinality of the largest W ` V such that any two distinct vectors in W differ on at least k indices. We show that M(k=n;V ) (cn=(k + d)) d for some constant c. This improves on the previous best result of ((cn=k) log(n=k)) d . This new result has applications in the theory of empirical processes. 1 The author gratefully acknowledges the support of the Mathematical Sciences Research Institute at UC Berkeley and ONR grant N0001491J1162. 1 1 Statement of Results Let n be natural number greater than zero. Let V ` f0; 1g n . For a sequence of indices I = (i 1 ; . . . ; i k ), with 1 i j n, let V j I denote the projection of V onto I, i.e. V j I = f(v i 1 ; . . . ; v i k ) : (v 1 ; . . . ; v n ) 2 V g: If V j I = f0; 1g k then we say that V shatters the index sequence I. The VapnikChervonenkis dimension of V is the size of the longest index sequence I that is shattered by V [VC71] (t...
A few notes on statistical learning theory
 In S. Mendelson & A. Smola (Eds. ), Lecture Notes in Computer Science
, 2003
"... ..."
Mutual Information, Metric Entropy, and Cumulative Relative Entropy Risk
 Annals of Statistics
, 1996
"... Assume fP ` : ` 2 \Thetag is a set of probability distributions with a common dominating measure on a complete separable metric space Y . A state ` 2 \Theta is chosen by Nature. A statistician gets n independent observations Y 1 ; : : : ; Y n from Y distributed according to P ` . For each time ..."
Abstract

Cited by 55 (2 self)
 Add to MetaCart
Assume fP ` : ` 2 \Thetag is a set of probability distributions with a common dominating measure on a complete separable metric space Y . A state ` 2 \Theta is chosen by Nature. A statistician gets n independent observations Y 1 ; : : : ; Y n from Y distributed according to P ` . For each time t between 1 and n, based on the observations Y 1 ; : : : ; Y t\Gamma1 , the statistician produces an estimated distribution P t for P ` , and suffers a loss L(P ` ; P t ). The cumulative risk for the statistician is the average total loss up to time n. Of special interest in information theory, data compression, mathematical finance, computational learning theory and statistical mechanics is the special case when the loss L(P ` ; P t ) is the relative entropy between the true distribution P ` and the estimated distribution P t . Here the cumulative Bayes risk from time 1 to n is the mutual information between the random parameter \Theta and the observations Y 1 ; : : : ;...
Concentration inequalities and asymptotic results for ratio type empirical processes
 ANN. PROBAB
, 2006
"... Let F be a class of measurable functions on a measurable space (S, S) with values in [0, 1] and let Pn = n −1 n ∑ δXi i=1 be the empirical measure based on an i.i.d. sample (X1,...,Xn) from a probability distribution P on (S, S). We study the behavior of suprema of the following type: sup rn<σP f ..."
Abstract

Cited by 42 (5 self)
 Add to MetaCart
(Show Context)
Let F be a class of measurable functions on a measurable space (S, S) with values in [0, 1] and let Pn = n −1 n ∑ δXi i=1 be the empirical measure based on an i.i.d. sample (X1,...,Xn) from a probability distribution P on (S, S). We study the behavior of suprema of the following type: sup rn<σP f ≤δn Pnf − Pf  φ(σPf) where σP f ≥ Var 1/2 P f and φ is a continuous, strictly increasing function with φ(0) = 0. Using Talagrand’s concentration inequality for empirical processes, we establish concentration inequalities for such suprema and use them to derive several results about their asymptotic behavior, expressing the conditions in terms of expectations of localized suprema of empirical processes. We also prove new bounds for expected values of supnorms of empirical processes in terms of the largest σP f and the L2(P) norm of the envelope of the function class, which are especially suited for estimating localized suprema. With this technique, we extend to function classes most of the known results on ratio type suprema of empirical processes, including some of Alexander’s results for VC classes of sets. We also consider applications of these results to several important problems in nonparametric statistics and in learning theory (including general excess risk bounds in empirical risk minimization and their versions for L2regression and classification and ratio type bounds for margin distributions in classification).
RATES OF CONVERGENCE IN ACTIVE LEARNING
 SUBMITTED TO THE ANNALS OF STATISTICS
"... We study the rates of convergence in generalization error achievable by active learning under various types of label noise. Additionally, we study the general problem of model selection for active learning with a nested hierarchy of hypothesis classes, and propose an algorithm whose error rate prova ..."
Abstract

Cited by 40 (5 self)
 Add to MetaCart
We study the rates of convergence in generalization error achievable by active learning under various types of label noise. Additionally, we study the general problem of model selection for active learning with a nested hierarchy of hypothesis classes, and propose an algorithm whose error rate provably converges to the best achievable error among classifiers in the hierarchy at a rate adaptive to both the complexity of the optimal classifier and the noise conditions. In particular, we state sufficient conditions for these rates to be dramatically faster than those achievable by passive learning.