Results 1  10
of
40
RATES OF CONVERGENCE IN ACTIVE LEARNING
 SUBMITTED TO THE ANNALS OF STATISTICS
"... We study the rates of convergence in generalization error achievable by active learning under various types of label noise. Additionally, we study the general problem of model selection for active learning with a nested hierarchy of hypothesis classes, and propose an algorithm whose error rate prova ..."
Abstract

Cited by 39 (4 self)
 Add to MetaCart
We study the rates of convergence in generalization error achievable by active learning under various types of label noise. Additionally, we study the general problem of model selection for active learning with a nested hierarchy of hypothesis classes, and propose an algorithm whose error rate provably converges to the best achievable error among classifiers in the hierarchy at a rate adaptive to both the complexity of the optimal classifier and the noise conditions. In particular, we state sufficient conditions for these rates to be dramatically faster than those achievable by passive learning.
Rademacher complexities and bounding the excess risk in active learning
, 2009
"... Sequential algorithms of active learning based on the estimation of the level sets of the empirical risk are discussed in the paper. Localized Rademacher complexities are used in the algorithms to estimate the sample sizes needed to achieve the required accuracy of learning in an adaptive way. Proba ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
Sequential algorithms of active learning based on the estimation of the level sets of the empirical risk are discussed in the paper. Localized Rademacher complexities are used in the algorithms to estimate the sample sizes needed to achieve the required accuracy of learning in an adaptive way. Probabilistic bounds on the number of active examples have been proved and several applications to binary classification problems are considered.
On local Ustatistic processes and the estimation of densities of functions of several sample variables
, 2007
"... A notion of local Ustatistic process is introduced and central limit theorems in various norms are obtained for it. This involves the development of several inequalities for Uprocesses that may be useful in other contexts. This local Ustatistic process is based on an estimator of the density of a ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
(Show Context)
A notion of local Ustatistic process is introduced and central limit theorems in various norms are obtained for it. This involves the development of several inequalities for Uprocesses that may be useful in other contexts. This local Ustatistic process is based on an estimator of the density of a function of several sample variables proposed by Frees [J. Amer. Statist. Assoc. 89 (1994) 517–525] and, as a consequence, uniform in bandwidth central limit theorems in the sup and in the Lp norms are obtained for these estimators. 1. Introduction. Let X,X1,X2,... be i.i.d. random variables taking values in R, with common density function f and consider the kernel density estimator of f defined for t ∈ R, (1.1) fn(t,hn) = (nhn) −1
Activized Learning: Transforming Passive to Active with Improved Label Complexity
"... Active learning methods often achieve improved performance using fewer labels compared to passive learning methods. A variety of practically successful active learning algorithms use a passive learning algorithm as a subroutine, and the essential role of the active component is to construct data set ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
Active learning methods often achieve improved performance using fewer labels compared to passive learning methods. A variety of practically successful active learning algorithms use a passive learning algorithm as a subroutine, and the essential role of the active component is to construct data sets to feed into the passive subroutine. This general idea is appealing for a variety of reasons, as it may be able
Adaptive Estimation of a Distribution Function and its Density in SupNorm Loss by Wavelet and Spline Projections
, 2008
"... Given an i.i.d. sample from a distribution F on R with uniformly continuous density p0, purelydata driven estimators are constructed that efficiently estimate F in supnorm loss, and simultaneously estimate p0 at the best possible rate of convergence over Hölder balls, also in supnorm loss. The es ..."
Abstract

Cited by 11 (7 self)
 Add to MetaCart
Given an i.i.d. sample from a distribution F on R with uniformly continuous density p0, purelydata driven estimators are constructed that efficiently estimate F in supnorm loss, and simultaneously estimate p0 at the best possible rate of convergence over Hölder balls, also in supnorm loss. The estimators are obtained from applying a model selection procedure close to Lepski’s method with random thresholds to projections of the empirical measure onto spaces spanned by wavelets or Bsplines. Explicit constants in the asymptotic risk of the estimator are obtained, as well as oracletype inequalities in supnorm loss. The random thresholds are based on suprema of Rademacher processes indexed by wavelet or spline projection kernels. This requires Bernsteinanalogues of the inequalities in Koltchinskii (2006) for the deviation of suprema of empirical processes from their Rademacher symmetrizations.
A Local Maximal Inequality under Uniform Entropy
"... Abstract: We derive an upper bound for the mean of the supremum of the empirical process indexed by a class of functions that are known to have variance bounded by a small constant δ. The bound is expressed in the uniform entropy integral of the class at δ. The bound yields a rate of convergence of ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
Abstract: We derive an upper bound for the mean of the supremum of the empirical process indexed by a class of functions that are known to have variance bounded by a small constant δ. The bound is expressed in the uniform entropy integral of the class at δ. The bound yields a rate of convergence of minimum contrast estimators when applied to the modulus of continuity of the contrast functions.
Lower Bounds for Passive and Active Learning
"... We develop unified informationtheoretic machinery for deriving lower bounds for passive and active learning schemes. Our bounds involve the socalled Alexander’s capacity function. The supremum of this function has been recently rediscovered by Hanneke in the context of active learning under the na ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
We develop unified informationtheoretic machinery for deriving lower bounds for passive and active learning schemes. Our bounds involve the socalled Alexander’s capacity function. The supremum of this function has been recently rediscovered by Hanneke in the context of active learning under the name of “disagreement coefficient. ” For passive learning, our lower bounds match the upper bounds of Giné and Koltchinskii up to constants and generalize analogous results of Massart and Nédélec. For active learning, we provide first known lower bounds based on the capacity function rather than the disagreement coefficient. 1
A new method for estimation and model selection: ρestimation. arXiv:1403.6057v1, http://arxiv.org
, 2014
"... ar ..."
2014): “Some New Asymptotic Theory for Least Squares Series: Pointwise and Uniform Results,” Discussion paper
"... Abstract. In econometric applications it is common that the exact form of a conditional expectation is unknown and having flexible functional forms can lead to improvements over a prespecified functional form, especially if they nest some successful parametric economicallymotivated forms. Series m ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract. In econometric applications it is common that the exact form of a conditional expectation is unknown and having flexible functional forms can lead to improvements over a prespecified functional form, especially if they nest some successful parametric economicallymotivated forms. Series method offers exactly that by approximating the unknown function based on k basis functions, where k is allowed to grow with the sample size n to balance the trade off between variance and bias. In this work we consider series estimators for the conditional mean in light of four new ingredients: (i) sharp LLNs for matrices derived from the noncommutative Khinchin inequalities, (ii) bounds on the Lebesgue factor that controls the ratio between the L ∞ and L2norms of approximation errors, (iii) maximal inequalities for processes whose entropy integrals diverge at some rate, and (iv) strong approximations to seriestype processes. These technical tools allow us to contribute to the series literature, specifically the seminal work of Newey (1997), as follows. First, we weaken considerably the condition on the number k of approximating functions used in series estimation from the typical k2/n → 0 to k/n → 0, up to log factors, which was available only for spline series before.