Results 1  10
of
28
Lectures on the central limit theorem for empirical processes
 Probability and Banach Spaces
, 1986
"... Abstract. Concentration inequalities are used to derive some new inequalities for ratiotype suprema of empirical processes. These general inequalities are used to prove several new limit theorems for ratiotype suprema and to recover anumber of the results from [1] and [2]. As a statistical applica ..."
Abstract

Cited by 135 (9 self)
 Add to MetaCart
(Show Context)
Abstract. Concentration inequalities are used to derive some new inequalities for ratiotype suprema of empirical processes. These general inequalities are used to prove several new limit theorems for ratiotype suprema and to recover anumber of the results from [1] and [2]. As a statistical application, an oracle inequality for nonparametric regression is obtained via ratio bounds. 1.
Minimaxoptimal rates for sparse additive models over kernel classes via convex programming
"... Sparse additive models are families of dvariate functions with the additive decomposition f ∗ = ∑ j∈S f ∗ j, where S is an unknown subset of cardinality s ≪ d. In this paper, we consider the case where each univariate component function f ∗ j lies in a reproducing kernel Hilbert space (RKHS), and ..."
Abstract

Cited by 52 (8 self)
 Add to MetaCart
(Show Context)
Sparse additive models are families of dvariate functions with the additive decomposition f ∗ = ∑ j∈S f ∗ j, where S is an unknown subset of cardinality s ≪ d. In this paper, we consider the case where each univariate component function f ∗ j lies in a reproducing kernel Hilbert space (RKHS), and analyze a method for estimating the unknown function f ∗ based on kernels combined with ℓ1type convex regularization. Working within a highdimensional framework that allows both the dimension d and sparsity s to increase with n, we derive convergence rates in the L2 (P) and L2 (Pn) norms over the classF d,s,H of sparse additive models with each univariate function f ∗ j in the unit ball of a univariate RKHS with bounded kernel function. We complement our upper bounds by deriving minimax lower bounds on the L2 (P) error, thereby showing the optimality of our method. Thus, we obtain optimal minimax rates for many interesting classes of sparse additive models, including polynomials, splines, and Sobolev classes. We also show that if, in contrast to our univariate conditions, the dvariate function class is assumed to be globally bounded, then much faster estimation rates are possible for any sparsity s=Ω ( √ n), showing that global boundedness is a significant restriction in the highdimensional setting.
Concentration inequalities and asymptotic results for ratio type empirical processes
 ANN. PROBAB
, 2006
"... Let F be a class of measurable functions on a measurable space (S, S) with values in [0, 1] and let Pn = n −1 n ∑ δXi i=1 be the empirical measure based on an i.i.d. sample (X1,...,Xn) from a probability distribution P on (S, S). We study the behavior of suprema of the following type: sup rn<σP f ..."
Abstract

Cited by 40 (5 self)
 Add to MetaCart
(Show Context)
Let F be a class of measurable functions on a measurable space (S, S) with values in [0, 1] and let Pn = n −1 n ∑ δXi i=1 be the empirical measure based on an i.i.d. sample (X1,...,Xn) from a probability distribution P on (S, S). We study the behavior of suprema of the following type: sup rn<σP f ≤δn Pnf − Pf  φ(σPf) where σP f ≥ Var 1/2 P f and φ is a continuous, strictly increasing function with φ(0) = 0. Using Talagrand’s concentration inequality for empirical processes, we establish concentration inequalities for such suprema and use them to derive several results about their asymptotic behavior, expressing the conditions in terms of expectations of localized suprema of empirical processes. We also prove new bounds for expected values of supnorms of empirical processes in terms of the largest σP f and the L2(P) norm of the envelope of the function class, which are especially suited for estimating localized suprema. With this technique, we extend to function classes most of the known results on ratio type suprema of empirical processes, including some of Alexander’s results for VC classes of sets. We also consider applications of these results to several important problems in nonparametric statistics and in learning theory (including general excess risk bounds in empirical risk minimization and their versions for L2regression and classification and ratio type bounds for margin distributions in classification).
RATES OF CONVERGENCE IN ACTIVE LEARNING
 SUBMITTED TO THE ANNALS OF STATISTICS
"... We study the rates of convergence in generalization error achievable by active learning under various types of label noise. Additionally, we study the general problem of model selection for active learning with a nested hierarchy of hypothesis classes, and propose an algorithm whose error rate prova ..."
Abstract

Cited by 39 (4 self)
 Add to MetaCart
We study the rates of convergence in generalization error achievable by active learning under various types of label noise. Additionally, we study the general problem of model selection for active learning with a nested hierarchy of hypothesis classes, and propose an algorithm whose error rate provably converges to the best achievable error among classifiers in the hierarchy at a rate adaptive to both the complexity of the optimal classifier and the noise conditions. In particular, we state sufficient conditions for these rates to be dramatically faster than those achievable by passive learning.
Bivariate tail estimation: dependence in asymptotic independence
 Bernoulli
"... In the classical setting of bivariate extreme value theory, the procedures for estimating the probability of an extreme event are not applicable if the componentwise maxima of the observations are asymptotically independent. To cope with this problem, Ledford and Tawn proposed a submodel in which th ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
In the classical setting of bivariate extreme value theory, the procedures for estimating the probability of an extreme event are not applicable if the componentwise maxima of the observations are asymptotically independent. To cope with this problem, Ledford and Tawn proposed a submodel in which the penultimate dependence is characterized by an additional parameter. We discuss the asymptotic properties of two estimators for this parameter in an extended model. Moreover, we develop an estimator for the probability of an extreme event that works in the case of asymptotic independence as well as in the case of asymptotic dependence, and prove its consistency.
Nonparametric Estimation of the Limit Dependence Function of Multivariate Extremes
 Extremes
, 1999
"... Abstract. This paper presents a new estimation procedure for the limit distribution of the maximum of a multivariate random sample. This procedure relies on a new and simple relationship between the copula of the underlying multivariate distribution function and the dependence function of its maximu ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Abstract. This paper presents a new estimation procedure for the limit distribution of the maximum of a multivariate random sample. This procedure relies on a new and simple relationship between the copula of the underlying multivariate distribution function and the dependence function of its maximum attractor. The obtained characterization is then used to define a class of kernelbased estimates for the dependence function of the maximum attractor. The consistency and the asymptotic distribution of these estimates are considered.
A uniform functional law of the logarithm for the local empirical process
 Ann. Probab
, 2004
"... We prove a uniform functional law of the logarithm for the local empirical process. To accomplish this we combine techniques from classical and abstract empirical process theory, Gaussian distributional approximation and probability on Banach spaces. The body of techniques we develop should prove us ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We prove a uniform functional law of the logarithm for the local empirical process. To accomplish this we combine techniques from classical and abstract empirical process theory, Gaussian distributional approximation and probability on Banach spaces. The body of techniques we develop should prove useful to the study of the strong consistency of dvariate kerneltype nonparametric function estimators. 1. Introduction. Let U, U1, U2,..., be a sequence of independent Uniform [0,1] random variables. Consider for each integer n ≥ 1 the empirical distribution function based on U1,..., Un, Gn(t) = n −1 n∑
Lower Bounds for Passive and Active Learning
"... We develop unified informationtheoretic machinery for deriving lower bounds for passive and active learning schemes. Our bounds involve the socalled Alexander’s capacity function. The supremum of this function has been recently rediscovered by Hanneke in the context of active learning under the na ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
We develop unified informationtheoretic machinery for deriving lower bounds for passive and active learning schemes. Our bounds involve the socalled Alexander’s capacity function. The supremum of this function has been recently rediscovered by Hanneke in the context of active learning under the name of “disagreement coefficient. ” For passive learning, our lower bounds match the upper bounds of Giné and Koltchinskii up to constants and generalize analogous results of Massart and Nédélec. For active learning, we provide first known lower bounds based on the capacity function rather than the disagreement coefficient. 1
Part 1: Overview of the Probably Approximately Correct (PAC) Learning Framework
, 1995
"... Here we survey some recent theoretical results on the efficiency of machine learning algorithms. The main tool described is the notion of Probably Approximately Correct (PAC) learning, introduced by Valiant. We define this learning model and then look at some of the results obtained in it. We then c ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Here we survey some recent theoretical results on the efficiency of machine learning algorithms. The main tool described is the notion of Probably Approximately Correct (PAC) learning, introduced by Valiant. We define this learning model and then look at some of the results obtained in it. We then consider some criticisms of the PAC model and the extensions proposed to address these criticisms. Finally, we look briefly at other models recently proposed in computational learning theory.
A Statistical Theory of Active Learning
, 2013
"... Active learning is a protocol for supervised machine learning, in which a learning algorithm sequentially requests the labels of selected data points from a large pool of unlabeled data. This contrasts with passive learning, where the labeled data are taken at random. The objective in active learnin ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Active learning is a protocol for supervised machine learning, in which a learning algorithm sequentially requests the labels of selected data points from a large pool of unlabeled data. This contrasts with passive learning, where the labeled data are taken at random. The objective in active learning is to produce a highlyaccurate classifier, ideally using fewer labels than the number of random labeled data sufficient for passive learning to achieve the same. This article describes recent advances in our understanding of the theoretical benefits of active learning, and implications for the design of effective active learning algorithms. Much of the article focuses on a particular technique, namely disagreement based active learning, which by now has amassed a mature and coherent literature. It also briefly surveys several alternative approaches from the literature. The emphasis is on theorems regarding the performance of a few general algorithms, including rigorous proofs where appropriate. However, the presentation is intended to be pedagogical, focusing