Results 1 
7 of
7
Lectures on the central limit theorem for empirical processes
 Probability and Banach Spaces
, 1986
"... Abstract. Concentration inequalities are used to derive some new inequalities for ratiotype suprema of empirical processes. These general inequalities are used to prove several new limit theorems for ratiotype suprema and to recover anumber of the results from [1] and [2]. As a statistical applica ..."
Abstract

Cited by 135 (9 self)
 Add to MetaCart
(Show Context)
Abstract. Concentration inequalities are used to derive some new inequalities for ratiotype suprema of empirical processes. These general inequalities are used to prove several new limit theorems for ratiotype suprema and to recover anumber of the results from [1] and [2]. As a statistical application, an oracle inequality for nonparametric regression is obtained via ratio bounds. 1.
Concentration inequalities and asymptotic results for ratio type empirical processes
 ANN. PROBAB
, 2006
"... Let F be a class of measurable functions on a measurable space (S, S) with values in [0, 1] and let Pn = n −1 n ∑ δXi i=1 be the empirical measure based on an i.i.d. sample (X1,...,Xn) from a probability distribution P on (S, S). We study the behavior of suprema of the following type: sup rn<σP f ..."
Abstract

Cited by 40 (5 self)
 Add to MetaCart
(Show Context)
Let F be a class of measurable functions on a measurable space (S, S) with values in [0, 1] and let Pn = n −1 n ∑ δXi i=1 be the empirical measure based on an i.i.d. sample (X1,...,Xn) from a probability distribution P on (S, S). We study the behavior of suprema of the following type: sup rn<σP f ≤δn Pnf − Pf  φ(σPf) where σP f ≥ Var 1/2 P f and φ is a continuous, strictly increasing function with φ(0) = 0. Using Talagrand’s concentration inequality for empirical processes, we establish concentration inequalities for such suprema and use them to derive several results about their asymptotic behavior, expressing the conditions in terms of expectations of localized suprema of empirical processes. We also prove new bounds for expected values of supnorms of empirical processes in terms of the largest σP f and the L2(P) norm of the envelope of the function class, which are especially suited for estimating localized suprema. With this technique, we extend to function classes most of the known results on ratio type suprema of empirical processes, including some of Alexander’s results for VC classes of sets. We also consider applications of these results to several important problems in nonparametric statistics and in learning theory (including general excess risk bounds in empirical risk minimization and their versions for L2regression and classification and ratio type bounds for margin distributions in classification).
Pattern classification and learning theory
"... 1.1 A binary classification problem Pattern recognition (or classification or discrimination) is about guessing or predicting the unknown class of an observation. An observation is a collection of numerical measurements, represented by a ddimensional vector x. The unknown nature of the observation ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
1.1 A binary classification problem Pattern recognition (or classification or discrimination) is about guessing or predicting the unknown class of an observation. An observation is a collection of numerical measurements, represented by a ddimensional vector x. The unknown nature of the observation is called a class. It is denoted by y and takes values in the set f0; 1g. (For simplicity, we restrict our attention to binary classification.) In pattern recognition, one creates a function g(x) : R d! f0; 1g which represents one's guess of y given x. The mapping g is called a classifier. A classifier errs on x if g(x) 6 = y. To model the learning problem, we introduce a probabilistic setting, and let (X; Y) be an R d \Theta f0; 1gvalued random pair. The random pair (X; Y) may be described in a variety of ways: for example, it is defined by the pair (_; j), where _ is the probability measure for X and j is the regression of Y on X. More precisely, for a Borelmeasurable set A ` R d
DataDependent MarginBased Generalization Bounds for Classification
 Pr'oc. of 1Jth Annual Conference on Computational Lear'ning Theor'y, COLT2001. D. Helmbold and Bob Williamson (Eds), Lecture Notes in Artificial Intelligence
, 2001
"... We derive new marginbased inequalities for the probability of error of classifiers. The main feature of these bounds is that they can be calculated using the training data and therefore may be effectively used for model selection purposes. In particular, the bounds involve quantities such as the em ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
We derive new marginbased inequalities for the probability of error of classifiers. The main feature of these bounds is that they can be calculated using the training data and therefore may be effectively used for model selection purposes. In particular, the bounds involve quantities such as the empirical fatshattering dimension and covering number measured on the training data, as opposed to their worstcase counterparts traditionally used in such analyses, and appear to be sharper and more general than recent results involving empirical complexity measures. In addition, we also develop an alternative databased bound for the generalization error of classes of convex combinations of classifiers involving an empirical complexity measure that is more easily computable than the empirical covering number or fatshattering dimension. We also show an example of efficient computation of the new bounds.
DistributionDependent VapnikChervonenkis Bounds
, 1999
"... . VapnikChervonenkis (VC) bounds play an important role in statistical learning theory as they are the fundamental result which explains the generalization ability of learning machines. There have been consequent mathematical works on the improvement of VC rates of convergence of empirical mean ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
. VapnikChervonenkis (VC) bounds play an important role in statistical learning theory as they are the fundamental result which explains the generalization ability of learning machines. There have been consequent mathematical works on the improvement of VC rates of convergence of empirical means to their expectations over the years. The result obtained by Talagrand in 1994 seems to provide more or less the final word to this issue as far as universal bounds are concerned. Though for fixed distributions, this bound can be practically outperformed. We show indeed that it is possible to replace the 2ffl 2 under the exponential of the deviation term by the corresponding Cram'er transform as shown by large deviations theorems. Then, we formulate rigorous distributionsensitive VC bounds and we also explain why these theoretical results on such bounds can lead to practical estimates of the effective VC dimension of learning structures. 1 Introduction and motivations One of t...
Weak Learners and Improved Rates of Convergence in Boosting
 In Advances in Neural Information Processing Systems
, 2000
"... The problem constructing weak classifiers for boosting algorithms is studied. We present an algorithm that produces a linear classifier that is guaranteed to achieve an error better than random guessing for any distribution on the data. While this weak learner is not useful for learning in genera ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
The problem constructing weak classifiers for boosting algorithms is studied. We present an algorithm that produces a linear classifier that is guaranteed to achieve an error better than random guessing for any distribution on the data. While this weak learner is not useful for learning in general, we show that under reasonable conditions on the distribution it yields an effective weak learner for onedimensional problems. Preliminary simulations suggest that the same behavior can be expected in higher dimensions. Additionally, we provide improved convergence rate bounds for the generalization error in situations where the empirical error can be made small, which is exactly the situation that occurs if weak learners with guaranteed performance that are better than random guessing can be established. 1
On Uniform Deviations of General Empirical Risks with Unboundedness
 Dependence, and High Dimensionality,” Journal of Machine Learning Research
, 2009
"... The statistical learning theory of risk minimization depends heavily on probability bounds for uniform deviations of the empirical risks. Classical probability bounds using Hoeffding’s inequality cannot accommodate more general situations with unbounded loss and dependent data. The current paper int ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
The statistical learning theory of risk minimization depends heavily on probability bounds for uniform deviations of the empirical risks. Classical probability bounds using Hoeffding’s inequality cannot accommodate more general situations with unbounded loss and dependent data. The current paper introduces an inequality that extends Hoeffding’s inequality to handle these more general situations. We will apply this inequality to provide probability bounds for uniform deviations in a very general framework, which can involve discrete decision rules, unbounded loss, and a dependence structure that can be more general than either martingale or strong mixing. We will consider two examples with high dimensional predictors: autoregression (AR) with ℓ1loss, and ARX model with variable selection for sign classification, which uses both lagged responses and exogenous predictors.