Results 1  10
of
27
Statistical performance of support vector machines
 ANN. STATIST
, 2008
"... The support vector machine (SVM) algorithm is well known to the computer learning community for its very good practical results. The goal of the present paper is to study this algorithm from a statistical perspective, using tools of concentration theory and empirical processes. Our main result build ..."
Abstract

Cited by 57 (8 self)
 Add to MetaCart
(Show Context)
The support vector machine (SVM) algorithm is well known to the computer learning community for its very good practical results. The goal of the present paper is to study this algorithm from a statistical perspective, using tools of concentration theory and empirical processes. Our main result builds on the observation made by other authors that the SVM can be viewed as a statistical regularization procedure. From this point of view, it can also be interpreted as a model selection principle using a penalized criterion. It is then possible to adapt general methods related to model selection in this framework to study two important points: (1) what is the minimum penalty and how does it compare to the penalty actually used in the SVM algorithm; (2) is it possible to obtain “oracle inequalities ” in that setting, for the specific loss function used in the SVM algorithm? We show that the answer to the latter question is positive and provides relevant insight to the former. Our result shows that it is possible to obtain fast rates of convergence for SVMs.
RANKING AND EMPIRICAL MINIMIZATION OF USTATISTICS
, 2008
"... The problem of ranking/ordering instances, instead of simply classifying them, has recently gained much attention in machine learning. In this paper we formulate the ranking problem in a rigorous statistical framework. The goal is to learn a ranking rule for deciding, among two instances, which one ..."
Abstract

Cited by 37 (2 self)
 Add to MetaCart
(Show Context)
The problem of ranking/ordering instances, instead of simply classifying them, has recently gained much attention in machine learning. In this paper we formulate the ranking problem in a rigorous statistical framework. The goal is to learn a ranking rule for deciding, among two instances, which one is “better,” with minimum ranking risk. Since the natural estimates of the risk are of the form of a Ustatistic, results of the theory of Uprocesses are required for investigating the consistency of empirical risk minimizers. We establish, in particular, a tail inequality for degenerate Uprocesses, and apply it for showing that fast rates of convergence may be achieved under specific noise assumptions, just like in classification. Convex risk minimization methods are also studied.
ℓ1regularized linear regression: Persistence and oracle inequalities
, 2009
"... We study the predictive performance of ℓ1regularized linear regression, including the case where the number of covariates is substantially larger than the sample size. We introduce a new analysis method that does not require uniformly bounded covariates, an assumption that was often necessary with ..."
Abstract

Cited by 30 (7 self)
 Add to MetaCart
(Show Context)
We study the predictive performance of ℓ1regularized linear regression, including the case where the number of covariates is substantially larger than the sample size. We introduce a new analysis method that does not require uniformly bounded covariates, an assumption that was often necessary with previous techniques. This technique provides an answer to a conjecture of Greenshtein and Ritov [12] regarding the “persistence ” rate for linear regression and allows us to prove an oracle inequality for the error of the regularized minimizer. 1
Ranking and scoring using empirical risk minimization
 Proceedings of the Eighteenth Annual Conference on Computational Learning Theory (COLT
, 2005
"... Abstract. A general model is proposed for studying ranking problems. We investigate learning methods based on empirical minimization of the natural estimates of the ranking risk. The empirical estimates are of the form of a Ustatistic. Inequalities from the theory of Ustatistics and Uprocesses are ..."
Abstract

Cited by 29 (8 self)
 Add to MetaCart
(Show Context)
Abstract. A general model is proposed for studying ranking problems. We investigate learning methods based on empirical minimization of the natural estimates of the ranking risk. The empirical estimates are of the form of a Ustatistic. Inequalities from the theory of Ustatistics and Uprocesses are used to obtain performance bounds for the empirical risk minimizers. Convex risk minimization methods are also studied to give a theoretical framework for ranking algorithms based on boosting and support vector machines. Just like in binary classification, fast rates of convergence are achieved under certain noise assumption. General sufficient conditions are proposed in several special cases that guarantee fast rates of convergence. 1
Regularization in Kernel Learning
, 2008
"... Under mild assumptions on the kernel, we obtain the best known error rates in a regularized learning scenario taking place in the corresponding reproducing kernel Hilbert space. The main novelty in the analysis is a proof that one can use a regularization term that grows significantly slower than th ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
(Show Context)
Under mild assumptions on the kernel, we obtain the best known error rates in a regularized learning scenario taking place in the corresponding reproducing kernel Hilbert space. The main novelty in the analysis is a proof that one can use a regularization term that grows significantly slower than the standard quadratic growth in the RKHS norm. 1
Sharper lower bounds on the performance of the Empirical Risk Minimization Algorithm
, 2009
"... In this note we study lower bounds on the empirical minimization algorithm. To explain the basic set up of this algorithm, let (Ω, µ) be a probability space and set X to be a random variable taking values in Ω, distributed according to µ. We are interested in the function learning (noiseless) proble ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
In this note we study lower bounds on the empirical minimization algorithm. To explain the basic set up of this algorithm, let (Ω, µ) be a probability space and set X to be a random variable taking values in Ω, distributed according to µ. We are interested in the function learning (noiseless) problem, in which one observes n
Local Complexities for Empirical Risk Minimization
 In Proceedings of the 17th Annual Conference on Learning Theory (COLT
, 2004
"... Abstract. We present sharp bounds on the risk of the empirical minimization algorithm under mild assumptions on the class. We introduce the notion of isomorphic coordinate projections and show that this leads to a sharper error bound than the best previously known. The quantity which governs this bo ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
Abstract. We present sharp bounds on the risk of the empirical minimization algorithm under mild assumptions on the class. We introduce the notion of isomorphic coordinate projections and show that this leads to a sharper error bound than the best previously known. The quantity which governs this bound on the empirical minimizer is the largest fixed point of the function ξn(r) = E sup {Ef − Enf  : f ∈ F, Ef = r}. We prove that this is the best estimate one can obtain using “structural results”, and that it is possible to estimate the error rate from data. We then prove that the bound on the empirical minimization algorithm can be improved further by a direct analysis, and that the correct error rate is the maximizer of ξ ′ n(r) − r, where ξ ′ n(r) = E sup {Ef − Enf: f ∈ F, Ef = r}.
Combining PACBayesian and Generic Chaining Bounds
, 2007
"... There exist many different generalization error bounds in statistical learning theory. Each of these bounds contains an improvement over the others for certain situations or algorithms. Our goal is, first, to underline the links between these bounds, and second, to combine the different improvements ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
There exist many different generalization error bounds in statistical learning theory. Each of these bounds contains an improvement over the others for certain situations or algorithms. Our goal is, first, to underline the links between these bounds, and second, to combine the different improvements into a single bound. In particular we combine the PACBayes approach introduced by McAllester (1998), which is interesting for randomized predictions, with the optimal union bound provided by the generic chaining technique developed by Fernique and Talagrand (see Talagrand, 1996), in a way that also takes into account the variance of the combined functions. We also show how this connects to Rademacher based bounds.
On the optimality of the empirical risk minimization procedure for the Convex Aggregation problem
, 2011
"... We study the performance of empirical risk minimization (ERM), with respect to the quadratic risk, in the context of convex aggregation, in which one wants to construct a procedure whose risk is as close as possible to the best function in the convex hull of an arbitrary finite class F. We show that ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
We study the performance of empirical risk minimization (ERM), with respect to the quadratic risk, in the context of convex aggregation, in which one wants to construct a procedure whose risk is as close as possible to the best function in the convex hull of an arbitrary finite class F. We show that ERM performed in the convex hull of F is an optimal aggregation procedure for the convex aggregation problem. We also show that if this procedure is used for the problem of model selection aggregation, in which one wants to mimic the performance of the best function in F itself, then its rate is the same as the one achieved for the convex aggregation problem, and thus is far from optimal. These results are obtained in deviation and are sharp up to logarithmic factors. 1 Introduction and main results In this note, we study the optimality of the empirical risk minimization procedure in the aggregation framework. Let X be a probability space and let (X, Y) and (X1, Y1),..., (Xn, Yn) be n + 1
Adaptive noisy clustering
, 2013
"... The problem of adaptive noisy clustering is investigated. Given a set of noisy observations Zi = Xi + ǫi, i = 1,..., n, the goal is to design clusters associated with the law of Xi’s, with unknown density f with respect to the Lebesgue measure. Since we observe a corrupted sample, a direct approach ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
The problem of adaptive noisy clustering is investigated. Given a set of noisy observations Zi = Xi + ǫi, i = 1,..., n, the goal is to design clusters associated with the law of Xi’s, with unknown density f with respect to the Lebesgue measure. Since we observe a corrupted sample, a direct approach as the popular kmeans is not suitable in this case. In this paper, we propose a noisy kmeans minimization, which is based on the kmeans loss function and a deconvolution estimator of the density f. In particular, this approach suffers from the dependance on a bandwidth involved in the deconvolution kernel. Fast rates of convergence for the excess risk are proposed for a particular choice of the bandwidth, which depends on the smoothness of the density f. Then, we turn out into the main issue of the paper: the datadriven choice of the bandwidth. We state an adaptive upper bound for a new selection rule, called ERC (Empirical Risk Comparison). This selection rule is based on the Lepski’s principle, where empirical risks associated with different bandwidths are compared. Finally, we illustrate that this adaptive rule can be used in many statistical problems of Mestimation where the empirical risk depends on a nuisance parameter. 1