Results 1  10
of
17
Statistical performance of support vector machines
 ANN. STATIST
, 2008
"... The support vector machine (SVM) algorithm is well known to the computer learning community for its very good practical results. The goal of the present paper is to study this algorithm from a statistical perspective, using tools of concentration theory and empirical processes. Our main result build ..."
Abstract

Cited by 56 (8 self)
 Add to MetaCart
(Show Context)
The support vector machine (SVM) algorithm is well known to the computer learning community for its very good practical results. The goal of the present paper is to study this algorithm from a statistical perspective, using tools of concentration theory and empirical processes. Our main result builds on the observation made by other authors that the SVM can be viewed as a statistical regularization procedure. From this point of view, it can also be interpreted as a model selection principle using a penalized criterion. It is then possible to adapt general methods related to model selection in this framework to study two important points: (1) what is the minimum penalty and how does it compare to the penalty actually used in the SVM algorithm; (2) is it possible to obtain “oracle inequalities ” in that setting, for the specific loss function used in the SVM algorithm? We show that the answer to the latter question is positive and provides relevant insight to the former. Our result shows that it is possible to obtain fast rates of convergence for SVMs.
Suboptimality of penalized empirical risk minimization in classification
 In Proceedings of the 20th annual conference on Computational Learning Theory (COLT). Lecture Notes in Computer Science 4539 142–156
, 2007
"... Abstract. Let F be a set of M classification procedures with values in [−1, 1]. Given a loss function, we want to construct a procedure which mimics at the best possible rate the best procedure in F. This fastest rate is called optimal rate of aggregation. Considering a continuous scale of loss func ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Let F be a set of M classification procedures with values in [−1, 1]. Given a loss function, we want to construct a procedure which mimics at the best possible rate the best procedure in F. This fastest rate is called optimal rate of aggregation. Considering a continuous scale of loss functions with various types of convexity, we prove that optimal rates of aggregation can be either ((log M)/n) 1/2 or (log M)/n. We prove that, if all the M classifiers are binary, the (penalized) Empirical Risk Minimization procedures are suboptimal (even under the margin/low noise condition) when the loss function is somewhat more than convex, whereas, in that case, aggregation procedures with exponential weights achieve the optimal rate of aggregation. 1
Optimal rates of aggregation in classification under low noise assumption
, 2007
"... In the same spirit as Tsybakov, we define the optimality of an aggregation procedure in the problem of classification. Using an aggregate with exponential weights, we obtain an optimal rate of convex aggregation for the hinge risk under the margin assumption. Moreover, we obtain an optimal rate of m ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
In the same spirit as Tsybakov, we define the optimality of an aggregation procedure in the problem of classification. Using an aggregate with exponential weights, we obtain an optimal rate of convex aggregation for the hinge risk under the margin assumption. Moreover, we obtain an optimal rate of model selection aggregation under the margin assumption for the excess Bayes risk.
MARGINADAPTIVE MODEL SELECTION IN STATISTICAL LEARNING
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2008
"... A classical condition for fast learning rates is the margin condition, first introduced by Mammen and Tsybakov. We tackle in this paper the problem of adaptivity to this condition in the context of model selection, in a general learning framework. Actually, we consider a weaker version of this condi ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
A classical condition for fast learning rates is the margin condition, first introduced by Mammen and Tsybakov. We tackle in this paper the problem of adaptivity to this condition in the context of model selection, in a general learning framework. Actually, we consider a weaker version of this condition that allows us to take into account that learning within a small model can be much easier than in a large one. Requiring this “strong margin adaptivity ” makes the model selection problem more challenging. We first prove, in a very general framework, that some penalization procedures (including local Rademacher complexities) exhibit this adaptivity when the models are nested. Contrary to previous results, this holds with penalties that only depend on the data. Our second main result is that strong margin adaptivity is not always possible when the models are not nested: for every model selection procedure (even a randomized one), there is a problem for which it does not demonstrate strong margin adaptivity.
Optimal oracle inequality for aggregation of classifiers under low noise condition
, 2006
"... ..."
(Show Context)
Adapting to unknown smoothness by aggregation of thresholded wavelet estimators
, 2006
"... We study the performances of an adaptive procedure based on a convex combination, with datadriven weights, of termbyterm thresholded wavelet estimators. For the bounded regression model, with random uniform design, and the nonparametric density model, we show that the resulting estimator is optim ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
We study the performances of an adaptive procedure based on a convex combination, with datadriven weights, of termbyterm thresholded wavelet estimators. For the bounded regression model, with random uniform design, and the nonparametric density model, we show that the resulting estimator is optimal in the minimax sense over all Besov balls under the L 2 risk, without any logarithm factor.
Risk bounds for cart classifiers under a margin condition
 Pattern Recognition
"... Non asymptotic risk bounds for Classification And Regression Trees (CART) classifiers are obtained in the binary supervised classification framework under a margin assumption on the joint distribution of the covariates and the labels. These risk bounds are derived conditionally on the construction o ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Non asymptotic risk bounds for Classification And Regression Trees (CART) classifiers are obtained in the binary supervised classification framework under a margin assumption on the joint distribution of the covariates and the labels. These risk bounds are derived conditionally on the construction of the maximal binary tree and allow to prove that the linear penalty used in the CART pruning algorithm is valid under the margin condition. It is also shown that, conditionally on the construction of the maximal tree, the final selection by test sample does not alter dramatically the estimation accuracy of the Bayes classifier.
Aggregation of SVM Classifiers Using Sobolev Spaces
"... This paper investigates statistical performances of Support Vector Machines (SVM) and considers the problem of adaptation to the margin parameter and to complexity. In particular we provide a classifier with no tuning parameter. It is a combination of SVM classifiers. Our contribution is twofold: ( ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This paper investigates statistical performances of Support Vector Machines (SVM) and considers the problem of adaptation to the margin parameter and to complexity. In particular we provide a classifier with no tuning parameter. It is a combination of SVM classifiers. Our contribution is twofold: (1) we propose learning rates for SVM using Sobolev spaces and build a numerically realizable aggregate that converges with same rate; (2) we present practical experiments of this method of aggregation for SVM using both Sobolev spaces and Gaussian kernels.
Aggregation of penalized empirical risk minimizers in regression
, 2009
"... We give a general result concerning the rates of convergence of penalized empirical risk minimizers (PERM) in the regression model. Then, we consider the problem of agnostic learning of the regression, and give in this context an oracle inequality and a lower bound for PERM over a finite class. Thes ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We give a general result concerning the rates of convergence of penalized empirical risk minimizers (PERM) in the regression model. Then, we consider the problem of agnostic learning of the regression, and give in this context an oracle inequality and a lower bound for PERM over a finite class. These results hold for a general multivariate random design, the only assumption being the compactness of the support of its law (allowing discrete distributions for instance). Then, using these results, we construct adaptive estimators. We consider as examples adaptive estimation over anisotropic Besov spaces or reproductive kernel Hilbert spaces. Finally, we provide an empirical evidence that aggregation leads to more stable estimators than more standard crossvalidation or generalized crossvalidation methods for the selection of the smoothing parameter, when the number of observation is small. 1. Introduction.
1 Adapting to Unknown Smoothness by Aggregation
"... Abstract: We consider multiwavelet thresholding method for nonparametric estimation. An adaptive procedure based on a convex combination of weighted termbyterm thresholded wavelet estimators is proposed. By considering the density estimation framework, we prove that it is optimal in the minimax s ..."
Abstract
 Add to MetaCart
Abstract: We consider multiwavelet thresholding method for nonparametric estimation. An adaptive procedure based on a convex combination of weighted termbyterm thresholded wavelet estimators is proposed. By considering the density estimation framework, we prove that it is optimal in the minimax sense over Besov balls under the L 2 risk, without any extra logarithm term. Key words and phrases: aggregation, oracle inequalities, margin, wavelets, threshold estimators, density estimation