Results 1  10
of
24
Comparing Learning Methods for Classification
, 2006
"... We address the consistency property of cross validation (CV) for classification. Sufficient conditions are obtained on the data splitting ratio to ensure that the better classifier between two candidates will be favored by CV with probability approaching 1. Interestingly, it turns out that for compa ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
We address the consistency property of cross validation (CV) for classification. Sufficient conditions are obtained on the data splitting ratio to ensure that the better classifier between two candidates will be favored by CV with probability approaching 1. Interestingly, it turns out that for comparing two general learning methods, the ratio of the training sample size and the evaluation size does not have to approach 0 for consistency in selection, as is required for comparing parametric regression models (Shao (1993)). In fact, the ratio may be allowed to converge to infinity or any positive constant, depending on the situation. In addition, we also discuss confidence intervals and sequential instability in selection for comparing classifiers. Key words and phrases: Classification, comparing learning methods, consistency in selection, cross validation paradox, sequential instability.
Segmentation of the mean of heteroscedastic data via crossvalidation
, 2010
"... This paper tackles the problem of detecting abrupt changes in the mean of a heteroscedastic signal by model selection, without knowledge on the variations of the noise. A new family of changepoint detection procedures is proposed, showing that crossvalidation methods can be successful in the heter ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
This paper tackles the problem of detecting abrupt changes in the mean of a heteroscedastic signal by model selection, without knowledge on the variations of the noise. A new family of changepoint detection procedures is proposed, showing that crossvalidation methods can be successful in the heteroscedastic framework, whereas most existing procedures are not robust to heteroscedasticity. The robustness to heteroscedasticity of the proposed procedures is supported by an extensive simulation study, together with recent theoretical results. An application to Comparative Genomic Hybridization (CGH) data is provided, showing that robustness to heteroscedasticity can indeed be required for their analysis.
VFOLD CROSSVALIDATION IMPROVED: VFOLD PENALIZATION
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2008
"... We study the efficiency of Vfold crossvalidation (VFCV) for model selection from the nonasymptotic viewpoint, and suggest an improvement on it, which we call “Vfold penalization”. Considering a particular (though simple) regression problem, we prove that VFCV with a bounded V is suboptimal for m ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
We study the efficiency of Vfold crossvalidation (VFCV) for model selection from the nonasymptotic viewpoint, and suggest an improvement on it, which we call “Vfold penalization”. Considering a particular (though simple) regression problem, we prove that VFCV with a bounded V is suboptimal for model selection, because it “overpenalizes ” all the more that V is large. Hence, asymptotic optimality requires V to go to infinity. However, when the signaltonoise ratio is low, it appears that overpenalizing is necessary, so that the optimal V is not always the larger one, despite of the variability issue. This is confirmed by some simulated data. In order to improve on the prediction performance of VFCV, we define a new model selection procedure, called “Vfold penalization” (penVF). It is a Vfold subsampling version of Efron’s bootstrap penalties, so that it has the same computational cost as VFCV, while being more flexible. In a heteroscedastic regression framework, assuming the models to have a particular structure, we prove that penVF satisfies a nonasymptotic oracle inequality with a leading constant that tends to 1 when the sample size goes to infinity. In particular, this implies adaptivity to the smoothness of the regression function, even with a highly heteroscedastic noise. Moreover, it is easy to overpenalize with penVF, independently from the V parameter. A simulation study shows that this results in a significant improvement on VFCV in nonasymptotic situations.
Density estimation via crossvalidation: Model selection point of view
, 2009
"... The problem of model selection by crossvalidation is addressed in the density estimation framework. Extensively used in practice, crossvalidation (CV) remains poorly understood, especially in the nonasymptotic setting which is the main concern of this work. A recurrent problem with CV is the comp ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
The problem of model selection by crossvalidation is addressed in the density estimation framework. Extensively used in practice, crossvalidation (CV) remains poorly understood, especially in the nonasymptotic setting which is the main concern of this work. A recurrent problem with CV is the computation time it involves. This drawback is overcome here thanks to closedform expressions for the CV estimator of the risk for a broad class of widespread estimators: projection estimators. In order to shed new lights on CV procedures with respect to the cardinality p of the test set, the CV estimator is interpreted as a penalized criterion with a random penalty. For instance, the amount of penalization is shown to increase with p. A theoretical assessment of the CV performance is carried out thanks to two oracle inequalities applying to respectively bounded or squareintegrable densities. For several collections of models, adaptivity results with respect to Hölder and Besov spaces are derived as well.
Supplement to “Parametric or nonparametric? A parametricness index for model selection.” DOI:10.1214/11AOS899SUPP
, 2011
"... ar ..."
On the withinfamily KullbackLeibler risk in Gaussian Predictive models
, 2012
"... We consider estimating the predictive density under KullbackLeibler loss in a highdimensional Gaussian model. Decision theoretic properties of the withinfamily prediction error – the minimal risk among estimates in the class G of all Gaussian densities are discussed. We show that in sparse models ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
We consider estimating the predictive density under KullbackLeibler loss in a highdimensional Gaussian model. Decision theoretic properties of the withinfamily prediction error – the minimal risk among estimates in the class G of all Gaussian densities are discussed. We show that in sparse models, the class G is minimax suboptimal. We produce asymptotically sharp upper and lower bounds on the withinfamily prediction errors for various subfamilies of G. Under mild regularity conditions, in the subfamily where the covariance structure is represented by a single data dependent parameter Σ = d · I, the KullbackLeiber risk has a tractable decomposition which can be subsequently minimized to yield optimally flattened predictive density estimates. The optimal predictive risk can be explicitly expressed in terms of the corresponding mean square error of the location estimate, and so, the role of shrinkage in the predictive regime can be determined based on point estimation theory results. Our results demonstrate that some of the decision theoretic parallels between predictive density estimation and point estimation regimes can be explained by second moment based concentration properties of the quadratic loss.
Catching Up Faster by Switching Sooner: A predictive approach to adaptive estimation with an application to the AICBIC Dilemma
, 2011
"... Prediction and estimation based on Bayesian model selection and model averaging, and derived methods such as BIC, do not always converge at the fastest possible rate. We identify the catchup phenomenon as a novel explanation for the slow convergence of Bayesian methods, and use it to define a modif ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Prediction and estimation based on Bayesian model selection and model averaging, and derived methods such as BIC, do not always converge at the fastest possible rate. We identify the catchup phenomenon as a novel explanation for the slow convergence of Bayesian methods, and use it to define a modification of the Bayesian predictive distribution, called the switch distribution. When used as an adaptive estimator, the switch distribution does achieve optimal cumulative risk convergence rates in nonparametric density estimation and Gaussian regression problems. We show that the minimax cumulative risk is obtained under very weak conditions and without knowledge of the underlying degree of smoothness. Unlike other adaptive model selection procedures such as AIC and leaveoneout crossvalidation, BIC and Bayes factor model selection are typically statistically consistent. We show that this property is retained by the switch distribution, which thus solves the AICBIC dilemma for cumulative risk. The switch distribution has an efficient implementation. We compare its performance to AIC, BIC and Bayes on a regression problem with simulated data. 1
Concentration inequalities of the crossvalidation estimator for empirical risk minimiser. Arxiv preprint arXiv:1011.0096
, 2010
"... In this article, we derive concentration inequalities for the crossvalidation estimate of the generalization error for empirical risk minimizers. In the general setting, we prove sanitycheck bounds in the spirit of Kearns et al. (1999) “bounds showing that the worstcase error of this estimate is ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this article, we derive concentration inequalities for the crossvalidation estimate of the generalization error for empirical risk minimizers. In the general setting, we prove sanitycheck bounds in the spirit of Kearns et al. (1999) “bounds showing that the worstcase error of this estimate is not much worse that of training error estimate ”. General loss functions and class of predictors with finite VCdimension are considered. We closely follow the formalism introduced by Dudoit et al. (2003) to cover a large variety of crossvalidation procedures including leaveoneout crossvalidation, kfold crossvalidation, holdout crossvalidation (or split sample), and the leaveυout crossvalidation. In particular, we focus on proving the consistency of the various crossvalidation procedures. We point out the interest of each crossvalidation procedure in terms of rate of convergence. An estimation curve with transition phases depending on the crossvalidation procedure and not only on the percentage of observations in the test sample gives a simple rule on how to choose the crossvalidation. An interesting consequence is that the size of the test sample is not required to grow to infinity for the consistency of the crossvalidation procedure.