Results 1  10
of
31
SHARP ORACLE INEQUALITIES FOR AGGREGATION OF AFFINE Estimators
, 2012
"... We consider the problem of combining a (possibly uncountably infinite) set of affine estimators in nonparametric regression model with heteroscedastic Gaussian noise. Focusing on the exponentially weighted aggregate, we prove a PACBayesian type inequality that leads to sharp oracle inequalities in ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
We consider the problem of combining a (possibly uncountably infinite) set of affine estimators in nonparametric regression model with heteroscedastic Gaussian noise. Focusing on the exponentially weighted aggregate, we prove a PACBayesian type inequality that leads to sharp oracle inequalities in discrete but also in continuous settings. The framework is general enough to cover the combinations of various procedures such as least square regression, kernel ridge regression, shrinking estimators and many other estimators used in the literature on statistical inverse problems. As a consequence, we show that the proposed aggregate provides an adaptive estimator in the exact minimax sense without neither discretizing the range of tuning parameters nor splitting the set of observations. We also illustrate numerically the good performance achieved by the exponentially weighted aggregate.
Sparse estimation by exponential weighting
 Statist. Sci
, 2012
"... Abstract. Consider a regression model with fixed design and Gaussian noise where the regression function can potentially be well approximated by a function that admits a sparse representation in a given dictionary. This paper resorts to exponential weights to exploit this underlying sparsity by impl ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
Abstract. Consider a regression model with fixed design and Gaussian noise where the regression function can potentially be well approximated by a function that admits a sparse representation in a given dictionary. This paper resorts to exponential weights to exploit this underlying sparsity by implementing the principle of sparsity pattern aggregation. This model selection take on sparse estimation allows us to derive sparsity oracle inequalities in several popular frameworks, including ordinary sparsity, fused sparsity and group sparsity. One striking aspect of these theoretical results is that they hold under no condition in the dictionary. Moreover, we describe an efficient implementation of the sparsity pattern aggregation principle that compares favorably to stateoftheart procedures on some basic numerical examples. Key words and phrases: Highdimensional regression, exponential weights, sparsity, fused sparsity, group sparsity, sparsity oracle inequalities, sparsity pattern aggregation, sparsity prior, sparse regression. 1.
Sparsity regret bounds for individual sequences in online linear regression
 JMLR Workshop and Conference Proceedings, 19 (COLT 2011 Proceedings):377–396
, 2011
"... We consider the problem of online linear regression on arbitrary deterministic sequences when the ambient dimension d can be much larger than the number of time rounds T. We introduce the notion of sparsity regret bound, which is a deterministic online counterpart of recent risk bounds derived in th ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
We consider the problem of online linear regression on arbitrary deterministic sequences when the ambient dimension d can be much larger than the number of time rounds T. We introduce the notion of sparsity regret bound, which is a deterministic online counterpart of recent risk bounds derived in the stochastic setting under a sparsity scenario. We prove such regret bounds for an onlinelearning algorithm called SeqSEW and based on exponential weighting and datadriven truncation. In a second part we apply a parameterfree version of this algorithm to the stochastic setting (regression model with random design). This yields risk bounds of the same flavor as in Dalalyan and Tsybakov (2012a) but which solve two questions left open therein. In particular our risk bounds are adaptive (up to a logarithmic factor) to the unknown variance of the noise if the latter is Gaussian. We also address the regression model with fixed design.
Linear regression with random projections
, 2010
"... We investigate a method for regression that makes use of a randomly generated subspace GP ⊂ F (of finite dimension P) of a given large (possibly infinite) dimensional function space F, for example, L2([0,1] d;R). GP is defined as the span of P random features that are linear combinations of a basis ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
We investigate a method for regression that makes use of a randomly generated subspace GP ⊂ F (of finite dimension P) of a given large (possibly infinite) dimensional function space F, for example, L2([0,1] d;R). GP is defined as the span of P random features that are linear combinations of a basis functions of F weighted by random Gaussian i.i.d. coefficients. We show practical motivation for the use of this approach, detail the link that this random projections method share with RKHS and Gaussian objects theory and prove, both in deterministic and random design, approximation error bounds when searching for the best regression function in GP rather than in F, and derive excess risk bounds for a specific regression algorithm (least squares regression in GP). This paper stresses the motivation to study such methods, thus the analysis developed is kept simple for explanations purpose and leaves room for future developments.
Sparse SingleIndex Model
, 2011
"... Let (X,Y) be a random pair taking values in R p × R. In the socalled singleindex model, one has Y = f ⋆ (θ ⋆T X) + W, where f ⋆ is an unknown univariate measurable function, θ ⋆ is an unknown vector in R d, and W denotes a random noise satisfying E[WX] = 0. The singleindex model is known to offe ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
Let (X,Y) be a random pair taking values in R p × R. In the socalled singleindex model, one has Y = f ⋆ (θ ⋆T X) + W, where f ⋆ is an unknown univariate measurable function, θ ⋆ is an unknown vector in R d, and W denotes a random noise satisfying E[WX] = 0. The singleindex model is known to offer a flexible way to model a variety of highdimensional realworld phenomena. However, despite its relative simplicity, this dimension reduction scheme is faced with severe complications as soon as the underlying dimension becomes 1 Corresponding author.
PACBayesianbound for gaussianprocessregressionand multiple kerneladditive model
 In COLT, arXiv:1102.3616v1 [math.ST
, 2012
"... We develop a PACBayesian bound for the convergence rate of a Bayesian variant of Multiple Kernel Learning (MKL) that is an estimation method for the sparse additive model. Standard analyses for MKL require a strong condition on the design analogous to the restricted eigenvalue condition for the ana ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We develop a PACBayesian bound for the convergence rate of a Bayesian variant of Multiple Kernel Learning (MKL) that is an estimation method for the sparse additive model. Standard analyses for MKL require a strong condition on the design analogous to the restricted eigenvalue condition for the analysis of Lasso and Dantzig selector. In this paper, we apply PACBayesian technique to show that the Bayesian variant of MKL achieves the optimal convergence rate without such strong conditions on the design. Basically our approach is a combination of PACBayes and recently developed theories of nonparametric Gaussian process regressions. Our bound is developed in a fixed design situation. Our analysis includes the existing result of Gaussian process as a special case and the proof is much simpler by virtue of PACBayesian technique. We also give the convergence rate of the Bayesian variant of Group Lasso as a finite dimensional special case.
Prediction of time series by statistical learning: general losses and fast rates
, 2012
"... Abstract: We establish rates of convergences in time series forecasting using the statistical learning approach based on oracle inequalities. A series of papers (e.g. [MM98, Mei00, BCV01, AW12]) extends the oracle inequalities obtained for iid observations to time series under weak dependence condit ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract: We establish rates of convergences in time series forecasting using the statistical learning approach based on oracle inequalities. A series of papers (e.g. [MM98, Mei00, BCV01, AW12]) extends the oracle inequalities obtained for iid observations to time series under weak dependence conditions. Given a family of predictors and n observations, oracle inequalities state that a predictor forecasts the series as well as the best predictor in the family up to a remainder term ∆n. Using the PACBayesian approach, we establishunderweak dependence conditionsoracleinequalitieswithoptimal rates of convergence ∆n. We extend results given in [AW12] for the absolute loss function to any Lipschitz loss function with rates ∆n ∼ √ c(Θ)/n where c(Θ) measures the complexity of the model. We apply the method for quantile loss functions to forecast the french GDP. Under additional conditions on the loss functions (satisfied by the quadratic loss function) and on the time series, we refine the rates of convergence to ∆n ∼ c(Θ)/n. We achieve for the first time these fast rates for uniformly mixing processes. These rates are known to be optimal in the iid case, see [Tsy03], and for individual sequences, see [CBL06]. In particular, we generalize the results of [DT08] on sparse regression estimation to the case of autoregression. ∗ We deeply thank Matthieu Cornec (INSEE) for useful discussions, and for providing the data with detailed explanations. We would like to thank Prs. Olivier Catoni, Paul Doukhan, Pascal Massart and Gilles Stoltz for useful comments. We want to mention that a preliminary
Optimal aggregation of affine estimators
 Journal of Machine DALALYAN AND SALMON Learning Research  Proceedings Track 19
, 2011
"... We consider the problem of combining a (possibly uncountably infinite) set of affine estimators in nonparametric regression model with heteroscedastic Gaussian noise. Focusing on the exponentially weighted aggregate, we prove a PACBayesian type inequality that leads to sharp oracle inequalities in ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
We consider the problem of combining a (possibly uncountably infinite) set of affine estimators in nonparametric regression model with heteroscedastic Gaussian noise. Focusing on the exponentially weighted aggregate, we prove a PACBayesian type inequality that leads to sharp oracle inequalities in discrete but also in continuous settings. The framework is general enough to cover the combinations of various procedures—such as the least square regression, the kernel ridge regression, the shrinkage estimators, etc.—used in the literature on statistical inverse problems. As a consequence, we show that the proposed aggregate provides an adaptive estimator in the exact minimax sense without neither discretizing the range of tuning parameters nor splitting the set of observations. We also illustrate numerically the good performance achieved by the exponentially weighted aggregate. 1