Results 1  10
of
52
A unified framework for highdimensional analysis of Mestimators with decomposable regularizers
"... ..."
Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions
 ANNALS OF STATISTICS,40(2):1171
, 2013
"... We analyze a class of estimators based on convex relaxation for solving highdimensional matrix decomposition problems. The observations are noisy realizations of a linear transformation X of the sum of an (approximately) low rank matrix � ⋆ with a second matrix Ɣ ⋆ endowed with a complementary for ..."
Abstract

Cited by 61 (8 self)
 Add to MetaCart
We analyze a class of estimators based on convex relaxation for solving highdimensional matrix decomposition problems. The observations are noisy realizations of a linear transformation X of the sum of an (approximately) low rank matrix � ⋆ with a second matrix Ɣ ⋆ endowed with a complementary form of lowdimensional structure; this setup includes many statistical models of interest, including factor analysis, multitask regression and robust covariance estimation. We derive a general theorem that bounds the Frobenius norm error for an estimate of the pair ( � ⋆,Ɣ ⋆ ) obtained by solving a convex optimization problem that combines the nuclear norm with a general decomposable regularizer. Our results use a “spikiness ” condition that is related to, but milder than, singular vector incoherence. We specialize our general result to two cases that have been studied in past work: low rank plus an entrywise sparse matrix, and low rank plus a columnwise sparse matrix. For both models, our theory yields nonasymptotic Frobenius error bounds for both deterministic and stochastic noise matrices, and applies to matrices � ⋆ that can be exactly or approximately low rank, and matrices Ɣ ⋆ that can be exactly or approximately sparse. Moreover, for the case of stochastic noise matrices and the identity observation operator, we establish matching lower bounds on the minimax error. The sharpness of our nonasymptotic predictions is confirmed by numerical simulations.
Tight conditions for consistent variable selection in high dimensional nonparametric regression
"... ..."
(Show Context)
HighDimensional Feature Selection by FeatureWise Kernelized Lasso
, 2013
"... The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and outpu ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and output values. In this paper, we consider a featurewise kernelized Lasso for capturing nonlinear inputoutput dependency. We first show that, with particular choices of kernel functions, nonredundant features with strong statistical dependence on output values can be found in terms of kernelbased independence measures such as the HilbertSchmidt independence criterion (HSIC). We then show that the globally optimal solution can be efficiently computed; this makes the approach scalable to highdimensional problems. The effectiveness of the proposed method is demonstrated through feature selection experiments for classification and regression with thousands of features.
Divide and Conquer Kernel Ridge Regression: A Distributed Algorithm with Minimax Optimal Rates
 COLT’12, JMLR: Workshop and Conference Proceedings vol 30
, 2013
"... ar ..."
(Show Context)
Fast learning rate of multiple kernel learning: Tradeoff between sparsity and smoothness. The Annals of Statistics
, 2013
"... We investigate the learning rate of multiple kernel leaning (MKL) with ℓ1 and elasticnet regularizations. The elasticnet regularization is a composition of an ℓ1regularizer for inducing the sparsity and an ℓ2regularizer for controlling the smoothness. We focus on a sparse setting where the total ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
We investigate the learning rate of multiple kernel leaning (MKL) with ℓ1 and elasticnet regularizations. The elasticnet regularization is a composition of an ℓ1regularizer for inducing the sparsity and an ℓ2regularizer for controlling the smoothness. We focus on a sparse setting where the total number of kernels is large but the number of nonzero components of the ground truth is relatively small, and show sharper convergence rates than the learning rates ever shown for both ℓ1 and elasticnet regularizations. Our analysis shows there appears a tradeoff between the sparsity and the smoothness when it comes to selecting which of ℓ1 and elasticnet regularizations to use; if the ground truth is smooth, the elasticnet regularization is preferred, otherwise the ℓ1 regularization is preferred. 1
Nonparametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels
"... We consider regularized risk minimization in a large dictionary of Reproducing kernel Hilbert Spaces (RKHSs) over which the target function has a sparse representation. This setting, commonly referred to as Sparse Multiple Kernel Learning (MKL), may be viewed as the nonparametric extension of group ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
We consider regularized risk minimization in a large dictionary of Reproducing kernel Hilbert Spaces (RKHSs) over which the target function has a sparse representation. This setting, commonly referred to as Sparse Multiple Kernel Learning (MKL), may be viewed as the nonparametric extension of group sparsity in linear models. While the two dominant algorithmic strands of sparse learning, namely convex relaxations using l1 norm (e.g., Lasso) and greedy methods (e.g., OMP), have both been rigorously extended for group sparsity, the sparse MKL literature has so far mainly adopted the former with mild empirical success. In this paper, we close this gap by proposing a GroupOMP based framework for sparse MKL. Unlike l1MKL, our approach decouples the sparsity regularizer (via a direct l0 constraint) from the smoothness regularizer (via RKHS norms), which leads to better empirical performance and a simpler optimization procedure that only requires a blackbox singlekernel solver. The algorithmic development and empirical studies are complemented by theoretical analyses in terms of Rademacher generalization bounds and sparse recovery conditions analogous to those for OMP [27] and GroupOMP [16]. 1
PACBayesianbound for gaussianprocessregressionand multiple kerneladditive model
 In COLT, arXiv:1102.3616v1 [math.ST
, 2012
"... We develop a PACBayesian bound for the convergence rate of a Bayesian variant of Multiple Kernel Learning (MKL) that is an estimation method for the sparse additive model. Standard analyses for MKL require a strong condition on the design analogous to the restricted eigenvalue condition for the ana ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We develop a PACBayesian bound for the convergence rate of a Bayesian variant of Multiple Kernel Learning (MKL) that is an estimation method for the sparse additive model. Standard analyses for MKL require a strong condition on the design analogous to the restricted eigenvalue condition for the analysis of Lasso and Dantzig selector. In this paper, we apply PACBayesian technique to show that the Bayesian variant of MKL achieves the optimal convergence rate without such strong conditions on the design. Basically our approach is a combination of PACBayes and recently developed theories of nonparametric Gaussian process regressions. Our bound is developed in a fixed design situation. Our analysis includes the existing result of Gaussian process as a special case and the proof is much simpler by virtue of PACBayesian technique. We also give the convergence rate of the Bayesian variant of Group Lasso as a finite dimensional special case.
Group lasso for high dimensional sparse quantile regression models
, 2011
"... iv ..."
(Show Context)
Volume Ratio, Sparsity, and Minimaxity under Unitarily Invariant Norms
"... Abstract—This paper presents a nonasymptotic study of the minimax estimation of highdimensional mean and covariance matrices. Based on the convex geometry of finitedimensional Banach spaces, we develop a unified volume ratio approach for determining minimax estimation rates of unconstrained mean ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Abstract—This paper presents a nonasymptotic study of the minimax estimation of highdimensional mean and covariance matrices. Based on the convex geometry of finitedimensional Banach spaces, we develop a unified volume ratio approach for determining minimax estimation rates of unconstrained mean and covariance matrices under all unitarily invariant norms. We also establish the rate for estimating mean matrices with group sparsity, where the sparsity constraint introduces an additional term in the rate whose dependence on the norm differs completely from the rate of the unconstrained counterpart. I.