Results 1  10
of
92
Structured variable selection with sparsityinducing norms
, 2011
"... We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to ov ..."
Abstract

Cited by 187 (27 self)
 Add to MetaCart
(Show Context)
We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to overlap. This leads to a specific set of allowed nonzero patterns for the solutions of such problems. We first explore the relationship between the groups defining the norm and the resulting nonzero patterns, providing both forward and backward algorithms to go back and forth from groups to patterns. This allows the design of norms adapted to specific prior knowledge expressed in terms of nonzero patterns. We also present an efficient active set algorithm, and analyze the consistency of variable selection for leastsquares linear regression in low and highdimensional settings.
On the conditions used to prove oracle results for the Lasso
 Electron. J. Stat
"... Abstract: Oracle inequalities and variable selection properties for the Lasso in linear models have been established under a variety of different assumptions on the design matrix. We show in this paper how the different conditions and concepts relate to each other. The restricted eigenvalue conditio ..."
Abstract

Cited by 103 (5 self)
 Add to MetaCart
(Show Context)
Abstract: Oracle inequalities and variable selection properties for the Lasso in linear models have been established under a variety of different assumptions on the design matrix. We show in this paper how the different conditions and concepts relate to each other. The restricted eigenvalue condition [2] or the slightly weaker compatibility condition [18] are sufficient for oracle results. We argue that both these conditions allow for a fairly general class of design matrices. Hence, optimality of the Lasso for prediction and estimation holds for more general situations than what it appears from coherence [5, 4] or restricted isometry [10] assumptions.
Adaptive forwardbackward greedy algorithm for learning sparse representations
 IEEE Trans. Inform. Theory
, 2011
"... Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with nonzero coefficients and reconstructing the target function from noisy observations. Two heuristics that ..."
Abstract

Cited by 101 (9 self)
 Add to MetaCart
(Show Context)
Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with nonzero coefficients and reconstructing the target function from noisy observations. Two heuristics that are widely used in practice are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever beneficial. We prove strong theoretical results showing that this procedure is effective in learning sparse representations. Experimental results support our theory. 1
A Probabilistic and RIPless Theory of Compressed Sensing
, 2010
"... This paper introduces a simple and very general theory of compressive sensing. In this theory, the sensing mechanism simply selects sensing vectors independently at random from a probability distribution F; it includes all models — e.g. Gaussian, frequency measurements — discussed in the literature, ..."
Abstract

Cited by 95 (3 self)
 Add to MetaCart
(Show Context)
This paper introduces a simple and very general theory of compressive sensing. In this theory, the sensing mechanism simply selects sensing vectors independently at random from a probability distribution F; it includes all models — e.g. Gaussian, frequency measurements — discussed in the literature, but also provides a framework for new measurement strategies as well. We prove that if the probability distribution F obeys a simple incoherence property and an isotropy property, one can faithfully recover approximately sparse signals from a minimal number of noisy measurements. The novelty is that our recovery results do not require the restricted isometry property (RIP) — they make use of a much weaker notion — or a random model for the signal. As an example, the paper shows that a signal with s nonzero entries can be faithfully recovered from about s log n Fourier coefficients that are contaminated with noise.
Nearideal model selection by ℓ1 minimization
, 2008
"... We consider the fundamental problem of estimating the mean of a vector y = Xβ + z, where X is an n × p design matrix in which one can have far more variables than observations and z is a stochastic error term—the socalled ‘p> n ’ setup. When β is sparse, or more generally, when there is a sparse ..."
Abstract

Cited by 84 (4 self)
 Add to MetaCart
We consider the fundamental problem of estimating the mean of a vector y = Xβ + z, where X is an n × p design matrix in which one can have far more variables than observations and z is a stochastic error term—the socalled ‘p> n ’ setup. When β is sparse, or more generally, when there is a sparse subset of covariates providing a close approximation to the unknown mean vector, we ask whether or not it is possible to accurately estimate Xβ using a computationally tractable algorithm. We show that in a surprisingly wide range of situations, the lasso happens to nearly select the best subset of variables. Quantitatively speaking, we prove that solving a simple quadratic program achieves a squared error within a logarithmic factor of the ideal mean squared error one would achieve with an oracle supplying perfect information about which variables should be included in the model and which variables should not. Interestingly, our results describe the average performance of the lasso; that is, the performance one can expect in an vast majority of cases where Xβ is a sparse or nearly sparse superposition of variables, but not in all cases. Our results are nonasymptotic and widely applicable since they simply require that pairs of predictor variables are not too collinear.
Visual classification with multitask joint sparse representation
 In CVPR
, 2010
"... Abstract — We address the problem of visual classification with multiple features and/or multiple instances. Motivated by the recent success of multitask joint covariate selection, we formulate this problem as a multitask joint sparse representation model to combine the strength of multiple features ..."
Abstract

Cited by 66 (1 self)
 Add to MetaCart
(Show Context)
Abstract — We address the problem of visual classification with multiple features and/or multiple instances. Motivated by the recent success of multitask joint covariate selection, we formulate this problem as a multitask joint sparse representation model to combine the strength of multiple features and/or instances for recognition. A joint sparsityinducing norm is utilized to enforce classlevel joint sparsity patterns among the multiple representation vectors. The proposed model can be efficiently optimized by a proximal gradient method. Furthermore, we extend our method to the setup where features are described in kernel matrices. We then investigate into two applications of our method to visual classification: 1) fusing multiple kernel features for object categorization and 2) robust face recognition in video with an ensemble of query images. Extensive experiments on challenging realworld data sets demonstrate that the proposed method is competitive to the stateoftheart methods in respective applications. Index Terms — Feature fusion, multitask learning, sparse representation, visual classification.
Sparse recovery with Orthogonal Matching Pursuit under RIP
 2010. Series DFGSPP 1324 http://www.dfgspp1324.de Reports
"... Abstract—This paper presents a new analysis for the orthogonal matching pursuit (OMP) algorithm. It is shown that if the restricted isometry property (RIP) is satisfied at sparsity level O ( ¯ k), then OMP can stably recover a ¯ ksparse signal in 2norm under measurement noise. For compressed sens ..."
Abstract

Cited by 62 (2 self)
 Add to MetaCart
(Show Context)
Abstract—This paper presents a new analysis for the orthogonal matching pursuit (OMP) algorithm. It is shown that if the restricted isometry property (RIP) is satisfied at sparsity level O ( ¯ k), then OMP can stably recover a ¯ ksparse signal in 2norm under measurement noise. For compressed sensing applications, this result implies that in order to uniformly recover a ¯ ksparse signal in R d, only O ( ¯ k ln d) random projections are needed. This analysis improves some earlier results on OMP depending on stronger conditions that can only be satisfied with Ω ( ¯ k 2 ln d) or Ω ( ¯ k 1.6 ln d) random projections. Index Terms—Estimation theory, feature selection, greedy algorithms, statistical learning, sparse recovery I.
On the Asymptotic Properties of The Group Lasso Estimator in Least Squares Problems
"... We derive conditions guaranteeing estimation and model selection consistency, oracle properties and persistence for the grouplasso estimator and model selector proposed by Yuan and Lin (2006) for least squares problems when the covariates have a natural grouping structure. We study both the case of ..."
Abstract

Cited by 58 (0 self)
 Add to MetaCart
(Show Context)
We derive conditions guaranteeing estimation and model selection consistency, oracle properties and persistence for the grouplasso estimator and model selector proposed by Yuan and Lin (2006) for least squares problems when the covariates have a natural grouping structure. We study both the case of a fixeddimensional parameter space with increasing sample size and the case when the model complexity changes with the sample size. 1
Structured Sparsity through Convex Optimization
, 2012
"... Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the ℓ1norm. In this paper, we consider sit ..."
Abstract

Cited by 47 (6 self)
 Add to MetaCart
Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the ℓ1norm. In this paper, we consider situations where we are not only interested in sparsity, but where some structural prior knowledge is available as well. We show that the ℓ1norm can then be extended to structured norms built on either disjoint or overlapping groups of variables, leading to a flexible framework that can deal with various structures. We present applications to unsupervised learning, for structured sparse principal component analysis and hierarchical dictionary learning, and to supervised learning in the context of nonlinear variable selection.
Selfconcordant analysis for logistic regression
"... Most of the nonasymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closedform expressions. In this paper, we use and extend tools from the convex optimization literature, namely selfconcordant functions, to provide simple extensio ..."
Abstract

Cited by 46 (14 self)
 Add to MetaCart
(Show Context)
Most of the nonasymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closedform expressions. In this paper, we use and extend tools from the convex optimization literature, namely selfconcordant functions, to provide simple extensions of theoretical results for the square loss to the logistic loss. We apply the extension techniques to logistic regression with regularization by the ℓ2norm and regularization by the ℓ1norm, showing that new results for binary classification through logistic regression can be easily derived from corresponding results for leastsquares regression. 1