Results 1  10
of
27
A unified framework for highdimensional analysis of Mestimators with decomposable regularizers
"... ..."
Learning with Structured Sparsity
"... This paper investigates a new learning formulation called structured sparsity, which is a natural extension of the standard sparsity concept in statistical learning and compressive sensing. By allowing arbitrary structures on the feature set, this concept generalizes the group sparsity idea. A gener ..."
Abstract

Cited by 127 (15 self)
 Add to MetaCart
This paper investigates a new learning formulation called structured sparsity, which is a natural extension of the standard sparsity concept in statistical learning and compressive sensing. By allowing arbitrary structures on the feature set, this concept generalizes the group sparsity idea. A general theory is developed for learning with structured sparsity, based on the notion of coding complexity associated with the structure. Moreover, a structured greedy algorithm is proposed to efficiently solve the structured sparsity problem. Experiments demonstrate the advantage of structured sparsity over standard sparsity. 1.
Estimation of (near) lowrank matrices with noise and highdimensional scaling
"... We study an instance of highdimensional statistical inference in which the goal is to use N noisy observations to estimate a matrix Θ ∗ ∈ R k×p that is assumed to be either exactly low rank, or “near ” lowrank, meaning that it can be wellapproximated by a matrix with low rank. We consider an Me ..."
Abstract

Cited by 95 (14 self)
 Add to MetaCart
We study an instance of highdimensional statistical inference in which the goal is to use N noisy observations to estimate a matrix Θ ∗ ∈ R k×p that is assumed to be either exactly low rank, or “near ” lowrank, meaning that it can be wellapproximated by a matrix with low rank. We consider an Mestimator based on regularization by the traceornuclearnormovermatrices, andanalyze its performance under highdimensional scaling. We provide nonasymptotic bounds on the Frobenius norm error that hold for a generalclassofnoisyobservationmodels,and apply to both exactly lowrank and approximately lowrank matrices. We then illustrate their consequences for a number of specific learning models, including lowrank multivariate or multitask regression, system identification in vector autoregressive processes, and recovery of lowrank matrices from random projections. Simulations show excellent agreement with the highdimensional scaling of the error predicted by our theory. 1.
SUPPORT UNION RECOVERY IN HIGHDIMENSIONAL MULTIVARIATE REGRESSION
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2010
"... In multivariate regression, a Kdimensional response vector is regressed upon a common set of p covariates, with a matrix B ∗ ∈ R p×K of regression coefficients. We study the behavior of the multivariate group Lasso, in which block regularization based on the ℓ1/ℓ2 norm is used for support union re ..."
Abstract

Cited by 78 (3 self)
 Add to MetaCart
In multivariate regression, a Kdimensional response vector is regressed upon a common set of p covariates, with a matrix B ∗ ∈ R p×K of regression coefficients. We study the behavior of the multivariate group Lasso, in which block regularization based on the ℓ1/ℓ2 norm is used for support union recovery, or recovery of the set of s rows for which B ∗ is nonzero. Under highdimensional scaling, we show that the multivariate group Lasso exhibits a threshold for the recovery of the exact row pattern with high probability over the random design and noise that is specified by the sample complexity parameter θ(n, p, s) : = n/[2ψ(B ∗ ) log(p − s)]. Here n is the sample size, and ψ(B ∗ ) is a sparsityoverlap function measuring a combination of the sparsities and overlaps of the Kregression coefficient vectors that constitute the model. We prove that the multivariate group Lasso succeeds for problem sequences (n, p, s) such that θ(n, p, s) exceeds a critical level θu, and fails for sequences such that θ(n, p, s) lies below a critical level θℓ. For the special case of the standard Gaussian ensemble, we show that θℓ = θu so that the characterization is sharp. The sparsityoverlap function ψ(B ∗ ) reveals that, if the design is uncorrelated on the active rows, ℓ1/ℓ2 regularization for multivariate regression never harms performance relative to an ordinary Lasso approach, and can yield substantial improvements in sample complexity (up to a factor of K) when the coefficient vectors are suitably orthogonal. For more general designs, it is possible for the ordinary Lasso to outperform the multivariate group Lasso. We complement our analysis with simulations that demonstrate the sharpness of our theoretical results, even for relatively small problems.
Structured Sparsity through Convex Optimization
, 2012
"... Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the ℓ1norm. In this paper, we consider sit ..."
Abstract

Cited by 47 (6 self)
 Add to MetaCart
Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the ℓ1norm. In this paper, we consider situations where we are not only interested in sparsity, but where some structural prior knowledge is available as well. We show that the ℓ1norm can then be extended to structured norms built on either disjoint or overlapping groups of variables, leading to a flexible framework that can deal with various structures. We present applications to unsupervised learning, for structured sparse principal component analysis and hierarchical dictionary learning, and to supervised learning in the context of nonlinear variable selection.
1penalized quantile regression in highdimensional sparse models. Available at arXiv:0904.2931
, 2009
"... We consider median regression and, more generally, a possibly infinite collection of quantile regressions in highdimensional sparse models. In these models, the number of regressors p is very large, possibly larger than the sample size n, but only at most s regressors have a nonzero impact on each ..."
Abstract

Cited by 20 (5 self)
 Add to MetaCart
We consider median regression and, more generally, a possibly infinite collection of quantile regressions in highdimensional sparse models. In these models, the number of regressors p is very large, possibly larger than the sample size n, but only at most s regressors have a nonzero impact on each conditional quantile of the response variable, where s grows more slowly than n. Since ordinary quantile regression is not consistent in this case, we consider 1penalized quantile regression (1QR), which penalizes the 1norm of regression coefficients, as well as the postpenalized QR estimator (post1QR), which applies ordinary QR to the model selected by 1QR. First, we show that under general conditions 1QR is consistent at the nearoracle rate√ s/n log(p ∨ n), uniformly in the compact set U ⊂ (0,1) of quantile indices. In deriving this result, we propose a partly pivotal, datadriven choice of the penalty level and show that it satisfies the requirements for achieving this rate. Second, we show that under similar conditions post1QR is consistent at the nearoracle rate s/n log(p ∨ n), uniformly over U, even if the 1QRselected models miss some components of the true models, and the rate could be even closer to the oracle rate otherwise. Third, we characterize conditions under which 1QR contains the true model as a submodel, and derive bounds on the dimension of the selected model, uniformly over U; we also provide conditions under which hardthresholding selects the minimal true model, uniformly over U. 1. Introduction. Quantile
A selective review of group selection in highdimensional models
 Statistical Science
, 2012
"... Abstract. Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a sele ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
Abstract. Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bilevel selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study. Key words and phrases: Bilevel selection, group LASSO, concave group selection, penalized regression, sparsity, oracle property. 1.
Multitask sparse discriminant analysis (MtSDA) with overlapping categories
 In AAAI
, 2010
"... Multitask learning aims at combining information across tasks to boost prediction performance, especially when the number of training samples is small and the number of predictors is very large. In this paper, we first extend the Sparse Discriminate Analysis (SDA) of Clemmensen et al.. We call this ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Multitask learning aims at combining information across tasks to boost prediction performance, especially when the number of training samples is small and the number of predictors is very large. In this paper, we first extend the Sparse Discriminate Analysis (SDA) of Clemmensen et al.. We call this Multitask Sparse Discriminate Analysis (MtSDA). MtSDA formulates multilabel prediction as a quadratic optimization problem whereas SDA obtains single labels via a nearest class mean rule. Second, we propose a class of equicorrelation matrices to use in MtSDA which includes the identity matrix. MtSDA with both matrices are compared with singletask learning (SVM and LDA+SVM) and multitask learning (HSML). The comparisons are made on real data sets in terms of AUC and Fmeasure. The data results show that MtSDA outperforms other methods substantially almost all the time and in some cases MtSDA with the equicorrelation matrix substantially outperforms MtSDA with identity matrix.
Union Support Recovery in Multitask Learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2010
"... We sharply characterize the performance of different penalization schemes for the problem of selecting the relevant variables in the multitask setting. Previous work focuses on the regression problem where conditions on the design matrix complicate the analysis. A clearer and simpler picture emerge ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
We sharply characterize the performance of different penalization schemes for the problem of selecting the relevant variables in the multitask setting. Previous work focuses on the regression problem where conditions on the design matrix complicate the analysis. A clearer and simpler picture emerges by studying the Normal means model. This model, often used in the field of statistics, is a simplified model that provides a laboratory for studying complex procedures.