Results 11  20
of
207
Lassotype recovery of sparse representations for highdimensional data
 ANNALS OF STATISTICS
, 2009
"... The Lasso is an attractive technique for regularization and variable selection for highdimensional data, where the number of predictor variables pn is potentially much larger than the number of samples n. However, it was recently discovered that the sparsity pattern of the Lasso estimator can only ..."
Abstract

Cited by 250 (14 self)
 Add to MetaCart
(Show Context)
The Lasso is an attractive technique for regularization and variable selection for highdimensional data, where the number of predictor variables pn is potentially much larger than the number of samples n. However, it was recently discovered that the sparsity pattern of the Lasso estimator can only be asymptotically identical to the true sparsity pattern if the design matrix satisfies the socalled irrepresentable condition. The latter condition can easily be violated in the presence of highly correlated variables. Here we examine the behavior of the Lasso estimators if the irrepresentable condition is relaxed. Even though the Lasso cannot recover the correct sparsity pattern, we show that the estimator is still consistent in the ℓ2norm sense for fixed designs under conditions on (a) the number sn of nonzero components of the vector βn and (b) the minimal singular values of design matrices that are induced by selecting small subsets of variables. Furthermore, a rate of convergence result is obtained on the ℓ2 error with an appropriate choice of the smoothing parameter. The rate is shown to be
Signal recovery from partial information via Orthogonal Matching Pursuit
 IEEE TRANS. INFORM. THEORY
, 2005
"... This article demonstrates theoretically and empirically that a greedy algorithm called Orthogonal Matching Pursuit (OMP) can reliably recover a signal with m nonzero entries in dimension d given O(m ln d) random linear measurements of that signal. This is a massive improvement over previous results ..."
Abstract

Cited by 191 (8 self)
 Add to MetaCart
This article demonstrates theoretically and empirically that a greedy algorithm called Orthogonal Matching Pursuit (OMP) can reliably recover a signal with m nonzero entries in dimension d given O(m ln d) random linear measurements of that signal. This is a massive improvement over previous results for OMP, which require O(m 2) measurements. The new results for OMP are comparable with recent results for another algorithm called Basis Pursuit (BP). The OMP algorithm is much faster and much easier to implement, which makes it an attractive alternative to BP for signal recovery problems.
The sparsity and bias of the lasso selection in highdimensional linear regression. Ann. Statist. Volume 36, Number 4, 15671594. Alexandre Belloni Duke University Fuqua
 School of Business 1 Towerview Drive Durham, NC 277080120 PO Box 90120 Email: abn5@duke.edu Victor Chernozhukov Massachusetts Institute of Technology Department of Economics and Operations research Center 50 Memorial Drive Room E52262f Cambridge, MA 02
, 2008
"... showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent, even when the number of variables is of greater order than the sample size. Zhao and Yu [(2006) J. Machine Learning Research 7 2541–2567] formalized the neighborho ..."
Abstract

Cited by 191 (27 self)
 Add to MetaCart
showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent, even when the number of variables is of greater order than the sample size. Zhao and Yu [(2006) J. Machine Learning Research 7 2541–2567] formalized the neighborhood stability condition in the context of linear regression as a strong irrepresentable condition. That paper showed that under this condition, the LASSO selects exactly the set of nonzero regression coefficients, provided that these coefficients are bounded away from zero at a certain rate. In this paper, the regression coefficients outside an ideal model are assumed to be small, but not necessarily zero. Under a sparse Riesz condition on the correlation of design variables, we prove that the LASSO selects a model of the correct order of dimensionality, controls the bias of the selected model at a level determined by the contributions of small regression coefficients and threshold bias, and selects all coefficients of greater order than the bias of the selected model. Moreover, as a consequence of this rate consistency of the LASSO in model selection, it is proved that the sum of error squares for the mean response and the ℓαloss for the regression coefficients converge at the best possible rates under the given conditions. An interesting aspect of our results is that the logarithm of the number of variables can be of the same order as the sample size for certain random dependent designs. 1. Introduction. Consider
The Convex Geometry of Linear Inverse Problems
, 2010
"... In applications throughout science and engineering one is often faced with the challenge of solving an illposed inverse problem, where the number of available measurements is smaller than the dimension of the model to be estimated. However in many practical situations of interest, models are constr ..."
Abstract

Cited by 189 (20 self)
 Add to MetaCart
In applications throughout science and engineering one is often faced with the challenge of solving an illposed inverse problem, where the number of available measurements is smaller than the dimension of the model to be estimated. However in many practical situations of interest, models are constrained structurally so that they only have a few degrees of freedom relative to their ambient dimension. This paper provides a general framework to convert notions of simplicity into convex penalty functions, resulting in convex optimization solutions to linear, underdetermined inverse problems. The class of simple models considered are those formed as the sum of a few atoms from some (possibly infinite) elementary atomic set; examples include wellstudied cases such as sparse vectors (e.g., signal processing, statistics) and lowrank matrices (e.g., control, statistics), as well as several others including sums of a few permutations matrices (e.g., ranked elections, multiobject tracking), lowrank tensors (e.g., computer vision, neuroscience), orthogonal matrices (e.g., machine learning), and atomic measures (e.g., system identification). The convex programming formulation is based on minimizing the norm induced by the convex hull of the atomic set; this norm is referred to as the atomic norm. The facial
Nearly unbiased variable selection under minimax concave penalty
 Annals of Statistics
, 2010
"... ar ..."
Informationtheoretic limits on sparsity recovery in the highdimensional and noisy setting
, 2007
"... Abstract—The problem of sparsity pattern or support set recovery refers to estimating the set of nonzero coefficients of an un3 p known vector 2 based on a set of n noisy observations. It arises in a variety of settings, including subset selection in regression, graphical model selection, signal de ..."
Abstract

Cited by 131 (2 self)
 Add to MetaCart
Abstract—The problem of sparsity pattern or support set recovery refers to estimating the set of nonzero coefficients of an un3 p known vector 2 based on a set of n noisy observations. It arises in a variety of settings, including subset selection in regression, graphical model selection, signal denoising, compressive sensing, and constructive approximation. The sample complexity of a given method for subset recovery refers to the scaling of the required sample size n as a function of the signal dimension p, sparsity index k (number of nonzeroes in 3), as well as the minimum value min of 3 over its support and other parameters of measurement matrix. This paper studies the informationtheoretic limits of sparsity recovery: in particular, for a noisy linear observation model based on random measurement matrices drawn from general Gaussian measurement matrices, we derive both a set of sufficient conditions for exact support recovery using an exhaustive search decoder, as well as a set of necessary conditions that any decoder, regardless of its computational complexity, must satisfy for exact support recovery. This analysis of fundamental limits complements our previous work on sharp thresholds for support set recovery over the same set of random measurement ensembles using the polynomialtime Lasso method (`1constrained quadratic programming). Index Terms—Compressed sensing, `1relaxation, Fano’s method, highdimensional statistical inference, informationtheoretic
For most large underdetermined systems of equations, the minimal l1norm nearsolution approximates the sparsest nearsolution
 Comm. Pure Appl. Math
, 2004
"... We consider inexact linear equations y ≈ Φα where y is a given vector in R n, Φ is a given n by m matrix, and we wish to find an α0,ɛ which is sparse and gives an approximate solution, obeying �y − Φα0,ɛ�2 ≤ ɛ. In general this requires combinatorial optimization and so is considered intractable. On ..."
Abstract

Cited by 122 (1 self)
 Add to MetaCart
(Show Context)
We consider inexact linear equations y ≈ Φα where y is a given vector in R n, Φ is a given n by m matrix, and we wish to find an α0,ɛ which is sparse and gives an approximate solution, obeying �y − Φα0,ɛ�2 ≤ ɛ. In general this requires combinatorial optimization and so is considered intractable. On the other hand, the ℓ 1 minimization problem min �α�1 subject to �y − Φα�2 ≤ ɛ, is convex, and is considered tractable. We show that for most Φ the solution ˆα1,ɛ = ˆα1,ɛ(y, Φ) of this problem is quite generally a good approximation for ˆα0,ɛ. We suppose that the columns of Φ are normalized to unit ℓ 2 norm 1 and we place uniform measure on such Φ. We study the underdetermined case where m ∼ An, A> 1 and prove the existence of ρ = ρ(A) and C> 0 so that for large n, and for all Φ’s except a negligible fraction, the following approximate sparse solution property of Φ holds: For every y having an approximation �y − Φα0�2 ≤ ɛ by a coefficient vector α0 ∈ R m with fewer than ρ · n nonzeros, we have �ˆα1,ɛ − α0�2 ≤ C · ɛ. This has two implications. First: for most Φ, whenever the combinatorial optimization result α0,ɛ would be very sparse, ˆα1,ɛ is a good approximation to α0,ɛ. Second: suppose we are given noisy data obeying y = Φα0 + z where the unknown α0 is known to be sparse and the noise �z�2 ≤ ɛ. For most Φ, noisetolerant ℓ 1minimization will stably recover α0 from y in the presence of noise z. We study also the barelydetermined case m = n and reach parallel conclusions by slightly different arguments. The techniques include the use of almostspherical sections in Banach space theory and concentration of measure for eigenvalues of random matrices.
The LittlewoodOfford problem and invertibility of random matrices.
 Adv. Math.
, 2008
"... Abstract We prove two basic conjectures on the distribution of the smallest singular value of random n×n matrices with independent entries. Under minimal moment assumptions, we show that the smallest singular value is of order n −1/2 , which is optimal for Gaussian matrices. Moreover, we give a opt ..."
Abstract

Cited by 105 (18 self)
 Add to MetaCart
(Show Context)
Abstract We prove two basic conjectures on the distribution of the smallest singular value of random n×n matrices with independent entries. Under minimal moment assumptions, we show that the smallest singular value is of order n −1/2 , which is optimal for Gaussian matrices. Moreover, we give a optimal estimate on the tail probability. This comes as a consequence of a new and essentially sharp estimate in the LittlewoodOfford problem: for i.i.d. random variables X k and real numbers a k , determine the probability p that the sum k a k X k lies near some number v. For arbitrary coefficients a k of the same order of magnitude, we show that they essentially lie in an arithmetic progression of length 1/p. Published by Elsevier Inc.
Highdimensional graphical model selection using ℓ1regularized logistic regression
 Advances in Neural Information Processing Systems 19
, 2007
"... We consider the problem of estimating the graph structure associated with a discrete Markov random field. We describe a method based on ℓ1regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an ℓ1constraint. Our fram ..."
Abstract

Cited by 102 (2 self)
 Add to MetaCart
We consider the problem of estimating the graph structure associated with a discrete Markov random field. We describe a method based on ℓ1regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an ℓ1constraint. Our framework applies to the highdimensional setting, in which both the number of nodes p and maximum neighborhood sizes d are allowed to grow as a function of the number of observations n. Our main results provide sufficient conditions on the triple (n, p, d) for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. Under certain assumptions on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes n = Ω(d 3 log p), with the error decaying as O(exp(−Cn/d 3)) for some constant C. If these same assumptions are imposed directly on the sample matrices, we show that n = Ω(d 2 log p) samples are sufficient.
Minimax rates of estimation for highdimensional linear regression over balls
, 2009
"... Abstract—Consider the highdimensional linear regression model,where is an observation vector, is a design matrix with, is an unknown regression vector, and is additive Gaussian noise. This paper studies the minimax rates of convergence for estimating in eitherloss andprediction loss, assuming tha ..."
Abstract

Cited by 97 (19 self)
 Add to MetaCart
(Show Context)
Abstract—Consider the highdimensional linear regression model,where is an observation vector, is a design matrix with, is an unknown regression vector, and is additive Gaussian noise. This paper studies the minimax rates of convergence for estimating in eitherloss andprediction loss, assuming that belongs to anball for some.Itisshown that under suitable regularity conditions on the design matrix, the minimax optimal rate inloss andprediction loss scales as. The analysis in this paper reveals that conditions on the design matrix enter into the rates forerror andprediction error in complementary ways in the upper and lower bounds. Our proofs of the lower bounds are information theoretic in nature, based on Fano’s inequality and results on the metric entropy of the balls, whereas our proofs of the upper bounds are constructive, involving direct analysis of least squares overballs. For the special case, corresponding to models with an exact sparsity constraint, our results show that although computationally efficientbased methods can achieve the minimax rates up to constant factors, they require slightly stronger assumptions on the design matrix than optimal algorithms involving leastsquares over theball. Index Terms—Compressed sensing, minimax techniques, regression analysis. I.