Results 1  10
of
1,309
Regularization and variable selection via the Elastic Net.
 J. R. Stat. Soc. Ser. B
, 2005
"... Abstract We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, wher ..."
Abstract

Cited by 973 (11 self)
 Add to MetaCart
(Show Context)
Abstract We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in (out) the model together. The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the p n case. An efficient algorithm called LARSEN is proposed for computing elastic net regularization paths efficiently, much like the LARS algorithm does for the lasso.
Regularization paths for generalized linear models via coordinate descent
, 2009
"... We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, twoclass logistic regression, and multinomial regression problems while the penalties include ℓ1 (the lasso), ℓ2 (ridge regression) and mixtures of the two (the elastic ..."
Abstract

Cited by 724 (15 self)
 Add to MetaCart
We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, twoclass logistic regression, and multinomial regression problems while the penalties include ℓ1 (the lasso), ℓ2 (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.
Guaranteed minimumrank solutions of linear matrix equations via nuclear norm minimization,”
 SIAM Review,
, 2010
"... Abstract The affine rank minimization problem consists of finding a matrix of minimum rank that satisfies a given system of linear equality constraints. Such problems have appeared in the literature of a diverse set of fields including system identification and control, Euclidean embedding, and col ..."
Abstract

Cited by 562 (20 self)
 Add to MetaCart
(Show Context)
Abstract The affine rank minimization problem consists of finding a matrix of minimum rank that satisfies a given system of linear equality constraints. Such problems have appeared in the literature of a diverse set of fields including system identification and control, Euclidean embedding, and collaborative filtering. Although specific instances can often be solved with specialized algorithms, the general affine rank minimization problem is NPhard, because it contains vector cardinality minimization as a special case. In this paper, we show that if a certain restricted isometry property holds for the linear transformation defining the constraints, the minimum rank solution can be recovered by solving a convex optimization problem, namely the minimization of the nuclear norm over the given affine space. We present several random ensembles of equations where the restricted isometry property holds with overwhelming probability, provided the codimension of the subspace is Ω(r(m + n) log mn), where m, n are the dimensions of the matrix, and r is its rank. The techniques used in our analysis have strong parallels in the compressed sensing framework. We discuss how affine rank minimization generalizes this preexisting concept and outline a dictionary relating concepts from cardinality minimization to those of rank minimization. We also discuss several algorithmic approaches to solving the norm minimization relaxations, and illustrate our results with numerical examples.
Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems
 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING
, 2007
"... Many problems in signal processing and statistical inference involve finding sparse solutions to underdetermined, or illconditioned, linear systems of equations. A standard approach consists in minimizing an objective function which includes a quadratic (squared ℓ2) error term combined with a spa ..."
Abstract

Cited by 539 (17 self)
 Add to MetaCart
Many problems in signal processing and statistical inference involve finding sparse solutions to underdetermined, or illconditioned, linear systems of equations. A standard approach consists in minimizing an objective function which includes a quadratic (squared ℓ2) error term combined with a sparsenessinducing (ℓ1) regularization term.Basis pursuit, the least absolute shrinkage and selection operator (LASSO), waveletbased deconvolution, and compressed sensing are a few wellknown examples of this approach. This paper proposes gradient projection (GP) algorithms for the boundconstrained quadratic programming (BCQP) formulation of these problems. We test variants of this approach that select the line search parameters in different ways, including techniques based on the BarzilaiBorwein method. Computational experiments show that these GP approaches perform well in a wide range of applications, often being significantly faster (in terms of computation time) than competing methods. Although the performance of GP methods tends to degrade as the regularization term is deemphasized, we show how they can be embedded in a continuation scheme to recover their efficient practical performance.
On Model Selection Consistency of Lasso
, 2006
"... Sparsity or parsimony of statistical models is crucial for their proper interpretations, as in sciences and social sciences. Model selection is a commonly used method to find such models, but usually involves a computationally heavy combinatorial search. Lasso (Tibshirani, 1996) is now being used ..."
Abstract

Cited by 477 (20 self)
 Add to MetaCart
Sparsity or parsimony of statistical models is crucial for their proper interpretations, as in sciences and social sciences. Model selection is a commonly used method to find such models, but usually involves a computationally heavy combinatorial search. Lasso (Tibshirani, 1996) is now being used as a computationally feasible alternative to model selection.
SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2007
"... We exhibit an approximate equivalence between the Lasso estimator and Dantzig selector. For both methods we derive parallel oracle inequalities for the prediction risk in the general nonparametric regression model, as well as bounds on the ℓp estimation loss for 1 ≤ p ≤ 2 in the linear model when th ..."
Abstract

Cited by 472 (11 self)
 Add to MetaCart
(Show Context)
We exhibit an approximate equivalence between the Lasso estimator and Dantzig selector. For both methods we derive parallel oracle inequalities for the prediction risk in the general nonparametric regression model, as well as bounds on the ℓp estimation loss for 1 ≤ p ≤ 2 in the linear model when the number of variables can be much larger than the sample size.
Efficient sparse coding algorithms
 In NIPS
, 2007
"... Sparse coding provides a class of algorithms for finding succinct representations of stimuli; given only unlabeled input data, it discovers basis functions that capture higherlevel features in the data. However, finding sparse codes remains a very difficult computational problem. In this paper, we ..."
Abstract

Cited by 445 (14 self)
 Add to MetaCart
(Show Context)
Sparse coding provides a class of algorithms for finding succinct representations of stimuli; given only unlabeled input data, it discovers basis functions that capture higherlevel features in the data. However, finding sparse codes remains a very difficult computational problem. In this paper, we present efficient sparse coding algorithms that are based on iteratively solving two convex optimization problems: an L1regularized least squares problem and an L2constrained least squares problem. We propose novel algorithms to solve both of these optimization problems. Our algorithms result in a significant speedup for sparse coding, allowing us to learn larger sparse codes than possible with previously described algorithms. We apply these algorithms to natural images and demonstrate that the inferred sparse codes exhibit endstopping and nonclassical receptive field surround suppression and, therefore, may provide a partial explanation for these two phenomena in V1 neurons. 1
Sparse Reconstruction by Separable Approximation
, 2007
"... Finding sparse approximate solutions to large underdetermined linear systems of equations is a common problem in signal/image processing and statistics. Basis pursuit, the least absolute shrinkage and selection operator (LASSO), waveletbased deconvolution and reconstruction, and compressed sensing ..."
Abstract

Cited by 373 (38 self)
 Add to MetaCart
Finding sparse approximate solutions to large underdetermined linear systems of equations is a common problem in signal/image processing and statistics. Basis pursuit, the least absolute shrinkage and selection operator (LASSO), waveletbased deconvolution and reconstruction, and compressed sensing (CS) are a few wellknown areas in which problems of this type appear. One standard approach is to minimize an objective function that includes a quadratic (ℓ2) error term added to a sparsityinducing (usually ℓ1) regularizer. We present an algorithmic framework for the more general problem of minimizing the sum of a smooth convex function and a nonsmooth, possibly nonconvex, sparsityinducing function. We propose iterative methods in which each step is an optimization subproblem involving a separable quadratic term (diagonal Hessian) plus the original sparsityinducing term. Our approach is suitable for cases in which this subproblem can be solved much more rapidly than the original problem. In addition to solving the standard ℓ2 − ℓ1 case, our approach handles other problems, e.g., ℓp regularizers with p � = 1, or groupseparable (GS) regularizers. Experiments with CS problems show that our approach provides stateoftheart speed for the standard ℓ2 − ℓ1 problem, and is also efficient on problems with GS regularizers. Index Terms — sparse approximation, compressed sensing, optimization, reconstruction.
Probing the Pareto frontier for basis pursuit solutions
, 2008
"... The basis pursuit problem seeks a minimum onenorm solution of an underdetermined leastsquares problem. Basis pursuit denoise (BPDN) fits the leastsquares problem only approximately, and a single parameter determines a curve that traces the optimal tradeoff between the leastsquares fit and the ..."
Abstract

Cited by 365 (5 self)
 Add to MetaCart
The basis pursuit problem seeks a minimum onenorm solution of an underdetermined leastsquares problem. Basis pursuit denoise (BPDN) fits the leastsquares problem only approximately, and a single parameter determines a curve that traces the optimal tradeoff between the leastsquares fit and the onenorm of the solution. We prove that this curve is convex and continuously differentiable over all points of interest, and show that it gives an explicit relationship to two other optimization problems closely related to BPDN. We describe a rootfinding algorithm for finding arbitrary points on this curve; the algorithm is suitable for problems that are large scale and for those that are in the complex domain. At each iteration, a spectral gradientprojection method approximately minimizes a leastsquares problem with an explicit onenorm constraint. Only matrixvector operations are required. The primaldual solution of this problem gives function and derivative information needed for the rootfinding method. Numerical experiments on a comprehensive set of test problems demonstrate that the method scales well to large problems.