#### DMCA

## An interior-point method for large-scale l1-regularized logistic regression (2007)

### Cached

### Download Links

Venue: | Journal of Machine Learning Research |

Citations: | 277 - 8 self |

### Citations

7184 | Convex Optimization
- Boyd, Vandenberghe
- 2004
(Show Context)
Citation Context ..., strictly convex, and bounded below, and so has a unique minimizer which we denote (v ⋆ (t),w ⋆ (t),u ⋆ (t)). This defines a curve in R × R n × R n , parametrized by t, called the central path. (See =-=Boyd and Vandenberghe, 2004-=-, Chap. 11 for more on the central path and its properties.) With the point (v ⋆ (t),w ⋆ (t),u ⋆ (t)) we associate θ ⋆ (t) = (1/m)(1 − plog(v ⋆ (t),w ⋆ (t))), which can be shown to be dual feasible. (... |

3995 | Regression shrinkage and selection via the lasso
- Tibshirani
- 1996
(Show Context)
Citation Context ...sed in other contexts such as geophysics (Claerbout and Muir, 1973; Taylor et al., 1979; Levy and Fullagar, 1981; Oldenburg et al., 1983). In statistics, it is used in the well-known Lasso algorithm (=-=Tibshirani, 1996-=-) for ℓ1-regularized linear regression, and its extensions such as the fused Lasso (Tibshirani et al., 2005), the grouped Lasso (Kim et al., 2006; Yuan and Lin, 2006; Zhao et al., 2007), and the monot... |

3542 | Compressed sensing - Donoho - 2006 |

3379 |
The elements of statistical learning
- Hastie, Tibshirani, et al.
- 2001
(Show Context)
Citation Context ...th convex optimization problem, and can be solved by a wide variety of methods, such as gradient descent, steepest descent, Newton, quasi-Newton, or conjugate-gradients (CG) methods (see, for example =-=Hastie et al., 2001-=-, § 4.4). Once we find maximum likelihood values of v and w, that is, a solution of (2), we can predict the probability of the two possible outcomes, given a new feature vector x ∈ R n , using the ass... |

3220 | Numerical Optimization
- Nocedal, Wright
- 1999
(Show Context)
Citation Context ...than 10%) at early iterations, and solve it more accurately as the duality gap decreases. This adaptive rule is similar in spirit to standard methods used in inexact and truncated Newton methods (see =-=Nocedal and Wright, 1999-=-). The computational effort of the truncated Newton interior-point algorithm is the product of s, the total number of PCG steps required over all iterations, and the cost of a PCG step, which is O(p),... |

3039 |
Generalized Linear Models
- McCullagh, Nelder
- 1983
(Show Context)
Citation Context ...bution, this ℓ1-regularized classification problem reduces to the ℓ1-regularized probit regression problem. More generally, ℓ1-regularized estimation problems that arise in generalized linear models (=-=McCullagh and Nelder, 1989-=-; Hastie et al., 2001) for binary response variables (which include logistic and probit models) can be formulated problems of the form (19); see Park and Hastie (2006a) for the precise formulation. In... |

2681 | Atomic decomposition by basis pursuit - Chen, Donoho, et al. - 1998 |

2559 | Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information - Candès, Romberg, et al. - 2006 |

1724 | classification of cancer: class discovery and class prediction by gene expression monitoring
- Golub
- 1999
(Show Context)
Citation Context ...e four small or medium standard data sets taken from the UCI machine learning benchmark repository (Newman et al., 1998) and other sources. The first data set is leukemia cancer gene expression data (=-=Golub et al., 1999-=-), the second is colon tumor gene expression data (Alon et al., 1999), the third is ionosphere data (Newman et al., 1998), and the fourth is spambase data (Newman et al., 1998). For each data set, we ... |

1475 | Linear and Nonlinear Programming - Luenberger - 1984 |

1402 | An introduction to compressive sampling
- Candes, Wakin
- 2008
(Show Context)
Citation Context ...and statistics. In signal processing, the idea of ℓ1 regularization comes up in several contexts including basis pursuit denoising [8] and a signal recovery method from incomplete measurements (e.g., =-=[4, 7, 6, 11, 12, 52, 51]-=-). In statistics, the idea of ℓ1 regularization is used in the well-known Lasso algorithm [50] for feature selection and its extensions including the elastic net [60]. Some of these problems do not ha... |

1367 | Decoding by linear programming - Candes, Tao - 2005 |

1360 | Stable signal recovery from incomplete and inaccurate information
- Candes, Romberg, et al.
(Show Context)
Citation Context ...and statistics. In signal processing, the idea of ℓ1 regularization comes up in several contexts including basis pursuit denoising [8] and a signal recovery method from incomplete measurements (e.g., =-=[4, 7, 6, 11, 12, 52, 51]-=-). In statistics, the idea of ℓ1 regularization is used in the well-known Lasso algorithm [50] for feature selection and its extensions including the elastic net [60]. Some of these problems do not ha... |

1286 | Least angle regression - Efron, Hastie, et al. - 2004 |

1126 | Model selection and estimation in regression with grouped variables - Yuan, Lin - 2007 |

1122 |
Nonlinear Programming. Athena Scientific
- Bertsekas
- 1999
(Show Context)
Citation Context ...ity Conditions The objective function of the ℓ1-regularized LRP, lavg(v,w)+λ�w�1, is convex but not differentiable, so we use a first-order optimality condition based on subdifferential calculus (see =-=Bertsekas, 1999-=-, Prop. B.24 or Borwein and Lewis, 2000, §2). The average logistic loss is differentiable, with ∇vlavg(v,w) = (1/m) = −(1/m) m ∑ f i=1 ′ (w T ai + vbi)bi m ∑ i=1 (1 − plog(v,w)i)bi = −(1/m)b T (1 − pl... |

1028 |
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
- Alon
- 1999
(Show Context)
Citation Context ...learning benchmark repository (Newman et al., 1998) and other sources. The first data set is leukemia cancer gene expression data (Golub et al., 1999), the second is colon tumor gene expression data (=-=Alon et al., 1999-=-), the third is ionosphere data (Newman et al., 1998), and the fourth is spambase data (Newman et al., 1998). For each data set, we considered four values of the regularization parameter: λ = 0.5λmax,... |

822 |
Regression models and life-tables
- Cox
- 1972
(Show Context)
Citation Context ...m interior-point method for ℓ1-regularized LRPs to general ℓ1regularized convex (twice differentiable) loss problems. Other possible extensions include ℓ1-regularized Cox proportional hazards models (=-=Cox, 1972-=-). The associated ℓ1-regularized problem does not have the form (19), but the idea behind the custom interior-point method for ℓ1-regularized LRPs can be readily extended. The reader is referred to Pa... |

820 |
UCI repository of machine learning databases
- Newman, Hettich, et al.
- 1998
(Show Context)
Citation Context ...oblems), is available online (www.stanford.edu/˜boyd/ l1_logreg). 4.1 Benchmark Problems The data are four small or medium standard data sets taken from the UCI machine learning benchmark repository (=-=Newman et al., 1998-=-) and other sources. The first data set is leukemia cancer gene expression data (Golub et al., 1999), the second is colon tumor gene expression data (Alon et al., 1999), the third is ionosphere data (... |

752 | Applied Numerical Linear Algebra
- Demmel
- 1997
(Show Context)
Citation Context ...uncated Newton methods have been applied to interior-point methods (see, for example, Vandenberghe and Boyd, 1995 and Portugal et al., 2000). 5.1 Preconditioned Conjugate Gradients The PCG algorithm (=-=Demmel, 1997-=-, §6.6) computes an approximate solution of the linear equations Hx = −g, where H ∈ R N×N is symmetric positive definite. It uses a preconditioner P ∈ R N×N , also symmetric positive definite. 1535sPS... |

732 | An iterative thresholding algorithm for linear inverse problems with a sparsity constraint
- Daubechies, Defrise, et al.
- 2004
(Show Context)
Citation Context ... computational methods for ℓ1-regularized LSPs include coordinate-wise descent methods [19], sequential subspace optimization methods [34], bound optimization methods [17], iterated shrinkage methods =-=[9, 16]-=-, gradient methods [35], and gradient projec4stion algorithms [18]. Some of these methods including the gradient projection algorithms [18] can handle efficiently very large problems. 1.3 Outline The ... |

648 | The adaptive Lasso and its oracle properties - Zou - 2006 |

616 | Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization - Donoho, Elad - 2003 |

600 |
Computer Solution of Large Sparse Positive Definite Systems
- George, Liu
- 1981
(Show Context)
Citation Context ...loiting sparsity in forming tA T D0A + D3, and using a sparse Cholesky factorization to factor Hred, the complexity can be much smaller than O(mn 2 ) flops (see Boyd and Vandenberghe, 2004, App. C or =-=George and Liu, 1981-=-). 3.4.2 FEWER EXAMPLES THAN FEATURES When m ≤ n, that is, there are fewer examples than features, the matrix Hred is a diagonal matrix plus a rank m + 1 matrix, so we can use the Sherman-Morrison-Woo... |

550 | NewsWeeder: Learning to filter netnews
- Lang
- 1995
(Show Context)
Citation Context ...basic interior-point method that uses sparse linear algebra methods, when the regularization parameter is not too small. 5.3.2 A LARGE SPARSE PROBLEM Our next example uses the 20 Newsgroups data set (=-=Lang, 1995-=-). We processed the data set in a way similar to Keerthi and DeCoste (2005). The positive class consists of the 10 groups with names of form sci.*, comp.*, and misc.forsale, and the negative class con... |

520 | Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems
- Figueiredo, Novak, et al.
- 2007
(Show Context)
Citation Context ...e descent methods [19], sequential subspace optimization methods [34], bound optimization methods [17], iterated shrinkage methods [9, 16], gradient methods [35], and gradient projec4stion algorithms =-=[18]-=-. Some of these methods including the gradient projection algorithms [18] can handle efficiently very large problems. 1.3 Outline The main purpose of this paper is to describe a specialized interior-p... |

478 | Just relax: convex programming methods for identifying sparse signals in noise - Tropp - 2006 |

453 | Stable recovery of sparse overcomplete representations in the presence of noise
- Donoho, Elad, et al.
(Show Context)
Citation Context ...and statistics. In signal processing, the idea of ℓ1 regularization comes up in several contexts including basis pursuit denoising [8] and a signal recovery method from incomplete measurements (e.g., =-=[4, 7, 6, 11, 12, 52, 51]-=-). In statistics, the idea of ℓ1 regularization is used in the well-known Lasso algorithm [50] for feature selection and its extensions including the elastic net [60]. Some of these problems do not ha... |

445 | On model selection consistency of lasso - Zhao, Yu - 2006 |

416 |
Statistical inference under order restrictions
- Barlow, Bremner, et al.
- 1972
(Show Context)
Citation Context ...imize �Ax − y�2 2 + λ�x�1 subject to x1 ≤ · · · ≤ xn. This problem is related to the monotone Lasso [24], which is an extension of isotonic regression which has been extensively studied in statistics =-=[1, 43]-=-. The ℓ1-regularized isotonic regression problem arises in several contexts. As an example, the problem of finding monotone trends in a Gaussian setting can be cast as a problem of this form [47, 21].... |

385 |
Minimization methods for nondifferentiable functions
- Shor
- 1985
(Show Context)
Citation Context ...Zhao and Yu (2006), and Zou et al. (2007). To solve the ℓ1-regularized LRP (5), generic methods for nondifferentiable convex problems can be used, such as the ellipsoid method or subgradient methods (=-=Shor, 1985-=-; Polyak, 1987). These methods are usually very slow in practice, however. (Because ℓ1-regularized LR typically results in a weight vector with (many) zero components, we cannot simply ignore the nond... |

317 | Sparsity and smoothness via the fused lasso - TIBSHIRANI, SAUNDERS, et al. - 2005 |

294 | Wavelet Shrinkage: Asymptopia
- Donoho, Johnstone, et al.
- 1993
(Show Context)
Citation Context ...22sAN INTERIOR-POINT METHOD FOR LARGE-SCALE ℓ1-REGULARIZED LOGISTIC REGRESSION 2001), signal recovery from incomplete measurements (Candès et al., 2006, 2005; Donoho, 2006), and wavelet thresholding (=-=Donoho et al., 1995-=-), decoding of linear codes (Candès and Tao, 2005), portfolio optimization (Lobo et al., 2005), controller design (Hassibi et al., 1999), computer-aided design of integrated circuits (Boyd et al., 200... |

279 | Interior Point Algorithms: Theory and Analysis - Ye - 1997 |

271 | A Direct Formulation for Sparse PCA Using Semidefinite Programming - d’Aspremont, Ghaoui, et al. |

264 | Sparse principal component analysis - Zou, Hastie, et al. - 2006 |

255 |
Introduction to Optimization
- Polyak
- 1987
(Show Context)
Citation Context ...(2006), and Zou et al. (2007). To solve the ℓ1-regularized LRP (5), generic methods for nondifferentiable convex problems can be used, such as the ellipsoid method or subgradient methods (Shor, 1985; =-=Polyak, 1987-=-). These methods are usually very slow in practice, however. (Because ℓ1-regularized LR typically results in a weight vector with (many) zero components, we cannot simply ignore the nondifferentiabili... |

239 | A new approach to variable selection in least squares problems - Osborne, Presnell, et al. - 2000 |

196 |
Convex Analysis and Nonlinear Optimization
- Borwein, Lewis
- 2000
(Show Context)
Citation Context ...nction of the ℓ1-regularized LRP, lavg(v,w)+λ�w�1, is convex but not differentiable, so we use a first-order optimality condition based on subdifferential calculus (see Bertsekas, 1999, Prop. B.24 or =-=Borwein and Lewis, 2000-=-, §2). The average logistic loss is differentiable, with ∇vlavg(v,w) = (1/m) = −(1/m) m ∑ f i=1 ′ (w T ai + vbi)bi m ∑ i=1 (1 − plog(v,w)i)bi = −(1/m)b T (1 − plog(v,w)), 1525sand KOH, KIM AND BOYD ∇w... |

196 | The entire regularization path for the support vector machine - Hastie, Rosset, et al. - 2004 |

189 | Sparse multinomial logistic regression: Fast algorithms and generalization bounds - Krishnapuram, Carin, et al. - 2005 |

184 | Large-scale Bayesian logistic regression for text categorization
- Genkin, Lewis, et al.
- 2007
(Show Context)
Citation Context ...an, 2004), bound optimization algorithms (Krishnapuram et al., 2005), online algorithms (Balakrishnan and Madigan, 2006; Perkins and Theiler, 2003), coordinate descent methods (Friedman et al., 2007; =-=Genkin et al., 2006-=-), and the Gauss-Seidel method (Shevade and Keerthi, 2003). Some of these methods can handle very large problems (assuming some sparsity in the data) with modest accuracy. But the additional computati... |

176 | A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/ mccallum/bow - Bow - 1996 |

170 | 1-norm support vector machine
- Zhu, Rosset, et al.
- 2003
(Show Context)
Citation Context ...selection (Wainwright et al., 2007), maximum likelihood estimation of graphical models (Banerjee et al., 2006; Dahl et al., 2005), boosting (Rosset et al., 2004), and ℓ1-norm support vector machines (=-=Zhu et al., 2004-=-). A recent survey of the idea can be found in Tropp (2006). Donoho and Elad (2003) and Tropp (2006) give some theoretical analysis of why ℓ1 regularization leads to a sparse model in linear regressio... |

155 | Basis pursuit
- CHEN
- 1995
(Show Context)
Citation Context ...et al., 2005), the grouped Lasso (Kim et al., 2006; Yuan and Lin, 2006; Zhao et al., 2007), and the monotone Lasso (Hastie et al., 2007). The idea also comes up in signal processing in basis pursuit (=-=Chen and Donoho, 1994-=-; Chen et al., 1522sAN INTERIOR-POINT METHOD FOR LARGE-SCALE ℓ1-REGULARIZED LOGISTIC REGRESSION 2001), signal recovery from incomplete measurements (Candès et al., 2006, 2005; Donoho, 2006), and wavel... |

155 | Bayesian parameter estimation via variational methods - Jaakkola, Jordan - 2000 |

134 | Recovery of exact sparse representations in the presence of bounded noise
- Fuchs
- 2005
(Show Context)
Citation Context ...n the number of observations is smaller than the number of features [15, 50]. Recently, theoretical properties of ℓ1-regularized linear regression have been studied by several researchers; see, e.g., =-=[20, 28, 32, 54, 59, 58]-=-. 3 Optimality conditions and dual problem In this section, we give some preliminaries needed later. 3.1 Optimality conditions The objective function of the ℓ1-regularized LSP (4) is convex but not di... |

133 | Piecewise linear regularized solution paths’, Annals of Statistics - Rosset, Zhu - 2004 |

122 | A tutorial on geometric programming
- Boyd, Kim, et al.
- 2007
(Show Context)
Citation Context ...n to be quite efficient compared to other standard solvers. MOSEK can solve ℓ1-regularized LRPs using the separable convex formulation (9), or by treating the problem as a geometric program (GP) (see =-=Boyd et al., 2006-=-). We used both formulations and report the better results here in each case. MOSEK uses a stopping criterion based on the duality gap, like our method. IRLS-LARS alternates between approximating the ... |

121 |
Modern Regression Methods
- Ryan
- 1997
(Show Context)
Citation Context ...Appendix A. Standardization Standardization is a widely used pre-processing step applied to the feature vector, so that each (transformed) feature has zero mean and unit variance (over the examples) (=-=Ryan, 1997-=-). The mean feature vector is µ = (1/m) ∑ m i=1 xi, and the vector of feature standard deviations σ is defined by σ j = � (1/m) m ∑ i=1 (xi j − µ j) 2 � 1/2 , j = 1,...,n, where xi j is the jth compon... |

119 | Grouped and hierarchical model selection through composite absolute penalties
- Zhao, Rocha, et al.
- 2006
(Show Context)
Citation Context ...Lasso algorithm (Tibshirani, 1996) for ℓ1-regularized linear regression, and its extensions such as the fused Lasso (Tibshirani et al., 2005), the grouped Lasso (Kim et al., 2006; Yuan and Lin, 2006; =-=Zhao et al., 2007-=-), and the monotone Lasso (Hastie et al., 2007). The idea also comes up in signal processing in basis pursuit (Chen and Donoho, 1994; Chen et al., 1522sAN INTERIOR-POINT METHOD FOR LARGE-SCALE ℓ1-REGU... |

113 |
User’s Guide for NPSOL (Version 4.0): a Fortran package for nonlinear programming
- Gill, Murray, et al.
- 1986
(Show Context)
Citation Context ...ers that can directly handle the problem (6) (and therefore, carry out ℓ1-regularized LR) include for example LOQO (Vanderbei, 1997), LANCELOT (Conn et al., 1992), MOSEK (MOSEK ApS, 2002), and NPSOL (=-=Gill et al., 1986-=-). These general purpose solvers can solve small and medium scale ℓ1-regularized LRPs quite effectively. Other recently developed computational methods for ℓ1-regularized logistic regression include t... |

113 |
Nonlinear Optimization
- Ruszczyński
- 2006
(Show Context)
Citation Context ...search direction in Newton’s method is computed approximately, using an iterative method such as PCG, the overall algorithm is called a conjugate gradient Newton method, or a truncated Newton method (=-=Ruszczynski, 2006-=-; Dembo and Steihaug, 1983). Truncated Newton methods have been applied to interior-point methods (see, for example, Vandenberghe and Boyd, 1995 and Portugal et al., 2000). 5.1 Preconditioned Conjugat... |

109 | LANCELOT: A Fortran Package for LargeScale Nonlinear Optimization - Toint - 1992 |

108 | A modified finite Newton method for fast solution of large scale linear SVMs - Keerthi, DeCoste |

108 | A comparison of numerical optimizers for logistic regression,” 2003, http://research.microsoft.com/en-us/um/people/ minka/papers/logreg/minka-logreg.pdf - Minka |

104 | A simple and efficient algorithm for gene selection using sparse logistic regression - Shevade, Keerthi - 2003 |

101 | A primal-dual potential reduction method for problems involving matrix inequalities, Mathematical Programming 69
- Vandenberghe, Boyd
- 1995
(Show Context)
Citation Context ...a conjugate gradient Newton method, or a truncated Newton method (Ruszczynski, 2006; Dembo and Steihaug, 1983). Truncated Newton methods have been applied to interior-point methods (see, for example, =-=Vandenberghe and Boyd, 1995-=- and Portugal et al., 2000). 5.1 Preconditioned Conjugate Gradients The PCG algorithm (Demmel, 1997, §6.6) computes an approximate solution of the linear equations Hx = −g, where H ∈ R N×N is symmetri... |

99 | High-dimensional graphical model selection using ℓ1-regularized logistic regression
- Wainwright, Ravikumar, et al.
- 2007
(Show Context)
Citation Context ...s than one with nonsparse w. It is not surprising that ℓ1-regularized LR can outperform ℓ2-regularized LR, especially when the number of observations is smaller than the number of features (Ng, 2004; =-=Wainwright et al., 2007-=-). We refer to the number of nonzero components in w as its cardinality, denoted card(w). Thus, ℓ1-regularized LR tends to yield w with card(w) small; the regularization parameter λ roughly controls c... |

96 | Trust region Newton method for large-scale logistic regression - Lin, Weng, et al. - 2008 |

94 | Numerical path following - Allgower, Georg - 1997 |

82 | 2000b), `A survey of truncated-newton methods - Nash |

73 | Exponential priors for maximum entropy models
- Goodman
(Show Context)
Citation Context ...Lee et al., 2006; Lokhorst, 1999), a generalized LASSO method (Roth, 2004) that extends the LASSO method proposed in Osborne et al. (2000) to generalized linear models, generalized iterative scaling (=-=Goodman, 2004-=-), bound optimization algorithms (Krishnapuram et al., 2005), online algorithms (Balakrishnan and Madigan, 2006; Perkins and Theiler, 2003), coordinate descent methods (Friedman et al., 2007; Genkin e... |

73 |
A bound optimization approach to wavelet-based image deconvolution
- Figueiredo, Nowak
(Show Context)
Citation Context ...roblems. Other recently developed computational methods for ℓ1-regularized LSPs include coordinate-wise descent methods [19], sequential subspace optimization methods [34], bound optimization methods =-=[17]-=-, iterated shrinkage methods [9, 16], gradient methods [35], and gradient projec4stion algorithms [18]. Some of these methods including the gradient projection algorithms [18] can handle efficiently v... |

72 | The generalized LASSO - Roth - 2004 |

70 |
Fast solution of l1-norm minimization problems when the solution may be sparse
- Donoho, Tsaig
- 2006
(Show Context)
Citation Context ...hms, such as the conjugate gradients (CG) or LSQR algorithm [40], to compute the search step. Recently, several researchers have proposed homotopy methods and variants for solving ℓ1-regularized LSPs =-=[14, 23, 15, 44, 39]-=-. Using the piecewise linear property of the regularization path, path-following methods can compute efficiently the entire solution path in an ℓ1-regularized LSP. When the solution of (15) is extreme... |

68 | Efficient l1 regularized logistic regression
- Lee, Lee, et al.
(Show Context)
Citation Context ... purpose solvers can solve small and medium scale ℓ1-regularized LRPs quite effectively. Other recently developed computational methods for ℓ1-regularized logistic regression include the IRLS method (=-=Lee et al., 2006-=-; Lokhorst, 1999), a generalized LASSO method (Roth, 2004) that extends the LASSO method proposed in Osborne et al. (2000) to generalized linear models, generalized iterative scaling (Goodman, 2004), ... |

65 | Blockwise sparse regression - Kim, Kim, et al. |

62 | Convex optimization techniques for fitting sparse Gaussian graphical models - Banerjee, Ghaoui, et al. - 2006 |

50 | An l1 regularization-path algorithm for generalized linear models. unpublished - Y, Hastie - 2006 |

42 |
Recovery of the acoustic impedance from reflection seismograms
- Oldenburg, Scheuer, et al.
- 1983
(Show Context)
Citation Context ...f model or feature selection (or just sparsity of solution) is quite old, and widely used in other contexts such as geophysics (Claerbout and Muir, 1973; Taylor et al., 1979; Levy and Fullagar, 1981; =-=Oldenburg et al., 1983-=-). In statistics, it is used in the well-known Lasso algorithm (Tibshirani, 1996) for ℓ1-regularized linear regression, and its extensions such as the fused Lasso (Tibshirani et al., 2005), the groupe... |

40 | Image denoising with shrinkage and redundant representations
- Elad, Matalon, et al.
- 2006
(Show Context)
Citation Context ... computational methods for ℓ1-regularized LSPs include coordinate-wise descent methods [19], sequential subspace optimization methods [34], bound optimization methods [17], iterated shrinkage methods =-=[9, 16]-=-, gradient methods [35], and gradient projec4stion algorithms [18]. Some of these methods including the gradient projection algorithms [18] can handle efficiently very large problems. 1.3 Outline The ... |

39 |
Reconstruction of a sparse spike train from a portion of its spectrum and application to high-resolution deconvolution
- Levy, Fullagar
- 1981
(Show Context)
Citation Context ...zation for the purposes of model or feature selection (or just sparsity of solution) is quite old, and widely used in other contexts such as geophysics (Claerbout and Muir, 1973; Taylor et al., 1979; =-=Levy and Fullagar, 1981-=-; Oldenburg et al., 1983). In statistics, it is used in the well-known Lasso algorithm (Tibshirani, 1996) for ℓ1-regularized linear regression, and its extensions such as the fused Lasso (Tibshirani e... |

38 | Online Feature Selection using Grafting
- Perkins, Theiler
- 2003
(Show Context)
Citation Context ...t al. (2000) to generalized linear models, generalized iterative scaling (Goodman, 2004), bound optimization algorithms (Krishnapuram et al., 2005), online algorithms (Balakrishnan and Madigan, 2006; =-=Perkins and Theiler, 2003-=-), coordinate descent methods (Friedman et al., 2007; Genkin et al., 2006), and the Gauss-Seidel method (Shevade and Keerthi, 2003). Some of these methods can handle very large problems (assuming some... |

34 | Tracking curved regularized optimization solution paths
- Rosset
- 2005
(Show Context)
Citation Context ...r vice versa. Using this fact, the entire regularization path in a (small or medium size) ℓ1regularized linear regression problem can be computed efficiently (Hastie et al., 2004; Efron et al., 2004; =-=Rosset, 2005-=-; Rosset and Zhu, 2007; Osborne et al., 2000). These methods are related to numerical continuation techniques for following piecewise smooth curves, which have been well studied in the optimization li... |

31 | S.: ‘Low-authority controller design via convex optimization
- HASSIBI, HOW, et al.
- 1999
(Show Context)
Citation Context ...ndès et al., 2006, 2005; Donoho, 2006), and wavelet thresholding (Donoho et al., 1995), decoding of linear codes (Candès and Tao, 2005), portfolio optimization (Lobo et al., 2005), controller design (=-=Hassibi et al., 1999-=-), computer-aided design of integrated circuits (Boyd et al., 2001), computer vision (Bhusnurmath and Taylor, 2007), sparse principal component analysis (d’Aspremont et al., 2005; Zou et al., 2006), g... |

31 | Primal-dual interior-point methods. Society for Industrial and Applied Mathematics (SIAM - Wright - 1997 |

30 | A truncated primal-infeasible dual-feasible network interior point method Networks 35 - Portugal, Resende, et al. - 2000 |

25 | Forward Stagewise Regression and the Monotone Lasso - Hastie, Taylor, et al. |

25 |
Deconvolution with the l1 norm
- Taylor, Banks, et al.
- 1979
(Show Context)
Citation Context ...l idea of ℓ1 regularization for the purposes of model or feature selection (or just sparsity of solution) is quite old, and widely used in other contexts such as geophysics (Claerbout and Muir, 1973; =-=Taylor et al., 1979-=-; Levy and Fullagar, 1981; Oldenburg et al., 1983). In statistics, it is used in the well-known Lasso algorithm (Tibshirani, 1996) for ℓ1-regularized linear regression, and its extensions such as the ... |

23 |
Robust modeling of erratic data
- Claerbout, Muir
- 1973
(Show Context)
Citation Context ...maller card(w). The general idea of ℓ1 regularization for the purposes of model or feature selection (or just sparsity of solution) is quite old, and widely used in other contexts such as geophysics (=-=Claerbout and Muir, 1973-=-; Taylor et al., 1979; Levy and Fullagar, 1981; Oldenburg et al., 1983). In statistics, it is used in the well-known Lasso algorithm (Tibshirani, 1996) for ℓ1-regularized linear regression, and its ex... |

21 | Algorithms for sparse linear classifiers in the massive data setting - Balakrishnan, Madigan |

21 |
Feature selection, ℓ1 vs. ℓ2 regularization, and rotational invariance
- Ng
- 2004
(Show Context)
Citation Context ...arsimonious than one with nonsparse w. It is not surprising that ℓ1-regularized LR can outperform ℓ2-regularized LR, especially when the number of observations is smaller than the number of features (=-=Ng, 2004-=-; Wainwright et al., 2007). We refer to the number of nonzero components in w as its cardinality, denoted card(w). Thus, ℓ1-regularized LR tends to yield w with card(w) small; the regularization param... |

18 | Design of robust global power and ground networks - Boyd, Vandenberghe, et al. |

17 | Maximum likelihood estimation of gaussian graphical models: numerical implementation and topology selection’, Preprint
- Dahl, Roychowdhury, et al.
- 2005
(Show Context)
Citation Context ...incipal component analysis (d’Aspremont et al., 2005; Zou et al., 2006), graphical model selection (Wainwright et al., 2007), maximum likelihood estimation of graphical models (Banerjee et al., 2006; =-=Dahl et al., 2005-=-), boosting (Rosset et al., 2004), and ℓ1-norm support vector machines (Zhu et al., 2004). A recent survey of the idea can be found in Tropp (2006). Donoho and Elad (2003) and Tropp (2006) give some t... |

17 | Logistic regression for data mining and high-dimensional classification
- Komarek
- 2004
(Show Context)
Citation Context ...l and Wright (1999), and Nash (2000). Other methods that have been used include optimization transfer (Krishnapuram and Hartemink, 2005; Zhang et al., 2004) and iteratively re-weighted least squares (=-=Komarek, 2004-=-). Newton’s method and variants are very effective for small and medium sized problems, while conjugate-gradients and limited memory Newton (or truncated Newton) methods can handle very large problems... |

17 | Interior point methodology for 3-D PET reconstruction
- Johnson, Seidel, et al.
- 2000
(Show Context)
Citation Context ...ms in which there are fast algorithms for the matrix-vector operations with A and A T . Specialized interior-point methods that exploit such algorithms can scale to large problems, as demonstrated in =-=[8, 26]-=-. High-quality implementations of specialized interior-point methods include l1-magic [5] and PDCO [48], which use iterative algorithms, such as the conjugate gradients (CG) or LSQR algorithm [40], to... |

15 | 2006b) Regularization path algorithms for detecting gene interactions - Park, Hastie |

15 |
L1-magic: A Collection of MATLAB Routines for Solving the Convex Optimization Programs Central to Compressive Sampling,” 2006 [Online], Available: www.acm.cal tech.edu/l1magic
- Candès, Romberg
(Show Context)
Citation Context ...alized interior-point methods that exploit such algorithms can scale to large problems, as demonstrated in [8, 26]. High-quality implementations of specialized interior-point methods include l1-magic =-=[5]-=- and PDCO [48], which use iterative algorithms, such as the conjugate gradients (CG) or LSQR algorithm [40], to compute the search step. Recently, several researchers have proposed homotopy methods an... |

13 |
The lasso and generalised linear models
- Lokhorst
- 1999
(Show Context)
Citation Context ...an solve small and medium scale ℓ1-regularized LRPs quite effectively. Other recently developed computational methods for ℓ1-regularized logistic regression include the IRLS method (Lee et al., 2006; =-=Lokhorst, 1999-=-), a generalized LASSO method (Roth, 2004) that extends the LASSO method proposed in Osborne et al. (2000) to generalized linear models, generalized iterative scaling (Goodman, 2004), bound optimizati... |

13 |
2004: Monotonic regression filters for trending gradual deterioration faults
- Gorinevsky
(Show Context)
Citation Context ...s [1, 43]. The ℓ1-regularized isotonic regression problem arises in several contexts. As an example, the problem of finding monotone trends in a Gaussian setting can be cast as a problem of this form =-=[47, 21]-=-. As another example, the problem of estimating accumulating damage trend from a series of structural health monitoring (SHM) images can be formulated as a problem of the form in which the variables a... |

11 | The MOSEK Optimization Tools Version 2.5. User’s Manual and Reference, 2002. Available from www.mosek.com - ApS |

9 | Surrogate maximization/minimization algorithms and extensions
- Zhang, Kwok, et al.
- 2007
(Show Context)
Citation Context ...ple, Luenberger (1984), Lin et al. (2007), Minka (2003), Nocedal and Wright (1999), and Nash (2000). Other methods that have been used include optimization transfer (Krishnapuram and Hartemink, 2005; =-=Zhang et al., 2004-=-) and iteratively re-weighted least squares (Komarek, 2004). Newton’s method and variants are very effective for small and medium sized problems, while conjugate-gradients and limited memory Newton (o... |

5 | Optimal estimation of deterioration from diagnostic image sequence
- Gorinevsky, Kim, et al.
(Show Context)
Citation Context ... accumulating damage trend from a series of structural health monitoring (SHM) images can be formulated as a problem of the form in which the variables are three-dimensional (a time series of images) =-=[22]-=-. 19 ⎤ ⎥ ⎦. (21)sWe take the change of variables z1 = x1, zk = xk − xk−1, k = 2, . . ., n, to reformulate the isotonic regression problem as an ℓ1-regularized LSP with monotonicity constraints: minimi... |

3 |
SparseLab: Seeking Sparse Solutions to Linear Systems of Equations 2006 [Online]. Available: http:// sparselab.stanford.edu
- Donoho, Stodden, et al.
(Show Context)
Citation Context ...PDCO-LSQR is a Matlab implementation of a primal-dual interior-point method [48] that uses the LSQR algorithm [40] to compute the search direction. (PDCO-CHOL and PDCO-LSQR are available in SparseLab =-=[13]-=-, a library of Matlab routines for finding sparse solutions to underdetermined systems.) l1-magic [5] is a Matlab package devoted to compressed sensing problems. PDCO-LSQR and l1-magic implement inter... |

2 | Pathwise coordinate optimization, 2007. Manuscript available fromwww-stat.stanford.edu/˜hastie/pub.htm - Friedman, T, et al. |

2 | Efficient minimization methods of mixed ℓ1-ℓ1 and ℓ2-ℓ1 norms for image restoration - Fu, Ng, et al. |

2 | The Lasso for variable selection in the Cox model - Tibshirani - 1997 |

2 |
Iterative Methods for Linear and NonlinearEquations
- Kelley
- 1995
(Show Context)
Citation Context ...to the Newton system (17). The PCG algorithm uses a preconditioner P ∈ R2n×2n , which is symmetric and positive definite. We will not go into the details of the PCG algorithm, and refer the reader to =-=[27]-=-, [46, §6.7], or [38, §5]. The preconditioner used in the PCG algorithm approximates the Hessian of t�Ax − y�2 2 with its diagonal entries, while retaining the Hessian of the logarithmic barrier Φ(x, ... |

1 |
Solving the graph cut problem via ℓ1 norm minimization
- Bhusnurmath, Taylor
- 2007
(Show Context)
Citation Context ... codes (Candès and Tao, 2005), portfolio optimization (Lobo et al., 2005), controller design (Hassibi et al., 1999), computer-aided design of integrated circuits (Boyd et al., 2001), computer vision (=-=Bhusnurmath and Taylor, 2007-=-), sparse principal component analysis (d’Aspremont et al., 2005; Zou et al., 2006), graphical model selection (Wainwright et al., 2007), maximum likelihood estimation of graphical models (Banerjee et... |