#### DMCA

## A quasi-newton proximal splitting method

Venue: | In Advances in Neural Information Processing Systems (NIPS |

Citations: | 8 - 0 self |

### Citations

5217 | Convex analysis
- Rockafellar
- 1970
(Show Context)
Citation Context ...ex analysis We here collect some results from convex analysis that are key for our proof. Some lemmata are listed without proof and can be either easily proved or found in standard references such as =-=[31, 1]-=-. A.1 Background Functions Definition 13 (Indicator function). Let C a nonempty subset of H. The indicator function ıC of C is ıC(x) = { 0, if x ∈ C , +∞, otherwise. dom(ıC) = C. Definition 14 (Infima... |

3215 | Numerical optimization
- NOCEDAL, WRIGHT
- 1999
(Show Context)
Citation Context ...c rank-one update that satisfies the secant condition. The inequality 〈sk, yk〉 > 0 is the curvature condition, and it is guaranteed for all strictly convex objectives. Following the recommendation in =-=[26]-=-, we skip updates whenever 〈sk, yk〉 cannot be guaranteed to be non-zero given standard floating-point precision. A value of γ = 0.8 works well in most situations. We have tested picking γ adaptively, ... |

1025 | A fast iterative shrinkage-thresholding algorithm for linear inverse problems
- Beck, Teboulle
- 2009
(Show Context)
Citation Context ...ic choice of step-length tk motivated by quasi-Newton methods. Numerical evidence suggests the SPG/SpaRSA method is highly effective, although convergence results are not as strong as for ISTA. FISTA =-=[7]-=- is a multi-step accelerated version of ISTA inspired by the work of Nesterov. The stepsize t is chosen in a similar way to ISTA; in our implementation, we tweak the original approach by using a Barzi... |

547 | A limited memory algorithm for bound constrained optimization
- Byrd, Lu, et al.
- 1995
(Show Context)
Citation Context ...asi-Newton methods to solve (P) and that extends naturally and elegantly from the unconstrained to the constrained case. Most well-known quasiNewton methods for constrained problems, such as L-BFGS-B =-=[2]-=-, are only applicable to box constraints l ≤ x ≤ u. The power of our approach is that it applies to a wide-variety of useful non-smooth functionals (see §3.1.4 for a list) and that it does not rely on... |

362 | Sparse reconstruction by separable approximation
- Wright, Nowak, et al.
- 2009
(Show Context)
Citation Context ...ts over-relaxation factors α ∈ (0, 1) [3]. The spectral projected gradient (SPG) [4] method was designed as an extension of the BarzilaiBorwein spectral step-length method to constrained problems. In =-=[5]-=-, it was extended to non-smooth problems by allowing general proximity operators; we refer to this as SPG/SpaRSA (N.B. we do not use the SpaRSA implementation since we do not use warm-starts or restar... |

303 |
Two point step size gradient methods
- Barzilai, Borwein
- 1988
(Show Context)
Citation Context ...perators; we refer to this as SPG/SpaRSA (N.B. we do not use the SpaRSA implementation since we do not use warm-starts or restarts, in order to be fair to all algorithms). The Barzilai-Borwein method =-=[6]-=- use a specific choice of step-length tk motivated by quasi-Newton methods. Numerical evidence suggests the SPG/SpaRSA method is highly effective, although convergence results are not as strong as for... |

271 |
Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Space
- Bauschke, L
- 2011
(Show Context)
Citation Context ...t is possible to generalize gradient descent with proximal gradient descent (which includes projected gradient descent as a sub-cases), which is just the application of the forward-backward algorithm =-=[1]-=-. Unlike gradient descent, it is not easy to adapt quasi-Newton and CG methods to problems involving constraints and non-smooth terms. Much work has been written on the topic, and approaches generally... |

257 | Proximal splitting methods in signal processing. FixedPoint Algorithms for
- Combettes, Pesquet
- 2011
(Show Context)
Citation Context ...ithm in (5) is variously known as proximal descent or iterated shrinkage/thresholding algorithm (IST or ISTA). It has a grounded convergence theory, and also admits over-relaxation factors α ∈ (0, 1) =-=[3]-=-. The spectral projected gradient (SPG) [4] method was designed as an extension of the BarzilaiBorwein spectral step-length method to constrained problems. In [5], it was extended to non-smooth proble... |

232 | A sparse matrix arithmetic based on H-matrices. Part I: Introduction to H-matrices. Computing 62
- Hackbusch
- 1999
(Show Context)
Citation Context ...an be solved by our algorithm (see Remark 10). The challenge here is adapting this to a robust quasi-Newton update. For some matrices that are well-approximated by low-rank blocks, such as H-matrices =-=[30]-=-, it may be possible to choose Bk ≡ B to be a fixed preconditioner. Acknowledgments SB would like to acknowledge the Fondation Sciences Mathématiques de Paris for his fellowship. 10 A Elements from c... |

207 | Nonmonotone spectral projected gradient methods on convex sets
- Birgin, Martinez, et al.
- 2000
(Show Context)
Citation Context ...descent or iterated shrinkage/thresholding algorithm (IST or ISTA). It has a grounded convergence theory, and also admits over-relaxation factors α ∈ (0, 1) [3]. The spectral projected gradient (SPG) =-=[4]-=- method was designed as an extension of the BarzilaiBorwein spectral step-length method to constrained problems. In [5], it was extended to non-smooth problems by allowing general proximity operators;... |

168 | Scalable training of L1-regularized log-linear models
- Andrew, Gao
- 2007
(Show Context)
Citation Context ... on the free variables, and shows good results compared to L-BFGS-B, SPG, GENCAN and TRON. We also compare to several active set approaches specialized for `1 penalties: “Orthant-wise Learning” (OWL) =-=[14]-=-, “Projected Scaled Sub-gradient + Active Set” (PSSas) [15], “Fixed-point continuation + Active Set” (FPC AS) [16], and “CG + IST” (CGIST) [17]. Other approaches By transforming the problem into a sta... |

143 |
Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization
- Zhu, Byrd, et al.
- 1997
(Show Context)
Citation Context ...hods take a simple step, such as gradient projection, to identify active variables, and then uses a more advanced quadratic model to solve for the free variables. A well-known such method is L-BFGS-B =-=[2, 11]-=- which handles general box-constrained problems; we test an updated version [12]. A recent bound-constrained solver is ASA [13] which uses a conjugate gradient (CG) solver on the free variables, and s... |

123 |
Fonctions convexes duales et points proximaux dans un espace hilbertien
- Moreau
- 1962
(Show Context)
Citation Context ...SPG to this case). 3 Proximity operators and proximal calculus We only recall essential definitions. More notions and results from convex analysis can be found in §A. Definition 4 (Proximity operator =-=[22]-=-). Let h ∈ Γ0(H). Then, for every x ∈ H, the function z 7→ 1 2 ‖x− z‖2 + h(z) achieves its infimum at a unique point denoted by proxh x. The uniquely-valued operator proxh : H → H thus defined is the ... |

83 | 2007, Fast optimization methods for l1 regularization: A comparative study and two new approaches
- Schmidt, Fung, et al.
(Show Context)
Citation Context ...L-BFGS-B, SPG, GENCAN and TRON. We also compare to several active set approaches specialized for `1 penalties: “Orthant-wise Learning” (OWL) [14], “Projected Scaled Sub-gradient + Active Set” (PSSas) =-=[15]-=-, “Fixed-point continuation + Active Set” (FPC AS) [16], and “CG + IST” (CGIST) [17]. Other approaches By transforming the problem into a standard conic programming problem, the generic problem is ame... |

69 |
Quasi-Newton methods and their applications to function minimisation
- Broyden
- 1967
(Show Context)
Citation Context ... All quasi-Newton methods update an approximation to the (inverse) Hessian that satisfies the secant condition: Hkyk = sk, yk = ∇f(xk)−∇f(xk−1), sk = xk − xk−1 (16) Algorithm 1 follows the SR1 method =-=[24]-=-, which uses a rank-1 update to the inverse Hessian approximation at every step. The SR1 method is perhaps less well-known than BFGS, but it has the crucial property that updates are rank-1, rather th... |

60 |
Diagonal preconditioning for first order primal-dual algorithms in convex optimization
- Pock, Chambolle
(Show Context)
Citation Context ...l diagonal case was considered in several papers in the 1980s as a simple quasi-Newton method, but never widely adapted. More recent 3 attempts include a static choice Bk ≡ B for a primal-dual method =-=[9]-=-. A convergence rate analysis of forward-backward splitting with static and variable Bk where one of the operators is maximal strongly monotone is given in [10]. Active set approaches Active set metho... |

51 | A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization, and continuation
- Wen, Yin, et al.
- 2010
(Show Context)
Citation Context ...ral active set approaches specialized for `1 penalties: “Orthant-wise Learning” (OWL) [14], “Projected Scaled Sub-gradient + Active Set” (PSSas) [15], “Fixed-point continuation + Active Set” (FPC AS) =-=[16]-=-, and “CG + IST” (CGIST) [17]. Other approaches By transforming the problem into a standard conic programming problem, the generic problem is amenable to interior-point methods (IPM). IPM requires sol... |

51 | Optimizing costly functions with simple constraints: A limited-memory projected quasi-newton algorithm
- Schmidt, Berg
(Show Context)
Citation Context ...roximation. Yu et al. [18] propose a non-smooth modification of BFGS and L-BFGS, and test on problems where h is typically a hinge-loss or related function. The projected quasi-Newton (PQN) algorithm =-=[19, 20]-=- is perhaps the most elegant and logical extension of quasi-Newton methods, but it involves solving a sub-iteration. PQN proposes the SPG [4] algorithm for the subproblems, and finds that this is an e... |

48 | Generalized forward-backward splitting
- Raguet, Fadili, et al.
- 2011
(Show Context)
Citation Context ...rices. The proposed method can be extended in several ways. Although we focused on forward-backward splitting, our approach can be easily extended to the new generalized forward-backward algorithm of =-=[29]-=-. However, if we switch to a primal-dual setting, which is desirable because it can handle more complicated objective functionals, updating Bk is non-obvious. Though one can think of non-diagonal pre-... |

37 | A quasi-newton approach to nonsmooth convex optimization problems in machine learning
- Yu, Vishwanathan, et al.
- 2010
(Show Context)
Citation Context ...sampling the Hessian. The main issues are speed and robust stopping criteria for the approximations. Yet another approach is to include the non-smooth h term in the quadratic approximation. Yu et al. =-=[18]-=- propose a non-smooth modification of BFGS and L-BFGS, and test on problems where h is typically a hinge-loss or related function. The projected quasi-Newton (PQN) algorithm [19, 20] is perhaps the mo... |

37 | Practical Methods
- FLETCHER
- 1980
(Show Context)
Citation Context ... uses a square operator A with dimensions n = 133 = 2197 chosen as a 3D discrete differential operator. This example stems from a numerical analysis problem to solve a discretized PDE as suggested by =-=[28]-=-. For this example, we set λ = 1. For all the solvers, we use the same parameters as in the previous example. Unlike the previous example, Fig. 1b now shows that L-BFGS-B is very slow on this problem.... |

35 | Convergence rates in forward-backward splitting
- Chen, Rockafellar
- 1997
(Show Context)
Citation Context ...c choice Bk ≡ B for a primal-dual method [9]. A convergence rate analysis of forward-backward splitting with static and variable Bk where one of the operators is maximal strongly monotone is given in =-=[10]-=-. Active set approaches Active set methods take a simple step, such as gradient projection, to identify active variables, and then uses a more advanced quadratic model to solve for the free variables.... |

34 | A new active set algorithm for box constrained optimization
- Hager, Zhang
(Show Context)
Citation Context ...to solve for the free variables. A well-known such method is L-BFGS-B [2, 11] which handles general box-constrained problems; we test an updated version [12]. A recent bound-constrained solver is ASA =-=[13]-=- which uses a conjugate gradient (CG) solver on the free variables, and shows good results compared to L-BFGS-B, SPG, GENCAN and TRON. We also compare to several active set approaches specialized for ... |

31 | A semismooth Newton method for Tikhonov functionals with sparsity constraints - Griesse, Lorenz |

21 | Adaptive restart for accelerated gradient schemes
- O’Donoghue, Candès
(Show Context)
Citation Context ...he work of Nesterov. The stepsize t is chosen in a similar way to ISTA; in our implementation, we tweak the original approach by using a Barzilai-Borwein step size, a standard line search, and restart=-=[8]-=-, since this led to improved performance. Nesterov acceleration can be viewed as an over-relaxed version of ISTA with a specific, non-constant over-relaxation parameter αk. The above approaches assume... |

14 |
Remark on “Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound constrained optimization
- Morales, Nocedal
- 2011
(Show Context)
Citation Context ... and then uses a more advanced quadratic model to solve for the free variables. A well-known such method is L-BFGS-B [2, 11] which handles general box-constrained problems; we test an updated version =-=[12]-=-. A recent bound-constrained solver is ASA [13] which uses a conjugate gradient (CG) solver on the free variables, and shows good results compared to L-BFGS-B, SPG, GENCAN and TRON. We also compare to... |

10 | Projected Newton-type methods in machine learning
- Schmidt, Kim, et al.
- 2011
(Show Context)
Citation Context ...roximation. Yu et al. [18] propose a non-smooth modification of BFGS and L-BFGS, and test on problems where h is typically a hinge-loss or related function. The projected quasi-Newton (PQN) algorithm =-=[19, 20]-=- is perhaps the most elegant and logical extension of quasi-Newton methods, but it involves solving a sub-iteration. PQN proposes the SPG [4] algorithm for the subproblems, and finds that this is an e... |

8 |
Tackling box-constrained optimization via a new projected quasi-Newton approach
- Kim, Sra, et al.
(Show Context)
Citation Context ...s well as trying H0 to be non-constant on the diagonal, but found no consistent improvements. 5 Numerical experiments and comparisons Consider the unconstrained LASSO problem (1). Many codes, such as =-=[27]-=- and L-BFGS-B [2], handle only non-negativity or box-constraints. Using the standard change of variables by introducing the 8 0 10 20 30 40 50 60 70 80 90 100 11010 −8 10−6 10−4 10−2 100 102 104 time ... |

7 |
Proximal newton-type methods for minimizing convex objective functions in composite form, arXiv preprint arXiv:1206.1623
- Lee, Sun, et al.
- 2012
(Show Context)
Citation Context ...ely much more expensive to evaluate than projecting onto the constraints. Again, the cost of the sub-problem solver (and a suitable stopping criteria for this inner solve) are issues. As discussed in =-=[21]-=-, it is possible to generalize PQN to general non-smooth problems whenever the proximity operator is known (since, as mentioned above, it is possible to extend SPG to this case). 3 Proximity operators... |

2 |
High-order methods for basis pursuit
- Goldstein, Setzer
- 2011
(Show Context)
Citation Context ...cialized for `1 penalties: “Orthant-wise Learning” (OWL) [14], “Projected Scaled Sub-gradient + Active Set” (PSSas) [15], “Fixed-point continuation + Active Set” (FPC AS) [16], and “CG + IST” (CGIST) =-=[17]-=-. Other approaches By transforming the problem into a standard conic programming problem, the generic problem is amenable to interior-point methods (IPM). IPM requires solving a Newton-step equation, ... |

1 |
Seminal papers in nonlinear optimization. In An introduction to algorithms for continuous optimization
- Gould
- 2006
(Show Context)
Citation Context ...n BFGS, but it has the crucial property that updates are rank-1, rather than rank-2, and it is described “[SR1] has now taken its place alongside the BFGS method as the pre-eminent updating formula.” =-=[25]-=-. 7 We propose two important modifications to SR1. The first is to use limited-memory, as is commonly done with BFGS. In particular, we use zero-memory, which means that at every iteration, a new diag... |