Results 1 - 10
of
11
A linearly convergent conditional gradient algorithm with applications to online and stochastic optimization
, 2013
"... Linear optimization is many times algorithmically simpler than non-linear convex optimization. Linear optimization over matroid polytopes, matching polytopes and path polytopes are example of problems for which we have simple and efficient combinatorial algorithms, but whose non-linear convex count ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
(Show Context)
Linear optimization is many times algorithmically simpler than non-linear convex optimization. Linear optimization over matroid polytopes, matching polytopes and path polytopes are example of problems for which we have simple and efficient combinatorial algorithms, but whose non-linear convex counterpart is harder and admits significantly less efficient algorithms. This motivates the computational model of convex optimization, including the offline, online and stochastic settings, using a linear optimization oracle. In this computational model we give several new results that improve over the previous state-of-the-art. Our main result is a novel conditional gradient algo-rithm for smooth and strongly convex optimization over polyhedral sets that performs only a single linear optimization step over the domain on each iteration and enjoys a linear convergence rate. This gives an exponential improvement in convergence rate over previous results. Based on this new conditional gradient algorithm we give the first algorithms for online convex optimization over polyhedral sets that perform only a single linear optimization step over the domain while having optimal regret guarantees, answering an open question of Kalai and Vempala, and Hazan and Kale. Our online algorithms also imply conditional gradient algorithms for non-smooth and stochastic convex optimization with the same convergence rates as projected (sub)gradient methods. Key words. frank-wolfe algorithm; conditional gradient methods; linear programming; first-order methods; online convex optimization; online learning; stochastic optimization AMS subject classifications. 65K05; 90C05; 90C06; 90C25; 90C30; 90C27; 90C15
New Analysis and Results for the Conditional Gradient Method
, 2013
"... We present new results for the conditional gradient method (also known as the Frank-Wolfe method). We derive computational guarantees for arbitrary step-size sequences, which are then applied to various step-size rules, including simple averaging and constant step-sizes. We also develop step-size ru ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
(Show Context)
We present new results for the conditional gradient method (also known as the Frank-Wolfe method). We derive computational guarantees for arbitrary step-size sequences, which are then applied to various step-size rules, including simple averaging and constant step-sizes. We also develop step-size rules and computational guarantees that depend naturally on the warm-start quality of the initial (and subsequent) iterates. Our results include computational guarantees for both duality/bound gaps and the so-called Wolfe gaps. Lastly, we present complexity bounds in the presence of approximate computation of gradients and/or linear optimization subproblem solutions.
On Lower Complexity Bounds for Large-Scale Smooth Convex Optimization. ArXiv e-prints,
, 2014
"... Abstract In this note we present tight lower bounds on the information-based complexity of large-scale smooth convex minimization problems. We demonstrate, in particular, that the k-step Conditional Gradient (a.k.a. Frank-Wolfe) algorithm as applied to minimizing smooth convex functions over the n- ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract In this note we present tight lower bounds on the information-based complexity of large-scale smooth convex minimization problems. We demonstrate, in particular, that the k-step Conditional Gradient (a.k.a. Frank-Wolfe) algorithm as applied to minimizing smooth convex functions over the n-dimensional box with n ≥ k is optimal, up to an O(ln n)-factor, in terms of information-based complexity.
Conditional gradient sliding for convex optimization
, 2014
"... Abstract In this paper, we present a new conditional gradient type method for convex optimization by utilizing a linear optimization (LO) oracle to minimize a series of linear functions over the feasible set. Different from the classic conditional gradient method, the conditional gradient sliding ( ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract In this paper, we present a new conditional gradient type method for convex optimization by utilizing a linear optimization (LO) oracle to minimize a series of linear functions over the feasible set. Different from the classic conditional gradient method, the conditional gradient sliding (CGS) algorithm developed herein can skip the computation of gradients from time to time, and as a result, can achieve the optimal complexity bounds in terms of not only the number of calls to the LO oracle, but also the number of gradient evaluations. More specifically, we show that the CGS method requires O(1/ √ ) and O(log(1/ )) gradient evaluations, respectively, for solving smooth and strongly convex problems, while still maintaining the optimal O(1/ ) bound on the number of calls to the LO oracle. We also develop variants of the CGS method which can achieve the optimal complexity bounds for solving stochastic optimization problems and an important class of saddle point optimization problems. To the best of our knowledge, this is the first time that these types of projection-free optimal first-order methods have been developed in the literature. Some preliminary numerical results have also been provided to demonstrate the advantages of the CGS method.
S.: Iteration bounds for finding -stationary points of structured nonconvex optimization. Working Paper
, 2014
"... In this paper we study proximal conditional-gradient (CG) and proximal gradient-projection type algorithms for a block-structured constrained nonconvex optimization model, which arises naturally from tensor data analysis. First, we introduce a new notion of -stationarity, which is suitable for the s ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
In this paper we study proximal conditional-gradient (CG) and proximal gradient-projection type algorithms for a block-structured constrained nonconvex optimization model, which arises naturally from tensor data analysis. First, we introduce a new notion of -stationarity, which is suitable for the structured problem under consideration. We then propose two types of first-order algorithms for the model based on the proximal conditional-gradient (CG) method and the proximal gradient-projection method respectively. If the nonconvex objective function is in the form of mathematical expectation, we then discuss how to incorporate randomized sampling to avoid computing the expectations exactly. For the general block optimization model, the proximal subroutines are performed for each block according to either the block-coordinate-descent (BCD) or the maximum-block-improvement (MBI) updating rule. If the gradient of the nonconvex part of the objective f satisfies ‖∇f(x) − ∇f(y)‖q ≤ M‖x − y‖δp where δ = p/q with 1/p + 1/q = 1, then we prove that the new algorithms have an overall iteration complexity bound of O(1/q) in finding an -stationary solution. If f is concave then the iteration complexity reduces to O(1/). Our numerical experiments for tensor approximation problems show promising performances of the new solution algorithms.
Unifying lower bounds on the oracle complexity of nonsmooth convex optimization via information theory
, 2014
"... ..."
Suykens, “Hybrid conditional gradient-smoothing algorithms with applications to sparse and low rank regularization
- Regularization, Optimization, Kernels, and Support Vector Machines
, 2014
"... Conditional gradient methods are old and well studied optimization algorithms. Their origin dates at least to the 50’s and the Frank-Wolfe algorithm for quadratic programming [18] but they apply to much more general optimization problems. General formulations of conditional gradient algorithms have ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Conditional gradient methods are old and well studied optimization algorithms. Their origin dates at least to the 50’s and the Frank-Wolfe algorithm for quadratic programming [18] but they apply to much more general optimization problems. General formulations of conditional gradient algorithms have been studied in the
Semi-Proximal Mirror Prox Background Key Components Semi-MP Algorithm Experiments
"... First-order methods for composite minimization minx∈X f (x) + h(x) f and h are convex, f is smooth, h is simple. (Acc)Proximal gradient methods (when h proximal-friendly) Proximal operator: proxh(η) = argminx∈X { 12‖x − η‖22 + h(x)} For example, when h(x) = ‖x‖1, reduces to soft thresholding. Wors ..."
Abstract
- Add to MetaCart
First-order methods for composite minimization minx∈X f (x) + h(x) f and h are convex, f is smooth, h is simple. (Acc)Proximal gradient methods (when h proximal-friendly) Proximal operator: proxh(η) = argminx∈X { 12‖x − η‖22 + h(x)} For example, when h(x) = ‖x‖1, reduces to soft thresholding. Worst complexity bound for first-order oracles is O(1/√). Conditional gradient methods (when h is LMO-friendly) (Composite) linear minimization oracles(LMO): LMOh(η) = argminx∈X{〈η, x〉+ h(x)} For example, when h(x) = ‖x‖nuc or δ‖x‖nuc≤1(x), reduces to computing top pair of singular vectors. Worst (also optimal) complexity bound for LMOs is O(1/).