#### DMCA

## A linearly convergent conditional gradient algorithm with applications to online and stochastic optimization (2013)

Citations: | 11 - 2 self |

### Citations

898 |
Combinatorial Optimization - Polyhedra and Efficiency
- Schrijver
- 2003
(Show Context)
Citation Context ...ple greedy algorithm for linear optimization, and the flow polytope (convex hull of all s − t paths in a directed acyclic graph) for which linear optimization amounts to finding a minimum-weight path =-=[32]-=-. Other important examples include the set of rotations for which linear optimization is very efficient using Wahba’s algorithm [36], and the bounded cone of positive semidefinite matrices, for which ... |

539 |
Interior-point polynomial algorithms in convex programming
- Nesterov, Nemirovski
- 1994
(Show Context)
Citation Context ...of choice for coping with very large scale optimization tasks. While theoretically attaining inferior convergence rate compared to other efficient optimization algorithms (e.g. interior point methods =-=[31]-=-), modern optimization problems are often so large that using second-order information or other super-linear operations becomes practically infeasible. The computational bottleneck of (sub)gradient de... |

394 | Gradient methods for minimizing composite objective function
- Nesterov
- 2007
(Show Context)
Citation Context ...learning; stochastic optimization AMS subject classifications. 65K05; 90C05; 90C06; 90C25; 90C30; 90C27; 90C15 1. Introduction. First-order optimization methods, such as (sub)gradient-descent methods =-=[35, 29, 30]-=- and conditional-gradient methods [10, 8, 6, 15, 20], are often the method of choice for coping with very large scale optimization tasks. While theoretically attaining inferior convergence rate compar... |

334 |
Problem complexity and method efficiency in optimization
- Nemirovsky, Yudin
- 1983
(Show Context)
Citation Context ...learning; stochastic optimization AMS subject classifications. 65K05; 90C05; 90C06; 90C25; 90C30; 90C27; 90C15 1. Introduction. First-order optimization methods, such as (sub)gradient-descent methods =-=[35, 29, 30]-=- and conditional-gradient methods [10, 8, 6, 15, 20], are often the method of choice for coping with very large scale optimization tasks. While theoretically attaining inferior convergence rate compar... |

309 |
An algorithm for quadratic programming
- FRANK, WOLFE
- 1956
(Show Context)
Citation Context ...classifications. 65K05; 90C05; 90C06; 90C25; 90C30; 90C27; 90C15 1. Introduction. First-order optimization methods, such as (sub)gradient-descent methods [35, 29, 30] and conditional-gradient methods =-=[10, 8, 6, 15, 20]-=-, are often the method of choice for coping with very large scale optimization tasks. While theoretically attaining inferior convergence rate compared to other efficient optimization algorithms (e.g. ... |

298 | Online convex programming and generalized infinitesimal gradient ascent
- ZINKEVICH
(Show Context)
Citation Context ...the domain. In all results we omit the dependencies on constants and the dimension, these dependencies will be fully detailed in the sequel. We also consider the setting of online convex optimization =-=[38, 33, 16, 23]-=-. In this setting, a decision maker is iteratively required to choose a point in a fixed convex decision set. After choosing his point, an adversary chooses some convex function and the decision maker... |

209 | Logarithmic regret algorithms for online convex optimization
- Hazan, Kalai, et al.
- 2006
(Show Context)
Citation Context ...onstruction requires only a single call to the oracle OP . 2.2 Online convex optimization and its application to stochastic and offline optimization In the problem of online convex optimization (OCO) =-=[33, 14, 13]-=-, a decision maker is iteratively required to choose a point xt ∈ K where K is a fixed convex set. After choosing the point xt a convex loss function ft(x) is chosen and the decision maker incurs loss... |

174 | On the generalization ability of on-line learning algorithms
- Cesa-Bianchi, Conconi, et al.
- 2004
(Show Context)
Citation Context ... results for martingales, one can also derive error bounds that hold with high probability and not only in expectation, but these are beyond the scope of this paper. We refer the interested reader to =-=[4]-=- for more details. 2.2.3. Non-smooth optimization. As in stochastic optimization (see previous subsection), an algorithm for OCO also implies an algorithm for offline convex optimization. Thus a condi... |

122 |
Online convex optimization in the bandit setting: gradient descent without a gradient
- Flaxman, Kalai, et al.
- 2005
(Show Context)
Citation Context ...arantees optimal O(log(T )) regret. In the partial information setting the RFTL rule (2.4) with the algorithmic conversion of the bandit problem to that of the full information problem established in =-=[9]-=-, yields an algorithm with regret O(T 3/4), which is the best to date. A Linearly Convergent CG Algorithm with Applications 9 Our algorithms for online optimization are based on iteratively approximat... |

86 | Revisiting Frank-Wolfe: Projection-free sparse convex optimization
- Jaggi
- 2013
(Show Context)
Citation Context ...te choice for the sequence of step sizes {αt}∞t=1, the approximation error strictly decreases on each iteration. This leads to the following theorem (for a proof see for instance the modern survey of =-=[20]-=-). Theorem 2.4. There is an explicit choice for the sequence of step sizes {αt}∞t=1 such that for every t ≥ 2, the iterate xt of Algorithm 1 satisfies that f(xt)− f(x∗) = O ( βD2 t−1 ) . The relativel... |

84 | Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm
- CLARKSON
(Show Context)
Citation Context ...classifications. 65K05; 90C05; 90C06; 90C25; 90C30; 90C27; 90C15 1. Introduction. First-order optimization methods, such as (sub)gradient-descent methods [35, 29, 30] and conditional-gradient methods =-=[10, 8, 6, 15, 20]-=-, are often the method of choice for coping with very large scale optimization tasks. While theoretically attaining inferior convergence rate compared to other efficient optimization algorithms (e.g. ... |

60 |
Online learning and online convex optimization.
- Shalev-Shwartz
- 2011
(Show Context)
Citation Context ...after t linear optimization steps over the domain, omitting constants. In the online setting we give the order of the regret after T rounds. We also consider the setting of online convex optimization =-=[33, 29, 13, 21]-=-. In this setting, a decision maker is iteratively required to choose a point in a fixed convex decision set. After choosing his point, an adversary chooses some convex function and the decision maker... |

58 | Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization.
- Hazan, Kale
- 2011
(Show Context)
Citation Context ...optimal regret bound attainable scales like √ T [5] where T is the length of the game. In the case that all loss functions are strongly convex, the optimal regret bound attainable scales like log(T ) =-=[18]-=-. 2.2.1. Algorithms for OCO. A simple algorithm that attains optimal regret of O( √ T ) for general convex losses is known as the Regularized Follows The Leader algorithm (RFTL) [16]. On time t the al... |

58 | Blockcoordinate Frank-Wolfe optimization for structural SVMs. arXiv preprint arXiv:1207.4747
- Lacoste-Julien, Jaggi, et al.
- 2012
(Show Context)
Citation Context ...efinite cone this means that the solution has low rank). Due to these two properties, conditional gradient methods have attracted much attention in the machine learning community in recent years, see =-=[21, 25, 20, 7, 13, 34, 27, 2]-=-. It is known that in general the convergence rate 1/t is also optimal for this method without further assumptions, as shown in [6, 15, 20]. In case the objective function is both smooth and strongly ... |

58 |
A least squares estimate of satellite attitude
- Wahba
- 1965
(Show Context)
Citation Context ... which linear optimization amounts to finding a minimum-weight path [32]. Other important examples include the set of rotations for which linear optimization is very efficient using Wahba’s algorithm =-=[36]-=-, and the bounded cone of positive semidefinite matrices, for which linear optimization amounts to a leading eigenvector computation whereas projections require cimputing the singular value decomposit... |

48 | A simple algorithm for nuclear norm regularized problems
- Jaggi, Sulovsky
- 2010
(Show Context)
Citation Context ...efinite cone this means that the solution has low rank). Due to these two properties, conditional gradient methods have attracted much attention in the machine learning community in recent years, see =-=[21, 25, 20, 7, 13, 34, 27, 2]-=-. It is known that in general the convergence rate 1/t is also optimal for this method without further assumptions, as shown in [6, 15, 20]. In case the objective function is both smooth and strongly ... |

48 |
Online learning and online convex optimization. Foundations and Trends
- Shalev-Shwartz
- 2011
(Show Context)
Citation Context ...the domain. In all results we omit the dependencies on constants and the dimension, these dependencies will be fully detailed in the sequel. We also consider the setting of online convex optimization =-=[38, 33, 16, 23]-=-. In this setting, a decision maker is iteratively required to choose a point in a fixed convex decision set. After choosing his point, an adversary chooses some convex function and the decision maker... |

46 | The convex optimization approach to regret minimization. Optimization for machine learning 1
- HAZAN
(Show Context)
Citation Context ...after t linear optimization steps over the domain, omitting constants. In the online setting we give the order of the regret after T rounds. We also consider the setting of online convex optimization =-=[33, 29, 13, 21]-=-. In this setting, a decision maker is iteratively required to choose a point in a fixed convex decision set. After choosing his point, an adversary chooses some convex function and the decision maker... |

39 | Sparse approximate solutions to semidefinite programs
- Hazan
- 2008
(Show Context)
Citation Context ...classifications. 65K05; 90C05; 90C06; 90C25; 90C30; 90C27; 90C15 1. Introduction. First-order optimization methods, such as (sub)gradient-descent methods [35, 29, 30] and conditional-gradient methods =-=[10, 8, 6, 15, 20]-=-, are often the method of choice for coping with very large scale optimization tasks. While theoretically attaining inferior convergence rate compared to other efficient optimization algorithms (e.g. ... |

39 | Large-scale convex minimization with a low-rank constraint
- Shwartz, Gonen, et al.
- 2011
(Show Context)
Citation Context ...efinite cone this means that the solution has low rank). Due to these two properties, conditional gradient methods have attracted much attention in the machine learning community in recent years, see =-=[21, 25, 20, 7, 13, 34, 27, 2]-=-. It is known that in general the convergence rate 1/t is also optimal for this method without further assumptions, as shown in [6, 15, 20]. In case the objective function is both smooth and strongly ... |

29 | Lifted coordinate descent for learning with trace-norm regularization.
- Dudik, Harchaoui, et al.
- 2012
(Show Context)
Citation Context |

28 | Linear convergence of a modified Frank-Wolfe algorithm for computing minimum-volume enclosing ellipsoids.
- Ahipasaoglu, Sun, et al.
- 2008
(Show Context)
Citation Context ... set. Here we emphasize that in this work we do not make any assumptions on the location of the optimum in the convex domain and our convergence rates are independent of it. Ahipasaoglu, Sun and Todd =-=[1]-=- gave a variant of the conditional gradient algorithm with away steps that achieves a linear convergence rate for the specific case in 4 A Linearly Convergent CG Algorithm with Applications which the ... |

26 | Projection-free online learning
- Hazan, Kale
- 2012
(Show Context)
Citation Context ...lgorithms for online convex optimization over polyhedral sets that perform only a single linear optimization step over the domain while having optimal regret guarantees, answering an open question of =-=[20, 16]-=-. Our online algorithms also imply conditional gradient algorithms for non-smooth and stochastic convex optimization with the same convergence rates as projected (sub)gradient methods. 1 Introduction ... |

25 | Hedging structured concepts.
- Koolen, Warmuth, et al.
- 2010
(Show Context)
Citation Context ...after t linear optimization steps over the domain, omitting constants. In the online setting we give the order of the regret after T rounds. We also consider the setting of online convex optimization =-=[33, 29, 13, 21]-=-. In this setting, a decision maker is iteratively required to choose a point in a fixed convex decision set. After choosing his point, an adversary chooses some convex function and the decision maker... |

22 |
Convergence theory in nonlinear programming
- Wolfe
- 1970
(Show Context)
Citation Context ...bitrarily bad convergence rate. We note that the suggestion of using “away steps” to accelerate the convergence of the FW algorithm for strongly convex objectives was already made by Wolfe himself in =-=[37]-=-. Beck and Taboule [3] gave a linearly converging conditional gradient algorithm for solving convex linear systems, but as in [12], their convergence rate depends on the distance of the optimum from t... |

20 |
Some comments on Wolfe’s ‘away step
- Guélat, Marcotte
- 1986
(Show Context)
Citation Context ... computing projections. This is also the case with the algorithm for smooth and strongly convex optimization in the recent work of Lan [26]. In case the convex set is a polytope, GuéLat and Marcotte =-=[12]-=- has shown that the algorithm of Frank and Wolfe [10] converges in linear rate assuming that the optimal point in the polytope is bounded away from the boundary. The convergence rate is proportional t... |

20 |
Sparse Convex Optimization Methods for Machine Learning. PhD thesis,
- Jaggi
- 2011
(Show Context)
Citation Context ...to the term ‖pt − xt‖ in the above analysis, that may remain as large as the diameter of P while the term f(xt)− f(x∗) keeps on shrinking, that forces us to choose values of αt that decrease like 1 t =-=[5, 12, 17]-=-. Notice that if f is σ-strongly convex for some σ > 0 then knowing that for some 6 iteration t it holds that f(xt)− f(x∗) ≤ ǫ implies that ‖xt − x∗‖2 ≤ ǫσ . Thus when choosing pt, denoting r = √ ǫ/σ,... |

19 | On the equivalence between herding and conditional gradient algorithms.
- Bach, Lacoste-Julien, et al.
- 2012
(Show Context)
Citation Context ...finite cone this means that the solution has low rank). Due to these two proprieties, conditional gradient methods have attracted much attention in the machine learning community in recent years, see =-=[19, 22, 18, 6, 1, 30, 23, 2]-=-. It is known that in general the convergence rate 1/t is also optimal for this method without further assumptions, as shown in [5, 12, 17]. In case the objective function is both smooth and strongly ... |

15 |
Convergence rates for conditional gradient sequences generated by implicit step length rules
- Dunn
- 1980
(Show Context)
Citation Context |

15 |
A hybrid algorithm for convex semidefinite optimization, in: ICML,
- Laue
- 2012
(Show Context)
Citation Context |

11 |
Tauman Kalai and Santosh Vempala. Efficient algorithms for online decision problems
- Adam
- 2005
(Show Context)
Citation Context ... perform only a single linear optimization step over the domain on each iteration while enjoying optimal regret guarantees in terms of the game length, answering an open question of Kalai and Vempala =-=[22]-=-, and Hazan and Kale [19]. Using existing techniques we give an extension of this algorithm to the partial information setting which obtains the best known regret bound for this setting. Finally, our ... |

11 | The complexity of large-scale convex programming under a linear optimization oracle. arXiv preprint arXiv:1309.5550
- Lan
- 2013
(Show Context)
Citation Context ... problem on each iteration which is computationally equivalent to computing projections. This is also the case with the algorithm for smooth and strongly convex optimization in the recent work of Lan =-=[26]-=-. In case the convex set is a polytope, GuéLat and Marcotte [12] has shown that the algorithm of Frank and Wolfe [10] converges in linear rate assuming that the optimal point in the polytope is bound... |

10 | A conditional gradient method with linear rate of convergence for solving convex linear systems
- Beck, Teboulle
(Show Context)
Citation Context ...ce rate. We note that the suggestion of using “away steps” to accelerate the convergence of the FW algorithm for strongly convex objectives was already made by Wolfe himself in [37]. Beck and Taboule =-=[3]-=- gave a linearly converging conditional gradient algorithm for solving convex linear systems, but as in [12], their convergence rate depends on the distance of the optimum from the boundary of the set... |

8 | An affine invariant linear convergence analysis for Frank-Wolfe algorithms. arXiv.org,
- Lacoste-Julien, Jaggi
- 2013
(Show Context)
Citation Context ...ndeed the technical heart of this work. We also provide convergence rates with detailed dependencies on natural parameters of the problem. After our work first appeared [11], Jaggi and Lacoste-Julien =-=[24]-=- presented a refined analysis of a variant of the conditional gradient algorithm with away steps from [12] that achieves a linear convergence rate without the assumption on the location of the optimum... |

7 |
Kiwiel K C, Ruszcayǹ ski A. Minimization methods for non-differentiable functions
- Shor
- 1985
(Show Context)
Citation Context ...learning; stochastic optimization AMS subject classifications. 65K05; 90C05; 90C06; 90C25; 90C30; 90C27; 90C15 1. Introduction. First-order optimization methods, such as (sub)gradient-descent methods =-=[35, 29, 30]-=- and conditional-gradient methods [10, 8, 6, 15, 20], are often the method of choice for coping with very large scale optimization tasks. While theoretically attaining inferior convergence rate compar... |

6 |
A regularization of the frankwolfe method and unification of certain nonlinear programming methods
- Migdalas
- 1994
(Show Context)
Citation Context ...ist extensions of the basic method which achieve faster rates under various assumptions. One such extension of the conditionalgradient algorithm with linear convergence rate was presented by Migdalas =-=[28]-=-, however the algorithm requires to solve a regularized linear problem on each iteration which is computationally equivalent to computing projections. This is also the case with the algorithm for smoo... |

3 |
Anatoli Juditsky, and Arkadi Nemirovski. Conditional gradient algorithms for norm-regularized smooth convex optimization
- Harchaoui
(Show Context)
Citation Context ...ditive error after O(ǫ−2) linear optimization steps over the domain and O(ǫ−2) calls to the subgradient oracle. Also relevant to our work is the very recent work of Harchaoui, Juditsky and Nemirovski =-=[14]-=- who give methods for i) minimizing a norm over the intersection of a cone and the level set of a convex smooth function and ii) minimizing the sum of a convex smooth function and a multiple of a norm... |

1 |
34 A Linearly Convergent CG Algorithm with Applications
- Cesa-Bianchi, Lugosi, et al.
- 2006
(Show Context)
Citation Context ...r the online setting in the special case in which all loss functions are linear, also known as online linear optimization. In this setting their algorithm achieves regret of O( √ T ) which is optimal =-=[5]-=-. On iteration t their algorithm plays a point in the decision set that minimizes the cumulative loss on all previous iterations plus a vector whose entries are independent random variables. The work ... |

1 |
Miroslav Dud́ık, and Jérôme Malick, Large-scale image classification with trace-norm regularization
- Harchaoui, Douze, et al.
(Show Context)
Citation Context |

1 |
Jyrki Kivinen, Hedging structured concepts
- Koolen, Warmuth
(Show Context)
Citation Context ...the domain. In all results we omit the dependencies on constants and the dimension, these dependencies will be fully detailed in the sequel. We also consider the setting of online convex optimization =-=[38, 33, 16, 23]-=-. In this setting, a decision maker is iteratively required to choose a point in a fixed convex decision set. After choosing his point, an adversary chooses some convex function and the decision maker... |