#### DMCA

## On Lower Complexity Bounds for Large-Scale Smooth Convex Optimization (2014)

Citations: | 5 - 1 self |

### Citations

539 | Introductory lectures on convex optimization: a basic course. Number 87 in Applied Optimization - Nesterov - 2003 |

518 | Smooth minimization of non-smooth functions
- Nesterov
(Show Context)
Citation Context ... function with Lipschitz continuous gradients; a novelty here, if any, stems from the fact that we need Lipschitz continuity of the gradient w.r.t. a given, not necessarily Euclidean, norm, while the standard Moreau envelope technique is adjusted to the case of the Euclidean norm2). Section 4 contains the main result of this note – a lower bound on information-based complexity of smooth convex minimization. 1)For the O(1)L/t2 lower risk bound, see [4]; an O(1)L/t2 upper risk bound in the case in question is achieved by the celebrated Nesterov’s optimal algorithm for smooth convex minimization [6, 7]. 2)It well may happen that the extensions of the classical Moreau results which we present in section 3 are known, so that the material in this section does not pretend to be novel. This being said, at this point in time we do not have at our disposal references to the results on smoothing we need, and therefore we decided to augment these simple results with their proofs, in order to make our presentation self-contained. 2 2 The problem Let E be an n-dimensional Euclidean space, and ‖ · ‖ be a norm on E (not necessary the Euclidean one). Let, further, X be a nonempty closed and bounded conve... |

431 | The volume of convex bodies and Banach space geometry - Pisier - 1999 |

334 |
Problem complexity and method efficiency in optimization
- Nemirovsky, Yudin
- 1983
(Show Context)
Citation Context ...e function of k = 1, 2, ... defined by RiskF,X,O(k) = inf M [ RiskM(k) := sup f∈F [f(xk(M, f))−Opt(f)] ] , where the right hand side infinum is taken over all k-step solution algorithmsM utilizing oracle O. The inverse to the risk function CF,X,O(ε) = min {k : RiskF,X,O(k) ≤ ε} ∗The research was supported by the NSF grant CMMI-1232623. †cguzman@gatech.edu ‡arkadi.nemirovski@isye.gatech.edu 1 for ε > 0 is called the information-based complexity of the family FX taken with respect to oracle O. The standard reference on information-based complexity of various broad families of convex programs is [4]; for some recent developments, see [8, 1] and references therein. This note is primarily motivated by recently renewed interest in the Conditional Gradient (a.k.a. FrankWolfe) method, originating from [3], for solving problems (Pf,X) with smooth convex objectives f . This method utilizes the standard first order oracle (O(f, x) = (f(x),∇f(x)) and possesses two remarkable features: 1. Dimension-independent sublinear convergence rate depending solely on t and of the properly measured smoothness parameters of f . Specifically, assuming w.l.o.g. that X linearly spans E, the set 12 [X −X] is the u... |

309 |
An algorithm for quadratic programming
- FRANK, WOLFE
- 1956
(Show Context)
Citation Context .... The inverse to the risk function CF,X,O(ε) = min {k : RiskF,X,O(k) ≤ ε} ∗The research was supported by the NSF grant CMMI-1232623. †cguzman@gatech.edu ‡arkadi.nemirovski@isye.gatech.edu 1 for ε > 0 is called the information-based complexity of the family FX taken with respect to oracle O. The standard reference on information-based complexity of various broad families of convex programs is [4]; for some recent developments, see [8, 1] and references therein. This note is primarily motivated by recently renewed interest in the Conditional Gradient (a.k.a. FrankWolfe) method, originating from [3], for solving problems (Pf,X) with smooth convex objectives f . This method utilizes the standard first order oracle (O(f, x) = (f(x),∇f(x)) and possesses two remarkable features: 1. Dimension-independent sublinear convergence rate depending solely on t and of the properly measured smoothness parameters of f . Specifically, assuming w.l.o.g. that X linearly spans E, the set 12 [X −X] is the unit ball of certain, depending solely on X, norm on E; we denote this norm ‖ · ‖X , and its conjugate norm by ‖ · ‖X,∗. Assuming that the objective f in (Pf,X) is convex and (κ, L) smooth on X, meaning tha... |

295 |
A method of solving a convex programming problem with convergence rate
- Nesterov
- 1983
(Show Context)
Citation Context ... function with Lipschitz continuous gradients; a novelty here, if any, stems from the fact that we need Lipschitz continuity of the gradient w.r.t. a given, not necessarily Euclidean, norm, while the standard Moreau envelope technique is adjusted to the case of the Euclidean norm2). Section 4 contains the main result of this note – a lower bound on information-based complexity of smooth convex minimization. 1)For the O(1)L/t2 lower risk bound, see [4]; an O(1)L/t2 upper risk bound in the case in question is achieved by the celebrated Nesterov’s optimal algorithm for smooth convex minimization [6, 7]. 2)It well may happen that the extensions of the classical Moreau results which we present in section 3 are known, so that the material in this section does not pretend to be novel. This being said, at this point in time we do not have at our disposal references to the results on smoothing we need, and therefore we decided to augment these simple results with their proofs, in order to make our presentation self-contained. 2 2 The problem Let E be an n-dimensional Euclidean space, and ‖ · ‖ be a norm on E (not necessary the Euclidean one). Let, further, X be a nonempty closed and bounded conve... |

226 | Fundaments of convex analysis - Hiriart-Urruty, Lemaréchal - 2004 |

86 | Revisiting Frank-Wolfe: Projection-free sparse convex optimization - Jaggi - 2013 |

84 | Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm - CLARKSON |

74 | Information-theoretic lower bounds on the oracle complexity of convex optimization.
- Agarwal, Bartlett, et al.
- 2010
(Show Context)
Citation Context ...RiskF,X,O(k) = inf M [ RiskM(k) := sup f∈F [f(xk(M, f))−Opt(f)] ] , where the right hand side infinum is taken over all k-step solution algorithmsM utilizing oracle O. The inverse to the risk function CF,X,O(ε) = min {k : RiskF,X,O(k) ≤ ε} ∗The research was supported by the NSF grant CMMI-1232623. †cguzman@gatech.edu ‡arkadi.nemirovski@isye.gatech.edu 1 for ε > 0 is called the information-based complexity of the family FX taken with respect to oracle O. The standard reference on information-based complexity of various broad families of convex programs is [4]; for some recent developments, see [8, 1] and references therein. This note is primarily motivated by recently renewed interest in the Conditional Gradient (a.k.a. FrankWolfe) method, originating from [3], for solving problems (Pf,X) with smooth convex objectives f . This method utilizes the standard first order oracle (O(f, x) = (f(x),∇f(x)) and possesses two remarkable features: 1. Dimension-independent sublinear convergence rate depending solely on t and of the properly measured smoothness parameters of f . Specifically, assuming w.l.o.g. that X linearly spans E, the set 12 [X −X] is the unit ball of certain, depending solely on X... |

53 | An algorithm for quadratic programming. Naval research logistics quarterly - Frank, Wolfe - 1956 |

39 | Sparse approximate solutions to semidefinite programs - Hazan - 2008 |

23 | Conditional gradient algorithms for normregularized smooth convex optimization. - Harchaoui, Juditsky, et al. - 2014 |

20 | Sparse Convex Optimization Methods for Machine Learning. PhD thesis, - Jaggi - 2011 |

17 | Approximate Methods in Optimization Problems. - Demyanov, Rubinov - 1970 |

13 | Information-based complexity, feedback and dynamics in convex programming
- Raginsky, Rakhlin
(Show Context)
Citation Context ...RiskF,X,O(k) = inf M [ RiskM(k) := sup f∈F [f(xk(M, f))−Opt(f)] ] , where the right hand side infinum is taken over all k-step solution algorithmsM utilizing oracle O. The inverse to the risk function CF,X,O(ε) = min {k : RiskF,X,O(k) ≤ ε} ∗The research was supported by the NSF grant CMMI-1232623. †cguzman@gatech.edu ‡arkadi.nemirovski@isye.gatech.edu 1 for ε > 0 is called the information-based complexity of the family FX taken with respect to oracle O. The standard reference on information-based complexity of various broad families of convex programs is [4]; for some recent developments, see [8, 1] and references therein. This note is primarily motivated by recently renewed interest in the Conditional Gradient (a.k.a. FrankWolfe) method, originating from [3], for solving problems (Pf,X) with smooth convex objectives f . This method utilizes the standard first order oracle (O(f, x) = (f(x),∇f(x)) and possesses two remarkable features: 1. Dimension-independent sublinear convergence rate depending solely on t and of the properly measured smoothness parameters of f . Specifically, assuming w.l.o.g. that X linearly spans E, the set 12 [X −X] is the unit ball of certain, depending solely on X... |

11 | The complexity of large-scale convex programming under a linear optimization oracle. arXiv preprint arXiv:1309.5550 - Lan - 2013 |

10 | Information-based complexity of linear operator equations. - Nemirovski - 1992 |

10 | Universal gradient methods for convex optimization problems. - Nesterov - 2015 |

7 | Dual subgradient algorithms for large-scale nonsmooth learning problems. - Cox, Juditsky, et al. - 2013 |

7 | Efficient Methods - Nemirovski - 1994 |

2 | A polynomial time conditional gradient algorithm with applications to online and stochastic optimization - Garber, Hazan - 2013 |

2 |
Optimal methods for smooth convex optimization” (in Russian) – Jurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki
- Nemirovskii, Nesterov
- 1985
(Show Context)
Citation Context ... by O(1)/(t lnn), so that the Conditional Gradient algorithm is in this case nearly (up to an O(lnn) factor) optimal in terms of information-based complexity. In fact, in what follows we provide tight lower bounds on information-based complexity if minimizing (κ, L)- smooth functions on ‖ · ‖p-balls, where 2 ≤ p ≤ ∞, so that the just mentioned lower complexity bound for smooth convex minimization over the unit box is the special case p = ∞ of the bounds to be presented. It should be mentioned that for the case of p <∞ these bounds, obtained by the second author of this note, were announced in [5, 2]; however, aside of the very special case of p = 2, the highly technical original proofs of the bounds were never published. Motivated as explained above, we recently have revisited the original proofs and were able to simplify them dramatically, thus making them publishable. The rest of this note is organized as follows. In section 2 we restate in full details the problem we are interested in. Section 3 is devoted to the main component of our construction – a Moreau-type scheme for approximating a convex Lipschitz continuous function by a convex function with Lipschitz continuous gradients; a... |

1 | On optimality of Krylov’s information when solving linear operator equations - Nemirovski - 1991 |

1 | Optimal methods of smooth convex optimization (in - Nemirovskii, Nesterov - 1985 |

1 | Optimal methods for the solution of large-scale convex programming problems - Khachiyan, Nemirovski, et al. - 1993 |

1 |
Modern Mathematical Methods in Optimization.
- Elster
- 1993
(Show Context)
Citation Context ... by O(1)/(t lnn), so that the Conditional Gradient algorithm is in this case nearly (up to an O(lnn) factor) optimal in terms of information-based complexity. In fact, in what follows we provide tight lower bounds on information-based complexity if minimizing (κ, L)- smooth functions on ‖ · ‖p-balls, where 2 ≤ p ≤ ∞, so that the just mentioned lower complexity bound for smooth convex minimization over the unit box is the special case p = ∞ of the bounds to be presented. It should be mentioned that for the case of p <∞ these bounds, obtained by the second author of this note, were announced in [5, 2]; however, aside of the very special case of p = 2, the highly technical original proofs of the bounds were never published. Motivated as explained above, we recently have revisited the original proofs and were able to simplify them dramatically, thus making them publishable. The rest of this note is organized as follows. In section 2 we restate in full details the problem we are interested in. Section 3 is devoted to the main component of our construction – a Moreau-type scheme for approximating a convex Lipschitz continuous function by a convex function with Lipschitz continuous gradients; a... |