Results 1  10
of
23
Optimization with firstorder surrogate functions
 In Proceedings of the International Conference on Machine Learning (ICML
, 2013
"... In this paper, we study optimization methods consisting of iteratively minimizing surrogates of an objective function. By proposing several algorithmic variants and simple convergence analyses, we make two main contributions. First, we provide a unified viewpoint for several firstorder optimization ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
(Show Context)
In this paper, we study optimization methods consisting of iteratively minimizing surrogates of an objective function. By proposing several algorithmic variants and simple convergence analyses, we make two main contributions. First, we provide a unified viewpoint for several firstorder optimization techniques such as accelerated proximal gradient, block coordinate descent, or FrankWolfe algorithms. Second, we introduce a new incremental scheme that experimentally matches or outperforms stateoftheart solvers for largescale optimization problems typically arising in machine learning. 1.
New Analysis and Results for the Conditional Gradient Method
, 2013
"... We present new results for the conditional gradient method (also known as the FrankWolfe method). We derive computational guarantees for arbitrary stepsize sequences, which are then applied to various stepsize rules, including simple averaging and constant stepsizes. We also develop stepsize ru ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
We present new results for the conditional gradient method (also known as the FrankWolfe method). We derive computational guarantees for arbitrary stepsize sequences, which are then applied to various stepsize rules, including simple averaging and constant stepsizes. We also develop stepsize rules and computational guarantees that depend naturally on the warmstart quality of the initial (and subsequent) iterates. Our results include computational guarantees for both duality/bound gaps and the socalled Wolfe gaps. Lastly, we present complexity bounds in the presence of approximate computation of gradients and/or linear optimization subproblem solutions.
Duality between subgradient and conditional gradient methods
 HAL00861118, VERSION 1  12 SEP 2013
, 2013
"... Given a convex optimization problem and its dual, there are many possible firstorder ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
Given a convex optimization problem and its dual, there are many possible firstorder
Convex relaxations of structured matrix factorizations
, 2013
"... We consider the factorization of a rectangular matrix X into a positive linear combination of rankone factors of the form uv ⊤ , where u and v belongs to certain sets U and V, that may encode specific structures regarding the factors, such as positivity or sparsity. In this paper, we show that comp ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
We consider the factorization of a rectangular matrix X into a positive linear combination of rankone factors of the form uv ⊤ , where u and v belongs to certain sets U and V, that may encode specific structures regarding the factors, such as positivity or sparsity. In this paper, we show that computing the optimal decomposition is equivalent to computing a certain gauge function of X and we provide a detailed analysis of these gauge functions and their polars. Since these gaugefunctions are typically hard to compute, we present semidefinite relaxations and several algorithms that may recover approximate decompositions with approximation guarantees. We illustrate our results with simulations on finding decompositions with elements in {0,1}. As side contributions, we present a detailed analysis of variational quadratic representations of norms as well as a new iterative basis pursuit algorithm that can deal with inexact firstorder oracles. 1
On Lower Complexity Bounds for LargeScale Smooth Convex Optimization. ArXiv eprints,
, 2014
"... Abstract In this note we present tight lower bounds on the informationbased complexity of largescale smooth convex minimization problems. We demonstrate, in particular, that the kstep Conditional Gradient (a.k.a. FrankWolfe) algorithm as applied to minimizing smooth convex functions over the n ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Abstract In this note we present tight lower bounds on the informationbased complexity of largescale smooth convex minimization problems. We demonstrate, in particular, that the kstep Conditional Gradient (a.k.a. FrankWolfe) algorithm as applied to minimizing smooth convex functions over the ndimensional box with n ≥ k is optimal, up to an O(ln n)factor, in terms of informationbased complexity.
A smoothing approach for composite conditional gradient with nonsmooth loss
, 2014
"... Abstract We consider learning problems where the nonsmoothness lies both in the convex empirical risk and in the regularization penalty. Examples of such problems include learning with nonsmooth loss functions and atomic decomposition regularization penalty. Such doubly nonsmooth learning problems ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract We consider learning problems where the nonsmoothness lies both in the convex empirical risk and in the regularization penalty. Examples of such problems include learning with nonsmooth loss functions and atomic decomposition regularization penalty. Such doubly nonsmooth learning problems prevent the use of recently proposed composite conditional gradient algorithms for training, which are particularly attractive for largescale applications. Indeed, they rely on the assumption that the empirical risk part of the objective is smooth. We propose a composite conditional gradient algorithm with smoothing to tackle such learning problems. We set up a framework allowing to systematically design parametrized smooth surrogates of nonsmooth loss functions. We then propose a smoothed composite conditional gradient algorithm, for which we prove theoretical guarantees on the accuracy. We present promising experimental results on collaborative filtering tasks.
Solving Variational Inequalities with Monotone Operators on Domains Given by Linear Minimization Oracles
, 2015
"... ..."
SemiProximal Mirror Prox Background Key Components SemiMP Algorithm Experiments
"... Firstorder methods for composite minimization minx∈X f (x) + h(x) f and h are convex, f is smooth, h is simple. (Acc)Proximal gradient methods (when h proximalfriendly) Proximal operator: proxh(η) = argminx∈X { 12‖x − η‖22 + h(x)} For example, when h(x) = ‖x‖1, reduces to soft thresholding. Wors ..."
Abstract
 Add to MetaCart
Firstorder methods for composite minimization minx∈X f (x) + h(x) f and h are convex, f is smooth, h is simple. (Acc)Proximal gradient methods (when h proximalfriendly) Proximal operator: proxh(η) = argminx∈X { 12‖x − η‖22 + h(x)} For example, when h(x) = ‖x‖1, reduces to soft thresholding. Worst complexity bound for firstorder oracles is O(1/√). Conditional gradient methods (when h is LMOfriendly) (Composite) linear minimization oracles(LMO): LMOh(η) = argminx∈X{〈η, x〉+ h(x)} For example, when h(x) = ‖x‖nuc or δ‖x‖nuc≤1(x), reduces to computing top pair of singular vectors. Worst (also optimal) complexity bound for LMOs is O(1/).
Asynchronous Parallel BlockCoordinate FrankWolfe
"... Abstract We develop minibatched parallel FrankWolfe (conditional gradient) methods for smooth convex optimization subject to blockseparable constraints. Our work includes the basic (batch) FrankWolfe algorithm as well as the recently proposed BlockCoordinate FrankWolfe (BCFW) method [18] as s ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We develop minibatched parallel FrankWolfe (conditional gradient) methods for smooth convex optimization subject to blockseparable constraints. Our work includes the basic (batch) FrankWolfe algorithm as well as the recently proposed BlockCoordinate FrankWolfe (BCFW) method [18] as special cases. Our algorithm permits asynchronous updates within the minibatch, and is robust to stragglers and faulty worker threads. Our analysis reveals how the potential speedups over BCFW depend on the minibatch size and how one can provably obtain large problem dependent speedups. We present several experiments to indicate empirical behavior of our methods, obtaining significant speedups over competing stateoftheart (and synchronous) methods on structural SVMs.