Results 1  10
of
11
The complexity of largescale convex programming under a linear optimization oracle.
, 2013
"... Abstract This paper considers a general class of iterative optimization algorithms, referred to as linearoptimizationbased convex programming (LCP) methods, for solving largescale convex programming (CP) problems. The LCP methods, covering the classic conditional gradient (CG) method (a.k.a., Fra ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Abstract This paper considers a general class of iterative optimization algorithms, referred to as linearoptimizationbased convex programming (LCP) methods, for solving largescale convex programming (CP) problems. The LCP methods, covering the classic conditional gradient (CG) method (a.k.a., FrankWolfe method) as a special case, can only solve a linear optimization subproblem at each iteration. In this paper, we first establish a series of lower complexity bounds for the LCP methods to solve different classes of CP problems, including smooth, nonsmooth and certain saddlepoint problems. We then formally establish the theoretical optimality or nearly optimality, in the largescale case, for the CG method and its variants to solve different classes of CP problems. We also introduce several new optimal LCP methods, obtained by properly modifying Nesterov's accelerated gradient method, and demonstrate their possible advantages over the classic CG for solving certain classes of largescale CP problems.
Conditional gradient sliding for convex optimization
, 2014
"... Abstract In this paper, we present a new conditional gradient type method for convex optimization by utilizing a linear optimization (LO) oracle to minimize a series of linear functions over the feasible set. Different from the classic conditional gradient method, the conditional gradient sliding ( ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract In this paper, we present a new conditional gradient type method for convex optimization by utilizing a linear optimization (LO) oracle to minimize a series of linear functions over the feasible set. Different from the classic conditional gradient method, the conditional gradient sliding (CGS) algorithm developed herein can skip the computation of gradients from time to time, and as a result, can achieve the optimal complexity bounds in terms of not only the number of calls to the LO oracle, but also the number of gradient evaluations. More specifically, we show that the CGS method requires O(1/ √ ) and O(log(1/ )) gradient evaluations, respectively, for solving smooth and strongly convex problems, while still maintaining the optimal O(1/ ) bound on the number of calls to the LO oracle. We also develop variants of the CGS method which can achieve the optimal complexity bounds for solving stochastic optimization problems and an important class of saddle point optimization problems. To the best of our knowledge, this is the first time that these types of projectionfree optimal firstorder methods have been developed in the literature. Some preliminary numerical results have also been provided to demonstrate the advantages of the CGS method.
A smoothing approach for composite conditional gradient with nonsmooth loss
, 2014
"... Abstract We consider learning problems where the nonsmoothness lies both in the convex empirical risk and in the regularization penalty. Examples of such problems include learning with nonsmooth loss functions and atomic decomposition regularization penalty. Such doubly nonsmooth learning problems ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract We consider learning problems where the nonsmoothness lies both in the convex empirical risk and in the regularization penalty. Examples of such problems include learning with nonsmooth loss functions and atomic decomposition regularization penalty. Such doubly nonsmooth learning problems prevent the use of recently proposed composite conditional gradient algorithms for training, which are particularly attractive for largescale applications. Indeed, they rely on the assumption that the empirical risk part of the objective is smooth. We propose a composite conditional gradient algorithm with smoothing to tackle such learning problems. We set up a framework allowing to systematically design parametrized smooth surrogates of nonsmooth loss functions. We then propose a smoothed composite conditional gradient algorithm, for which we prove theoretical guarantees on the accuracy. We present promising experimental results on collaborative filtering tasks.
Efficient Second Order Online Learning by Sketching
"... Abstract We propose Sketched Online Newton (SON), an online second order learning algorithm that enjoys substantially improved regret guarantees for illconditioned data. SON is an enhanced version of the Online Newton Step, which, via sketching techniques enjoys a running time linear in the dimens ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We propose Sketched Online Newton (SON), an online second order learning algorithm that enjoys substantially improved regret guarantees for illconditioned data. SON is an enhanced version of the Online Newton Step, which, via sketching techniques enjoys a running time linear in the dimension and sketch size. We further develop sparse forms of the sketching methods (such as Oja's rule), making the computation linear in the sparsity of features. Together, the algorithm eliminates all computational obstacles in previous second order online learning approaches.
Asynchronous Parallel BlockCoordinate FrankWolfe
"... Abstract We develop minibatched parallel FrankWolfe (conditional gradient) methods for smooth convex optimization subject to blockseparable constraints. Our work includes the basic (batch) FrankWolfe algorithm as well as the recently proposed BlockCoordinate FrankWolfe (BCFW) method [18] as s ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We develop minibatched parallel FrankWolfe (conditional gradient) methods for smooth convex optimization subject to blockseparable constraints. Our work includes the basic (batch) FrankWolfe algorithm as well as the recently proposed BlockCoordinate FrankWolfe (BCFW) method [18] as special cases. Our algorithm permits asynchronous updates within the minibatch, and is robust to stragglers and faulty worker threads. Our analysis reveals how the potential speedups over BCFW depend on the minibatch size and how one can provably obtain large problem dependent speedups. We present several experiments to indicate empirical behavior of our methods, obtaining significant speedups over competing stateoftheart (and synchronous) methods on structural SVMs.
Parallel and Distributed BlockCoordinate FrankWolfe Algorithms
"... Abstract We study parallel and distributed FrankWolfe algorithms; the former on shared memory machines with minibatching, and the latter in a delayed update framework. In both cases, we perform computations asynchronously whenever possible. We assume blockseparable constraints as in BlockCoordi ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We study parallel and distributed FrankWolfe algorithms; the former on shared memory machines with minibatching, and the latter in a delayed update framework. In both cases, we perform computations asynchronously whenever possible. We assume blockseparable constraints as in BlockCoordinate FrankWolfe (BCFW) method
VarianceReduced and ProjectionFree Stochastic Optimization
"... Abstract The FrankWolfe optimization algorithm has recently regained popularity for machine learning applications due to its projectionfree property and its ability to handle structured constraints. However, in the stochastic learning setting, it is still relatively understudied compared to the g ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract The FrankWolfe optimization algorithm has recently regained popularity for machine learning applications due to its projectionfree property and its ability to handle structured constraints. However, in the stochastic learning setting, it is still relatively understudied compared to the gradient descent counterpart. In this work, leveraging a recent variance reduction technique, we propose two stochastic FrankWolfe variants which substantially improve previous results in terms of the number of stochastic gradient evaluations needed to achieve 1 − accuracy. For example, we improve from O( 1 ) to O(ln 1 ) if the objective function is smooth and strongly convex, and from O(
Faster Rates for the FrankWolfe Method over StronglyConvex Sets
, 2015
"... The FrankWolfe method (a.k.a. conditional gradient algorithm) for smooth optimization has regained much interest in recent years in the context of large scale optimization and machine learning. A key advantage of the method is that it avoids projections the computational bottleneck in many appl ..."
Abstract
 Add to MetaCart
The FrankWolfe method (a.k.a. conditional gradient algorithm) for smooth optimization has regained much interest in recent years in the context of large scale optimization and machine learning. A key advantage of the method is that it avoids projections the computational bottleneck in many applications replacing it by a linear optimization step. Despite this advantage, the known convergence rates of the FW method fall behind standard first order methods for most settings of interest. It is an active line of research to derive faster linear optimizationbased algorithms for various settings of convex optimization. In this paper we consider the special case of optimization over strongly convex sets, for which we prove that the vanila FW method converges at a rate of 1t2. This gives a quadratic improvement in convergence rate compared to the general case, in which convergence is of the order 1t, and known to be tight. We show that various balls induced by `p norms, Schatten norms and group norms are strongly convex on one hand and on the other hand, linear optimization over these sets is straightforward and admits a closedform solution. We further show how several previous fastrate results for the FW method follow easily from our analysis.