Results 1  10
of
29
Revisiting frankwolfe: Projectionfree sparse convex optimization
 In ICML
, 2013
"... We provide stronger and more general primaldual convergence results for FrankWolfetype algorithms (a.k.a. conditional gradient) for constrained convex optimization, enabled by a simple framework of duality gap certificates. Our analysis also holds if the linear subproblems are only solved approxi ..."
Abstract

Cited by 86 (2 self)
 Add to MetaCart
(Show Context)
We provide stronger and more general primaldual convergence results for FrankWolfetype algorithms (a.k.a. conditional gradient) for constrained convex optimization, enabled by a simple framework of duality gap certificates. Our analysis also holds if the linear subproblems are only solved approximately (as well as if the gradients are inexact), and is proven to be worstcase optimal in the sparsity of the obtained solutions. On the application side, this allows us to unify a large variety of existing sparse greedy methods, in particular for optimization over convex hulls of an atomic set, even if those sets can only be approximated, including sparse (or structured sparse) vectors or matrices, lowrank matrices, permutation matrices, or maxnorm bounded matrices. We present a new general framework for convex optimization over matrix factorizations, where every FrankWolfe iteration will consist of a lowrank update, and discuss the broad application areas of this approach. 1.
Conditional gradient algorithms for normregularized smooth convex optimization
, 2013
"... Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone K, a norm ‖ · ‖ and a smooth convex function f, we want either 1) to minimize the norm over the intersection of the cone and a level set of f, or 2) to minimiz ..."
Abstract

Cited by 23 (6 self)
 Add to MetaCart
(Show Context)
Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone K, a norm ‖ · ‖ and a smooth convex function f, we want either 1) to minimize the norm over the intersection of the cone and a level set of f, or 2) to minimize over the cone the sum of f and a multiple of the norm. We focus on the case where (a) the dimension of the problem is too large to allow for interior point algorithms, (b) ‖ · ‖ is “too complicated ” to allow for computationally cheap Bregman projections required in the firstorder proximal gradient algorithms. On the other hand, we assume that it is relatively easy to minimize linear forms over the intersection of K and the unit ‖ · ‖ball. Motivating examples are given by the nuclear norm with K being the entire space of matrices, or the positive semidefinite cone in the space of symmetric matrices, and the Total Variation norm on the space of 2D images. We discuss versions of the Conditional Gradient algorithm capable to handle our problems of interest, provide the related theoretical efficiency estimates and outline some applications. 1
Largescale image classification with tracenorm regularization
 IEEE Conference on Computer Vision & Pattern Recognition (CVPR
, 2012
"... With the advent of larger image classification datasets such as ImageNet, designing scalable and efficient multiclass classification algorithms is now an important challenge. We introduce a new scalable learning algorithm for largescale multiclass image classification, based on the multinomial l ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
With the advent of larger image classification datasets such as ImageNet, designing scalable and efficient multiclass classification algorithms is now an important challenge. We introduce a new scalable learning algorithm for largescale multiclass image classification, based on the multinomial logistic loss and the tracenorm regularization penalty. Reframing the challenging nonsmooth optimization problem into a surrogate infinitedimensional optimization problem with a regular `1regularization penalty, we propose a simple and provably efficient accelerated coordinate descent algorithm. Furthermore, we show how to perform efficient matrix computations in the compressed domain for quantized dense visual features, scaling up to 100,000s examples, 1,000sdimensional features, and 100s of categories. Promising experimental results on the “Fungus”, “Ungulate”, and “Vehicles ” subsets of ImageNet are presented, where we show that our approach performs significantly better than stateoftheart approaches for Fisher vectors with 16 Gaussians. 1.
Accelerated Training for Matrixnorm Regularization: A Boosting Approach
"... Sparse learning models typically combine a smooth loss with a nonsmooth penalty, such as trace norm. Although recent developments in sparse approximation have offered promising solution methods, current approaches either apply only to matrixnorm constrained problems or provide suboptimal convergenc ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
(Show Context)
Sparse learning models typically combine a smooth loss with a nonsmooth penalty, such as trace norm. Although recent developments in sparse approximation have offered promising solution methods, current approaches either apply only to matrixnorm constrained problems or provide suboptimal convergence rates. In this paper, we propose a boosting method for regularized learning that guarantees ɛ accuracy within O(1/ɛ) iterations. Performance is further accelerated by interlacing boosting with fixedrank local optimization—exploiting a simpler local objective than previous work. The proposed method yields stateoftheart performance on largescale problems. We also demonstrate an application to latent multiview learning for which we provide the first efficient weakoracle. 1
A linearly convergent conditional gradient algorithm with applications to online and stochastic optimization
, 2013
"... Linear optimization is many times algorithmically simpler than nonlinear convex optimization. Linear optimization over matroid polytopes, matching polytopes and path polytopes are example of problems for which we have simple and efficient combinatorial algorithms, but whose nonlinear convex count ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
Linear optimization is many times algorithmically simpler than nonlinear convex optimization. Linear optimization over matroid polytopes, matching polytopes and path polytopes are example of problems for which we have simple and efficient combinatorial algorithms, but whose nonlinear convex counterpart is harder and admits significantly less efficient algorithms. This motivates the computational model of convex optimization, including the offline, online and stochastic settings, using a linear optimization oracle. In this computational model we give several new results that improve over the previous stateoftheart. Our main result is a novel conditional gradient algorithm for smooth and strongly convex optimization over polyhedral sets that performs only a single linear optimization step over the domain on each iteration and enjoys a linear convergence rate. This gives an exponential improvement in convergence rate over previous results. Based on this new conditional gradient algorithm we give the first algorithms for online convex optimization over polyhedral sets that perform only a single linear optimization step over the domain while having optimal regret guarantees, answering an open question of Kalai and Vempala, and Hazan and Kale. Our online algorithms also imply conditional gradient algorithms for nonsmooth and stochastic convex optimization with the same convergence rates as projected (sub)gradient methods. Key words. frankwolfe algorithm; conditional gradient methods; linear programming; firstorder methods; online convex optimization; online learning; stochastic optimization AMS subject classifications. 65K05; 90C05; 90C06; 90C25; 90C30; 90C27; 90C15
Conditional gradient algorithms for machine learning
, 2012
"... We consider penalized formulations of machine learning problems with regularization penalty having conic structure. For several important learning problems, stateoftheart optimization approaches such as proximal gradient algorithms are difficult to apply and computationally expensive, preventing ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
We consider penalized formulations of machine learning problems with regularization penalty having conic structure. For several important learning problems, stateoftheart optimization approaches such as proximal gradient algorithms are difficult to apply and computationally expensive, preventing from using them for largescale learning purpose. We present a conditional gradient algorithm, with theoretical guarantees, and show promising experimental results on two largescale realworld datasets.
Orthogonal rankone matrix pursuit for low rank matrix completion
 SIAM Journal on Scientific Computing
"... ar ..."
(Show Context)
A smoothing approach for composite conditional gradient with nonsmooth loss
, 2014
"... Abstract We consider learning problems where the nonsmoothness lies both in the convex empirical risk and in the regularization penalty. Examples of such problems include learning with nonsmooth loss functions and atomic decomposition regularization penalty. Such doubly nonsmooth learning problems ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract We consider learning problems where the nonsmoothness lies both in the convex empirical risk and in the regularization penalty. Examples of such problems include learning with nonsmooth loss functions and atomic decomposition regularization penalty. Such doubly nonsmooth learning problems prevent the use of recently proposed composite conditional gradient algorithms for training, which are particularly attractive for largescale applications. Indeed, they rely on the assumption that the empirical risk part of the objective is smooth. We propose a composite conditional gradient algorithm with smoothing to tackle such learning problems. We set up a framework allowing to systematically design parametrized smooth surrogates of nonsmooth loss functions. We then propose a smoothed composite conditional gradient algorithm, for which we prove theoretical guarantees on the accuracy. We present promising experimental results on collaborative filtering tasks.