Results 1  10
of
30
Revisiting frankwolfe: Projectionfree sparse convex optimization
 In ICML
, 2013
"... We provide stronger and more general primaldual convergence results for FrankWolfetype algorithms (a.k.a. conditional gradient) for constrained convex optimization, enabled by a simple framework of duality gap certificates. Our analysis also holds if the linear subproblems are only solved approxi ..."
Abstract

Cited by 76 (2 self)
 Add to MetaCart
(Show Context)
We provide stronger and more general primaldual convergence results for FrankWolfetype algorithms (a.k.a. conditional gradient) for constrained convex optimization, enabled by a simple framework of duality gap certificates. Our analysis also holds if the linear subproblems are only solved approximately (as well as if the gradients are inexact), and is proven to be worstcase optimal in the sparsity of the obtained solutions. On the application side, this allows us to unify a large variety of existing sparse greedy methods, in particular for optimization over convex hulls of an atomic set, even if those sets can only be approximated, including sparse (or structured sparse) vectors or matrices, lowrank matrices, permutation matrices, or maxnorm bounded matrices. We present a new general framework for convex optimization over matrix factorizations, where every FrankWolfe iteration will consist of a lowrank update, and discuss the broad application areas of this approach. 1.
Conditional gradient algorithms for normregularized smooth convex optimization
, 2013
"... Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone K, a norm ‖ · ‖ and a smooth convex function f, we want either 1) to minimize the norm over the intersection of the cone and a level set of f, or 2) to minimiz ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
(Show Context)
Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone K, a norm ‖ · ‖ and a smooth convex function f, we want either 1) to minimize the norm over the intersection of the cone and a level set of f, or 2) to minimize over the cone the sum of f and a multiple of the norm. We focus on the case where (a) the dimension of the problem is too large to allow for interior point algorithms, (b) ‖ · ‖ is “too complicated ” to allow for computationally cheap Bregman projections required in the firstorder proximal gradient algorithms. On the other hand, we assume that it is relatively easy to minimize linear forms over the intersection of K and the unit ‖ · ‖ball. Motivating examples are given by the nuclear norm with K being the entire space of matrices, or the positive semidefinite cone in the space of symmetric matrices, and the Total Variation norm on the space of 2D images. We discuss versions of the Conditional Gradient algorithm capable to handle our problems of interest, provide the related theoretical efficiency estimates and outline some applications. 1
Largescale image classification with tracenorm regularization
 IEEE Conference on Computer Vision & Pattern Recognition (CVPR
, 2012
"... With the advent of larger image classification datasets such as ImageNet, designing scalable and efficient multiclass classification algorithms is now an important challenge. We introduce a new scalable learning algorithm for largescale multiclass image classification, based on the multinomial l ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
(Show Context)
With the advent of larger image classification datasets such as ImageNet, designing scalable and efficient multiclass classification algorithms is now an important challenge. We introduce a new scalable learning algorithm for largescale multiclass image classification, based on the multinomial logistic loss and the tracenorm regularization penalty. Reframing the challenging nonsmooth optimization problem into a surrogate infinitedimensional optimization problem with a regular `1regularization penalty, we propose a simple and provably efficient accelerated coordinate descent algorithm. Furthermore, we show how to perform efficient matrix computations in the compressed domain for quantized dense visual features, scaling up to 100,000s examples, 1,000sdimensional features, and 100s of categories. Promising experimental results on the “Fungus”, “Ungulate”, and “Vehicles ” subsets of ImageNet are presented, where we show that our approach performs significantly better than stateoftheart approaches for Fisher vectors with 16 Gaussians. 1.
Accelerated Training for Matrixnorm Regularization: A Boosting Approach
"... Sparse learning models typically combine a smooth loss with a nonsmooth penalty, such as trace norm. Although recent developments in sparse approximation have offered promising solution methods, current approaches either apply only to matrixnorm constrained problems or provide suboptimal convergenc ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
(Show Context)
Sparse learning models typically combine a smooth loss with a nonsmooth penalty, such as trace norm. Although recent developments in sparse approximation have offered promising solution methods, current approaches either apply only to matrixnorm constrained problems or provide suboptimal convergence rates. In this paper, we propose a boosting method for regularized learning that guarantees ɛ accuracy within O(1/ɛ) iterations. Performance is further accelerated by interlacing boosting with fixedrank local optimization—exploiting a simpler local objective than previous work. The proposed method yields stateoftheart performance on largescale problems. We also demonstrate an application to latent multiview learning for which we provide the first efficient weakoracle. 1
A linearly convergent conditional gradient algorithm with applications to online and stochastic optimization
, 1301
"... Abstract. Linear optimization is many times algorithmically simpler than nonlinear convex optimization. Linear optimization over matroid polytopes, matching polytopes and path polytopes are example of problems for which we have simple and efficient combinatorial algorithms, but whose nonlinear con ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Linear optimization is many times algorithmically simpler than nonlinear convex optimization. Linear optimization over matroid polytopes, matching polytopes and path polytopes are example of problems for which we have simple and efficient combinatorial algorithms, but whose nonlinear convex counterpart is harder and admits significantly less efficient algorithms. This motivates the computational model of convex optimization, including the offline, online and stochastic settings, using a linear optimization oracle. In this computational model we give several new results that improve over the previous stateoftheart. Our main result is a novel conditional gradient algorithm for smooth and strongly convex optimization over polyhedral sets that performs only a single linear optimization step over the domain on each iteration and enjoys a linear convergence rate. This gives an exponential improvement in convergence rate over previous results. Based on this new conditional gradient algorithm we give the first algorithms for online convex optimization over polyhedral sets that perform only a single linear optimization step over the domain while having optimal regret guarantees, answering an open question of Kalai and Vempala, and Hazan and Kale. Our online algorithms also imply conditional gradient algorithms for nonsmooth and stochastic convex optimization with the same convergence rates as projected (sub)gradient methods. Key words. frankwolfe algorithm; conditional gradient methods; linear programming; firstorder methods; online convex optimization; online learning; stochastic optimization AMS subject classifications. 65K05; 90C05; 90C06; 90C25; 90C30; 90C27; 90C15
Convex Coembedding
"... We present a general framework for association learning, where entities are embedded in a common latent space to express relatedness via geometry—an approach that underlies the state of the art for link prediction, relation learning, multilabel tagging, relevance retrieval and ranking. Although ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We present a general framework for association learning, where entities are embedded in a common latent space to express relatedness via geometry—an approach that underlies the state of the art for link prediction, relation learning, multilabel tagging, relevance retrieval and ranking. Although current approaches rely on local training methods applied to nonconvex formulations, we demonstrate how general convex formulations can be achieved for entity embedding, both for standard multilinear and prototypedistance models. We investigate an efficient optimization strategy that allows scaling. An experimental evaluation reveals the advantages of global training in different case studies. 1
Constrained relative entropy minimization with applications to multitask learning
, 2013
"... Copyright by ..."