Results 1  10
of
86
Stochastic blockcoordinate frankwolfe optimization for structural svms. arXiv preprint:1207.4747
, 2012
"... We propose a randomized blockcoordinate variant of the classic FrankWolfe algorithm for convex optimization with blockseparable constraints. Despite its lower iteration cost, we show that it achieves a similar convergence rate in duality gap as the full FrankWolfe algorithm. We also show that, w ..."
Abstract

Cited by 58 (6 self)
 Add to MetaCart
(Show Context)
We propose a randomized blockcoordinate variant of the classic FrankWolfe algorithm for convex optimization with blockseparable constraints. Despite its lower iteration cost, we show that it achieves a similar convergence rate in duality gap as the full FrankWolfe algorithm. We also show that, when applied to the dual structural support vector machine (SVM) objective, this yields an online algorithm that has the same low iteration complexity as primal stochastic subgradient methods. However, unlike stochastic subgradient methods, the blockcoordinate FrankWolfe algorithm allows us to compute the optimal stepsize and yields a computable duality gap guarantee. Our experiments indicate that this simple algorithm outperforms competing structural SVM solvers. 1.
Conditional gradient algorithms for normregularized smooth convex optimization
, 2013
"... Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone K, a norm ‖ · ‖ and a smooth convex function f, we want either 1) to minimize the norm over the intersection of the cone and a level set of f, or 2) to minimiz ..."
Abstract

Cited by 23 (6 self)
 Add to MetaCart
(Show Context)
Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone K, a norm ‖ · ‖ and a smooth convex function f, we want either 1) to minimize the norm over the intersection of the cone and a level set of f, or 2) to minimize over the cone the sum of f and a multiple of the norm. We focus on the case where (a) the dimension of the problem is too large to allow for interior point algorithms, (b) ‖ · ‖ is “too complicated ” to allow for computationally cheap Bregman projections required in the firstorder proximal gradient algorithms. On the other hand, we assume that it is relatively easy to minimize linear forms over the intersection of K and the unit ‖ · ‖ball. Motivating examples are given by the nuclear norm with K being the entire space of matrices, or the positive semidefinite cone in the space of symmetric matrices, and the Total Variation norm on the space of 2D images. We discuss versions of the Conditional Gradient algorithm capable to handle our problems of interest, provide the related theoretical efficiency estimates and outline some applications. 1
Square deal: Lower bounds and improved relaxations for tensor recovery
 CoRR
"... Recovering a lowrank tensor from incomplete information is a recurring problem in signal processing and machine learning. The most popular convex relaxation of this problem minimizes the sum of the nuclear norms of the unfoldings of the tensor. We show that this approach can be substantially subopt ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
(Show Context)
Recovering a lowrank tensor from incomplete information is a recurring problem in signal processing and machine learning. The most popular convex relaxation of this problem minimizes the sum of the nuclear norms of the unfoldings of the tensor. We show that this approach can be substantially suboptimal: reliably recovering a Kway tensor of length n and Tucker rank r from Gaussian measurements requires Ω(rnK−1) observations. In contrast, a certain (intractable) nonconvex formulation needs only O(rK+nrK) observations. We introduce a very simple, new convex relaxation, which partially bridges this gap. Our new formulation succeeds with O(rbK/2cndK/2e) observations. While these results pertain to Gaussian measurements, simulations strongly suggest that the new norm also outperforms the sum of nuclear norms for tensor completion from a random subset of entries. Our lower bound for the sumofnuclearnorms model follows from a new result on recovering signals with multiple sparse structures (e.g. sparse, low rank), which perhaps surprisingly demonstrates the significant suboptimality of the commonly used recovery approach via minimizing the sum of individual sparsity inducing norms (e.g. l1, nuclear norm). Our new formulation for lowrank tensor recovery however opens the possibility in reducing the sample complexity by exploiting several structures jointly. 1
Efficient Image and Video Colocalization with FrankWolfe Algorithm
 In ECCV
"... Abstract. In this paper, we tackle the problem of performing efficient colocalization in images and videos. Colocalization is the problem of simultaneously localizing (with bounding boxes) objects of the same class across a set of distinct images or videos. Building upon recent stateoftheart m ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we tackle the problem of performing efficient colocalization in images and videos. Colocalization is the problem of simultaneously localizing (with bounding boxes) objects of the same class across a set of distinct images or videos. Building upon recent stateoftheart methods, we show how we are able to naturally incorporate temporal terms and constraints for video colocalization into a quadratic programming framework. Furthermore, by leveraging the FrankWolfe algorithm (or conditional gradient), we show how our optimization formulations for both images and videos can be reduced to solving a succession of simple integer programs, leading to increased efficiency in both memory and speed. To validate our method, we present experimental results on the PASCAL VOC 2007 dataset for images and the YouTubeObjects dataset for videos, as well as a joint combination of the two. 1
The complexity of largescale convex programming under a linear optimization oracle.
, 2013
"... Abstract This paper considers a general class of iterative optimization algorithms, referred to as linearoptimizationbased convex programming (LCP) methods, for solving largescale convex programming (CP) problems. The LCP methods, covering the classic conditional gradient (CG) method (a.k.a., Fra ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Abstract This paper considers a general class of iterative optimization algorithms, referred to as linearoptimizationbased convex programming (LCP) methods, for solving largescale convex programming (CP) problems. The LCP methods, covering the classic conditional gradient (CG) method (a.k.a., FrankWolfe method) as a special case, can only solve a linear optimization subproblem at each iteration. In this paper, we first establish a series of lower complexity bounds for the LCP methods to solve different classes of CP problems, including smooth, nonsmooth and certain saddlepoint problems. We then formally establish the theoretical optimality or nearly optimality, in the largescale case, for the CG method and its variants to solve different classes of CP problems. We also introduce several new optimal LCP methods, obtained by properly modifying Nesterov's accelerated gradient method, and demonstrate their possible advantages over the classic CG for solving certain classes of largescale CP problems.
A linearly convergent conditional gradient algorithm with applications to online and stochastic optimization
, 2013
"... Linear optimization is many times algorithmically simpler than nonlinear convex optimization. Linear optimization over matroid polytopes, matching polytopes and path polytopes are example of problems for which we have simple and efficient combinatorial algorithms, but whose nonlinear convex count ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
Linear optimization is many times algorithmically simpler than nonlinear convex optimization. Linear optimization over matroid polytopes, matching polytopes and path polytopes are example of problems for which we have simple and efficient combinatorial algorithms, but whose nonlinear convex counterpart is harder and admits significantly less efficient algorithms. This motivates the computational model of convex optimization, including the offline, online and stochastic settings, using a linear optimization oracle. In this computational model we give several new results that improve over the previous stateoftheart. Our main result is a novel conditional gradient algorithm for smooth and strongly convex optimization over polyhedral sets that performs only a single linear optimization step over the domain on each iteration and enjoys a linear convergence rate. This gives an exponential improvement in convergence rate over previous results. Based on this new conditional gradient algorithm we give the first algorithms for online convex optimization over polyhedral sets that perform only a single linear optimization step over the domain while having optimal regret guarantees, answering an open question of Kalai and Vempala, and Hazan and Kale. Our online algorithms also imply conditional gradient algorithms for nonsmooth and stochastic convex optimization with the same convergence rates as projected (sub)gradient methods. Key words. frankwolfe algorithm; conditional gradient methods; linear programming; firstorder methods; online convex optimization; online learning; stochastic optimization AMS subject classifications. 65K05; 90C05; 90C06; 90C25; 90C30; 90C27; 90C15
New Analysis and Results for the Conditional Gradient Method
, 2013
"... We present new results for the conditional gradient method (also known as the FrankWolfe method). We derive computational guarantees for arbitrary stepsize sequences, which are then applied to various stepsize rules, including simple averaging and constant stepsizes. We also develop stepsize ru ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
We present new results for the conditional gradient method (also known as the FrankWolfe method). We derive computational guarantees for arbitrary stepsize sequences, which are then applied to various stepsize rules, including simple averaging and constant stepsizes. We also develop stepsize rules and computational guarantees that depend naturally on the warmstart quality of the initial (and subsequent) iterates. Our results include computational guarantees for both duality/bound gaps and the socalled Wolfe gaps. Lastly, we present complexity bounds in the presence of approximate computation of gradients and/or linear optimization subproblem solutions.
Weakly Supervised Action Labeling in Videos Under Ordering Constraints
"... Abstract. We are given a set of video clips, each one annotated with an ordered list of actions, such as “walk ” then “sit ” then “answer phone” extracted from, for example, the associated text script. We seek to temporally localize the individual actions in each clip as well as to learn a discrimin ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We are given a set of video clips, each one annotated with an ordered list of actions, such as “walk ” then “sit ” then “answer phone” extracted from, for example, the associated text script. We seek to temporally localize the individual actions in each clip as well as to learn a discriminative classifier for each action. We formulate the problem as a weakly supervised temporal assignment with ordering constraints. Each video clip is divided into small time intervals and each time interval of each video clip is assigned one action label, while respecting the order in which the action labels appear in the given annotations. We show that the action label assignment can be determined together with learning a classifier for each action in a discriminative manner. We evaluate the proposed model on a new and challenging dataset of 937 video clips with a total of 787720 frames containing sequences of 16 different actions from 69 Hollywood movies. 1
Conditional gradient algorithms for machine learning
, 2012
"... We consider penalized formulations of machine learning problems with regularization penalty having conic structure. For several important learning problems, stateoftheart optimization approaches such as proximal gradient algorithms are difficult to apply and computationally expensive, preventing ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
We consider penalized formulations of machine learning problems with regularization penalty having conic structure. For several important learning problems, stateoftheart optimization approaches such as proximal gradient algorithms are difficult to apply and computationally expensive, preventing from using them for largescale learning purpose. We present a conditional gradient algorithm, with theoretical guarantees, and show promising experimental results on two largescale realworld datasets.
Duality between subgradient and conditional gradient methods
 HAL00861118, VERSION 1  12 SEP 2013
, 2013
"... Given a convex optimization problem and its dual, there are many possible firstorder ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
Given a convex optimization problem and its dual, there are many possible firstorder