Results 1  10
of
74
Stochastic blockcoordinate frankwolfe optimization for structural svms. arXiv preprint:1207.4747
, 2012
"... We propose a randomized blockcoordinate variant of the classic FrankWolfe algorithm for convex optimization with blockseparable constraints. Despite its lower iteration cost, we show that it achieves a similar convergence rate in duality gap as the full FrankWolfe algorithm. We also show that, w ..."
Abstract

Cited by 52 (4 self)
 Add to MetaCart
(Show Context)
We propose a randomized blockcoordinate variant of the classic FrankWolfe algorithm for convex optimization with blockseparable constraints. Despite its lower iteration cost, we show that it achieves a similar convergence rate in duality gap as the full FrankWolfe algorithm. We also show that, when applied to the dual structural support vector machine (SVM) objective, this yields an online algorithm that has the same low iteration complexity as primal stochastic subgradient methods. However, unlike stochastic subgradient methods, the blockcoordinate FrankWolfe algorithm allows us to compute the optimal stepsize and yields a computable duality gap guarantee. Our experiments indicate that this simple algorithm outperforms competing structural SVM solvers. 1.
Conditional gradient algorithms for normregularized smooth convex optimization
, 2013
"... Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone K, a norm ‖ · ‖ and a smooth convex function f, we want either 1) to minimize the norm over the intersection of the cone and a level set of f, or 2) to minimiz ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
(Show Context)
Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone K, a norm ‖ · ‖ and a smooth convex function f, we want either 1) to minimize the norm over the intersection of the cone and a level set of f, or 2) to minimize over the cone the sum of f and a multiple of the norm. We focus on the case where (a) the dimension of the problem is too large to allow for interior point algorithms, (b) ‖ · ‖ is “too complicated ” to allow for computationally cheap Bregman projections required in the firstorder proximal gradient algorithms. On the other hand, we assume that it is relatively easy to minimize linear forms over the intersection of K and the unit ‖ · ‖ball. Motivating examples are given by the nuclear norm with K being the entire space of matrices, or the positive semidefinite cone in the space of symmetric matrices, and the Total Variation norm on the space of 2D images. We discuss versions of the Conditional Gradient algorithm capable to handle our problems of interest, provide the related theoretical efficiency estimates and outline some applications. 1
Square deal: Lower bounds and improved relaxations for tensor recovery
 CoRR
"... Recovering a lowrank tensor from incomplete information is a recurring problem in signal processing and machine learning. The most popular convex relaxation of this problem minimizes the sum of the nuclear norms of the unfoldings of the tensor. We show that this approach can be substantially subopt ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
Recovering a lowrank tensor from incomplete information is a recurring problem in signal processing and machine learning. The most popular convex relaxation of this problem minimizes the sum of the nuclear norms of the unfoldings of the tensor. We show that this approach can be substantially suboptimal: reliably recovering a Kway tensor of length n and Tucker rank r from Gaussian measurements requires Ω(rnK−1) observations. In contrast, a certain (intractable) nonconvex formulation needs only O(rK+nrK) observations. We introduce a very simple, new convex relaxation, which partially bridges this gap. Our new formulation succeeds with O(rbK/2cndK/2e) observations. While these results pertain to Gaussian measurements, simulations strongly suggest that the new norm also outperforms the sum of nuclear norms for tensor completion from a random subset of entries. Our lower bound for the sumofnuclearnorms model follows from a new result on recovering signals with multiple sparse structures (e.g. sparse, low rank), which perhaps surprisingly demonstrates the significant suboptimality of the commonly used recovery approach via minimizing the sum of individual sparsity inducing norms (e.g. l1, nuclear norm). Our new formulation for lowrank tensor recovery however opens the possibility in reducing the sample complexity by exploiting several structures jointly. 1
Efficient Image and Video Colocalization with FrankWolfe Algorithm
 In ECCV
"... Abstract. In this paper, we tackle the problem of performing efficient colocalization in images and videos. Colocalization is the problem of simultaneously localizing (with bounding boxes) objects of the same class across a set of distinct images or videos. Building upon recent stateoftheart m ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we tackle the problem of performing efficient colocalization in images and videos. Colocalization is the problem of simultaneously localizing (with bounding boxes) objects of the same class across a set of distinct images or videos. Building upon recent stateoftheart methods, we show how we are able to naturally incorporate temporal terms and constraints for video colocalization into a quadratic programming framework. Furthermore, by leveraging the FrankWolfe algorithm (or conditional gradient), we show how our optimization formulations for both images and videos can be reduced to solving a succession of simple integer programs, leading to increased efficiency in both memory and speed. To validate our method, we present experimental results on the PASCAL VOC 2007 dataset for images and the YouTubeObjects dataset for videos, as well as a joint combination of the two. 1
New analysis and results for the conditional gradient method
, 2013
"... We present new results for the conditional gradient method (also known as the FrankWolfe method). We derive computational guarantees for arbitrary stepsize sequences, which are then applied to various stepsize rules, including simple averaging and constant stepsizes. We also develop stepsize r ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
We present new results for the conditional gradient method (also known as the FrankWolfe method). We derive computational guarantees for arbitrary stepsize sequences, which are then applied to various stepsize rules, including simple averaging and constant stepsizes. We also develop stepsize rules and complexity bounds that depend naturally on the warmstart quality of the initial (and subsequent) iterates. Our results include complexity bounds for optimality bound gap and the Wolfe gap. Lastly, we present complexity bounds in the presence of approximate computation of gradients and/or linear optimization subproblem solutions. The results herein are mostly a condensation of the paper [1]. 1
Weakly Supervised Action Labeling in Videos Under Ordering Constraints
"... Abstract. We are given a set of video clips, each one annotated with an ordered list of actions, such as “walk ” then “sit ” then “answer phone” extracted from, for example, the associated text script. We seek to temporally localize the individual actions in each clip as well as to learn a discrimin ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We are given a set of video clips, each one annotated with an ordered list of actions, such as “walk ” then “sit ” then “answer phone” extracted from, for example, the associated text script. We seek to temporally localize the individual actions in each clip as well as to learn a discriminative classifier for each action. We formulate the problem as a weakly supervised temporal assignment with ordering constraints. Each video clip is divided into small time intervals and each time interval of each video clip is assigned one action label, while respecting the order in which the action labels appear in the given annotations. We show that the action label assignment can be determined together with learning a classifier for each action in a discriminative manner. We evaluate the proposed model on a new and challenging dataset of 937 video clips with a total of 787720 frames containing sequences of 16 different actions from 69 Hollywood movies. 1
Duality between subgradient and conditional gradient methods
 hal00861118, version 1  12 Sep 2013
, 2013
"... ..."
Convex relaxations of structured matrix factorizations
, 2013
"... We consider the factorization of a rectangular matrix X into a positive linear combination of rankone factors of the form uv ⊤ , where u and v belongs to certain sets U and V, that may encode specific structures regarding the factors, such as positivity or sparsity. In this paper, we show that comp ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
We consider the factorization of a rectangular matrix X into a positive linear combination of rankone factors of the form uv ⊤ , where u and v belongs to certain sets U and V, that may encode specific structures regarding the factors, such as positivity or sparsity. In this paper, we show that computing the optimal decomposition is equivalent to computing a certain gauge function of X and we provide a detailed analysis of these gauge functions and their polars. Since these gaugefunctions are typically hard to compute, we present semidefinite relaxations and several algorithms that may recover approximate decompositions with approximation guarantees. We illustrate our results with simulations on finding decompositions with elements in {0,1}. As side contributions, we present a detailed analysis of variational quadratic representations of norms as well as a new iterative basis pursuit algorithm that can deal with inexact firstorder oracles. 1
Intersecting singularities for multistructured estimation
"... We address the problem of designing a convex nonsmooth regularizer encouraging multiple structural effects simultaneously. Focusing on the inference of sparse and lowrank matrices we suggest a new complexity index and a convex penalty approximating it. The new penalty term can be written as the tra ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
We address the problem of designing a convex nonsmooth regularizer encouraging multiple structural effects simultaneously. Focusing on the inference of sparse and lowrank matrices we suggest a new complexity index and a convex penalty approximating it. The new penalty term can be written as the trace norm of a linear function of the matrix. By analyzing theoretical properties of this family of regularizers we come up with oracle inequalities and compressed sensing results ensuring the quality of our regularized estimator. We also provide algorithms and supporting numerical experiments. 1.
A linearly convergent conditional gradient algorithm with applications to online and stochastic optimization
, 1301
"... Abstract. Linear optimization is many times algorithmically simpler than nonlinear convex optimization. Linear optimization over matroid polytopes, matching polytopes and path polytopes are example of problems for which we have simple and efficient combinatorial algorithms, but whose nonlinear con ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Linear optimization is many times algorithmically simpler than nonlinear convex optimization. Linear optimization over matroid polytopes, matching polytopes and path polytopes are example of problems for which we have simple and efficient combinatorial algorithms, but whose nonlinear convex counterpart is harder and admits significantly less efficient algorithms. This motivates the computational model of convex optimization, including the offline, online and stochastic settings, using a linear optimization oracle. In this computational model we give several new results that improve over the previous stateoftheart. Our main result is a novel conditional gradient algorithm for smooth and strongly convex optimization over polyhedral sets that performs only a single linear optimization step over the domain on each iteration and enjoys a linear convergence rate. This gives an exponential improvement in convergence rate over previous results. Based on this new conditional gradient algorithm we give the first algorithms for online convex optimization over polyhedral sets that perform only a single linear optimization step over the domain while having optimal regret guarantees, answering an open question of Kalai and Vempala, and Hazan and Kale. Our online algorithms also imply conditional gradient algorithms for nonsmooth and stochastic convex optimization with the same convergence rates as projected (sub)gradient methods. Key words. frankwolfe algorithm; conditional gradient methods; linear programming; firstorder methods; online convex optimization; online learning; stochastic optimization AMS subject classifications. 65K05; 90C05; 90C06; 90C25; 90C30; 90C27; 90C15