Results 1  10
of
49
Revisiting frankwolfe: Projectionfree sparse convex optimization
 In ICML
, 2013
"... We provide stronger and more general primaldual convergence results for FrankWolfetype algorithms (a.k.a. conditional gradient) for constrained convex optimization, enabled by a simple framework of duality gap certificates. Our analysis also holds if the linear subproblems are only solved approxi ..."
Abstract

Cited by 86 (2 self)
 Add to MetaCart
We provide stronger and more general primaldual convergence results for FrankWolfetype algorithms (a.k.a. conditional gradient) for constrained convex optimization, enabled by a simple framework of duality gap certificates. Our analysis also holds if the linear subproblems are only solved approximately (as well as if the gradients are inexact), and is proven to be worstcase optimal in the sparsity of the obtained solutions. On the application side, this allows us to unify a large variety of existing sparse greedy methods, in particular for optimization over convex hulls of an atomic set, even if those sets can only be approximated, including sparse (or structured sparse) vectors or matrices, lowrank matrices, permutation matrices, or maxnorm bounded matrices. We present a new general framework for convex optimization over matrix factorizations, where every FrankWolfe iteration will consist of a lowrank update, and discuss the broad application areas of this approach. 1.
Largescale convex minimization with a lowrank constraint
 In Proceedings of the 28th International Conference on Machine Learning
, 2011
"... We address the problem of minimizing a convex function over the space of large matrices with low rank. While this optimization problem is hard in general, we propose an efficient greedy algorithm and derive its formal approximation guarantees. Each iteration of the algorithm involves (approximately) ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
We address the problem of minimizing a convex function over the space of large matrices with low rank. While this optimization problem is hard in general, we propose an efficient greedy algorithm and derive its formal approximation guarantees. Each iteration of the algorithm involves (approximately) finding the left and right singular vectors corresponding to the largest singular value of a certain matrix, which can be calculated in linear time. This leads to an algorithm which can scale to large matrices arising in several applications such as matrix completion for collaborative filtering and robust low rank matrix approximation. 1.
Lifted coordinate descent for learning with tracenorm regularization
 AISTATS
, 2012
"... We consider the minimization of a smooth loss with tracenorm regularization, which is a natural objective in multiclass and multitask learning. Even though the problem is convex, existing approaches rely on optimizing a nonconvex variational bound, which is not guaranteed to converge, or repeated ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
(Show Context)
We consider the minimization of a smooth loss with tracenorm regularization, which is a natural objective in multiclass and multitask learning. Even though the problem is convex, existing approaches rely on optimizing a nonconvex variational bound, which is not guaranteed to converge, or repeatedly perform singularvalue decomposition, which prevents scaling beyond moderate matrix sizes. We lift the nonsmooth convex problem into an infinitely dimensional smooth problem and apply coordinate descent to solve it. We prove that our approach converges to the optimum, and is competitive or outperforms state of the art. 1
Projectionfree Online Learning
"... The computational bottleneck in applying online learning to massive data sets is usually the projection step. We present efficient online learning algorithms that eschew projections in favor of much more efficient linear optimization steps using the FrankWolfe technique. We obtain a range of regret ..."
Abstract

Cited by 26 (5 self)
 Add to MetaCart
The computational bottleneck in applying online learning to massive data sets is usually the projection step. We present efficient online learning algorithms that eschew projections in favor of much more efficient linear optimization steps using the FrankWolfe technique. We obtain a range of regret bounds for online convex optimization, with better bounds for specific cases such as stochastic online smooth convex optimization. Besides the computational advantage, other desirable features of our algorithms are that they are parameterfree in the stochastic case and produce sparse decisions. We apply our algorithms to computationally intensive applications of collaborative filtering, and show the theoretical improvements to be clearly visible on standard datasets. 1.
Conditional gradient algorithms for normregularized smooth convex optimization
, 2013
"... Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone K, a norm ‖ · ‖ and a smooth convex function f, we want either 1) to minimize the norm over the intersection of the cone and a level set of f, or 2) to minimiz ..."
Abstract

Cited by 23 (6 self)
 Add to MetaCart
(Show Context)
Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone K, a norm ‖ · ‖ and a smooth convex function f, we want either 1) to minimize the norm over the intersection of the cone and a level set of f, or 2) to minimize over the cone the sum of f and a multiple of the norm. We focus on the case where (a) the dimension of the problem is too large to allow for interior point algorithms, (b) ‖ · ‖ is “too complicated ” to allow for computationally cheap Bregman projections required in the firstorder proximal gradient algorithms. On the other hand, we assume that it is relatively easy to minimize linear forms over the intersection of K and the unit ‖ · ‖ball. Motivating examples are given by the nuclear norm with K being the entire space of matrices, or the positive semidefinite cone in the space of symmetric matrices, and the Total Variation norm on the space of 2D images. We discuss versions of the Conditional Gradient algorithm capable to handle our problems of interest, provide the related theoretical efficiency estimates and outline some applications. 1
Understanding alternating minimization for matrix completion
 In Symposium on Foundations of Computer Science
, 2014
"... Alternating minimization is a widely used and empirically successful heuristic for matrix completion and related lowrank optimization problems. Theoretical guarantees for alternating minimization have been hard to come by and are still poorly understood. This is in part because the heuristic is ite ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
Alternating minimization is a widely used and empirically successful heuristic for matrix completion and related lowrank optimization problems. Theoretical guarantees for alternating minimization have been hard to come by and are still poorly understood. This is in part because the heuristic is iterative and nonconvex in nature. We give a new algorithm based on alternating minimization that provably recovers an unknown lowrank matrix from a random subsample of its entries under a standard incoherence assumption. Our results reduce the sample size requirements of the alternating minimization approach by at least a quartic factor in the rank and the condition number of the unknown matrix. These improvements apply even if the matrix is only close to lowrank in the Frobenius norm. Our algorithm runs in nearly linear time in the dimension of the matrix and, in a broad range of parameters, gives the strongest sample bounds among all subquadratic time algorithms that we are aware of. Underlying our work is a new robust convergence analysis of the wellknown Power Method for computing the dominant singular vectors of a matrix. This viewpoint leads to a conceptually simple understanding of alternating minimization. In addition, we contribute a new technique for controlling the coherence of intermediate solutions arising in iterative algorithms based on a smoothed analysis of the QR factorization. These techniques may be of interest beyond their application here.
The complexity of largescale convex programming under a linear optimization oracle.
, 2013
"... Abstract This paper considers a general class of iterative optimization algorithms, referred to as linearoptimizationbased convex programming (LCP) methods, for solving largescale convex programming (CP) problems. The LCP methods, covering the classic conditional gradient (CG) method (a.k.a., Fra ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Abstract This paper considers a general class of iterative optimization algorithms, referred to as linearoptimizationbased convex programming (LCP) methods, for solving largescale convex programming (CP) problems. The LCP methods, covering the classic conditional gradient (CG) method (a.k.a., FrankWolfe method) as a special case, can only solve a linear optimization subproblem at each iteration. In this paper, we first establish a series of lower complexity bounds for the LCP methods to solve different classes of CP problems, including smooth, nonsmooth and certain saddlepoint problems. We then formally establish the theoretical optimality or nearly optimality, in the largescale case, for the CG method and its variants to solve different classes of CP problems. We also introduce several new optimal LCP methods, obtained by properly modifying Nesterov's accelerated gradient method, and demonstrate their possible advantages over the classic CG for solving certain classes of largescale CP problems.
A linearly convergent conditional gradient algorithm with applications to online and stochastic optimization
, 2013
"... Linear optimization is many times algorithmically simpler than nonlinear convex optimization. Linear optimization over matroid polytopes, matching polytopes and path polytopes are example of problems for which we have simple and efficient combinatorial algorithms, but whose nonlinear convex count ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
Linear optimization is many times algorithmically simpler than nonlinear convex optimization. Linear optimization over matroid polytopes, matching polytopes and path polytopes are example of problems for which we have simple and efficient combinatorial algorithms, but whose nonlinear convex counterpart is harder and admits significantly less efficient algorithms. This motivates the computational model of convex optimization, including the offline, online and stochastic settings, using a linear optimization oracle. In this computational model we give several new results that improve over the previous stateoftheart. Our main result is a novel conditional gradient algorithm for smooth and strongly convex optimization over polyhedral sets that performs only a single linear optimization step over the domain on each iteration and enjoys a linear convergence rate. This gives an exponential improvement in convergence rate over previous results. Based on this new conditional gradient algorithm we give the first algorithms for online convex optimization over polyhedral sets that perform only a single linear optimization step over the domain while having optimal regret guarantees, answering an open question of Kalai and Vempala, and Hazan and Kale. Our online algorithms also imply conditional gradient algorithms for nonsmooth and stochastic convex optimization with the same convergence rates as projected (sub)gradient methods. Key words. frankwolfe algorithm; conditional gradient methods; linear programming; firstorder methods; online convex optimization; online learning; stochastic optimization AMS subject classifications. 65K05; 90C05; 90C06; 90C25; 90C30; 90C27; 90C15
Efficient and Practical Stochastic Subgradient Descent for Nuclear Norm Regularization
"... We describe novel subgradient methods for a broad class of matrix optimization problems involving nuclear norm regularization. Unlike existing approaches, our method executes very cheap iterations by combining lowrank stochastic subgradients with efficient incremental SVD updates, made possible by ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
We describe novel subgradient methods for a broad class of matrix optimization problems involving nuclear norm regularization. Unlike existing approaches, our method executes very cheap iterations by combining lowrank stochastic subgradients with efficient incremental SVD updates, made possible by highly optimized and parallelizable dense linear algebra operations on small matrices. Our practical algorithms always maintain a lowrank factorization of iterates that can be conveniently held in memory and efficiently multiplied to generate predictions in matrix completion settings. Empirical comparisons confirm that our approach is highly competitive with several recently proposed stateoftheart solvers for such problems. 1.
Convex Collective Matrix Factorization
"... In many applications, multiple interlinked sources of data are available and they cannot be represented by a single adjacency matrix, to which large scale factorization method could be applied. Collective matrix factorization is a simple yet powerful approach to jointly factorize multiple matrices, ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
In many applications, multiple interlinked sources of data are available and they cannot be represented by a single adjacency matrix, to which large scale factorization method could be applied. Collective matrix factorization is a simple yet powerful approach to jointly factorize multiple matrices, each of which represents a relation between two entity types. Existing algorithms to estimate parameters of collective matrix factorization models are based on nonconvex formulations of the problem; in this paper, a convex formulation of this approach is proposed. This enables the derivation of large scale algorithms to estimate the parameters, including an iterative eigenvalue thresholding algorithm. Numerical experiments illustrate the benefits of this new approach. 1