Results 1  10
of
45
Revisiting frankwolfe: Projectionfree sparse convex optimization
 In ICML
, 2013
"... We provide stronger and more general primaldual convergence results for FrankWolfetype algorithms (a.k.a. conditional gradient) for constrained convex optimization, enabled by a simple framework of duality gap certificates. Our analysis also holds if the linear subproblems are only solved approxi ..."
Abstract

Cited by 76 (2 self)
 Add to MetaCart
We provide stronger and more general primaldual convergence results for FrankWolfetype algorithms (a.k.a. conditional gradient) for constrained convex optimization, enabled by a simple framework of duality gap certificates. Our analysis also holds if the linear subproblems are only solved approximately (as well as if the gradients are inexact), and is proven to be worstcase optimal in the sparsity of the obtained solutions. On the application side, this allows us to unify a large variety of existing sparse greedy methods, in particular for optimization over convex hulls of an atomic set, even if those sets can only be approximated, including sparse (or structured sparse) vectors or matrices, lowrank matrices, permutation matrices, or maxnorm bounded matrices. We present a new general framework for convex optimization over matrix factorizations, where every FrankWolfe iteration will consist of a lowrank update, and discuss the broad application areas of this approach. 1.
Largescale convex minimization with a lowrank constraint
 In Proceedings of the 28th International Conference on Machine Learning
, 2011
"... We address the problem of minimizing a convex function over the space of large matrices with low rank. While this optimization problem is hard in general, we propose an efficient greedy algorithm and derive its formal approximation guarantees. Each iteration of the algorithm involves (approximately) ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
We address the problem of minimizing a convex function over the space of large matrices with low rank. While this optimization problem is hard in general, we propose an efficient greedy algorithm and derive its formal approximation guarantees. Each iteration of the algorithm involves (approximately) finding the left and right singular vectors corresponding to the largest singular value of a certain matrix, which can be calculated in linear time. This leads to an algorithm which can scale to large matrices arising in several applications such as matrix completion for collaborative filtering and robust low rank matrix approximation. 1.
Lifted coordinate descent for learning with tracenorm regularization
 AISTATS
, 2012
"... We consider the minimization of a smooth loss with tracenorm regularization, which is a natural objective in multiclass and multitask learning. Even though the problem is convex, existing approaches rely on optimizing a nonconvex variational bound, which is not guaranteed to converge, or repeated ..."
Abstract

Cited by 31 (5 self)
 Add to MetaCart
(Show Context)
We consider the minimization of a smooth loss with tracenorm regularization, which is a natural objective in multiclass and multitask learning. Even though the problem is convex, existing approaches rely on optimizing a nonconvex variational bound, which is not guaranteed to converge, or repeatedly perform singularvalue decomposition, which prevents scaling beyond moderate matrix sizes. We lift the nonsmooth convex problem into an infinitely dimensional smooth problem and apply coordinate descent to solve it. We prove that our approach converges to the optimum, and is competitive or outperforms state of the art. 1
Conditional gradient algorithms for normregularized smooth convex optimization
, 2013
"... Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone K, a norm ‖ · ‖ and a smooth convex function f, we want either 1) to minimize the norm over the intersection of the cone and a level set of f, or 2) to minimiz ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
(Show Context)
Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone K, a norm ‖ · ‖ and a smooth convex function f, we want either 1) to minimize the norm over the intersection of the cone and a level set of f, or 2) to minimize over the cone the sum of f and a multiple of the norm. We focus on the case where (a) the dimension of the problem is too large to allow for interior point algorithms, (b) ‖ · ‖ is “too complicated ” to allow for computationally cheap Bregman projections required in the firstorder proximal gradient algorithms. On the other hand, we assume that it is relatively easy to minimize linear forms over the intersection of K and the unit ‖ · ‖ball. Motivating examples are given by the nuclear norm with K being the entire space of matrices, or the positive semidefinite cone in the space of symmetric matrices, and the Total Variation norm on the space of 2D images. We discuss versions of the Conditional Gradient algorithm capable to handle our problems of interest, provide the related theoretical efficiency estimates and outline some applications. 1
Projectionfree Online Learning
"... The computational bottleneck in applying online learning to massive data sets is usually the projection step. We present efficient online learning algorithms that eschew projections in favor of much more efficient linear optimization steps using the FrankWolfe technique. We obtain a range of regret ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
The computational bottleneck in applying online learning to massive data sets is usually the projection step. We present efficient online learning algorithms that eschew projections in favor of much more efficient linear optimization steps using the FrankWolfe technique. We obtain a range of regret bounds for online convex optimization, with better bounds for specific cases such as stochastic online smooth convex optimization. Besides the computational advantage, other desirable features of our algorithms are that they are parameterfree in the stochastic case and produce sparse decisions. We apply our algorithms to computationally intensive applications of collaborative filtering, and show the theoretical improvements to be clearly visible on standard datasets. 1.
Efficient and Practical Stochastic Subgradient Descent for Nuclear Norm Regularization
"... We describe novel subgradient methods for a broad class of matrix optimization problems involving nuclear norm regularization. Unlike existing approaches, our method executes very cheap iterations by combining lowrank stochastic subgradients with efficient incremental SVD updates, made possible by ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
We describe novel subgradient methods for a broad class of matrix optimization problems involving nuclear norm regularization. Unlike existing approaches, our method executes very cheap iterations by combining lowrank stochastic subgradients with efficient incremental SVD updates, made possible by highly optimized and parallelizable dense linear algebra operations on small matrices. Our practical algorithms always maintain a lowrank factorization of iterates that can be conveniently held in memory and efficiently multiplied to generate predictions in matrix completion settings. Empirical comparisons confirm that our approach is highly competitive with several recently proposed stateoftheart solvers for such problems. 1.
Convex Collective Matrix Factorization
"... In many applications, multiple interlinked sources of data are available and they cannot be represented by a single adjacency matrix, to which large scale factorization method could be applied. Collective matrix factorization is a simple yet powerful approach to jointly factorize multiple matrices, ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
In many applications, multiple interlinked sources of data are available and they cannot be represented by a single adjacency matrix, to which large scale factorization method could be applied. Collective matrix factorization is a simple yet powerful approach to jointly factorize multiple matrices, each of which represents a relation between two entity types. Existing algorithms to estimate parameters of collective matrix factorization models are based on nonconvex formulations of the problem; in this paper, a convex formulation of this approach is proposed. This enables the derivation of large scale algorithms to estimate the parameters, including an iterative eigenvalue thresholding algorithm. Numerical experiments illustrate the benefits of this new approach. 1
Scalable Matrixvalued Kernel Learning for Highdimensional Nonlinear Multivariate Regression and Granger Causality
"... We propose a general matrixvalued multiple kernel learning framework for highdimensionalnonlinearmultivariateregression problems. This framework allows a broad class of mixed norm regularizers, including those that induce sparsity, to be imposedonadictionaryofvectorvaluedReproducing Kernel Hilbert ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
We propose a general matrixvalued multiple kernel learning framework for highdimensionalnonlinearmultivariateregression problems. This framework allows a broad class of mixed norm regularizers, including those that induce sparsity, to be imposedonadictionaryofvectorvaluedReproducing Kernel Hilbert Spaces. We develop a highly scalable and eigendecompositionfree algorithm that orchestrates two inexact solvers for simultaneously learning both the input and output components of separable matrixvalued kernels. As a key application enabled by our framework, we show how highdimensional causal inference tasks can be naturally cast as sparse function estimation problems, leading to novel nonlinear extensions of a class of Graphical Granger Causality techniques. Our algorithmic developments and extensive empirical studies are complemented by theoretical analyses in terms of Rademacher generalization bounds. 1
Understanding alternating minimization for matrix completion
 In Symposium on Foundations of Computer Science
, 2014
"... Alternating minimization is a widely used and empirically successful heuristic for matrix completion and related lowrank optimization problems. Theoretical guarantees for alternating minimization have been hard to come by and are still poorly understood. This is in part because the heuristic is ite ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Alternating minimization is a widely used and empirically successful heuristic for matrix completion and related lowrank optimization problems. Theoretical guarantees for alternating minimization have been hard to come by and are still poorly understood. This is in part because the heuristic is iterative and nonconvex in nature. We give a new algorithm based on alternating minimization that provably recovers an unknown lowrank matrix from a random subsample of its entries under a standard incoherence assumption. Our results reduce the sample size requirements of the alternating minimization approach by at least a quartic factor in the rank and the condition number of the unknown matrix. These improvements apply even if the matrix is only close to lowrank in the Frobenius norm. Our algorithm runs in nearly linear time in the dimension of the matrix and, in a broad range of parameters, gives the strongest sample bounds among all subquadratic time algorithms that we are aware of. Underlying our work is a new robust convergence analysis of the wellknown Power Method for computing the dominant singular vectors of a matrix. This viewpoint leads to a conceptually simple understanding of alternating minimization. In addition, we contribute a new technique for controlling the coherence of intermediate solutions arising in iterative algorithms based on a smoothed analysis of the QR factorization. These techniques may be of interest beyond their application here.
Stochastic Gradient Descent with Only One Projection
"... Although many variants of stochastic gradient descent have been proposed for largescale convex optimization, most of them require projecting the solution at each iteration to ensure that the obtained solution stays within the feasible domain. For complex domains (e.g., positive semidefinite cone), ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
Although many variants of stochastic gradient descent have been proposed for largescale convex optimization, most of them require projecting the solution at each iteration to ensure that the obtained solution stays within the feasible domain. For complex domains (e.g., positive semidefinite cone), the projection step can be computationally expensive, making stochastic gradient descent unattractive for largescale optimization problems. We address this limitation by developing novel stochastic optimization algorithms that do not need intermediate projections. Instead, only one projection at the last iteration is needed to obtain a feasible solution in the given domain. Our theoretical analysis shows that with a high probability, the proposed algorithms achieve an O(1 / √ T) convergence rate for general convex optimization, and an O(ln T/T) rate for strongly convex optimization under mild conditions about the domain and the objective function. 1