Results 1 
6 of
6
Duality between subgradient and conditional gradient methods
 HAL00861118, VERSION 1  12 SEP 2013
, 2013
"... Given a convex optimization problem and its dual, there are many possible firstorder ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
Given a convex optimization problem and its dual, there are many possible firstorder
Structured LowRank Matrix Factorization: Optimality, Algorithm, and Applications to Image Processing
"... Recently, convex solutions to lowrank matrix factorization problems have received increasing attention in machine learning. However, in many applications the data can display other structures beyond simply being lowrank. For example, images and videos present complex spatiotemporal structures, w ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Recently, convex solutions to lowrank matrix factorization problems have received increasing attention in machine learning. However, in many applications the data can display other structures beyond simply being lowrank. For example, images and videos present complex spatiotemporal structures, which are largely ignored by current lowrank methods. In this paper we explore a matrix factorization technique suitable for large datasets that captures additional structure in the factors by using a projective tensor norm, which includes classical image regularizers such as total variation and the nuclear norm as particular cases. Although the resulting optimization problem is not convex, we show that under certain conditions on the factors, any local minimizer for the factors yields a global minimizer for their product. Examples in biomedical video segmentation and hyperspectral compressed recovery show the advantages of our approach on highdimensional datasets. 1.
Tight convex relaxations for sparse matrix factorization
, 2014
"... Based on a new atomic norm, we propose a new convex formulation for sparse matrix factorization problems in which the number of nonzero elements of the factors is assumed fixed and known. The formulation counts sparse PCA with multiple factors, subspace clustering and lowrank sparse bilinear regre ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Based on a new atomic norm, we propose a new convex formulation for sparse matrix factorization problems in which the number of nonzero elements of the factors is assumed fixed and known. The formulation counts sparse PCA with multiple factors, subspace clustering and lowrank sparse bilinear regression as potential applications. We compute slow rates and an upper bound on the statistical dimension Amelunxen et al. (2013) of the suggested norm for rank 1 matrices, showing that its statistical dimension is an order of magnitude smaller than the usual `1norm, trace norm and their combinations. Even though our convex formulation is in theory hard and does not lead to provably polynomial time algorithmic schemes, we propose an active set algorithm leveraging the structure of the convex problem to solve it and show promising numerical results.
Editor: U.N.Known
, 804
"... We extend the wellknown BFGS quasiNewton method and its limitedmemory variant LBFGS to the optimization of nonsmooth convex objectives. This is done in a rigorous fashion by generalizing three components of BFGS to subdifferentials: The local quadratic model, the identification of a descent direc ..."
Abstract
 Add to MetaCart
We extend the wellknown BFGS quasiNewton method and its limitedmemory variant LBFGS to the optimization of nonsmooth convex objectives. This is done in a rigorous fashion by generalizing three components of BFGS to subdifferentials: The local quadratic model, the identification of a descent direction, and the Wolfe line search conditions. We apply the resulting subLBFGS algorithm to L2regularized risk minimization with the binary hinge loss. To extend our algorithm to the multiclass and multilabel settings we develop a new, efficient, exact line search algorithm. We prove its worstcase time complexity bounds, and show that it can also extend a recently developed bundle method to the multiclass and multilabel settings. We also apply the directionfinding component of our algorithm to L1regularized risk minimization with logistic loss. In all these contexts our methods perform comparable to or better than specialized stateoftheart solvers on a number of publicly available datasets. Open source software implementing our algorithms is freely available for download.
• Assumptions – f: Rn → R Lipschitzcontinuous ⇒ f ∗ has compact support C
, 2013
"... Wolfe’s universal algorithm www.di.ens.fr/~fbach/wolfe_anonymous.pdf Conditional gradients everywhere • Conditional gradient and subgradient method – Fenchel duality – Generalized conditional gradient and mirror descent • Conditional gradient and greedy algorithms – Relationship with basis pursuit, ..."
Abstract
 Add to MetaCart
Wolfe’s universal algorithm www.di.ens.fr/~fbach/wolfe_anonymous.pdf Conditional gradients everywhere • Conditional gradient and subgradient method – Fenchel duality – Generalized conditional gradient and mirror descent • Conditional gradient and greedy algorithms – Relationship with basis pursuit, matching pursuit • Conditional gradient and herding – Properties of conditional gradient iterates – Relationships with sampling Composite optimization problems min x∈Rp h(x) + f(Ax)
University of Alberta FAST GRADIENT ALGORITHMS FOR STRUCTURED SPARSITY
"... and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication an ..."
Abstract
 Add to MetaCart
and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis, and except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatever without the author’s prior written permission. To my grandpa. Many machine learning problems can be formulated under the composite minimization framework which usually involves a smooth loss function and a nonsmooth regularizer. A lot of algorithms have thus been proposed and the main focus has been on first order gradient methods, due to their applicability in very large scale application domains. A common requirement of many of these popular gradient algorithms is the access to the proximal map of the regularizer, which unfortunately may not be easily computable in scenarios such as structured sparsity. In this thesis we first identify conditions under which the proximal map of a sum of functions is simply the composition of the proximal