Results 1  10
of
47
Revisiting frankwolfe: Projectionfree sparse convex optimization
 In ICML
, 2013
"... We provide stronger and more general primaldual convergence results for FrankWolfetype algorithms (a.k.a. conditional gradient) for constrained convex optimization, enabled by a simple framework of duality gap certificates. Our analysis also holds if the linear subproblems are only solved approxi ..."
Abstract

Cited by 86 (2 self)
 Add to MetaCart
(Show Context)
We provide stronger and more general primaldual convergence results for FrankWolfetype algorithms (a.k.a. conditional gradient) for constrained convex optimization, enabled by a simple framework of duality gap certificates. Our analysis also holds if the linear subproblems are only solved approximately (as well as if the gradients are inexact), and is proven to be worstcase optimal in the sparsity of the obtained solutions. On the application side, this allows us to unify a large variety of existing sparse greedy methods, in particular for optimization over convex hulls of an atomic set, even if those sets can only be approximated, including sparse (or structured sparse) vectors or matrices, lowrank matrices, permutation matrices, or maxnorm bounded matrices. We present a new general framework for convex optimization over matrix factorizations, where every FrankWolfe iteration will consist of a lowrank update, and discuss the broad application areas of this approach. 1.
Structured Sparsity through Convex Optimization
, 2012
"... Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the ℓ1norm. In this paper, we consider sit ..."
Abstract

Cited by 47 (6 self)
 Add to MetaCart
Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the ℓ1norm. In this paper, we consider situations where we are not only interested in sparsity, but where some structural prior knowledge is available as well. We show that the ℓ1norm can then be extended to structured norms built on either disjoint or overlapping groups of variables, leading to a flexible framework that can deal with various structures. We present applications to unsupervised learning, for structured sparse principal component analysis and hierarchical dictionary learning, and to supervised learning in the context of nonlinear variable selection.
On the Equivalence between Herding and Conditional Gradient Algorithms
, 2012
"... We show that the herding procedure of Welling (2009b) takes exactly the form of a standard convex optimization algorithm—namely a conditional gradient algorithm minimizing a quadratic moment discrepancy. This link enables us to invoke convergence results from convex optimization and to consider fast ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
(Show Context)
We show that the herding procedure of Welling (2009b) takes exactly the form of a standard convex optimization algorithm—namely a conditional gradient algorithm minimizing a quadratic moment discrepancy. This link enables us to invoke convergence results from convex optimization and to consider faster alternatives for the task of approximating integrals in a reproducing kernel Hilbert space. We study the behavior of the different variants through numerical simulations. The experiments indicate that while we can improve over herding on the task of approximating integrals, the original herding algorithm tends to approach more often the maximum entropy distribution, shedding more light on the learning bias behind herding. 1
Fast Semidifferentialbased Submodular Function Optimization
, 2013
"... We present a practical and powerful new framework for both unconstrained and constrained submodular function optimization based on discrete semidifferentials (sub and superdifferentials). The resulting algorithms, which repeatedly compute and then efficiently optimize submodular semigradients, off ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
(Show Context)
We present a practical and powerful new framework for both unconstrained and constrained submodular function optimization based on discrete semidifferentials (sub and superdifferentials). The resulting algorithms, which repeatedly compute and then efficiently optimize submodular semigradients, offer new and generalize many old methods for submodular optimization. Our approach, moreover, takes steps towards providing a unifying paradigm applicable to both submodular minimization and maximization, problems that historically have been treated quite distinctly. The practicality of our algorithms is important since interest in submodularity, owing to its natural and wide applicability, has recently been in ascendance within machine learning. We analyze theoretical properties of our algorithms for minimization and maximization, and show that many stateoftheart maximization algorithms are special cases. Lastly, we complement our theoretical analyses with supporting empirical experiments.
Convex relaxation of combinatorial penalties
, 2011
"... In this paper, we propose an unifying view of several recently proposed structured sparsityinducing norms. We consider the situation of a model simultaneously (a) penalized by a setfunction defined on the support of the unknown parameter vector which represents prior knowledge on supports, and (b) r ..."
Abstract

Cited by 12 (8 self)
 Add to MetaCart
(Show Context)
In this paper, we propose an unifying view of several recently proposed structured sparsityinducing norms. We consider the situation of a model simultaneously (a) penalized by a setfunction defined on the support of the unknown parameter vector which represents prior knowledge on supports, and (b) regularized in ℓpnorm. We show that the natural combinatorial optimization problems obtained may be relaxed into convex optimization problems and introduce a notion, the lower combinatorial envelope of a setfunction, that characterizes the tightness of our relaxations. We moreover establish links with norms based on latent representations including the latent group Lasso and blockcoding, and with norms obtained from submodular functions. 1
Reflection methods for userfriendly submodular optimization
"... Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efficient optimization procedures for submodular functions, especially for minimization problems. While gener ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
(Show Context)
Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efficient optimization procedures for submodular functions, especially for minimization problems. While general submodular minimization is challenging, we propose a new method that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize. A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution. This method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction. In our experiments, we illustrate the benefits of our method on two image segmentation tasks. 1
From MAP to marginals: Variational inference in Bayesian submodular models
 In Neural Information Processing Systems (NIPS
, 2014
"... Submodular optimization has found many applications in machine learning and beyond. We carry out the first systematic investigation of inference in probabilistic models defined through submodular functions, generalizing regular pairwise MRFs and Determinantal Point Processes. In particular, we pres ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Submodular optimization has found many applications in machine learning and beyond. We carry out the first systematic investigation of inference in probabilistic models defined through submodular functions, generalizing regular pairwise MRFs and Determinantal Point Processes. In particular, we present LFIELD, a variational approach to general logsubmodular and logsupermodular distributions based on sub and supergradients. We obtain both lower and upper bounds on the logpartition function, which enables us to compute probability intervals for marginals, conditionals and marginal likelihoods. We also obtain fully factorized approximate posteriors, at the same computational cost as ordinary submodular optimization. Our framework results in convex problems for optimizing over differentials of submodular functions, which we show how to optimally solve. We provide theoretical guarantees of the approximation quality with respect to the curvature of the function. We further establish natural relations between our variational approach and the classical meanfield method. Lastly, we empirically demonstrate the accuracy of our inference scheme on several submodular models. 1
Convex relaxations of structured matrix factorizations
, 2013
"... We consider the factorization of a rectangular matrix X into a positive linear combination of rankone factors of the form uv ⊤ , where u and v belongs to certain sets U and V, that may encode specific structures regarding the factors, such as positivity or sparsity. In this paper, we show that comp ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
We consider the factorization of a rectangular matrix X into a positive linear combination of rankone factors of the form uv ⊤ , where u and v belongs to certain sets U and V, that may encode specific structures regarding the factors, such as positivity or sparsity. In this paper, we show that computing the optimal decomposition is equivalent to computing a certain gauge function of X and we provide a detailed analysis of these gauge functions and their polars. Since these gaugefunctions are typically hard to compute, we present semidefinite relaxations and several algorithms that may recover approximate decompositions with approximation guarantees. We illustrate our results with simulations on finding decompositions with elements in {0,1}. As side contributions, we present a detailed analysis of variational quadratic representations of norms as well as a new iterative basis pursuit algorithm that can deal with inexact firstorder oracles. 1
SubmodularBregman and the LovászBregman Divergences with Applications: Extended Version
"... We introduce a class of discrete divergences on sets (equivalently binary vectors) that we call the submodularBregman divergences. We consider two kinds of submodular Bregman divergence, defined either from tight modular upper or tight modular lower bounds of a submodular function. We show that the ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
We introduce a class of discrete divergences on sets (equivalently binary vectors) that we call the submodularBregman divergences. We consider two kinds of submodular Bregman divergence, defined either from tight modular upper or tight modular lower bounds of a submodular function. We show that the properties of these divergences are analogous to the (standard continuous) Bregman divergence. We demonstrate how the submodular Bregman divergences generalize many useful divergences, including the weighted Hamming distance, squared weighted Hamming, weighted precision, recall, conditional mutual information, and a generalized KLdivergence on sets. We also show that the generalized Bregman divergence on the Lovász extension of a submodular function, which we call the LovászBregman divergence, is a continuous extension of a submodular Bregman divergence. We point out a number of applications of the submodular Bregman and the Lovász Bregman divergences, and in particular show that a proximal algorithm defined through the submodular Bregman divergence provides a framework for many mirrordescent style algorithms related to submodular function optimization. We also show that a generalization of the kmeans algorithm using the Lovász Bregman divergence is natural in clustering scenarios where ordering is important. A unique property of this algorithm is that computing the mean ordering is extremely efficient unlike other order based distance measures. Finally we provide a clustering framework for the submodular Bregman, and we derive fast algorithms for clustering sets of binary vectors (equivalently sets of sets). 1
The LovászBregman Divergence and connections to rank aggregation, clustering and web ranking
"... We extend the recently introduced theory of Lovász Bregman (LB) divergences [19] in several ways. We show that they represent a distortion between a “score ” and an “ordering”, thus providing a new view of rank aggregation and order based clustering with interesting connections to web ranking. We sh ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
We extend the recently introduced theory of Lovász Bregman (LB) divergences [19] in several ways. We show that they represent a distortion between a “score ” and an “ordering”, thus providing a new view of rank aggregation and order based clustering with interesting connections to web ranking. We show how the LB divergences have a number of properties akin to many permutation based metrics, and in fact have as special cases forms very similar to the Kendallτ metric. We also show how the LB divergences subsume a number of commonly used ranking measures in information retrieval, like the NDCG [22] and AUC [35]. Unlike the traditional permutation based metrics, however, the LB divergence naturally captures a notion of “confidence ” in the orderings, thus providing a new representation to applications involving aggregating scores as opposed to just orderings. We show how a number of recently used web ranking models are forms of Lovász Bregman rank aggregation and also observe that a natural form of Mallow’s model using the LB divergence has been used as conditional ranking models for the “Learning to Rank ” problem. 1