Results 1  10
of
14
Revisiting frankwolfe: Projectionfree sparse convex optimization
 In ICML
, 2013
"... We provide stronger and more general primaldual convergence results for FrankWolfetype algorithms (a.k.a. conditional gradient) for constrained convex optimization, enabled by a simple framework of duality gap certificates. Our analysis also holds if the linear subproblems are only solved approxi ..."
Abstract

Cited by 76 (2 self)
 Add to MetaCart
(Show Context)
We provide stronger and more general primaldual convergence results for FrankWolfetype algorithms (a.k.a. conditional gradient) for constrained convex optimization, enabled by a simple framework of duality gap certificates. Our analysis also holds if the linear subproblems are only solved approximately (as well as if the gradients are inexact), and is proven to be worstcase optimal in the sparsity of the obtained solutions. On the application side, this allows us to unify a large variety of existing sparse greedy methods, in particular for optimization over convex hulls of an atomic set, even if those sets can only be approximated, including sparse (or structured sparse) vectors or matrices, lowrank matrices, permutation matrices, or maxnorm bounded matrices. We present a new general framework for convex optimization over matrix factorizations, where every FrankWolfe iteration will consist of a lowrank update, and discuss the broad application areas of this approach. 1.
Duality between subgradient and conditional gradient methods
 hal00861118, version 1  12 Sep 2013
, 2013
"... ..."
A linearly convergent conditional gradient algorithm with applications to online and stochastic optimization
, 1301
"... Abstract. Linear optimization is many times algorithmically simpler than nonlinear convex optimization. Linear optimization over matroid polytopes, matching polytopes and path polytopes are example of problems for which we have simple and efficient combinatorial algorithms, but whose nonlinear con ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Linear optimization is many times algorithmically simpler than nonlinear convex optimization. Linear optimization over matroid polytopes, matching polytopes and path polytopes are example of problems for which we have simple and efficient combinatorial algorithms, but whose nonlinear convex counterpart is harder and admits significantly less efficient algorithms. This motivates the computational model of convex optimization, including the offline, online and stochastic settings, using a linear optimization oracle. In this computational model we give several new results that improve over the previous stateoftheart. Our main result is a novel conditional gradient algorithm for smooth and strongly convex optimization over polyhedral sets that performs only a single linear optimization step over the domain on each iteration and enjoys a linear convergence rate. This gives an exponential improvement in convergence rate over previous results. Based on this new conditional gradient algorithm we give the first algorithms for online convex optimization over polyhedral sets that perform only a single linear optimization step over the domain while having optimal regret guarantees, answering an open question of Kalai and Vempala, and Hazan and Kale. Our online algorithms also imply conditional gradient algorithms for nonsmooth and stochastic convex optimization with the same convergence rates as projected (sub)gradient methods. Key words. frankwolfe algorithm; conditional gradient methods; linear programming; firstorder methods; online convex optimization; online learning; stochastic optimization AMS subject classifications. 65K05; 90C05; 90C06; 90C25; 90C30; 90C27; 90C15
OptimallyWeighted Herding is Bayesian Quadrature
"... Herding and kernel herding are deterministic methods of choosing samples which summarise a probability distribution. A related task is choosing samples for estimating integrals using Bayesian quadrature. We show that the criterion minimised when selecting samples in kernel herding is equivalent to t ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Herding and kernel herding are deterministic methods of choosing samples which summarise a probability distribution. A related task is choosing samples for estimating integrals using Bayesian quadrature. We show that the criterion minimised when selecting samples in kernel herding is equivalent to the posterior variance in Bayesian quadrature. We then show that sequential Bayesian quadrature can be viewed as a weighted version of kernel herding which achieves performance superior to any other weighted herding method. We demonstrate empirically a rate of convergence faster than O(1/N). Our results also imply an upper bound on the empirical error of the Bayesian quadrature estimate. 1
A Greedy Framework for FirstOrder Optimization
"... Introduction. Recent work has shown many connections between conditional gradient and other firstorder optimization methods, such as herding [3] and subgradient descent [2]. By considering a type of proximal conditional method, which we call boosted mirror descent (BMD), we are able to unify all of ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Introduction. Recent work has shown many connections between conditional gradient and other firstorder optimization methods, such as herding [3] and subgradient descent [2]. By considering a type of proximal conditional method, which we call boosted mirror descent (BMD), we are able to unify all of these algorithms into a single framework, which can be interpreted as taking successive argmins of a sequence of surrogate functions. Using a standard online learning analysis based on
• Assumptions – f: Rn → R Lipschitzcontinuous ⇒ f ∗ has compact support C
, 2013
"... Wolfe’s universal algorithm www.di.ens.fr/~fbach/wolfe_anonymous.pdf Conditional gradients everywhere • Conditional gradient and subgradient method – Fenchel duality – Generalized conditional gradient and mirror descent • Conditional gradient and greedy algorithms – Relationship with basis pursuit, ..."
Abstract
 Add to MetaCart
(Show Context)
Wolfe’s universal algorithm www.di.ens.fr/~fbach/wolfe_anonymous.pdf Conditional gradients everywhere • Conditional gradient and subgradient method – Fenchel duality – Generalized conditional gradient and mirror descent • Conditional gradient and greedy algorithms – Relationship with basis pursuit, matching pursuit • Conditional gradient and herding – Properties of conditional gradient iterates – Relationships with sampling Composite optimization problems min x∈Rp h(x) + f(Ax)
Predicting the Future Behavior of a TimeVarying Probability Distribution
"... We study the problem of predicting the future, though only in the probabilistic sense of estimating a future state of a timevarying probability distribution. This is not only an interesting academic problem, but solving this extrapolation problem also has many practical application, e.g. for train ..."
Abstract
 Add to MetaCart
(Show Context)
We study the problem of predicting the future, though only in the probabilistic sense of estimating a future state of a timevarying probability distribution. This is not only an interesting academic problem, but solving this extrapolation problem also has many practical application, e.g. for training classifiers that have to operate under timevarying conditions. Our main contribution is a method for predicting the next step of the timevarying distribution from a given sequence of sample sets from earlier time steps. For this we rely on two recent machine learning techniques: embedding probability distributions into a reproducing kernel Hilbert space, and learning operators by vectorvalued regression. We illustrate the working principles and the practical usefulness of our method by experiments on synthetic and real data. We also highlight an exemplary application: training a classifier in a domain adaptation setting without having access to examples from the test time distribution at training time. 1.
MEASURING SAMPLE QUALITY WITH STEIN’S METHOD
, 2015
"... To improve the efficiency of Monte Carlo estimation, practitioners are turning to biased Markov chain Monte Carlo procedures that trade off asymptotic exactness for computational speed. The reasoning is sound: a reduction in variance due to more rapid sampling can outweigh the bias introduced. Howev ..."
Abstract
 Add to MetaCart
(Show Context)
To improve the efficiency of Monte Carlo estimation, practitioners are turning to biased Markov chain Monte Carlo procedures that trade off asymptotic exactness for computational speed. The reasoning is sound: a reduction in variance due to more rapid sampling can outweigh the bias introduced. However, the inexactness creates new challenges for sampler and parameter selection, since standard measures of sample quality like effective sample size do not account for asymptotic bias. To address these challenges, we introduce a new computable quality measure based on Stein’s method that bounds the discrepancy between sample and target expectations over a large class of test functions. We use our tool to compare exact, biased, and deterministic sample sequences and illustrate applications to hyperparameter selection, convergence rate assessment, and quantifying biasvariance tradeoffs in posterior inference. 1
cal Methods
"... 5.2. SiGMa Simple Greedy Matching: a tool for aligning large knowledgebases 3 ..."
Abstract
 Add to MetaCart
(Show Context)
5.2. SiGMa Simple Greedy Matching: a tool for aligning large knowledgebases 3