Results

**1 - 2**of**2**### Semi-Proximal Mirror Prox Background Key Components Semi-MP Algorithm Experiments

"... First-order methods for composite minimization minx∈X f (x) + h(x) f and h are convex, f is smooth, h is simple. (Acc)Proximal gradient methods (when h proximal-friendly) Proximal operator: proxh(η) = argminx∈X { 12‖x − η‖22 + h(x)} For example, when h(x) = ‖x‖1, reduces to soft thresholding. Wors ..."

Abstract
- Add to MetaCart

First-order methods for composite minimization minx∈X f (x) + h(x) f and h are convex, f is smooth, h is simple. (Acc)Proximal gradient methods (when h proximal-friendly) Proximal operator: proxh(η) = argminx∈X { 12‖x − η‖22 + h(x)} For example, when h(x) = ‖x‖1, reduces to soft thresholding. Worst complexity bound for first-order oracles is O(1/√). Conditional gradient methods (when h is LMO-friendly) (Composite) linear minimization oracles(LMO): LMOh(η) = argminx∈X{〈η, x〉+ h(x)} For example, when h(x) = ‖x‖nuc or δ‖x‖nuc≤1(x), reduces to computing top pair of singular vectors. Worst (also optimal) complexity bound for LMOs is O(1/).

### Variance-Reduced and Projection-Free Stochastic Optimization

"... Abstract The Frank-Wolfe optimization algorithm has recently regained popularity for machine learning applications due to its projection-free property and its ability to handle structured constraints. However, in the stochastic learning setting, it is still relatively understudied compared to the g ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract The Frank-Wolfe optimization algorithm has recently regained popularity for machine learning applications due to its projection-free property and its ability to handle structured constraints. However, in the stochastic learning setting, it is still relatively understudied compared to the gradient descent counterpart. In this work, leveraging a recent variance reduction technique, we propose two stochastic Frank-Wolfe variants which substantially improve previous results in terms of the number of stochastic gradient evaluations needed to achieve 1 − accuracy. For example, we improve from O( 1 ) to O(ln 1 ) if the objective function is smooth and strongly convex, and from O(