Results 1 -
4 of
4
Hybrid Random/Deterministic Parallel Algorithms for Convex and Nonconvex Big Data Optimization
"... We propose a decomposition framework for the parallel optimization of the sum of a differentiable (possibly nonconvex) function and a nonsmooth (possibly nonseparable), convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. The main contribution of ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We propose a decomposition framework for the parallel optimization of the sum of a differentiable (possibly nonconvex) function and a nonsmooth (possibly nonseparable), convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. The main contribution of this work is a novel parallel, hybrid random/deterministic decomposition scheme wherein, at each iteration, a subset of (block) variables is updated at the same time by minimizing a convex surrogate of the original nonconvex function. To tackle huge-scale problems, the (block) variables to be updated are chosen according to a mixed random and deterministic procedure, which captures the advantages of both pure deterministic and random update-based schemes. Almost sure convergence of the proposed scheme is established. Numerical results show that on huge-scale problems the proposed hybrid random/deterministic algorithm compares favorably to random and deterministic schemes on both convex and nonconvex problems.
Improved Iteration Complexity Bounds of Cyclic Block Coordinate Descent for Convex Problems
"... Abstract The iteration complexity of the block-coordinate descent (BCD) type algorithm has been under extensive investigation. It was recently shown that for convex problems the classical cyclic BCGD (block coordinate gradient descent) achieves an O(1/r) complexity (r is the number of passes of all ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract The iteration complexity of the block-coordinate descent (BCD) type algorithm has been under extensive investigation. It was recently shown that for convex problems the classical cyclic BCGD (block coordinate gradient descent) achieves an O(1/r) complexity (r is the number of passes of all blocks). However, such bounds are at least linearly depend on K (the number of variable blocks), and are at least K times worse than those of the gradient descent (GD) and proximal gradient (PG) methods. In this paper, we close such theoretical performance gap between cyclic BCD and GD/PG. First we show that for a family of quadratic nonsmooth problems, the complexity bounds for cyclic Block Coordinate Proximal Gradient (BCPG), a popular variant of BCD, can match those of the GD/PG in terms of dependency on K (up to a log 2 (K) factor). Second, we establish an improved complexity bound for Coordinate Gradient Descent (CGD) for general convex problems which can match that of GD in certain scenarios. Our bounds are sharper than the known bounds as they are always at least K times worse than GD. Our analyses do not depend on the update order of block variables inside each cycle, thus our results also apply to BCD methods with random permutation (random sampling without replacement, another popular variant).
NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization
"... Abstract We study a stochastic and distributed algorithm for nonconvex problems whose objective consists of a sum of N nonconvex L i /N -smooth functions, plus a nonsmooth regularizer. The proposed NonconvEx primal-dual SpliTTing (NESTT) algorithm splits the problem into N subproblems, and utilizes ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract We study a stochastic and distributed algorithm for nonconvex problems whose objective consists of a sum of N nonconvex L i /N -smooth functions, plus a nonsmooth regularizer. The proposed NonconvEx primal-dual SpliTTing (NESTT) algorithm splits the problem into N subproblems, and utilizes an augmented Lagrangian based primal-dual scheme to solve it in a distributed and stochastic manner. With a special non-uniform sampling, a version of NESTT achieves -stationary solution using O(( L i /N ) 2 / ) gradient evaluations, which can be up to O(N ) times better than the (proximal) gradient descent methods. It also achieves Q-linear convergence rate for nonconvex 1 penalized quadratic problems with polyhedral constraints. Further, we reveal a fundamental connection between primal-dual based methods and a few primal only methods such as IAG/SAG/SAGA.