Results 1  10
of
11
On the complexity analysis of randomized blockcoordinate descent methods
 Mathematical Programming DOI
, 2014
"... Abstract In this paper we analyze the randomized blockcoordinate descent (RBCD) methods proposed in ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
(Show Context)
Abstract In this paper we analyze the randomized blockcoordinate descent (RBCD) methods proposed in
An Accelerated Proximal Coordinate Gradient Method
, 2014
"... We develop an accelerated randomized proximal coordinate gradient (APCG) method, for solving a broad class of composite convex optimization problems. In particular, our method achieves faster linear convergence rates for minimizing strongly convex functions than existing randomized proximal coordina ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
We develop an accelerated randomized proximal coordinate gradient (APCG) method, for solving a broad class of composite convex optimization problems. In particular, our method achieves faster linear convergence rates for minimizing strongly convex functions than existing randomized proximal coordinate gradient methods. We show how to apply the APCG method to solve the dual of the regularized empirical risk minimization (ERM) problem, and devise efficient implementations that avoid fulldimensional vector operations. For illconditioned ERM problems, our method obtains improved convergence rates than the stateoftheart stochastic dual coordinate ascent (SDCA) method.
Parallel successive convex approximation for nonsmooth nonconvex optimization
, 2014
"... Consider the problem of minimizing the sum of a smooth (possibly nonconvex) and a convex (possibly nonsmooth) function involving a large number of variables. A popular approach to solve this problem is the block coordinate descent (BCD) method whereby at each iteration only one variable block is up ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Consider the problem of minimizing the sum of a smooth (possibly nonconvex) and a convex (possibly nonsmooth) function involving a large number of variables. A popular approach to solve this problem is the block coordinate descent (BCD) method whereby at each iteration only one variable block is updated while the remaining variables are held fixed. With the recent advances in the developments of the multicore parallel processing technology, it is desirable to parallelize the BCD method by allowing multiple blocks to be updated simultaneously at each iteration of the algorithm. In this work, we propose an inexact parallel BCD approach where at each iteration, a subset of the variables is updated in parallel by minimizing convex approximations of the original objective function. We investigate the convergence of this parallel BCD method for both randomized and cyclic variable selection rules. We analyze the asymptotic and nonasymptotic convergence behavior of the algorithm for both convex and nonconvex objective functions. The numerical experiments suggest that for a special case of Lasso minimization problem, the cyclic block selection rule can outperform the randomized rule.
Hybrid Random/Deterministic Parallel Algorithms for Convex and Nonconvex Big Data Optimization
"... We propose a decomposition framework for the parallel optimization of the sum of a differentiable (possibly nonconvex) function and a nonsmooth (possibly nonseparable), convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. The main contribution of ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We propose a decomposition framework for the parallel optimization of the sum of a differentiable (possibly nonconvex) function and a nonsmooth (possibly nonseparable), convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. The main contribution of this work is a novel parallel, hybrid random/deterministic decomposition scheme wherein, at each iteration, a subset of (block) variables is updated at the same time by minimizing a convex surrogate of the original nonconvex function. To tackle hugescale problems, the (block) variables to be updated are chosen according to a mixed random and deterministic procedure, which captures the advantages of both pure deterministic and random updatebased schemes. Almost sure convergence of the proposed scheme is established. Numerical results show that on hugescale problems the proposed hybrid random/deterministic algorithm compares favorably to random and deterministic schemes on both convex and nonconvex problems.
Separable Approximations and Decomposition Methods for the Augmented Lagrangian
, 2013
"... In this paper we study decomposition methods based on separable approximations for minimizing the augmented Lagrangian. In particular, we study and compare the Diagonal Quadratic Approximation Method (DQAM) of Mulvey and Ruszczyński [13] and the Parallel Coordinate Descent Method (PCDM) of Richtárik ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
In this paper we study decomposition methods based on separable approximations for minimizing the augmented Lagrangian. In particular, we study and compare the Diagonal Quadratic Approximation Method (DQAM) of Mulvey and Ruszczyński [13] and the Parallel Coordinate Descent Method (PCDM) of Richtárik and Takáč [23]. We show that the two methods are equivalent for feasibility problems up to the selection of a single stepsize parameter. Furthermore, we prove an improved complexity bound for PCDM under strong convexity, and show that this bound is at least 8(L ′ / ¯ L)(ω − 1) 2 times better than the best known bound for DQAM, where ω is the degree of partial separability and L ′ and ¯ L are the maximum and average of the block Lipschitz constants of the gradient of the quadratic penalty appearing in the augmented Lagrangian.
Coordinate Descent Algorithms
, 2014
"... Coordinate descent algorithms solve optimization problems by successively performing approximate minimization along coordinate directions or coordinate hyperplanes. They have been used in applications for many years, and their popularity continues to grow because of their usefulness in data analysi ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Coordinate descent algorithms solve optimization problems by successively performing approximate minimization along coordinate directions or coordinate hyperplanes. They have been used in applications for many years, and their popularity continues to grow because of their usefulness in data analysis, machine learning, and other areas of current interest. This paper describes the fundamentals of the coordinate descent approach, together with variants and extensions and their convergence properties, mostly with reference to convex objectives. We pay particular attention to a certain problem structure that arises frequently in machine learning applications, showing that efficient implementations of accelerated coordinate descent algorithms are possible for problems of this type. We also present some parallel variants and discuss their convergence properties under several models of parallel execution.
A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming
, 2015
"... We propose a randomized nonmonotone block proximal gradient (RNBPG) method for minimizing the sum of a smooth (possibly nonconvex) function and a blockseparable (possibly nonconvex nonsmooth) function. At each iteration, this method randomly picks a block according to any prescribed probability dis ..."
Abstract
 Add to MetaCart
(Show Context)
We propose a randomized nonmonotone block proximal gradient (RNBPG) method for minimizing the sum of a smooth (possibly nonconvex) function and a blockseparable (possibly nonconvex nonsmooth) function. At each iteration, this method randomly picks a block according to any prescribed probability distribution and solves typically several associated proximal subproblems that usually have a closedform solution, until a certain progress on objective value is achieved. In contrast to the usual randomized block coordinate descent method [23, 20], our method has a nonmonotone flavor and uses variable stepsizes that can partially utilize the local curvature information of the smooth component of objective function. We show that any accumulation point of the solution sequence of the method is a stationary point of the problem almost surely and the method is capable of finding an approximate stationary point with high probability. We also establish a sublinear rate of convergence for the method in terms of the minimal expected squared norm of certain proximal gradients over the iterations. When the problem under consideration is convex, we show that the expected objective val