Results 1  10
of
22
Accelerated, parallel and proximal coordinate descent
, 2014
"... We propose a new stochastic coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel and PROXimal; this is the first time such a method is proposed. In the special cas ..."
Abstract

Cited by 31 (6 self)
 Add to MetaCart
We propose a new stochastic coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel and PROXimal; this is the first time such a method is proposed. In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate 2ω̄L̄R2/(k+1)2, where k is the iteration counter, ω ̄ is an average degree of separability of the loss function, L ̄ is the average of Lipschitz constants associated with the coordinates and individual functions in the sum, and R is the distance of the initial point from the minimizer. We show that the method can be implemented without the need to perform fulldimensional vector operations, which is the major bottleneck of accelerated coordinate descent. The fact that the method depends on the average degree of separability, and not on the maximum degree of separability, can be attributed to the use of new safe large stepsizes, leading to improved expected separable overapproximation (ESO). These are of independent interest and can be utilized in all existing parallel stochastic coordinate descent algorithms based on the concept of ESO.
An almostlineartime algorithm for approximate max flow in undirected graphs, and its multicommodity generalizations
"... In this paper we present an almost linear time algorithm for solving approximate maximum flow in undirected graphs. In particular, given a graph with m edges we show how to produce a 1−ε approximate maximum flow in time O(m 1+o(1) · ε −2). Furthermore, we present this algorithm as part of a general ..."
Abstract

Cited by 14 (7 self)
 Add to MetaCart
In this paper we present an almost linear time algorithm for solving approximate maximum flow in undirected graphs. In particular, given a graph with m edges we show how to produce a 1−ε approximate maximum flow in time O(m 1+o(1) · ε −2). Furthermore, we present this algorithm as part of a general framework that also allows us to achieve a running time of O(m 1+o(1) ε −2 k 2) for the maximum concurrent kcommodity flow problem, the first such algorithm with an almost linear dependence on m. We also note that independently Jonah Sherman has produced an almost linear time algorithm for maximum flow and we thank him for coordinating submissions.
An Accelerated Proximal Coordinate Gradient Method
, 2014
"... We develop an accelerated randomized proximal coordinate gradient (APCG) method, for solving a broad class of composite convex optimization problems. In particular, our method achieves faster linear convergence rates for minimizing strongly convex functions than existing randomized proximal coordina ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
We develop an accelerated randomized proximal coordinate gradient (APCG) method, for solving a broad class of composite convex optimization problems. In particular, our method achieves faster linear convergence rates for minimizing strongly convex functions than existing randomized proximal coordinate gradient methods. We show how to apply the APCG method to solve the dual of the regularized empirical risk minimization (ERM) problem, and devise efficient implementations that avoid fulldimensional vector operations. For illconditioned ERM problems, our method obtains improved convergence rates than the stateoftheart stochastic dual coordinate ascent (SDCA) method.
Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS
, 2014
"... ..."
Distributed Block Coordinate Descent for Minimizing Partially Separable Functions
"... In this work we propose a distributed randomized block coordinate descent method for minimizing a convex function with a huge number of variables/coordinates. We analyze its complexity under the assumption that the smooth part of the objective function is partially block separable, and show that th ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
In this work we propose a distributed randomized block coordinate descent method for minimizing a convex function with a huge number of variables/coordinates. We analyze its complexity under the assumption that the smooth part of the objective function is partially block separable, and show that the degree of separability directly influences the complexity. This extends the results in [22] to a distributed environment. We first show that partially block separable functions admit an expected separable overapproximation (ESO) with respect to a distributed sampling, compute the ESO parameters, and then specialize complexity results from recent literature that hold under the generic ESO assumption. We describe several approaches to distribution and synchronization of the computation across a cluster of multicore computer and provide promising computational results.
Unregularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization. arXiv preprint arXiv:1506.07512,
, 2015
"... Abstract We develop a family of accelerated stochastic algorithms that optimize sums of convex functions. Our algorithms improve upon the fastest running time for empirical risk minimization (ERM), and in particular linear leastsquares regression, across a wide range of problem settings. To achiev ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract We develop a family of accelerated stochastic algorithms that optimize sums of convex functions. Our algorithms improve upon the fastest running time for empirical risk minimization (ERM), and in particular linear leastsquares regression, across a wide range of problem settings. To achieve this, we establish a framework, based on the classical proximal point algorithm, useful for accelerating recent fast stochastic algorithms in a blackbox fashion. Empirically, we demonstrate that the resulting algorithms exhibit notions of stability that are advantageous in practice. Both in theory and in practice, the provided algorithms reap the computational benefits of adding a large strongly convex regularization term, without incurring a corresponding bias to the original ERM problem.
Matching the universal barrier without paying the costs : Solving linear programs with Õ( √ rank) linear system solves
 CoRR
"... In this paper we present a new algorithm for solving linear programs that requires only Õ( rank(A)L) iterations where A is the constraint matrix of a linear program with m constraints and n variables and L is the bit complexity of a linear program. Each iteration of our method consists of solving ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
In this paper we present a new algorithm for solving linear programs that requires only Õ( rank(A)L) iterations where A is the constraint matrix of a linear program with m constraints and n variables and L is the bit complexity of a linear program. Each iteration of our method consists of solving Õ(1) linear systems and additional nearly linear time computation. Our method improves upon the previous best iteration bound by factor of Ω̃((m / rank(A))1/4) for methods with polynomial time computable iterations and by Ω̃((m / rank(A))1/2) for methods which solve at most Õ(1) linear systems in each iteration. Our method is parallelizable and amenable to linear algebraic techniques for accelerating the linear system solver. As such, up to polylogarithmic factors we either match or improve upon the best previous running times for solving linear programs in both depth and work for different ratios of m and rank(A). Moreover, our method matches up to polylogarithmic factors a theoretical limit established by Nesterov and Nemirovski in 1994 regarding the use of a “universal barrier ” for interior point methods, thereby resolving a longstanding open question regarding the running time of polynomial time interior point methods for linear programming. 1
Parallel successive convex approximation for nonsmooth nonconvex optimization
, 2014
"... Consider the problem of minimizing the sum of a smooth (possibly nonconvex) and a convex (possibly nonsmooth) function involving a large number of variables. A popular approach to solve this problem is the block coordinate descent (BCD) method whereby at each iteration only one variable block is up ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Consider the problem of minimizing the sum of a smooth (possibly nonconvex) and a convex (possibly nonsmooth) function involving a large number of variables. A popular approach to solve this problem is the block coordinate descent (BCD) method whereby at each iteration only one variable block is updated while the remaining variables are held fixed. With the recent advances in the developments of the multicore parallel processing technology, it is desirable to parallelize the BCD method by allowing multiple blocks to be updated simultaneously at each iteration of the algorithm. In this work, we propose an inexact parallel BCD approach where at each iteration, a subset of the variables is updated in parallel by minimizing convex approximations of the original objective function. We investigate the convergence of this parallel BCD method for both randomized and cyclic variable selection rules. We analyze the asymptotic and nonasymptotic convergence behavior of the algorithm for both convex and nonconvex objective functions. The numerical experiments suggest that for a special case of Lasso minimization problem, the cyclic block selection rule can outperform the randomized rule.