• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Coordinate descent with arbitrary sampling i: Algorithms and complexity. arXiv preprint arXiv:1412.8060, (2014)

by Zheng Qu, Peter Richtarik
Add To MetaCart

Tools

Sorted by:
Results 1 - 5 of 5

Adding vs. averaging in distributed primal-dual optimization

by Chenxin Ma , Virginia Smith , Vsmith@berkeley Edu , Martin Jaggi , Jaggi@inf Ethz Ch , Eth Zürich , Switzerland Michael , I Jordan , Jordan@cs Berkeley Edu , Peter Richtárik , Peter Ac Richtarik@ed , Uk , Martin Takáč , 2015
"... Abstract Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and accurately aggregating partial work from different machines. In this paper, we present a novel generalization of t ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
Abstract Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and accurately aggregating partial work from different machines. In this paper, we present a novel generalization of the recent communication-efficient primal-dual framework (COCOA) for distributed optimization. Our framework, COCOA + , allows for additive combination of local updates to the global parameters at each iteration, whereas previous schemes with convergence guarantees only allow conservative averaging. We give stronger (primal-dual) convergence rate guarantees for both COCOA as well as our new variants, and generalize the theory for both methods to cover non-smooth convex loss functions. We provide an extensive experimental comparison that shows the markedly improved performance of COCOA + on several real-world distributed datasets, especially when scaling up the number of machines.

Stochastic dual coordinate ascent with adaptive probabilities. ICML 2015. [2] Shai Shalev-Shwartz and Tong Zhang. Stochastic dual coordinate ascent methods for regularized loss

by Dominik Csiba, Zheng Qu
"... This paper introduces AdaSDCA: an adap-tive variant of stochastic dual coordinate as-cent (SDCA) for solving the regularized empir-ical risk minimization problems. Our modifica-tion consists in allowing the method adaptively change the probability distribution over the dual variables throughout the ..."
Abstract - Cited by 3 (2 self) - Add to MetaCart
This paper introduces AdaSDCA: an adap-tive variant of stochastic dual coordinate as-cent (SDCA) for solving the regularized empir-ical risk minimization problems. Our modifica-tion consists in allowing the method adaptively change the probability distribution over the dual variables throughout the iterative process. AdaS-DCA achieves provably better complexity bound than SDCA with the best fixed probability dis-tribution, known as importance sampling. How-ever, it is of a theoretical character as it is expen-sive to implement. We also propose AdaSDCA+: a practical variant which in our experiments out-performs existing non-adaptive methods. 1.

Primal Method for ERM with Flexible Mini-batching Schemes and Non-convex Losses∗

by Dominik Csiba , 2015
"... In this work we develop a new algorithm for regularized empirical risk minimization. Our method extends recent techniques of Shalev-Shwartz [02/2015], which enable a dual-free analysis of SDCA, to arbitrary mini-batching schemes. Moreover, our method is able to better utilize the information in the ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
In this work we develop a new algorithm for regularized empirical risk minimization. Our method extends recent techniques of Shalev-Shwartz [02/2015], which enable a dual-free analysis of SDCA, to arbitrary mini-batching schemes. Moreover, our method is able to better utilize the information in the data defining the ERM problem. For convex loss functions, our complexity results match those of QUARTZ, which is a primal-dual method also allowing for arbitrary mini-batching schemes. The advantage of a dual-free analysis comes from the fact that it guarantees convergence even for non-convex loss functions, as long as the average loss is convex. We illustrate through experiments the utility of being able to design arbitrary mini-batching schemes. 1
(Show Context)

Citation Context

.... We assume the knowledge of parameters v1, . . . , vn > 0 for which E ∥∥∥∥∥∥ ∑ i∈Ŝ Aihi ∥∥∥∥∥∥ 2 ≤ n∑ i=1 pivi‖hi‖2. (6) Tight and easily computable formulas for such parameters can be found in =-=[19]-=-. For instance, whenever Prob(|Ŝ| ≤ τ) = 1, inequality (6) holds with vi = τ‖Ai‖2. To simplify the exposure, we will write B(t) def = ‖w(t) − w∗‖2, C(t)i def= ‖α(t)i − α∗i ‖2, i = 1, 2, . . . , n. (7...

Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms

by Yu-Xiang Wang , Veeranjaneyulu Sadhanala , Wei Dai , Willie Neiswanger , Suvrit Sra , Eric P Xing
"... Abstract We study parallel and distributed Frank-Wolfe algorithms; the former on shared memory machines with mini-batching, and the latter in a delayed update framework. In both cases, we perform computations asynchronously whenever possible. We assume block-separable constraints as in Block-Coordi ..."
Abstract - Add to MetaCart
Abstract We study parallel and distributed Frank-Wolfe algorithms; the former on shared memory machines with mini-batching, and the latter in a delayed update framework. In both cases, we perform computations asynchronously whenever possible. We assume block-separable constraints as in Block-Coordinate Frank-Wolfe (BCFW) method

An Asynchronous Distributed Proximal Gradient Method for Composite Convex Optimization

by N S Aybat , Z Wang , G Iyengar
"... Abstract We propose a distributed first-order augmented Lagrangian (DFAL) algorithm to minimize the sum of composite convex functions, where each term in the sum is a private cost function belonging to a node, and only nodes connected by an edge can directly communicate with each other. This optimi ..."
Abstract - Add to MetaCart
Abstract We propose a distributed first-order augmented Lagrangian (DFAL) algorithm to minimize the sum of composite convex functions, where each term in the sum is a private cost function belonging to a node, and only nodes connected by an edge can directly communicate with each other. This optimization model abstracts a number of applications in distributed sensing and machine learning. We show that any limit point of DFAL iterates is optimal; and for any ǫ > 0, an ǫ-optimal and ǫ-feasible solution can be computed within O(log(ǫ −1 )) DFAL iterations, which require O( ψ 1.5 max dmin ǫ −1 ) proximal gradient computations and communications per node in total, where ψ max denotes the largest eigenvalue of the graph Laplacian, and d min is the minimum degree of the graph. We also propose an asynchronous version of DFAL by incorporating randomized block coordinate descent methods; and demonstrate the efficiency of DFAL on large scale sparse-group LASSO problems.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University