Results 1 -
7 of
7
Coordinate descent with arbitrary sampling I: Algorithms and complexity
, 2014
"... The design and complexity analysis of randomized coordinate descent methods, and in par-ticular of variants which update a random subset (sampling) of coordinates in each iteration, depends on the notion of expected separable overapproximation (ESO). This refers to an in-equality involving the objec ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
The design and complexity analysis of randomized coordinate descent methods, and in par-ticular of variants which update a random subset (sampling) of coordinates in each iteration, depends on the notion of expected separable overapproximation (ESO). This refers to an in-equality involving the objective function and the sampling, capturing in a compact way certain smoothness properties of the function in a random subspace spanned by the sampled coordi-nates. ESO inequalities were previously established for special classes of samplings only, almost invariably for uniform samplings. In this paper we develop a systematic technique for deriving these inequalities for a large class of functions and for arbitrary samplings. We demonstrate that one can recover existing ESO results using our general approach, which is based on the study of eigenvalues associated with samplings and the data describing the function. 1
Adding vs. averaging in distributed primal-dual optimization
, 2015
"... Abstract Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and accurately aggregating partial work from different machines. In this paper, we present a novel generalization of t ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and accurately aggregating partial work from different machines. In this paper, we present a novel generalization of the recent communication-efficient primal-dual framework (COCOA) for distributed optimization. Our framework, COCOA + , allows for additive combination of local updates to the global parameters at each iteration, whereas previous schemes with convergence guarantees only allow conservative averaging. We give stronger (primal-dual) convergence rate guarantees for both COCOA as well as our new variants, and generalize the theory for both methods to cover non-smooth convex loss functions. We provide an extensive experimental comparison that shows the markedly improved performance of COCOA + on several real-world distributed datasets, especially when scaling up the number of machines.
Stochastic dual coordinate ascent with adaptive probabilities. ICML 2015. [2] Shai Shalev-Shwartz and Tong Zhang. Stochastic dual coordinate ascent methods for regularized loss
"... This paper introduces AdaSDCA: an adap-tive variant of stochastic dual coordinate as-cent (SDCA) for solving the regularized empir-ical risk minimization problems. Our modifica-tion consists in allowing the method adaptively change the probability distribution over the dual variables throughout the ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
This paper introduces AdaSDCA: an adap-tive variant of stochastic dual coordinate as-cent (SDCA) for solving the regularized empir-ical risk minimization problems. Our modifica-tion consists in allowing the method adaptively change the probability distribution over the dual variables throughout the iterative process. AdaS-DCA achieves provably better complexity bound than SDCA with the best fixed probability dis-tribution, known as importance sampling. How-ever, it is of a theoretical character as it is expen-sive to implement. We also propose AdaSDCA+: a practical variant which in our experiments out-performs existing non-adaptive methods. 1.
Primal Method for ERM with Flexible Mini-batching Schemes and Non-convex Losses∗
, 2015
"... In this work we develop a new algorithm for regularized empirical risk minimization. Our method extends recent techniques of Shalev-Shwartz [02/2015], which enable a dual-free analysis of SDCA, to arbitrary mini-batching schemes. Moreover, our method is able to better utilize the information in the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
In this work we develop a new algorithm for regularized empirical risk minimization. Our method extends recent techniques of Shalev-Shwartz [02/2015], which enable a dual-free analysis of SDCA, to arbitrary mini-batching schemes. Moreover, our method is able to better utilize the information in the data defining the ERM problem. For convex loss functions, our complexity results match those of QUARTZ, which is a primal-dual method also allowing for arbitrary mini-batching schemes. The advantage of a dual-free analysis comes from the fact that it guarantees convergence even for non-convex loss functions, as long as the average loss is convex. We illustrate through experiments the utility of being able to design arbitrary mini-batching schemes. 1
Primal-Dual Rates and Certificates ETH Zürich, Switzerland
"... Abstract We propose an algorithm-independent framework to equip existing optimization methods with primal-dual certificates. Such certificates and corresponding rate of convergence guarantees are important for practitioners to diagnose progress, in particular in machine learning applications. We ob ..."
Abstract
- Add to MetaCart
Abstract We propose an algorithm-independent framework to equip existing optimization methods with primal-dual certificates. Such certificates and corresponding rate of convergence guarantees are important for practitioners to diagnose progress, in particular in machine learning applications. We obtain new primal-dual convergence rates, e.g., for the Lasso as well as many L 1 , Elastic Net, group Lasso and TV-regularized problems. The theory applies to any norm-regularized generalized linear model. Our approach provides efficiently computable duality gaps which are globally defined, without modifying the original problems in the region of interest.
Stochastic DualNewtonAscent forEmpiricalRiskMinimization
"... We study the problem of minimizing the average of a large number of 1/γ-smooth convex functions penal-ized with a 1-strongly convex regularizer. min ..."
Abstract
- Add to MetaCart
We study the problem of minimizing the average of a large number of 1/γ-smooth convex functions penal-ized with a 1-strongly convex regularizer. min