Results 1 -
4 of
4
PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent
- Machine Learning
, 2011
"... Stochastic Dual Coordinate Descent (DCD) is one of the most efficient ways to solve the fam-ily of `2-regularized empirical risk minimiza-tion problems, including linear SVM, logistic regression, and many others. The vanilla im-plementation of DCD is quite slow; however, by maintaining primal variab ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Stochastic Dual Coordinate Descent (DCD) is one of the most efficient ways to solve the fam-ily of `2-regularized empirical risk minimiza-tion problems, including linear SVM, logistic regression, and many others. The vanilla im-plementation of DCD is quite slow; however, by maintaining primal variables while updating dual variables, the time complexity of DCD can be significantly reduced. Such a strategy forms the core algorithm in the widely-used LIBLIN-EAR package. In this paper, we parallelize the DCD algorithms in LIBLINEAR. In recent re-search, several synchronized parallel DCD algo-rithms have been proposed, however, they fail to achieve good speedup in the shared mem-ory multi-core setting. In this paper, we pro-pose a family of parallel asynchronous stochas-tic dual coordinate descent algorithms (PASS-CoDe). Each thread repeatedly selects a ran-dom dual variable and conducts coordinate up-dates using the primal variables that are stored in the shared memory. We analyze the convergence properties of DCD when different locking/atomic mechanisms are applied. For implementation with atomic operations, we show linear conver-gence under mild conditions. For implementa-tion without any atomic operations or locking, we present a novel error analysis for PASSCoDe under the multi-core environment, showing that the converged solution is the exact solution for a primal problem with a perturbed regularizer. Experimental results show that our methods are much faster than previous parallel coordinate de-scent solvers. 1.
Adding vs. averaging in distributed primal-dual optimization
, 2015
"... Abstract Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and accurately aggregating partial work from different machines. In this paper, we present a novel generalization of t ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Abstract Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and accurately aggregating partial work from different machines. In this paper, we present a novel generalization of the recent communication-efficient primal-dual framework (COCOA) for distributed optimization. Our framework, COCOA + , allows for additive combination of local updates to the global parameters at each iteration, whereas previous schemes with convergence guarantees only allow conservative averaging. We give stronger (primal-dual) convergence rate guarantees for both COCOA as well as our new variants, and generalize the theory for both methods to cover non-smooth convex loss functions. We provide an extensive experimental comparison that shows the markedly improved performance of COCOA + on several real-world distributed datasets, especially when scaling up the number of machines.
Distributed box-constrained quadratic optimization for dual linear svm
- in ICML
, 2015
"... Training machine learning models sometimes needs to be done on large amounts of data that exceed the capacity of a single machine, motivat-ing recent works on developing algorithms that train in a distributed fashion. This paper pro-poses an efficient box-constrained quadratic opti-mization algorith ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Training machine learning models sometimes needs to be done on large amounts of data that exceed the capacity of a single machine, motivat-ing recent works on developing algorithms that train in a distributed fashion. This paper pro-poses an efficient box-constrained quadratic opti-mization algorithm for distributedly training lin-ear support vector machines (SVMs) with large data. Our key technical contribution is an ana-lytical solution to the problem of computing the optimal step size at each iteration, using an ef-ficient method that requires only O(1) commu-nication cost to ensure fast convergence. With this optimal step size, our approach is superior to other methods by possessing global linear con-vergence, or, equivalently, O(log(1/)) iteration complexity for an -accurate solution, for dis-tributedly solving the non-strongly-convex linear SVM dual problem. Experiments also show that our method is significantly faster than state-of-the-art distributed linear SVM algorithms includ-ing DSVM-AVE, DisDCA and TRON. 1.
A Distributed Algorithm for Training Nonlinear Kernel Machines
"... Abstract. This paper concerns the distributed training of nonlinear ker-nel machines on Map-Reduce. We show that a re-formulation of Nyström approximation based solution which is solved using gradient based tech-niques is well suited for this, especially when it is necessary to work with a large nu ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. This paper concerns the distributed training of nonlinear ker-nel machines on Map-Reduce. We show that a re-formulation of Nyström approximation based solution which is solved using gradient based tech-niques is well suited for this, especially when it is necessary to work with a large number of basis points. The main advantages of this approach are: avoidance of computing the pseudo-inverse of the kernel sub-matrix cor-responding to the basis points; simplicity and efficiency of the distributed part of the computations; and, friendliness to stage-wise addition of basis points. We implement the method using an AllReduce tree on Hadoop and demonstrate its value on a few large benchmark datasets. 1