Results 1 
6 of
6
Solving ridge regression using sketched preconditioned svrg.
 In ICML,
, 2016
"... Abstract We develop a novel preconditioning method for ridge regression, based on recent linear sketching methods. By equipping Stochastic Variance Reduced Gradient (SVRG) with this preconditioning process, we obtain a significant speedup relative to fast stochastic methods such as SVRG, SDCA and ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract We develop a novel preconditioning method for ridge regression, based on recent linear sketching methods. By equipping Stochastic Variance Reduced Gradient (SVRG) with this preconditioning process, we obtain a significant speedup relative to fast stochastic methods such as SVRG, SDCA and SAG.
Tight Complexity Bounds for Optimizing Composite Objectives
"... Abstract We provide tight upper and lower bounds on the complexity of minimizing the average of m convex functions using gradient and prox oracles of the component functions. We show a significant gap between the complexity of deterministic vs randomized optimization. For smooth functions, we show ..."
Abstract
 Add to MetaCart
Abstract We provide tight upper and lower bounds on the complexity of minimizing the average of m convex functions using gradient and prox oracles of the component functions. We show a significant gap between the complexity of deterministic vs randomized optimization. For smooth functions, we show that accelerated gradient descent (AGD) and an accelerated variant of SVRG are optimal in the deterministic and randomized settings respectively, and that a gradient oracle is sufficient for the optimal rate. For nonsmooth functions, having access to prox oracles reduces the complexity and we present optimal methods based on smoothing that improve over methods using just gradient accesses.
Improved SVRG for NonStronglyConvex or SumofNonConvex Objectives
"... Abstract Many classical algorithms are found until several years later to outlive the confines in which they were conceived, and continue to be relevant in unforeseen settings. In this paper, we show that SVRG is one such method: being originally designed for strongly convex objectives, it is also ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Many classical algorithms are found until several years later to outlive the confines in which they were conceived, and continue to be relevant in unforeseen settings. In this paper, we show that SVRG is one such method: being originally designed for strongly convex objectives, it is also very robust in nonstrongly convex or sumofnonconvex settings. More precisely, we provide new analysis to improve the stateoftheart running times in both settings by either applying SVRG or its novel variant. Since nonstrongly convex objectives include important examples such as Lasso or logistic regression, and sumofnonconvex objectives include famous examples such as stochastic PCA and is even believed to be related to training deep neural nets, our results also imply better performances in these applications.
On the Iteration Complexity of Oblivious FirstOrder Optimization Algorithms
"... Abstract We consider a broad class of firstorder optimization algorithms which are oblivious, in the sense that their step sizes are scheduled regardless of the function under consideration, except for limited sideinformation such as smoothness or strong convexity parameters. With the knowledge o ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We consider a broad class of firstorder optimization algorithms which are oblivious, in the sense that their step sizes are scheduled regardless of the function under consideration, except for limited sideinformation such as smoothness or strong convexity parameters. With the knowledge of these two parameters, we show that any such algorithm attains an iteration complexity lower bound of Ω( L/ ) for Lsmooth convex functions, andΩ( L/µ ln(1/ )) for Lsmooth µstrongly convex functions. These lower bounds are stronger than those in the traditional oracle model, as they hold independently of the dimension. To attain these, we abandon the oracle model in favor of a structurebased approach which builds upon a framework recently proposed in
Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters *
"... Abstract The amount of data available in the world is growing faster than our ability to deal with it. However, if we take advantage of the internal structure, data may become much smaller for machine learning purposes. In this paper we focus on one of the fundamental machine learning tasks, empiri ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract The amount of data available in the world is growing faster than our ability to deal with it. However, if we take advantage of the internal structure, data may become much smaller for machine learning purposes. In this paper we focus on one of the fundamental machine learning tasks, empirical risk minimization (ERM), and provide faster algorithms with the help from the clustering structure of the data. We introduce a simple notion of raw clustering that can be efficiently computed from the data, and propose two algorithms based on clustering information. Our accelerated algorithm ClusterACDM is built on a novel Haar transformation applied to the dual space of the ERM problem, and our variancereduction based algorithm ClusterSVRG introduces a new gradient estimator using clustering. Our algorithms outperform their classical counterparts ACDM and SVRG respectively.
Regularized Nonlinear Acceleration
"... Abstract We describe a convergence acceleration technique for generic optimization problems. Our scheme computes estimates of the optimum from a nonlinear average of the iterates produced by any optimization method. The weights in this average are computed via a simple and small linear system, whos ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We describe a convergence acceleration technique for generic optimization problems. Our scheme computes estimates of the optimum from a nonlinear average of the iterates produced by any optimization method. The weights in this average are computed via a simple and small linear system, whose solution can be updated online. This acceleration scheme runs in parallel to the base algorithm, providing improved estimates of the solution on the fly, while the original optimization method is running. Numerical experiments are detailed on classical classification problems.