Results 1  10
of
14
Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems
"... In this paper we show how to accelerate randomized coordinate descent methods and achieve faster convergence rates without paying periteration costs in asymptotic running time. In particular, we show how to generalize and efficiently implement a method proposed by Nesterov, giving faster asymptotic ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
(Show Context)
In this paper we show how to accelerate randomized coordinate descent methods and achieve faster convergence rates without paying periteration costs in asymptotic running time. In particular, we show how to generalize and efficiently implement a method proposed by Nesterov, giving faster asymptotic running times for various algorithms that use standard coordinate descent as a black box. In addition to providing a proof of convergence for this new general method, we show that it is numerically stable, efficiently implementable, and in certain regimes, asymptotically optimal. To highlight the computational power of this algorithm, we show how it can used to create faster linear system solvers in several regimes: • We show how this method achieves a faster asymptotic runtime than conjugate gradient for solving a broad class of symmetric positive definite systems of equations. • We improve the best known asymptotic convergence guarantees for Kaczmarz methods, a popular technique for image reconstruction and solving overdetermined systems of equations, by accelerating a randomized algorithm of Strohmer and Vershynin. • We achieve the best known running time for solving Symmetric Diagonally Dominant (SDD) system of equations in the unitcost RAM model, obtaining an O(m log3/2 n log logn log ( logn)) asymptotic running time by accelerating a recent solver by Kelner et al. Beyond the independent interest of these solvers, we believe they highlight the versatility of the approach of this paper and we hope that they will open the door for further algorithmic improvements in the future. 1 ar
Navigating Central Path with Electrical Flows: From Flows to
 Matchings, and Back. FOCS
, 2013
"... ..."
(Show Context)
Matching the universal barrier without paying the costs : Solving linear programs with Õ( √ rank) linear system solves
 CoRR
"... In this paper we present a new algorithm for solving linear programs that requires only Õ( rank(A)L) iterations where A is the constraint matrix of a linear program with m constraints and n variables and L is the bit complexity of a linear program. Each iteration of our method consists of solving ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
In this paper we present a new algorithm for solving linear programs that requires only Õ( rank(A)L) iterations where A is the constraint matrix of a linear program with m constraints and n variables and L is the bit complexity of a linear program. Each iteration of our method consists of solving Õ(1) linear systems and additional nearly linear time computation. Our method improves upon the previous best iteration bound by factor of Ω̃((m / rank(A))1/4) for methods with polynomial time computable iterations and by Ω̃((m / rank(A))1/2) for methods which solve at most Õ(1) linear systems in each iteration. Our method is parallelizable and amenable to linear algebraic techniques for accelerating the linear system solver. As such, up to polylogarithmic factors we either match or improve upon the best previous running times for solving linear programs in both depth and work for different ratios of m and rank(A). Moreover, our method matches up to polylogarithmic factors a theoretical limit established by Nesterov and Nemirovski in 1994 regarding the use of a “universal barrier ” for interior point methods, thereby resolving a longstanding open question regarding the running time of polynomial time interior point methods for linear programming. 1
A Novel, Simple Interpretation of Nesterov’s Accelerated Method as a Combination of Gradient and Mirror Descent. ArXiv eprints, abs/1407.1537
, 2014
"... Firstorder methods play a central role in largescale convex optimization. Even though many variations exist, each suited to a particular problem form, almost all such methods fundamentally rely on two types of algorithmic steps and two corresponding types of analysis: gradientdescent steps, which ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Firstorder methods play a central role in largescale convex optimization. Even though many variations exist, each suited to a particular problem form, almost all such methods fundamentally rely on two types of algorithmic steps and two corresponding types of analysis: gradientdescent steps, which yield primal progress, and mirrordescent steps, which yield dual progress. In this paper, we observe that the performances of these two types of step are complementary, so that faster algorithms can be designed by coupling the two steps and combining their analyses. In particular, we show how to obtain a conceptually simple interpretation of Nesterov’s accelerated gradient method [Nes83, Nes04, Nes05], a cornerstone algorithm in convex optimization. Nesterov’s method is the optimal firstorder method for the class of smooth convex optimization problems. However, to the best of our knowledge, the proof of the fast convergence of Nesterov’s method has not found a clear interpretation and is still regarded by many as crucially relying on an “algebraic trick”[Jud13]. We apply our novel insights to express Nesterov’s algorithm as a natural coupling of gradient descent and mirror descent and to write its proof of convergence as a simple combination of the convergence analyses of the two underlying steps. We believe that the complementary view of gradient descent and mirror descent proposed in this paper will prove very useful in the design of firstorder methods as it allows us to design fast algorithms in a conceptually easier way. For instance, our view greatly facilitates the adaptation of nontrivial variants of Nesterov’s method to specific scenarios, such as packing and covering problems [AO14b, AO14a]. ar X iv
Constructing LinearSized Spectral Sparsification in AlmostLinear Time
"... We present the first almostlinear time algorithm for constructing linearsized spectral sparsification for graphs. This improves all previous constructions of linearsized spectral sparsification, which requires Ω(n2) time [1], [2], [3]. A key ingredient in our algorithm is a novel combination of t ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We present the first almostlinear time algorithm for constructing linearsized spectral sparsification for graphs. This improves all previous constructions of linearsized spectral sparsification, which requires Ω(n2) time [1], [2], [3]. A key ingredient in our algorithm is a novel combination of two techniques used in literature for constructing spectral sparsification: Random sampling by effective resistance [4], and adaptive constructions based on barrier functions [1], [3]. Keywords algorithmic spectral graph theory; spectral sparsification I.
New Results in the Theory of Approximation  Fast Graph Algorithms and Inapproximability
, 2013
"... For several basic optimization problems, it is NPhard to find an exact solution. As a result, understanding the best possible tradeoff between the running time of an algorithm and its approximation guarantee, is a fundamental question in theoretical computer science, and the central goal of the th ..."
Abstract
 Add to MetaCart
For several basic optimization problems, it is NPhard to find an exact solution. As a result, understanding the best possible tradeoff between the running time of an algorithm and its approximation guarantee, is a fundamental question in theoretical computer science, and the central goal of the theory of approximation. There are two aspects to the theory of approximation: (1) efficient approximation algorithms that establish tradeoffs between approximation guarantee and running time, and (2) inapproximability results that give evidence against them. In this thesis, we contribute to both facets of the theory of approximation. In the first part of this thesis, we present the first nearlineartime algorithm for Balanced Separator given a graph, partition its vertices into two roughly equal parts without cutting too many edges that achieves the best approximation guarantee possible for algorithms in its class. This is a classic graph partitioning problem and has deep connections to several areas of both theory and practice, such as metric embeddings, Markov chains, clustering, etc.
• Scaling • Blocking flows
, 2014
"... In the previous lecture we finished covering the multiplicative weights method and its application to solving LP’s and we introduced the maximum flow problem and the FordFulkerson algorithm to solve it. In this lecture we discuss two new methods to solve the maximum flow problem ..."
Abstract
 Add to MetaCart
(Show Context)
In the previous lecture we finished covering the multiplicative weights method and its application to solving LP’s and we introduced the maximum flow problem and the FordFulkerson algorithm to solve it. In this lecture we discuss two new methods to solve the maximum flow problem
1Stochastic Spectral Descent for Discrete Graphical Models
"... Abstract—Interest in deep probabilistic graphical models has increased in recent years, due to their stateoftheart performance on many machine learning applications. Such models are typically trained with the stochastic gradient method, which can take a significant number of iterations to conver ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Interest in deep probabilistic graphical models has increased in recent years, due to their stateoftheart performance on many machine learning applications. Such models are typically trained with the stochastic gradient method, which can take a significant number of iterations to converge. Since the computational cost of gradient estimation is prohibitive even for modestlysized models, training becomes slow and practicallyusable models are kept small. In this paper we propose a new, largely tuningfree algorithm to address this problem. Our approach derives novel majorization bounds based on the Schatten ∞ norm. Intriguingly, the minimizers of these bounds can be interpreted as gradient methods in a nonEuclidean space. We thus propose using a stochastic gradient method in nonEuclidean space. We both provide simple conditions under which our algorithm is guaranteed to converge, and demonstrate empirically that our algorithm leads to dramatically faster training and improved predictive ability compared to stochastic gradient descent for both directed and undirected graphical models. I.