Results 1  10
of
68
Distributed Subgradient Methods for Multiagent Optimization
, 2007
"... We study a distributed computation model for optimizing a sum of convex objective functions corresponding to multiple agents. For solving this (not necessarily smooth) optimization problem, we consider a subgradient method that is distributed among the agents. The method involves every agent minimiz ..."
Abstract

Cited by 234 (24 self)
 Add to MetaCart
(Show Context)
We study a distributed computation model for optimizing a sum of convex objective functions corresponding to multiple agents. For solving this (not necessarily smooth) optimization problem, we consider a subgradient method that is distributed among the agents. The method involves every agent minimizing his/her own objective function while exchanging information locally with other agents in the network over a timevarying topology. We provide convergence results and convergence rate estimates for the subgradient method. Our convergence rate results explicitly characterize the tradeoff between a desired accuracy of the generated approximate optimal solutions and the number of iterations needed to achieve the accuracy.
Maximum margin planning
 In Proceedings of the 23rd International Conference on Machine Learning (ICML’06
, 2006
"... Imitation learning of sequential, goaldirected behavior by standard supervised techniques is often difficult. We frame learning such behaviors as a maximum margin structured prediction problem over a space of policies. In this approach, we learn mappings from features to cost so an optimal policy in ..."
Abstract

Cited by 139 (28 self)
 Add to MetaCart
(Show Context)
Imitation learning of sequential, goaldirected behavior by standard supervised techniques is often difficult. We frame learning such behaviors as a maximum margin structured prediction problem over a space of policies. In this approach, we learn mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert’s behavior. Further, we demonstrate a simple, provably efficient approach to structured maximum margin learning, based on the subgradient method, that leverages existing fast algorithms for inference. Although the technique is general, it is particularly relevant in problems where A * and dynamic programming approaches make learning policies tractable in problems beyond the limitations of a QP formulation. We demonstrate our approach applied to route planning for outdoor mobile robots, where the behavior a designer wishes a planner to execute is often clear, while specifying cost functions that engender this behavior is a much more difficult task. 1.
(Online) Subgradient Methods for Structured Prediction
"... Promising approaches to structured learning problems have recently been developed in the maximum margin framework. Unfortunately, algorithms that are computationally and memory efficient enough to solve large scale problems have lagged behind. We propose using simple subgradientbased techniques for ..."
Abstract

Cited by 86 (15 self)
 Add to MetaCart
(Show Context)
Promising approaches to structured learning problems have recently been developed in the maximum margin framework. Unfortunately, algorithms that are computationally and memory efficient enough to solve large scale problems have lagged behind. We propose using simple subgradientbased techniques for optimizing a regularized risk formulation of these problems in both online and batch settings, and analyze the theoretical convergence, generalization, and robustness properties of the resulting techniques. These algorithms are are simple, memory efficient, fast to converge, and have small regret in the online setting. We also investigate a novel convex regression formulation of structured learning. Finally, we demonstrate the benefits of the subgradient approach on three structured prediction problems. 1
Approximate Primal Solutions and Rate Analysis for Dual Subgradient Methods
, 2007
"... We study primal solutions obtained as a byproduct of subgradient methods when solving the Lagrangian dual of a primal convex constrained optimization problem (possibly nonsmooth). The existing literature on the use of subgradient methods for generating primal optimal solutions is limited to the met ..."
Abstract

Cited by 82 (7 self)
 Add to MetaCart
(Show Context)
We study primal solutions obtained as a byproduct of subgradient methods when solving the Lagrangian dual of a primal convex constrained optimization problem (possibly nonsmooth). The existing literature on the use of subgradient methods for generating primal optimal solutions is limited to the methods producing such solutions only asymptotically (i.e., in the limit as the number of subgradient iterations increases to infinity). Furthermore, no convergence rate results are known for these algorithms. In this paper, we propose and analyze dual subgradient methods using averaging to generate approximate primal optimal solutions. These algorithms use a constant stepsize as opposed to a diminishing stepsize which is dominantly used in the existing primal recovery schemes. We provide estimates on the convergence rate of the primal sequences. In particular, we provide bounds on the amount of feasibility violation of the generated approximate primal solutions. We also provide upper and lower bounds on the primal function values at the approximate solutions. The feasibility violation and primal value estimates are given per iteration, thus providing practical stopping criteria. Our analysis relies on the Slater condition and the inherited boundedness properties of the dual problem under this condition.
A Stochastic Gradient Method with an Exponential Convergence Rate for StronglyConvex Optimization with Finite Training Sets. arXiv preprint arXiv:1202.6258
, 2012
"... We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in ..."
Abstract

Cited by 76 (11 self)
 Add to MetaCart
(Show Context)
We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence rate. In a machine learning context, numerical experiments indicate that the new algorithm can dramatically outperform standard algorithms, both in terms of optimizing the training objective and reducing the testing objective quickly. 1
Parallel stochastic gradient algorithms for largescale matrix completion
 Mathematical Programming Computation
, 2013
"... This paper develops Jellyfish, an algorithm for solving dataprocessing problems with matrixvalued decision variables regularized to have low rank. Particular examples of problems solvable by Jellyfish include matrix completion problems and leastsquares problems regularized by the nuclear norm or ..."
Abstract

Cited by 71 (7 self)
 Add to MetaCart
This paper develops Jellyfish, an algorithm for solving dataprocessing problems with matrixvalued decision variables regularized to have low rank. Particular examples of problems solvable by Jellyfish include matrix completion problems and leastsquares problems regularized by the nuclear norm or γ2norm. Jellyfish implements a projected incremental gradient method with a biased, random ordering of the increments. This biased ordering allows for a parallel implementation that admits a speedup nearly proportional to the number of processors. On largescale matrix completion tasks, Jellyfish is orders of magnitude more efficient than existing codes. For example, on the Netflix Prize data set, prior art computes rating predictions in approximately 4 hours, while Jellyfish solves the same problem in under 3 minutes on a 12 core workstation.
Bundle methods for machine learning
 JMLR
"... We present a globally convergent method for regularized risk minimization problems. Our method applies to Support Vector estimation, regression, Gaussian Processes, and any other regularized risk minimization setting which leads to a convex optimization problem. SVMPerf can be shown to be a special ..."
Abstract

Cited by 50 (10 self)
 Add to MetaCart
(Show Context)
We present a globally convergent method for regularized risk minimization problems. Our method applies to Support Vector estimation, regression, Gaussian Processes, and any other regularized risk minimization setting which leads to a convex optimization problem. SVMPerf can be shown to be a special case of our approach. In addition to the unified framework we present tight convergence bounds, which show that our algorithm converges in O(1/) steps to precision for general convex problems and in O(log(1/)) steps for continuously differentiable problems. We demonstrate in experiments the performance of our approach. 1
Convergence Rates of Inexact ProximalGradient Methods for Convex Optimization
 NIPS'11 25 TH ANNUAL CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS
, 2011
"... We consider the problem of optimizing the sum of a smooth convex function and a nonsmooth convex function using proximalgradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the nonsmooth term. We show that b ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
(Show Context)
We consider the problem of optimizing the sum of a smooth convex function and a nonsmooth convex function using proximalgradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the nonsmooth term. We show that both the basic proximalgradient method and the accelerated proximalgradient method achieve the same convergence rate as in the errorfree case, provided that the errors decrease at appropriate rates. Using these rates, we perform as well as or better than a carefully chosen fixed error level on a set of structured sparsity problems.
Subgradient methods for saddlepoint problems
 Journal of Optimization Theory and Applications
, 2009
"... We study subgradient methods for computing the saddle points of a convexconcave function. Our motivation is coming from networking applications where dual and primaldual subgradient methods have attracted much attention in designing decentralized network protocols. We first present a subgradient al ..."
Abstract

Cited by 39 (0 self)
 Add to MetaCart
(Show Context)
We study subgradient methods for computing the saddle points of a convexconcave function. Our motivation is coming from networking applications where dual and primaldual subgradient methods have attracted much attention in designing decentralized network protocols. We first present a subgradient algorithm for generating approximate saddle points and provide periteration convergence rate estimates on the constructed solutions. We then focus on Lagrangian duality, where we consider a convex primal optimization problem and its Lagrangian dual problem, and generate approximate primaldual optimal solutions as approximate saddle points of the Lagrangian function. We present a variation of our subgradient method under the Slater constraint qualification and provide stronger estimates on the convergence rate of the generated primal sequences. In particular, we provide bounds on the amount of feasibility violation and on the primal objective function values at the approximate solutions. Our algorithm is particularly wellsuited for problems where the subgradient of the dual function cannot be evaluated easily (equivalently, the minimum of the Lagrangian function at a dual solution cannot be computed efficiently), thus impeding the use of dual subgradient methods.
A quasiNewton approach to nonsmooth convex optimization
 In ICML
, 2008
"... We extend the wellknown BFGS quasiNewton method and its limitedmemory variant LBFGS to the optimization of nonsmooth convex objectives. This is done in a rigorous fashion by generalizing three components of BFGS to subdifferentials: The local quadratic model, the identification of a descent direc ..."
Abstract

Cited by 37 (2 self)
 Add to MetaCart
(Show Context)
We extend the wellknown BFGS quasiNewton method and its limitedmemory variant LBFGS to the optimization of nonsmooth convex objectives. This is done in a rigorous fashion by generalizing three components of BFGS to subdifferentials: The local quadratic model, the identification of a descent direction, and the Wolfe line search conditions. We apply the resulting subLBFGS algorithm to L2regularized risk minimization with binary hinge loss, and its directionfinding component to L1regularized risk minimization with logistic loss. In both settings our generic algorithms perform comparable to or better than their counterparts in specialized stateoftheart solvers. 1.