Results 1  10
of
79
Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling
 IEEE TRANSACTIONS ON AUTOMATIC CONTROL
, 2010
"... The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. It arises in various application domains, including distributed tracking and localization, multiagen ..."
Abstract

Cited by 93 (12 self)
 Add to MetaCart
(Show Context)
The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. It arises in various application domains, including distributed tracking and localization, multiagent coordination, estimation in sensor networks, and largescale machine learning. We develop and analyze distributed algorithms based on dual subgradient averaging, and we provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis allows us to clearly separate the convergence of the optimization algorithm itself and the effects of communication dependent on the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network and confirm this prediction’s sharpness both by theoretical lower bounds and simulations for various networks. Our approach includes the cases of deterministic optimization and communication as well as problems with stochastic optimization and/or communication.
Distributed delayed stochastic optimization
, 2011
"... We analyze the convergence of gradientbased optimization algorithms whose updates depend on delayed stochastic gradient information. The main application of our results is to the development of distributed minimizationalgorithmswhereamasternodeperformsparameterupdateswhile worker nodes compute stoc ..."
Abstract

Cited by 52 (5 self)
 Add to MetaCart
We analyze the convergence of gradientbased optimization algorithms whose updates depend on delayed stochastic gradient information. The main application of our results is to the development of distributed minimizationalgorithmswhereamasternodeperformsparameterupdateswhile worker nodes compute stochastic gradients based on local information in parallel, which may give rise to delays due to asynchrony. Our main contributionistoshowthatforsmoothstochasticproblems,thedelaysareasymptotically negligible. In application to distributed optimization, we show nnode architectures whose optimization error in stochastic problems—in spite of asynchronous delays—scales asymptotically as O(1 / √ nT), which is known to be optimal even in the absence of delays. 1
Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey
, 2010
"... We survey incremental methods for minimizing a sum ∑ m i=1 fi(x) consisting of a large number of convex component functions fi. Our methods consist of iterations applied to single components, and have proved very effective in practice. We introduce a unified algorithmic framework for a variety of su ..."
Abstract

Cited by 47 (2 self)
 Add to MetaCart
(Show Context)
We survey incremental methods for minimizing a sum ∑ m i=1 fi(x) consisting of a large number of convex component functions fi. Our methods consist of iterations applied to single components, and have proved very effective in practice. We introduce a unified algorithmic framework for a variety of such methods, some involving gradient and subgradient iterations, which are known, and some involving combinations of subgradient and proximal methods, which are new and offer greater flexibility in exploiting the special structure of fi. We provide an analysis of the convergence and rate of convergence properties of these methods, including the advantages offered by randomization in the selection of components. We also survey applications in inference/machine learning, signal processing, and largescale and distributed optimization.
Giannakis, “Distributed sparse linear regression
 IEEE Trans. Signal Process
, 2010
"... Abstract—The Lasso is a popular technique for joint estimation and continuous variable selection, especially wellsuited for sparse and possibly underdetermined linear regression problems. This paper develops algorithms to estimate the regression coefficients via Lasso when the training data are di ..."
Abstract

Cited by 44 (8 self)
 Add to MetaCart
(Show Context)
Abstract—The Lasso is a popular technique for joint estimation and continuous variable selection, especially wellsuited for sparse and possibly underdetermined linear regression problems. This paper develops algorithms to estimate the regression coefficients via Lasso when the training data are distributed across different agents, and their communication to a central processing unit is prohibited for e.g., communication cost or privacy reasons. A motivating application is explored in the context of wireless communications, whereby sensing cognitive radios collaborate to estimate the radiofrequency power spectrum density. Attaining different tradeoffs between complexity and convergence speed, three novel algorithms are obtained after reformulating the Lasso into a separable form, which is iteratively minimized using the alternatingdirection method of multipliers so as to gain the desired degree of parallelization. Interestingly, the per agent estimate updates are given by simple softthresholding operations, and interagent communication overhead remains at affordable level. Without exchanging elements from the different training sets, the local estimates consent to the global Lasso solution, i.e., the fit that would be obtained if the entire data set were centrally available. Numerical experiments with both simulated and real data demonstrate the merits of the proposed distributed schemes, corroborating their convergence and global optimality. The ideas in this paper can be easily extended for the purpose of fitting related models in a distributed fashion, including the adaptive Lasso, elastic net, fused Lasso and nonnegative garrote. Index Terms—Distributed linear regression, Lasso, parallel optimization, sparse estimation. I.
Cooperative Convex Optimization in Networked Systems: Augmented Lagrangian Algorithms with Directed Gossip Communication
 IEEE Transactions on Signal Processing
"... We study distributed optimization in networked systems, where nodes cooperate to find the optimal quantity of common interest, x = x ⋆. The objective function of the corresponding10 optimization problem is the sum of private (known only by a node,) convex, nodes ’ objectives and each node imposes a ..."
Abstract

Cited by 26 (12 self)
 Add to MetaCart
(Show Context)
We study distributed optimization in networked systems, where nodes cooperate to find the optimal quantity of common interest, x = x ⋆. The objective function of the corresponding10 optimization problem is the sum of private (known only by a node,) convex, nodes ’ objectives and each node imposes a private convex constraint on the allowed values of x. We solve this problem for generic connected network topologies with asymmetric random link failures with a novel distributed, decentralized algorithm. We refer to this algorithm as AL–G (augmented Lagrangian gossiping,) and to its variants as AL–MG (augmented Lagrangian multi neighbor gossiping) and AL–BG (augmented Lagrangian broadcast gossiping.) The AL–G algorithm is based on the augmented Lagrangian dual function. Dual variables are updated by the standard method of multipliers, at a slow time scale. To update the primal variables, we propose a novel, GaussSeidel type, randomized algorithm, at a fast time scale. AL–G uses unidirectional gossip communication, only between immediate neighbors in the network and is resilient to random link failures. For networks with reliable communication (i.e., no failures,) the simplified, AL– BG (augmented Lagrangian broadcast gossiping) algorithm reduces communication, computation and data storage cost. We prove convergence for all proposed algorithms and demonstrate by simulations
Diffusion adaptation over networks
 in Academic Press Library in Signal Processing
, 2014
"... ar ..."
(Show Context)
CommunicationEfficient Algorithms for Statistical Optimization
"... We study two communicationefficient algorithms for distributed statistical optimization on largescale data. The first algorithm is an averaging method that distributes the N data samples evenly to m machines, performs separate minimization on each subset, and then averages the estimates. We provid ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
(Show Context)
We study two communicationefficient algorithms for distributed statistical optimization on largescale data. The first algorithm is an averaging method that distributes the N data samples evenly to m machines, performs separate minimization on each subset, and then averages the estimates. We provide a sharp analysis of this average mixture algorithm, showing that under a reasonable set of conditions, the combined parameter achieves meansquared error that decays as O(N −1 +(N/m) −2). Wheneverm ≤ √ N, this guarantee matches the best possible rate achievable by a centralized algorithm having access to all N samples. The second algorithm is a novel method, based on an appropriate form of the bootstrap. Requiring only a single round of communication, it has meansquared error that decays asO(N −1 +(N/m) −3), and so is more robust to the amount of parallelization. We complement our theoretical results with experiments on largescale problems from the internet search domain. In particular, we show that our methods efficiently solve an advertisement prediction problem from the Chinese SoSo Search Engine, which consists ofN ≈ 2.4×10 8 samples andd ≥ 700,000 dimensions. 1
Pushsum distributed dual averaging for convex optimization
 in IEEE CDC
, 2012
"... Abstract — In this paper we extend and analyze the distributed dual averaging algorithm [1] to handle communication delays and general stochastic consensus protocols. Assuming each network link experiences some fixed bounded delay, we show that distributed dual averaging converges and the error dec ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
(Show Context)
Abstract — In this paper we extend and analyze the distributed dual averaging algorithm [1] to handle communication delays and general stochastic consensus protocols. Assuming each network link experiences some fixed bounded delay, we show that distributed dual averaging converges and the error decays at a rate O(T−0.5) where T is the number of iterations. This bound is an improvement over [1] by a logarithmic factor in T for networks of fixed size. Finally, we extend the algorithm to the case of using general nonaveraging consensus protocols. We prove that the bias introduced in the optimization can be removed by a simple correction that depends on the stationary distribution of the consensus matrix. I.
On the O(1/k) Convergence of Asynchronous Distributed Alternating Direction Method of Multipliers
, 2013
"... We consider a network of agents that are cooperatively solving a global optimization problem, where the objective function is the sum of privately known local objective functions of the agents and the decision variables are coupled via linear constraints. Recent literature focused on special cases o ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
We consider a network of agents that are cooperatively solving a global optimization problem, where the objective function is the sum of privately known local objective functions of the agents and the decision variables are coupled via linear constraints. Recent literature focused on special cases of this formulation and studied their distributed solution through either subgradient based methods with O(1 /√k) rate of convergence (where k is the iteration number) or Alternating Direction Method of Multipliers (ADMM) based methods, which require a synchronous implementation and a globally known order on the agents. In this paper, we present a novel asynchronous ADMM based distributed method for the general formulation and show that it converges at the rate O (1/k).