Results 1  10
of
228
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
, 2010
"... ..."
(Show Context)
Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling
 IEEE TRANSACTIONS ON AUTOMATIC CONTROL
, 2010
"... The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. It arises in various application domains, including distributed tracking and localization, multiagen ..."
Abstract

Cited by 97 (12 self)
 Add to MetaCart
(Show Context)
The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. It arises in various application domains, including distributed tracking and localization, multiagent coordination, estimation in sensor networks, and largescale machine learning. We develop and analyze distributed algorithms based on dual subgradient averaging, and we provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis allows us to clearly separate the convergence of the optimization algorithm itself and the effects of communication dependent on the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network and confirm this prediction’s sharpness both by theoretical lower bounds and simulations for various networks. Our approach includes the cases of deterministic optimization and communication as well as problems with stochastic optimization and/or communication.
Distributed stochastic subgradient projection algorithms for convex optimization
 Journal of Optimization Theory and Applications
, 2010
"... Abstract. We consider a distributed multiagent network system where the goal is to minimize a sum of convex objective functions of the agents subject to a common convex constraint set. Each agent maintains an iterate sequence and communicates the iterates to its neighbors. Then, each agent combines ..."
Abstract

Cited by 87 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We consider a distributed multiagent network system where the goal is to minimize a sum of convex objective functions of the agents subject to a common convex constraint set. Each agent maintains an iterate sequence and communicates the iterates to its neighbors. Then, each agent combines weighted averages of the received iterates with its own iterate, and adjusts the iterate by using subgradient information (known with stochastic errors) of its own function and by projecting onto the constraint set. The goal of this paper is to explore the effects of stochastic subgradient errors on the convergence of the algorithm. We first consider the behavior of the algorithm in mean, and then the convergence with probability 1 and in mean square. We consider general stochastic errors that have uniformly bounded second moments and obtain bounds on the limiting performance of the algorithm in mean for diminishing and nondiminishing stepsizes. When the means of the errors diminish, we prove that there is mean consensus between the agents and mean convergence to the optimum function value for diminishing stepsizes. When the mean errors diminish sufficiently fast, we strengthen the results to consensus and convergence of the iterates to an optimal solution with probability 1 and in mean square.
Optimal distributed online prediction using minibatches
, 2010
"... Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of webscale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this wor ..."
Abstract

Cited by 73 (9 self)
 Add to MetaCart
Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of webscale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this work, we present the distributed minibatch algorithm, a method of converting many serial gradientbased online prediction algorithms into distributed algorithms. We prove a regret bound for this method that is asymptotically optimal for smooth convex loss functions and stochastic inputs. Moreover, our analysis explicitly takes into account communication latencies between nodes in the distributed environment. We show how our method can be used to solve the closelyrelated distributed stochastic optimization problem, achieving an asymptotically linear speedup over multiple processors. Finally, we demonstrate the merits of our approach on a webscale online prediction problem.
Distributed delayed stochastic optimization
, 2011
"... We analyze the convergence of gradientbased optimization algorithms whose updates depend on delayed stochastic gradient information. The main application of our results is to the development of distributed minimizationalgorithmswhereamasternodeperformsparameterupdateswhile worker nodes compute stoc ..."
Abstract

Cited by 55 (6 self)
 Add to MetaCart
(Show Context)
We analyze the convergence of gradientbased optimization algorithms whose updates depend on delayed stochastic gradient information. The main application of our results is to the development of distributed minimizationalgorithmswhereamasternodeperformsparameterupdateswhile worker nodes compute stochastic gradients based on local information in parallel, which may give rise to delays due to asynchrony. Our main contributionistoshowthatforsmoothstochasticproblems,thedelaysareasymptotically negligible. In application to distributed optimization, we show nnode architectures whose optimization error in stochastic problems—in spite of asynchronous delays—scales asymptotically as O(1 / √ nT), which is known to be optimal even in the absence of delays. 1
On Distributed Convex Optimization Under Inequality and Equality Constraints
 UNIVERSITY OF CALIFORNIA, SAN DIEGO (UC SAN
, 2012
"... We consider a general multiagent convex optimization problem where the agents are to collectively minimize a global objective function subject to a global inequality constraint, a global equality constraint, and a global constraint set. The objective function is defined by a sum of local objective ..."
Abstract

Cited by 52 (8 self)
 Add to MetaCart
(Show Context)
We consider a general multiagent convex optimization problem where the agents are to collectively minimize a global objective function subject to a global inequality constraint, a global equality constraint, and a global constraint set. The objective function is defined by a sum of local objective functions, while the global constraint set is produced by the intersection of local constraint sets. In particular, we study two cases: one where the equality constraint is absent, and the other where the local constraint sets are identical. We devise two distributed primaldual subgradient algorithms based on the characterization of the primaldual optimal solutions as the saddle points of the Lagrangian and penalty functions. These algorithms can be implemented over networks with dynamically changing topologies but satisfying a standard connectivity property, and allow the agents to asymptotically agree on optimal solutions and optimal values of the optimization problem under the Slater’s condition.
Incremental stochastic subgradient algorithms for convex optimization
 SIAM J. OPTIM
, 2008
"... In this paper we study the effect of stochastic errors on two constrained incremental subgradient algorithms. We view the incremental subgradient algorithms as decentralized network optimization algorithms as applied to minimize a sum of functions, when each component function is known only to a ..."
Abstract

Cited by 49 (7 self)
 Add to MetaCart
In this paper we study the effect of stochastic errors on two constrained incremental subgradient algorithms. We view the incremental subgradient algorithms as decentralized network optimization algorithms as applied to minimize a sum of functions, when each component function is known only to a particular agent of a distributed network. We first study the standard cyclic incremental subgradient algorithm in which the agents form a ring structure and pass the iterate in a cycle. We consider the method with stochastic errors in the subgradient evaluations and provide sufficient conditions on the moments of the stochastic errors that guarantee almost sure convergence when a diminishing stepsize is used. We also obtain almost sure bounds on the algorithm’s performance when a constant stepsize is used. We then consider the Markov randomized incremental subgradient method, which is a noncyclic version of the incremental algorithm where the sequence of computing agents is modeled as a time nonhomogeneous Markov chain. Such a model is appropriate for mobile networks, as the network topology changes across time in these networks. We establish the convergence results and error bounds for the Markov randomized method in the presence of stochastic errors for diminishing and constant stepsizes, respectively.
Giannakis, “Distributed sparse linear regression
 IEEE Trans. Signal Process
, 2010
"... Abstract—The Lasso is a popular technique for joint estimation and continuous variable selection, especially wellsuited for sparse and possibly underdetermined linear regression problems. This paper develops algorithms to estimate the regression coefficients via Lasso when the training data are di ..."
Abstract

Cited by 46 (8 self)
 Add to MetaCart
(Show Context)
Abstract—The Lasso is a popular technique for joint estimation and continuous variable selection, especially wellsuited for sparse and possibly underdetermined linear regression problems. This paper develops algorithms to estimate the regression coefficients via Lasso when the training data are distributed across different agents, and their communication to a central processing unit is prohibited for e.g., communication cost or privacy reasons. A motivating application is explored in the context of wireless communications, whereby sensing cognitive radios collaborate to estimate the radiofrequency power spectrum density. Attaining different tradeoffs between complexity and convergence speed, three novel algorithms are obtained after reformulating the Lasso into a separable form, which is iteratively minimized using the alternatingdirection method of multipliers so as to gain the desired degree of parallelization. Interestingly, the per agent estimate updates are given by simple softthresholding operations, and interagent communication overhead remains at affordable level. Without exchanging elements from the different training sets, the local estimates consent to the global Lasso solution, i.e., the fit that would be obtained if the entire data set were centrally available. Numerical experiments with both simulated and real data demonstrate the merits of the proposed distributed schemes, corroborating their convergence and global optimality. The ideas in this paper can be easily extended for the purpose of fitting related models in a distributed fashion, including the adaptive Lasso, elastic net, fused Lasso and nonnegative garrote. Index Terms—Distributed linear regression, Lasso, parallel optimization, sparse estimation. I.
Spread of (mis)information in social networks
, 2009
"... We provide a model to investigate the tension between information aggregation and spread of misinformation in large societies (conceptualized as networks of agents communicating with each other). Each individual holds a belief represented by a scalar. Individuals meet pairwise and exchange informati ..."
Abstract

Cited by 43 (7 self)
 Add to MetaCart
(Show Context)
We provide a model to investigate the tension between information aggregation and spread of misinformation in large societies (conceptualized as networks of agents communicating with each other). Each individual holds a belief represented by a scalar. Individuals meet pairwise and exchange information, which is modeled as both individuals adopting the average of their premeeting beliefs. When all individuals engage in this type of information exchange, the society will be able to effectively aggregate the initial information held by all individuals. There is also the possibility of misinformation, however, because some of the individuals are “forceful, ” meaning that they influence the beliefs of (some) of the other individuals they meet, but do not change their own opinion. The paper characterizes how the presence of forceful agents interferes with information aggregation. Under the assumption that even forceful agents obtain some information (however infrequent) from some others (and additional weak regularity conditions), we first show that beliefs in this class of societies converge to a consensus among all individuals. This consensus value is a random variable, however, and we characterize its behavior. Our main results quantify the extent of misinformation in the society by either providing bounds or exact results (in some special cases) on how far the consensus value can be from the benchmark without forceful agents (where there is efficient information aggregation). The worst outcomes obtain when there are several forceful agents and forceful agents themselves update their beliefs only on the basis of information they obtain from individuals most likely to have received their own information previously.
Opinion Dynamics and Learning in Social Networks
, 2010
"... We provide an overview of recent research on belief and opinion dynamics in social networks. We discuss both Bayesian and nonBayesian models of social learning and focus on the implications of the form of learning (e.g., Bayesian vs. nonBayesian), the sources of information (e.g., observation vs. ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
We provide an overview of recent research on belief and opinion dynamics in social networks. We discuss both Bayesian and nonBayesian models of social learning and focus on the implications of the form of learning (e.g., Bayesian vs. nonBayesian), the sources of information (e.g., observation vs. communication), and the structure of social networks in which individuals are situated on three key questions: (1) whether social learning will lead to consensus, i.e., to agreement among individuals starting with different views; (2) whether social learning will effectively aggregate dispersed information and thus weed out incorrect beliefs; (3) whether media sources, prominent agents, politicians and the state will be able to manipulate beliefs and spread misinformation in a society.