Results 1 - 10
of
36
Constrained consensus and optimization in multi-agent networks
- IEEE TRANSACTIONS ON AUTOMATIC CONTROL
, 2008
"... We present distributed algorithms that can be used by multiple agents to align their estimates with a particular value over a network with time-varying connectivity. Our framework is general in that this value can represent a consensus value among multiple agents or an optimal solution of an optimiz ..."
Abstract
-
Cited by 115 (8 self)
- Add to MetaCart
(Show Context)
We present distributed algorithms that can be used by multiple agents to align their estimates with a particular value over a network with time-varying connectivity. Our framework is general in that this value can represent a consensus value among multiple agents or an optimal solution of an optimization problem, where the global objective function is a combination of local agent objective functions. Our main focus is on constrained problems where the estimate of each agent is restricted to lie in a different constraint set. To highlight the effects of constraints, we first consider a constrained consensus problem and present a distributed “projected consensus algorithm ” in which agents combine their local averaging operation with projection on their individual constraint sets. This algorithm can be viewed as a version of an alternating projection method with weights that are varying over time and across agents. We establish convergence and convergence rate results for the projected consensus algorithm. We next study a constrained optimization problem for optimizing the
Revisiting Log-Linear Learning: Asynchrony, Completeness and Payoff-Based Implementation
, 2008
"... Log-linear learning is a learning algorithm with equilibrium selection properties. Log-linear learning provides guarantees on the percentage of time that the joint action profile will be at a potential maximizer in potential games. The traditional analysis of log-linear learning has centered around ..."
Abstract
-
Cited by 42 (11 self)
- Add to MetaCart
Log-linear learning is a learning algorithm with equilibrium selection properties. Log-linear learning provides guarantees on the percentage of time that the joint action profile will be at a potential maximizer in potential games. The traditional analysis of log-linear learning has centered around explicitly computing the stationary distribution. This analysis relied on a highly structured setting: i) players ’ utility functions constitute a potential game, ii) players update their strategies one at a time, which we refer to as asynchrony, iii) at any stage, a player can select any action in the action set, which we refer to as completeness, and iv) each player is endowed with the ability to assess the utility he would have received for any alternative action provided that the actions of all other players remain fixed. Since the appeal of log-linear learning is not solely the explicit form of the stationary distribution, we seek to address to what degree one can relax the structural assumptions while maintaining that only potential function maximizers are the stochastically stable action profiles. In this paper, we introduce slight variants of log-linear learning to include both synchronous updates and incomplete action sets. In both settings, we prove that only potential function maximizers are stochastically stable. Furthermore, we introduce a payoff-based version of log-linear learning, in which players are only aware of the utility they received and the action that they played. Note that log-linear learning in its original form is not a payoff-based learning algorithm. In payoff-based log-linear learning, we also prove that only potential maximizers are stochastically stable. The key enabler for these results is to change the focus of the analysis away from deriving the explicit form of the stationary distribution of the learning process towards characterizing the stochastically stable states. The resulting analysis uses the theory of resistance trees for regular perturbed Markov decision processes, thereby allowing a relaxation of the aforementioned structural assumptions.
Payoff-based dynamics for multi-player weakly acyclic games
- SIAM J. CONTROL OPT
, 2009
"... We consider repeated multiplayer games in which players repeatedly and simultaneously choose strategies from a finite set of available strategies according to some strategy adjustment process. We focus on the specific class of weakly acyclic games, which is particularly relevant for multiagent coo ..."
Abstract
-
Cited by 33 (12 self)
- Add to MetaCart
(Show Context)
We consider repeated multiplayer games in which players repeatedly and simultaneously choose strategies from a finite set of available strategies according to some strategy adjustment process. We focus on the specific class of weakly acyclic games, which is particularly relevant for multiagent cooperative control problems. A strategy adjustment process determines how players select their strategies at any stage as a function of the information gathered over previous stages. Of particular interest are “payoff-based ” processes in which, at any stage, players know only their own actions and (noise corrupted) payoffs from previous stages. In particular, players do not know the actions taken by other players and do not know the structural form of payoff functions. We introduce three different payoff-based processes for increasingly general scenarios and prove that, after a sufficiently large number of stages, player actions constitute a Nash equilibrium at any stage with arbitrarily high probability. We also show how to modify player utility functions through tolls and incentives in so-called congestion games, a special class of weakly acyclic games, to guarantee that a centralized objective can be realized as a Nash equilibrium. We illustrate the methods with a simulation of distributed routing over a network.
Payoff Based Dynamics for Multi-Player Weakly Acyclic Games
- SIAM JOURNAL ON CONTROL AND OPTIMIZATION, SPECIAL ISSUE ON CONTROL AND OPTIMIZATION IN COOPERATIVE NETWORKS
, 2007
"... We consider repeated multi-player games in which players repeatedly and simultaneously choose strategies from a finite set of available strategies according to some strategy adjustment process. We focus on the specific class of weakly acyclic games, which is particularly relevant for multi-agent coo ..."
Abstract
-
Cited by 28 (15 self)
- Add to MetaCart
(Show Context)
We consider repeated multi-player games in which players repeatedly and simultaneously choose strategies from a finite set of available strategies according to some strategy adjustment process. We focus on the specific class of weakly acyclic games, which is particularly relevant for multi-agent cooperative control problems. A strategy adjustment process determines how players select their strategies at any stage as a function of the information gathered over previous stages. Of particular interest are “payoff based ” processes, in which at any stage, players only know their own actions and (noise corrupted) payoffs from previous stages. In particular, players do not know the actions taken by other players and do not know the structural form of payoff functions. We introduce three different payoff based processes for increasingly general scenarios and prove that after a sufficiently large number of stages, player actions constitute a Nash equilibrium at any stage with arbitrarily high probability. We also show how to modify player utility functions through tolls and incentives in so-called congestion games, a special class of weakly acyclic games, to guarantee that a centralized objective can be realized as a Nash equilibrium. We illustrate the methods with a simulation of distributed routing over a network.
Regret based dynamics: Convergence in weakly acyclic games
- In Proceedings of the 2007 International Conference on Autonomous Agents and Multiagent Systems (AAMAS
, 2007
"... Regret based algorithms have been proposed to control a wide variety of multi-agent systems. The appeal of regretbased algorithms is that (1) these algorithms are easily implementable in large scale multi-agent systems and (2) there are existing results proving that the behavior will asymptotically ..."
Abstract
-
Cited by 28 (11 self)
- Add to MetaCart
(Show Context)
Regret based algorithms have been proposed to control a wide variety of multi-agent systems. The appeal of regretbased algorithms is that (1) these algorithms are easily implementable in large scale multi-agent systems and (2) there are existing results proving that the behavior will asymptotically converge to a set of points of “no-regret ” in any game. We illustrate, through a simple example, that noregret points need not reflect desirable operating conditions for a multi-agent system. Multi-agent systems often exhibit an additional structure (i.e. being “weakly acyclic”) that has not been exploited in the context of regret based algorithms. In this paper, we introduce a modification of regret based algorithms by (1) exponentially discounting the memory and (2) bringing in a notion of inertia in players ’ decision process. We show how these modifications can lead to an entire class of regret based algorithm that provide almost sure convergence to a pure Nash equilibrium in any weakly acyclic game.
DCOPs meet the real world: Exploring unknown reward matrices with applications to mobile sensor networks
, 2009
"... Abstract Buoyed by recent successes in the area of distributed constraint optimization problems (DCOPs), this paper addresses challenges faced when applying DCOPs to real-world domains. Three fundamental challenges must be addressed for a class of real-world domains, requiring novel DCOP algorithms ..."
Abstract
-
Cited by 26 (6 self)
- Add to MetaCart
(Show Context)
Abstract Buoyed by recent successes in the area of distributed constraint optimization problems (DCOPs), this paper addresses challenges faced when applying DCOPs to real-world domains. Three fundamental challenges must be addressed for a class of real-world domains, requiring novel DCOP algorithms. First, agents may not know the payoff matrix and must explore the environment to determine rewards associated with variable settings. Second, agents may need to maximize total accumulated reward rather than instantaneous final reward. Third, limited time horizons disallow exhaustive exploration of the environment. We propose and implement a set of novel algorithms that combine decision-theoretic exploration approaches with DCOP-mandated coordination. In addition to simulation results, we implement these algorithms on robots, deploying DCOPs on a distributed mobile sensor network.
Achieving Pareto Optimality Through Distributed Learning
, 2012
"... We propose a simple payoff-based learning rule that is completely decentralized, and that leads to an efficient configuration of actions in any n-person finite strategic-form game with generic payoffs. The algorithm follows the theme of exploration versus exploitation and is hence stochastic in natu ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
We propose a simple payoff-based learning rule that is completely decentralized, and that leads to an efficient configuration of actions in any n-person finite strategic-form game with generic payoffs. The algorithm follows the theme of exploration versus exploitation and is hence stochastic in nature. We prove that if all agents adhere to this algorithm, then the agents will select the action profile that maximizes the sum of the agents ’ payoffs a high percentage of time. The algorithm requires no communication. Agents respond solely to changes in their own realized payoffs, which are affected by the actions of other agents in the system in ways that they do not necessarily understand. The method can be applied to the optimization of complex systems with many distributed components, such as the routing of information in networks and the design and control of wind farms. The proof of the proposed learning algorithm relies on the theory of large deviations for perturbed Markov chains.
Distributed Welfare Games
"... We consider a variation of the resource allocation problem. In the traditional problem, there is a global planner who would like to assign a set of players to a set of resources so as to maximize welfare. We consider the situation where the global planner does not have the authority to assign player ..."
Abstract
-
Cited by 20 (7 self)
- Add to MetaCart
We consider a variation of the resource allocation problem. In the traditional problem, there is a global planner who would like to assign a set of players to a set of resources so as to maximize welfare. We consider the situation where the global planner does not have the authority to assign players to resources; rather, players are self-interested. The question that emerges is how can the global planner entice the players to settle on a desirable allocation with respect to the global welfare? To study this question, we focus on a class of games that we refer to as distributed welfare games. Within this context, we investigate how the global planner should distribute the welfare to the players. We measure the efficacy of a distribution rule in two ways: (i) Does a pure Nash equilibrium exist? (ii) How does the welfare associated with a pure Nash equilibrium compare to the global welfare associated with the optimal allocation? In this paper we explore the applicability of cost sharing methodologies for distributing welfare in such resource allocation problems. We demonstrate that obtaining desirable distribution rules, such as distribution rules that are budget balanced and guarantee the existence of a pure Nash equilibrium, often comes at a significant informational and computational cost. In light of this, we derive a systematic procedure for designing desirable distribution rules with a minimal informational and computational cost for a special class of distributed welfare games. Furthermore, we derive a bound on the price of anarchy for distributed welfare games in a variety of settings. Lastly, we highlight the implications of these results using the problem of sensor coverage.
A model-free approach to wind farm control using game theoretic methdos
- IEEE Transactions on Control Systems Technology
, 2013
"... Abstract — This brief explores the applicability of recent results in game theory and cooperative control to the problem of optimizing energy production in wind farms. One such result is a model-free control strategy that is completely decentralized and leads to efficient system behavior in virtuall ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
(Show Context)
Abstract — This brief explores the applicability of recent results in game theory and cooperative control to the problem of optimizing energy production in wind farms. One such result is a model-free control strategy that is completely decentralized and leads to efficient system behavior in virtually any distributed system. We demonstrate that this learning rule can provably maximize energy production in wind farms without explicitly modeling the aerodynamic interaction amongst the turbines. Index Terms — Cooperative systems, networked control systems, wind farms. I.
Distributed coverage games for mobile visual sensors (i): Reaching the set of nash equilibria
- In Proc. of the 48th IEEE Conf. on Decision and Control and 28th Chinese Control Conference
, 2009
"... the set of global optima ..."
(Show Context)