Results 1  10
of
30
Revisiting LogLinear Learning: Asynchrony, Completeness and PayoffBased Implementation
, 2008
"... Loglinear learning is a learning algorithm with equilibrium selection properties. Loglinear learning provides guarantees on the percentage of time that the joint action profile will be at a potential maximizer in potential games. The traditional analysis of loglinear learning has centered around ..."
Abstract

Cited by 42 (14 self)
 Add to MetaCart
Loglinear learning is a learning algorithm with equilibrium selection properties. Loglinear learning provides guarantees on the percentage of time that the joint action profile will be at a potential maximizer in potential games. The traditional analysis of loglinear learning has centered around explicitly computing the stationary distribution. This analysis relied on a highly structured setting: i) players ’ utility functions constitute a potential game, ii) players update their strategies one at a time, which we refer to as asynchrony, iii) at any stage, a player can select any action in the action set, which we refer to as completeness, and iv) each player is endowed with the ability to assess the utility he would have received for any alternative action provided that the actions of all other players remain fixed. Since the appeal of loglinear learning is not solely the explicit form of the stationary distribution, we seek to address to what degree one can relax the structural assumptions while maintaining that only potential function maximizers are the stochastically stable action profiles. In this paper, we introduce slight variants of loglinear learning to include both synchronous updates and incomplete action sets. In both settings, we prove that only potential function maximizers are stochastically stable. Furthermore, we introduce a payoffbased version of loglinear learning, in which players are only aware of the utility they received and the action that they played. Note that loglinear learning in its original form is not a payoffbased learning algorithm. In payoffbased loglinear learning, we also prove that only potential maximizers are stochastically stable. The key enabler for these results is to change the focus of the analysis away from deriving the explicit form of the stationary distribution of the learning process towards characterizing the stochastically stable states. The resulting analysis uses the theory of resistance trees for regular perturbed Markov decision processes, thereby allowing a relaxation of the aforementioned structural assumptions.
Cooperative control and potential game
 IEEE Trans. Syst., Man, Cybern. B
, 2009
"... Abstract—We present a view of cooperative control using the language of learning in games. We review the gametheoretic concepts of potential and weakly acyclic games, and demonstrate how several cooperative control problems, such as consensus and dynamic sensor coverage, can be formulated in these ..."
Abstract

Cited by 32 (7 self)
 Add to MetaCart
(Show Context)
Abstract—We present a view of cooperative control using the language of learning in games. We review the gametheoretic concepts of potential and weakly acyclic games, and demonstrate how several cooperative control problems, such as consensus and dynamic sensor coverage, can be formulated in these settings. Motivated by this connection, we build upon gametheoretic concepts to better accommodate a broader class of cooperative control problems. In particular, we extend existing learning algorithms to accommodate restricted action sets caused by the limitations of agent capabilities and groupbased decision making. Furthermore, we also introduce a new class of games called sometimes weakly acyclic games for timevarying objective functions and action sets, and provide distributed algorithms for convergence to an equilibrium. Index Terms—Cooperative control, game theory, learning in games, multiagent systems. I.
Regret based dynamics: Convergence in weakly acyclic games
 In Proceedings of the 2007 International Conference on Autonomous Agents and Multiagent Systems (AAMAS
, 2007
"... Regret based algorithms have been proposed to control a wide variety of multiagent systems. The appeal of regretbased algorithms is that (1) these algorithms are easily implementable in large scale multiagent systems and (2) there are existing results proving that the behavior will asymptotically ..."
Abstract

Cited by 28 (12 self)
 Add to MetaCart
(Show Context)
Regret based algorithms have been proposed to control a wide variety of multiagent systems. The appeal of regretbased algorithms is that (1) these algorithms are easily implementable in large scale multiagent systems and (2) there are existing results proving that the behavior will asymptotically converge to a set of points of “noregret ” in any game. We illustrate, through a simple example, that noregret points need not reflect desirable operating conditions for a multiagent system. Multiagent systems often exhibit an additional structure (i.e. being “weakly acyclic”) that has not been exploited in the context of regret based algorithms. In this paper, we introduce a modification of regret based algorithms by (1) exponentially discounting the memory and (2) bringing in a notion of inertia in players ’ decision process. We show how these modifications can lead to an entire class of regret based algorithm that provide almost sure convergence to a pure Nash equilibrium in any weakly acyclic game.
Achieving Pareto Optimality Through Distributed Learning
, 2012
"... We propose a simple payoffbased learning rule that is completely decentralized, and that leads to an efficient configuration of actions in any nperson finite strategicform game with generic payoffs. The algorithm follows the theme of exploration versus exploitation and is hence stochastic in natu ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
We propose a simple payoffbased learning rule that is completely decentralized, and that leads to an efficient configuration of actions in any nperson finite strategicform game with generic payoffs. The algorithm follows the theme of exploration versus exploitation and is hence stochastic in nature. We prove that if all agents adhere to this algorithm, then the agents will select the action profile that maximizes the sum of the agents ’ payoffs a high percentage of time. The algorithm requires no communication. Agents respond solely to changes in their own realized payoffs, which are affected by the actions of other agents in the system in ways that they do not necessarily understand. The method can be applied to the optimization of complex systems with many distributed components, such as the routing of information in networks and the design and control of wind farms. The proof of the proposed learning algorithm relies on the theory of large deviations for perturbed Markov chains.
Distributed Welfare Games
"... We consider a variation of the resource allocation problem. In the traditional problem, there is a global planner who would like to assign a set of players to a set of resources so as to maximize welfare. We consider the situation where the global planner does not have the authority to assign player ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
We consider a variation of the resource allocation problem. In the traditional problem, there is a global planner who would like to assign a set of players to a set of resources so as to maximize welfare. We consider the situation where the global planner does not have the authority to assign players to resources; rather, players are selfinterested. The question that emerges is how can the global planner entice the players to settle on a desirable allocation with respect to the global welfare? To study this question, we focus on a class of games that we refer to as distributed welfare games. Within this context, we investigate how the global planner should distribute the welfare to the players. We measure the efficacy of a distribution rule in two ways: (i) Does a pure Nash equilibrium exist? (ii) How does the welfare associated with a pure Nash equilibrium compare to the global welfare associated with the optimal allocation? In this paper we explore the applicability of cost sharing methodologies for distributing welfare in such resource allocation problems. We demonstrate that obtaining desirable distribution rules, such as distribution rules that are budget balanced and guarantee the existence of a pure Nash equilibrium, often comes at a significant informational and computational cost. In light of this, we derive a systematic procedure for designing desirable distribution rules with a minimal informational and computational cost for a special class of distributed welfare games. Furthermore, we derive a bound on the price of anarchy for distributed welfare games in a variety of settings. Lastly, we highlight the implications of these results using the problem of sensor coverage.
A modelfree approach to wind farm control using game theoretic methdos
 IEEE Transactions on Control Systems Technology
, 2013
"... Abstract — This brief explores the applicability of recent results in game theory and cooperative control to the problem of optimizing energy production in wind farms. One such result is a modelfree control strategy that is completely decentralized and leads to efficient system behavior in virtuall ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Abstract — This brief explores the applicability of recent results in game theory and cooperative control to the problem of optimizing energy production in wind farms. One such result is a modelfree control strategy that is completely decentralized and leads to efficient system behavior in virtually any distributed system. We demonstrate that this learning rule can provably maximize energy production in wind farms without explicitly modeling the aerodynamic interaction amongst the turbines. Index Terms — Cooperative systems, networked control systems, wind farms. I.
Distributed coverage games for mobile visual sensors (i): Reaching the set of nash equilibria
 In Proc. of the 48th IEEE Conf. on Decision and Control and 28th Chinese Control Conference
, 2009
"... the set of global optima ..."
(Show Context)
Designing Games for Distributed Optimization
"... Abstract — The central goal in multiagent systems is to design local control laws for the individual agents to ensure that the emergent global behavior is desirable with respect to a given system level objective. Ideally, a system designer seeks to satisfy this goal while conditioning each agent’s c ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
Abstract — The central goal in multiagent systems is to design local control laws for the individual agents to ensure that the emergent global behavior is desirable with respect to a given system level objective. Ideally, a system designer seeks to satisfy this goal while conditioning each agent’s control law on the least amount of information possible. Unfortunately, there are no existing methodologies for addressing this design challenge. The goal of this paper is to address this challenge using the field of game theory. Utilizing game theory for the design and control of multiagent systems requires two steps: (i) defining a local objective function for each decision maker and (ii) specifying a distributed learning algorithm to reach a desirable operating point. One of the core advantages of this game theoretic approach is that this two step process can be decoupled by utilizing specific classes of games. For example, if the designed objective functions result in a potential game then the system designer can utilize distributed learning algorithms for potential games to complete step (ii) of the design process. Unfortunately, designing agent objective functions to meet objectives such as locality of information and efficiency of resulting equilibria within the framework of potential games is fundamentally challenging and in many case impossible. In this paper we develop a systematic methodology for meeting these objectives using a broader framework of games termed state based potential games. State based potential games is an extension of potential games where an additional state variable is introduced into the game environment hence permitting more flexibility in our design space. Furthermore, state based potential games possess an underlying structure that can be exploited by distributed learning algorithms in a similar fashion to potential games hence providing a new baseline for our decomposition. I.
Decoupling Coupled Constraints Through Utility Design
"... The central goal in multiagent systems is to engineer a decision making architecture where agents make independent decisions in response to local information while ensuring that the emergent global behavior is desirable with respect to a given system level objective. In many systems this control de ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
The central goal in multiagent systems is to engineer a decision making architecture where agents make independent decisions in response to local information while ensuring that the emergent global behavior is desirable with respect to a given system level objective. In many systems this control design is further complicated by coupled constraints on the agents’ behavior. This paper seeks to address the design of such algorithms using the field of game theory. In particular, we derive a systematic methodology for designing local agent utility functions such that (i) all resulting pure Nash equilibria of the designed game optimize the given system level objective and satisfy the given coupled constraint (ii) the resulting game possesses an inherent structure that can be exploited in distributed learning, e.g., potential games. Such developments would greatly simplify the control design by eliminating the need to explicitly consider the constraint. One key to this realization is introducing an estimate of the coupled constraint and incorporating exterior penalty functions and barrier functions into the design of the agents’ utility functions.