Learning Efficient Nash Equilibria in Distributed Systems
, 2010
Abstract. An individual’s learning rule is completely uncoupled if it does not depend on the actions or payoffs of anyone else. We propose a variant of log linear learning that is completely uncoupled and that selects an efficient pure Nash equilibrium in all generic nperson games that possess at least one pure Nash equilibrium. In games that do not have such an equilibrium, there is a simple formula that expresses the longrun probability of the various disequilibrium states in terms of two factors: i) the sum of payoffs over all agents, and ii) the maximum payoff gain that results from a unilateral deviation by some agent. This welfare/stability tradeoff criterion provides a novel framework for analyzing the selection of disequilibrium as well as equilibrium states in nperson games. JEL: C72, C73 1 1. Learning equilibrium in complex interactive systems Game theory has traditionally focussed on situations that involve a small number of players. In these environments it makes sense to assume that players know the structure of the game and can predict the strategic behavior of their opponents. But there are many situations involving huge numbers of players where these assumptions are not particularly persuasive.
Aspiration learning in coordination games
 in IEEE Conference on Decision and Control
, 2010
Abstract — We consider the problem of distributed convergence to efficient outcomes in coordination games through payoffbased learning dynamics, namely aspiration learning. The proposed learning scheme assumes that players reinforce well performed actions, by successively playing these actions, otherwise they randomize among alternative actions. Our first contribution is the characterization of the asymptotic behavior of the induced Markov chain of the iterated process by an equivalent finitestate Markov chain, which simplifies previously introduced analysis on aspiration learning. We then characterize explicitly the behavior of the proposed aspiration learning in a generalized version of socalled coordination games, an example of which is network formation games. In particular, we show that in coordination games the expected percentage of time that the efficient action profile is played can become arbitrarily large. I.
Achieving pareto optimal equilibria in energy efficient clustered ad hoc networks
 in Military Communication Conference, Milcom
, 2012
Abstract—In this paper, a decentralized iterative algorithm, namely the optimal dynamic learning (ODL) algorithm, is analysed. The ability of this algorithm of achieving a Pareto optimal working point exploiting only a minimal amount of information is shown. The algorithm performance is analysed in a clustered ad hoc network, where radio devices are assumed to operate above a minimal signal to interference plus noise ratio (SINR) threshold while minimizing the global power consumption. Sufficient analytical conditions for ODL to converge to the desired working point are provided, moreover through numerical simulations the ability of the algorithm to configure an interference limited network is shown. The performances of ODL and of a Nash equilibrium reaching algorithm are numerically compared, and their performance as a function of available resources is studied. The gain of ODL is shown to be larger when the amount of available radio resources is scarce.
Game Theory and Distributed Control
, 2012
Game theory has been employed traditionally as a modeling tool for describing and influencing behavior in societal systems. Recently, game theory has emerged as a valuable tool for controlling or prescribing behavior in distributed engineered systems. The rationale for this new perspective stems from the parallels between the underlying decision making architectures in both societal systems and distributed engineered systems. In particular, both settings involve an interconnection of decision making elements whose collective behavior depends on a compilation of local decisions that are based on partial information about each other and the state of the world. Accordingly, there is extensive work in game theory that is relevant to the engineering agenda. Similarities notwithstanding, there remain important differences between the constraints and objectives in societal and engineered systems that require looking at game theoretic methods from a new perspective. This chapter provides an overview of selected recent developments of game theoretic methods in this role as a framework for distributed control in engineered systems.
CORASMA Program on Cognitive Radio for Tactical Networks: High Fidelity Simulator and First Results on Dynamic Frequency Allocation
Abstract—This paper reports some preliminary results of the “cognitive radio for dynamic spectrum management” (CORASMA) program that is dedicated to the evaluation of cognitive solutions for tactical wireless networks. It presents two main aspects of the program: the simulator and the cognitive solutions proposed by the authors. The first part is dedicated to the simulator. We explain the rationale used to design its architecture, and how this architecture allows to assess and compare different cognitive solutions in an operational context. The second part addresses the dynamic frequency allocation topic that is part of the cognitive solutions tackled in the program CORASMA. We first give an overview of the challenges attached to this problem in the military context and then we expose the technical solutions studied by the authors for this purpose. Finally, we present some results obtained from the simulator as an illustration. I.
Information Management in the Smart Grid: A Learning Game Approach
, 2013
In this article, the smart grid is modeled as a decentralized and hierarchical network, made of three categories of agents: producers, providers and microgrids. To optimize their decisions concerning the energy prices and the traded quantities of energy, the agents need to forecast the energy productions and the demand of the microgrids. The biases resulting from the decentralized learning might create imbalances between demand and supply, leading to penalties for the providers and for the producers. We determine analytically prices that provide to the producers a guarantee to avoid such penalties, reporting all the risk on the providers. Additionally, we prove that collaborative learning, through a grand coalition of providers where information is shared and forecasts aligned on a single value, minimizes their average risk. Simulations, run on a toy network, lead us to observe that the convergence rates of the collaborative learning strategy are clearly superior to rates resulting from distributed learning, using external and internal regret minimization.
Learning in a Black Box
, 2013
Many interactive environments can be represented as games, but they are so large and complex that individual players are in the dark about what others are doing and how their own payoffs are affected. This paper analyzes learning behavior in such ‘black box’ environments, where players ’ only source of information is their own history of actions taken and payoffs received. Specifically we study repeated public goods games, where players must decide how much to contribute at each stage, but they do not know how much others have contributed or how others ’ contributions affect their own payoffs. We identify two key features of the players ’ learning dynamics. First, if a player’s realized payoff increases he is less inclined to change his strategy, whereas if his realized payoff decreases he is more inclined to change his strategy. Second, if increasing his own contribution results in higher payoffs he will tend to increase his contribution still further, whereas the reverse holds if an increase in contribution leads to lower payoffs. These two effects are clearly present when players have no information about the game; moreover they are still present even when players have full information. Convergence to Nash equilibrium
ABSTRACT Title of dissertation: LEARNING IN ENGINEERED MULTIAGENT SYSTEMS
Consider the problem of maximizing the total power produced by a wind farm. Due to aerodynamic interactions between wind turbines, each turbine maximizing its individual power—as is the case in presentday wind farms—does not lead to optimal farmlevel power capture. Further, there are no good models to capture the said aerodynamic interactions, rendering model based optimization techniques ineffective. Thus, modelfree distributed algorithms are needed that help turbines adapt their power production online so as to maximize farmlevel power capture. Motivated by such problems, the main focus of this dissertation is a distributed modelfree optimization problem in the context of multiagent systems. The setup comprises of a fixed number of agents, each of which can pick an action and observe the value of its individual utility function. An individual’s utility function may depend on the collective action taken by all agents. The exact functional form (or model) of the agent utility functions, however, are unknown; an agent can only measure the numeric value of its utility. The objective of the multiagent system is to optimize the welfare function (i.e. sum of the individual utility functions).
ScienceDirect A behavioral study of "noise" in coordination gamesNCND license (http://creativecommons.org/licenses/byncnd/4.0/)
Abstract 'Noise' in this study, in the sense of evolutionary game theory, refers to deviations from prevailing behavioral rules. Analyzing data from a laboratory experiment on coordination in networks, we tested 'what kind of noise' is supported by behavioral evidence. This empirical analysis complements a growing theoretical literature on 'how noise matters' for equilibrium selection. We find that the vast majority of decisions (96%) constitute myopic best responses, but deviations continue to occur with probabilities that are sensitive to their costs, that is, less frequent when implying larger payoff losses relative to the myopic best response. In addition, deviation rates vary with patterns of realized payoffs that are related to trialanderror behavior. While there is little evidence that deviations are clustered in time or space, there is evidence of individual heterogeneity.