Results 1  10
of
32
Revisiting LogLinear Learning: Asynchrony, Completeness and PayoffBased Implementation
, 2008
"... Loglinear learning is a learning algorithm with equilibrium selection properties. Loglinear learning provides guarantees on the percentage of time that the joint action profile will be at a potential maximizer in potential games. The traditional analysis of loglinear learning has centered around ..."
Abstract

Cited by 42 (11 self)
 Add to MetaCart
Loglinear learning is a learning algorithm with equilibrium selection properties. Loglinear learning provides guarantees on the percentage of time that the joint action profile will be at a potential maximizer in potential games. The traditional analysis of loglinear learning has centered around explicitly computing the stationary distribution. This analysis relied on a highly structured setting: i) players ’ utility functions constitute a potential game, ii) players update their strategies one at a time, which we refer to as asynchrony, iii) at any stage, a player can select any action in the action set, which we refer to as completeness, and iv) each player is endowed with the ability to assess the utility he would have received for any alternative action provided that the actions of all other players remain fixed. Since the appeal of loglinear learning is not solely the explicit form of the stationary distribution, we seek to address to what degree one can relax the structural assumptions while maintaining that only potential function maximizers are the stochastically stable action profiles. In this paper, we introduce slight variants of loglinear learning to include both synchronous updates and incomplete action sets. In both settings, we prove that only potential function maximizers are stochastically stable. Furthermore, we introduce a payoffbased version of loglinear learning, in which players are only aware of the utility they received and the action that they played. Note that loglinear learning in its original form is not a payoffbased learning algorithm. In payoffbased loglinear learning, we also prove that only potential maximizers are stochastically stable. The key enabler for these results is to change the focus of the analysis away from deriving the explicit form of the stationary distribution of the learning process towards characterizing the stochastically stable states. The resulting analysis uses the theory of resistance trees for regular perturbed Markov decision processes, thereby allowing a relaxation of the aforementioned structural assumptions.
Achieving Pareto Optimality Through Distributed Learning
, 2012
"... We propose a simple payoffbased learning rule that is completely decentralized, and that leads to an efficient configuration of actions in any nperson finite strategicform game with generic payoffs. The algorithm follows the theme of exploration versus exploitation and is hence stochastic in natu ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
We propose a simple payoffbased learning rule that is completely decentralized, and that leads to an efficient configuration of actions in any nperson finite strategicform game with generic payoffs. The algorithm follows the theme of exploration versus exploitation and is hence stochastic in nature. We prove that if all agents adhere to this algorithm, then the agents will select the action profile that maximizes the sum of the agents ’ payoffs a high percentage of time. The algorithm requires no communication. Agents respond solely to changes in their own realized payoffs, which are affected by the actions of other agents in the system in ways that they do not necessarily understand. The method can be applied to the optimization of complex systems with many distributed components, such as the routing of information in networks and the design and control of wind farms. The proof of the proposed learning algorithm relies on the theory of large deviations for perturbed Markov chains.
Distributed power allocation with sinr constraints using trial and error learning
 in Military Communications and Information Systems Conference, MCC, Gdańsk
, 2012
"... Abstract—In this paper, we address the problem of global transmit power minimization in a selfconfiguring network where radio devices are subject to operate at a minimum signal to interference plus noise ratio (SINR) level. We model the network as a parallel Gaussian interference channel and we int ..."
Abstract

Cited by 8 (7 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper, we address the problem of global transmit power minimization in a selfconfiguring network where radio devices are subject to operate at a minimum signal to interference plus noise ratio (SINR) level. We model the network as a parallel Gaussian interference channel and we introduce a fully decentralized algorithm (based on trial and error) able to statistically achieve a configuration where the performance demands are met. Contrary to existing solutions, our algorithm requires only local information and can learn stable and efficient working points by using only one bit feedback. We model the network under two different game theoretical frameworks: normal form and satisfaction form. We show that the converging points correspond to equilibrium points, namely Nash and satisfaction equilibrium. Similarly, we provide sufficient conditions for the algorithm to converge in both formulations. Moreover, we provide analytical results to estimate the algorithm’s performance, as a function of the network parameters. Finally, numerical results are provided to validate our theoretical conclusions.
Multiagent learning in large anonymous games
 Journal of Artificial Intelligence Research
, 2011
"... Abstract In large systems, it is important for agents to learn to act effectively, but sophisticated multiagent learning algorithms generally do not scale. An alternative approach is to find restricted classes of games where simple, efficient algorithms converge. It is shown that stage learning ef ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract In large systems, it is important for agents to learn to act effectively, but sophisticated multiagent learning algorithms generally do not scale. An alternative approach is to find restricted classes of games where simple, efficient algorithms converge. It is shown that stage learning efficiently converges to Nash equilibria in large anonymous games if bestreply dynamics converge. Two features are identified that improve convergence. First, rather than making learning more difficult, more agents are actually beneficial in many settings. Second, providing agents with statistical information about the behavior of others can significantly reduce the number of observations needed.
On the complexity of trial and error
, 2013
"... Motivated by certain applications from physics, biochemistry, economics, and computer science in which the objects under investigation are unknown or not directly accessible because of various limitations, we propose a trialanderror model to examine search problems in which inputs are unknown. M ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Motivated by certain applications from physics, biochemistry, economics, and computer science in which the objects under investigation are unknown or not directly accessible because of various limitations, we propose a trialanderror model to examine search problems in which inputs are unknown. More specifically, we consider constraint satisfaction problems i Ci, where the constraints Ci are hidden, and the goal is to find a solution satisfying all constraints. We can adaptively propose a candidate solution (i.e., trial), and there is a verification oracle that either confirms that it is a valid solution, or returns the index i of a violated constraint (i.e., error), with the exact content of Ci still hidden. We studied the time and trial complexities of a number of
Achieving pareto optimal equilibria in energy efficient clustered ad hoc networks
 in Military Communication Conference, Milcom
, 2012
"... Abstract—In this paper, a decentralized iterative algorithm, namely the optimal dynamic learning (ODL) algorithm, is analysed. The ability of this algorithm of achieving a Pareto optimal working point exploiting only a minimal amount of information is shown. The algorithm performance is analysed in ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Abstract—In this paper, a decentralized iterative algorithm, namely the optimal dynamic learning (ODL) algorithm, is analysed. The ability of this algorithm of achieving a Pareto optimal working point exploiting only a minimal amount of information is shown. The algorithm performance is analysed in a clustered ad hoc network, where radio devices are assumed to operate above a minimal signal to interference plus noise ratio (SINR) threshold while minimizing the global power consumption. Sufficient analytical conditions for ODL to converge to the desired working point are provided, moreover through numerical simulations the ability of the algorithm to configure an interference limited network is shown. The performances of ODL and of a Nash equilibrium reaching algorithm are numerically compared, and their performance as a function of available resources is studied. The gain of ODL is shown to be larger when the amount of available radio resources is scarce.
Game Theory and Distributed Control
, 2012
"... Game theory has been employed traditionally as a modeling tool for describing and influencing behavior in societal systems. Recently, game theory has emerged as a valuable tool for controlling or prescribing behavior in distributed engineered systems. The rationale for this new perspective stems fro ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Game theory has been employed traditionally as a modeling tool for describing and influencing behavior in societal systems. Recently, game theory has emerged as a valuable tool for controlling or prescribing behavior in distributed engineered systems. The rationale for this new perspective stems from the parallels between the underlying decision making architectures in both societal systems and distributed engineered systems. In particular, both settings involve an interconnection of decision making elements whose collective behavior depends on a compilation of local decisions that are based on partial information about each other and the state of the world. Accordingly, there is extensive work in game theory that is relevant to the engineering agenda. Similarities notwithstanding, there remain important differences between the constraints and objectives in societal and engineered systems that require looking at game theoretic methods from a new perspective. This chapter provides an overview of selected recent developments of game theoretic methods in this role as a framework for distributed control in engineered systems.
Efficiency and equilibrium in trial and error learning
, 2010
"... Abstract. In trial and error learning, agents experiment with new strategies and adopt them with a probability that depends on their realized payoffs. Such rules are completely uncoupled, that is, each agent's behaviour depends only on his own realized payoffs and not on the payoffs or actions ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In trial and error learning, agents experiment with new strategies and adopt them with a probability that depends on their realized payoffs. Such rules are completely uncoupled, that is, each agent's behaviour depends only on his own realized payoffs and not on the payoffs or actions of anyone else. We show that by modifying a trial and error learning rule proposed by Young (2009) we obtain a completely uncoupled learning process that selects a Pareto optimal equilibrium whenever a pure equilibrium exists. When a pure equilibrium does not exist, there is a simple formula that relates the longrun likelihood of each disequilibrium state to the total payoff over all agents and the maximum payoff gain that would result from a unilateral deviation by some agent. This welfare/stability tradeoff criterion provides a novel framework for analyzing the selection of disequilibrium as well as equilibrium states in finite nperson games. Acknowledgements. We thank Gabriel Kreindler for suggesting a number of improvements
Joint channel and power allocation in tactical cognitive networks: Enhanced trial and errors
 the Military Communications and Information Systems Conference (MCC), Saint–Malo
, 2013
"... Abstract—In tactical networks the presence of a central controller (e.g., a base station) is made impractical by the unpredictability of the nodes ’ positions and by the fact that its presence can be exploited by hostile entities. As a consequence, selfconfiguring networks are sought for military a ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract—In tactical networks the presence of a central controller (e.g., a base station) is made impractical by the unpredictability of the nodes ’ positions and by the fact that its presence can be exploited by hostile entities. As a consequence, selfconfiguring networks are sought for military and emergency communication networks. In such networks, the transmission parameters, most notably the transmission channel and the power level, are set by the devices following specific behavioural rules. In this context, an algorithm for selfconfiguring wireless networks is presented, analysed and enhanced to meet the specific needs of tactical networks. Such an algorithm, based on the concept of trial and error, is tested under static and mobile situations, and different metrics are considered to show its performance. In particular, the stability and performance improvements with respect to previously proposed versions of the algorithm are detailed. I.