Results 1  10
of
104
Revisiting LogLinear Learning: Asynchrony, Completeness and PayoffBased Implementation
, 2008
"... Loglinear learning is a learning algorithm with equilibrium selection properties. Loglinear learning provides guarantees on the percentage of time that the joint action profile will be at a potential maximizer in potential games. The traditional analysis of loglinear learning has centered around ..."
Abstract

Cited by 42 (11 self)
 Add to MetaCart
Loglinear learning is a learning algorithm with equilibrium selection properties. Loglinear learning provides guarantees on the percentage of time that the joint action profile will be at a potential maximizer in potential games. The traditional analysis of loglinear learning has centered around explicitly computing the stationary distribution. This analysis relied on a highly structured setting: i) players ’ utility functions constitute a potential game, ii) players update their strategies one at a time, which we refer to as asynchrony, iii) at any stage, a player can select any action in the action set, which we refer to as completeness, and iv) each player is endowed with the ability to assess the utility he would have received for any alternative action provided that the actions of all other players remain fixed. Since the appeal of loglinear learning is not solely the explicit form of the stationary distribution, we seek to address to what degree one can relax the structural assumptions while maintaining that only potential function maximizers are the stochastically stable action profiles. In this paper, we introduce slight variants of loglinear learning to include both synchronous updates and incomplete action sets. In both settings, we prove that only potential function maximizers are stochastically stable. Furthermore, we introduce a payoffbased version of loglinear learning, in which players are only aware of the utility they received and the action that they played. Note that loglinear learning in its original form is not a payoffbased learning algorithm. In payoffbased loglinear learning, we also prove that only potential maximizers are stochastically stable. The key enabler for these results is to change the focus of the analysis away from deriving the explicit form of the stationary distribution of the learning process towards characterizing the stochastically stable states. The resulting analysis uses the theory of resistance trees for regular perturbed Markov decision processes, thereby allowing a relaxation of the aforementioned structural assumptions.
Payoffbased dynamics for multiplayer weakly acyclic games
 SIAM J. CONTROL OPT
, 2009
"... We consider repeated multiplayer games in which players repeatedly and simultaneously choose strategies from a finite set of available strategies according to some strategy adjustment process. We focus on the specific class of weakly acyclic games, which is particularly relevant for multiagent coo ..."
Abstract

Cited by 33 (12 self)
 Add to MetaCart
(Show Context)
We consider repeated multiplayer games in which players repeatedly and simultaneously choose strategies from a finite set of available strategies according to some strategy adjustment process. We focus on the specific class of weakly acyclic games, which is particularly relevant for multiagent cooperative control problems. A strategy adjustment process determines how players select their strategies at any stage as a function of the information gathered over previous stages. Of particular interest are “payoffbased ” processes in which, at any stage, players know only their own actions and (noise corrupted) payoffs from previous stages. In particular, players do not know the actions taken by other players and do not know the structural form of payoff functions. We introduce three different payoffbased processes for increasingly general scenarios and prove that, after a sufficiently large number of stages, player actions constitute a Nash equilibrium at any stage with arbitrarily high probability. We also show how to modify player utility functions through tolls and incentives in socalled congestion games, a special class of weakly acyclic games, to guarantee that a centralized objective can be realized as a Nash equilibrium. We illustrate the methods with a simulation of distributed routing over a network.
Payoff Based Dynamics for MultiPlayer Weakly Acyclic Games
 SIAM JOURNAL ON CONTROL AND OPTIMIZATION, SPECIAL ISSUE ON CONTROL AND OPTIMIZATION IN COOPERATIVE NETWORKS
, 2007
"... We consider repeated multiplayer games in which players repeatedly and simultaneously choose strategies from a finite set of available strategies according to some strategy adjustment process. We focus on the specific class of weakly acyclic games, which is particularly relevant for multiagent coo ..."
Abstract

Cited by 28 (15 self)
 Add to MetaCart
(Show Context)
We consider repeated multiplayer games in which players repeatedly and simultaneously choose strategies from a finite set of available strategies according to some strategy adjustment process. We focus on the specific class of weakly acyclic games, which is particularly relevant for multiagent cooperative control problems. A strategy adjustment process determines how players select their strategies at any stage as a function of the information gathered over previous stages. Of particular interest are “payoff based ” processes, in which at any stage, players only know their own actions and (noise corrupted) payoffs from previous stages. In particular, players do not know the actions taken by other players and do not know the structural form of payoff functions. We introduce three different payoff based processes for increasingly general scenarios and prove that after a sufficiently large number of stages, player actions constitute a Nash equilibrium at any stage with arbitrarily high probability. We also show how to modify player utility functions through tolls and incentives in socalled congestion games, a special class of weakly acyclic games, to guarantee that a centralized objective can be realized as a Nash equilibrium. We illustrate the methods with a simulation of distributed routing over a network.
Delaysensitive resource management in multihop cognitive radio networks
 in: Proceedings of the IEEE DySPAN
, 2008
"... Abstract–Dynamic resource management by the various cognitive nodes fundamentally changes the passive way that wireless nodes are currently adapting their transmission strategies to match available wireless resources, by enabling them to consciously influence the wireless system dynamics based on th ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
(Show Context)
Abstract–Dynamic resource management by the various cognitive nodes fundamentally changes the passive way that wireless nodes are currently adapting their transmission strategies to match available wireless resources, by enabling them to consciously influence the wireless system dynamics based on the gathered information about other network nodes. In this paper, we discuss the main challenges of performing such dynamic resource management by emphasizing the distributed information in the dynamic multiagent system. Specifically, the decisions on how to adapt the aforementioned resource management at sources and relays need to be performed in an informationallydecentralized manner, as the tolerable delay does not allow propagating information back and forth throughout the multihop infrastructure to a centralized decision maker. The term “cognitive ” refers in our paper to both the capability of the network nodes to achieving large spectral efficiencies through exploitation and mitigation of channel and interference variability by dynamically using different frequency bands as well as their ability to learn the “environment” (channel conditions and source characteristic) and the actions of competing nodes through the designed information exchange. We propose our dynamic resource management algorithms performed at each network nodes integrated with multiagent learning that explicitly consider the timeliness and the cost of such information exchange. The results show that our dynamic resource management approach improves the PSNR of multiple video streams by more than 3dB as opposed to the stateoftheart dynamic frequency channel/route selection approaches without learning capability, when the network resources are limited.
Theoretical considerations of potentialbased reward shaping for multiagent systems
 In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS
, 2011
"... Potentialbased reward shaping has previously been proven to both be equivalent to Qtable initialisation and guarantee policy invariance in singleagent reinforcement learning. The method has since been used in multiagent reinforcement learning without consideration of whether the theoretical equi ..."
Abstract

Cited by 21 (12 self)
 Add to MetaCart
(Show Context)
Potentialbased reward shaping has previously been proven to both be equivalent to Qtable initialisation and guarantee policy invariance in singleagent reinforcement learning. The method has since been used in multiagent reinforcement learning without consideration of whether the theoretical equivalence and guarantees hold. This paper extends the existing proofs to similar results in multiagent systems, providing the theoretical background to explain the success of previous empirical studies. Specifically, it is proven that the equivalence to Qtable initialisation remains and the Nash Equilibria of the underlying stochastic game are not modified. Furthermore, we demonstrate empirically that potentialbased reward shaping affects exploration and, consequentially, can alter the joint policy converged upon.
Social Reward Shaping in the Prisoner’s Dilemma (Short Paper)
"... Reward shaping is a wellknown technique applied to help reinforcementlearning agents converge more quickly to nearoptimal behavior. In this paper, we introduce social reward shaping, which is reward shaping applied in the multiagentlearning framework. We present preliminary experiments in the iter ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
(Show Context)
Reward shaping is a wellknown technique applied to help reinforcementlearning agents converge more quickly to nearoptimal behavior. In this paper, we introduce social reward shaping, which is reward shaping applied in the multiagentlearning framework. We present preliminary experiments in the iterated Prisoner’s dilemma setting that show that agents using social reward shaping appropriately can behave more effectively than other classical learning and nonlearning strategies. In particular, we show that these agents can both lead —encourage adaptive opponents to stably cooperate — and follow —adopt a bestresponse strategy when paired with a fixed opponent — where better known approaches achieve only one of these objectives.
Frequency Adjusted Multiagent Qlearning
"... Multiagent learning is a crucial method to control or find solutions for systems, in which more than one entity needs to be adaptive. In today’s interconnected world, such systems are ubiquitous in many domains, including auctions in economics, swarm robotics in computer science, and politics in so ..."
Abstract

Cited by 19 (11 self)
 Add to MetaCart
(Show Context)
Multiagent learning is a crucial method to control or find solutions for systems, in which more than one entity needs to be adaptive. In today’s interconnected world, such systems are ubiquitous in many domains, including auctions in economics, swarm robotics in computer science, and politics in social sciences. Multiagent learning is inherently more complex than singleagent learning and has a relatively thin theoretical framework supporting it. Recently, multiagent learning dynamics have been linked to evolutionary game theory, allowing the interpretation of learning as an evolution of competing policies in the mind of the learning agents. The dynamical system from evolutionary game theory that has been linked to Qlearning predicts the expected behavior of the learning agents. Closer analysis however allows for two interesting observations: the predicted behavior is not always the same as the actual behavior, and in case of deviation, the predicted behavior is more desirable. This discrepancy is elucidated in this article, and based on these new insights Frequency Adjusted Q (FAQ) learning is proposed. This variation of Qlearning perfectly adheres to the predictions of the evolutionary model for an arbitrarily large part of the policy space. In addition to the theoretical discussion, experiments in the three classes of twoagent twoaction games illustrate the superiority of FAQlearning.
An Empirical Study of PotentialBased Reward Shaping and Advice in Complex, MultiAgent Systems
 ADVANCES IN COMPLEX SYSTEMS C
, 2011
"... This paper investigates the impact of reward shaping in multiagent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory, potentialbased reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We d ..."
Abstract

Cited by 18 (9 self)
 Add to MetaCart
This paper investigates the impact of reward shaping in multiagent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory, potentialbased reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We demonstrate empirically the performance of reward shaping in two problem domains within the context of RoboCup KeepAway by designing three reward shaping schemes, encouraging specific behaviour such as keeping a minimum distance from other players on the same team and taking on specific roles. The results illustrate that reward shaping with multiple, simultaneous learning agents can reduce the time needed to learn a suitable policy and can alter the final group performance.
Multiagent learning for engineers
 Artificial Intelligence
, 2007
"... As suggested by the title of Shoham, Powers, and Grenager’s position paper [34], the ultimate lens through which the multiagent learning framework should be assessed is “what is the question?”. In this paper, we address this question by presenting challenges motivated by engineering applications an ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
(Show Context)
As suggested by the title of Shoham, Powers, and Grenager’s position paper [34], the ultimate lens through which the multiagent learning framework should be assessed is “what is the question?”. In this paper, we address this question by presenting challenges motivated by engineering applications and discussing the potential appeal of multiagent learning to meet these challenges. Moreover, we highlight various differences in the underlying assumptions and issues of concern that generally distinguish engineering applications from models that are typically considered in the economic game theory literature. 1