Results 1  10
of
74
Cooperative MultiAgent Learning: The State of the Art
 Autonomous Agents and MultiAgent Systems
, 2005
"... Cooperative multiagent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multiagent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. ..."
Abstract

Cited by 182 (8 self)
 Add to MetaCart
(Show Context)
Cooperative multiagent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multiagent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to multiagent systems problems has spawned increasing interest in machine learning techniques to automate the search and optimization process. We provide a broad survey of the cooperative multiagent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning or robotics). In this survey we attempt to draw from multiagent learning work in a spectrum of areas, including reinforcement learning, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multiagent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss direct and indirect communication in connection with learning, plus open issues in task decomposition, scalability, and adaptive dynamics. We conclude with a presentation of multiagent learning problem domains, and a list of multiagent learning resources. 1
Coordination in Multiagent Reinforcement Learning: A Bayesian Approach
 In Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems
, 2003
"... Much emphasis in multiagent reinforcement learning (MARL) research is placed on ensuring that MARL algorithms (eventually) converge to desirable equilibria. As in standard reinforcement learning, convergence generally requires sufficient exploration of strategy space. However, exploration often com ..."
Abstract

Cited by 66 (6 self)
 Add to MetaCart
(Show Context)
Much emphasis in multiagent reinforcement learning (MARL) research is placed on ensuring that MARL algorithms (eventually) converge to desirable equilibria. As in standard reinforcement learning, convergence generally requires sufficient exploration of strategy space. However, exploration often comes at a price in the form of penalties or foregone opportunities. In multiagent settings, the problem is exacerbated by the need for agents to "coordinate" their policies on equilibria. We propose a Bayesian model for optimal exploration in MARL problems that allows these exploration costs to be weighed against their expected benefits using the notion of value of information. Unlike standard RL models, this model requires reasoning about how one's actions will influence the behavior of other agents. We develop tractable approximations to optimal Bayesian exploration, and report on experiments illustrating the benefits of this approach in identical interest games.
Improving coevolutionary search for optimal multiagent behaviors
 In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI
, 2003
"... Evolutionary computation is a useful technique for learning behaviors in multiagent systems. Among the several types of evolutionary computation, one natural and popular method is to coevolve multiagent behaviors in multiple, cooperating populations. Recent research has suggested that coevolutionary ..."
Abstract

Cited by 34 (12 self)
 Add to MetaCart
(Show Context)
Evolutionary computation is a useful technique for learning behaviors in multiagent systems. Among the several types of evolutionary computation, one natural and popular method is to coevolve multiagent behaviors in multiple, cooperating populations. Recent research has suggested that coevolutionary systems may favor stability rather than performance in some domains. In order to improve upon existing methods, this paper examines the idea of modifying traditional coevolution, biasing it to search for maximal rewards. We introduce a theoretical justification of the improved method and present experiments in three problem domains. We conclude that biasing can help coevolution find better results in some multiagent problem domains. 1
Theoretical advantages of lenient learners: An evolutionary game theoretic perspective
 Journal of Machine Learning Research
"... This paper presents the dynamics of multiple learning agents from an evolutionary game theoretic perspective. We provide replicator dynamics models for cooperative coevolutionary algorithms and for traditional multiagent Qlearning, and we extend these differential equations to account for lenient l ..."
Abstract

Cited by 24 (12 self)
 Add to MetaCart
(Show Context)
This paper presents the dynamics of multiple learning agents from an evolutionary game theoretic perspective. We provide replicator dynamics models for cooperative coevolutionary algorithms and for traditional multiagent Qlearning, and we extend these differential equations to account for lenient learners: agents that forgive possible mismatched teammate actions that resulted in low rewards. We use these extended formal models to study the convergence guarantees for these algorithms, and also to visualize the basins of attraction to optimal and suboptimal solutions in two benchmark coordination problems. The paper demonstrates that lenience provides learners with more accurate information about the benefits of performing their actions, resulting in higher likelihood of convergence to the globally optimal solution. In addition, the analysis indicates that the choice of learning algorithm has an insignificant impact on the overall performance of multiagent learning algorithms; rather, the performance of these algorithms depends primarily on the level of lenience that the agents exhibit to one another. Finally, the research herein supports the strength and generality of evolutionary game theory as a backbone for multiagent learning.
Theoretical considerations of potentialbased reward shaping for multiagent systems
 In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS
, 2011
"... Potentialbased reward shaping has previously been proven to both be equivalent to Qtable initialisation and guarantee policy invariance in singleagent reinforcement learning. The method has since been used in multiagent reinforcement learning without consideration of whether the theoretical equi ..."
Abstract

Cited by 21 (12 self)
 Add to MetaCart
(Show Context)
Potentialbased reward shaping has previously been proven to both be equivalent to Qtable initialisation and guarantee policy invariance in singleagent reinforcement learning. The method has since been used in multiagent reinforcement learning without consideration of whether the theoretical equivalence and guarantees hold. This paper extends the existing proofs to similar results in multiagent systems, providing the theoretical background to explain the success of previous empirical studies. Specifically, it is proven that the equivalence to Qtable initialisation remains and the Nash Equilibria of the underlying stochastic game are not modified. Furthermore, we demonstrate empirically that potentialbased reward shaping affects exploration and, consequentially, can alter the joint policy converged upon.
Asymmetric multiagent reinforcement learning
, 2004
"... A novel model for asymmetric multiagent reinforcement learning is introduced in this paper. The model addresses the problem where the information states of the agents involved in the learning task are not equal; some agents (leaders) have information how their opponents (followers) will select thei ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
A novel model for asymmetric multiagent reinforcement learning is introduced in this paper. The model addresses the problem where the information states of the agents involved in the learning task are not equal; some agents (leaders) have information how their opponents (followers) will select their actions and based on this information leaders encourage followers to select actions that lead to improved payoffs for the leaders. This kind of configuration arises e.g. in semicentralized multiagent systems with an external global utility associated to the system. We present a brief literature survey of multiagent reinforcement learning based on Markov games and then propose an asymmetric learning model that utilizes the theory of Markov games. Additionally, we construct a practical learning method based on the proposed learning model and study its convergence properties. Finally, we test our model with a simple example problem and a larger twolayer pricing application.
Learning to coordinate using commitment sequences in cooperative multiagentsystems
 in Proceedings of the Third Symposium on Adaptive Agents and Multiagent Systems (AAMAS03
, 2003
"... We report on an investigation of the learning of coordination in cooperative multiagent systems. Specifically, we study solutions that are applicable to independent agents, i.e., agents that do not observe one another’s actions and do not explicitly communicate with each other. In previously publish ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
(Show Context)
We report on an investigation of the learning of coordination in cooperative multiagent systems. Specifically, we study solutions that are applicable to independent agents, i.e., agents that do not observe one another’s actions and do not explicitly communicate with each other. In previously published work (Kapetanakis and Kudenko, 2002) we have presented a reinforcement learning approach that converges to the optimal joint action even in scenarios with high miscoordination costs. However, this approach failed in fully stochastic environments. In this paper, we present a novel approach based on reward estimation with a shared actionselection protocol. The new technique is applicable in fully stochastic environments where mutual observation of actions is not possible. We demonstrate empirically that our approach causes the agents to converge almost always to the optimal joint action even in difficult stochastic scenarios with high miscoordination penalties. 1
Resource Allocation Games with Changing Resource Capacities
 In Proceedings of the International Conference on Autonomous Agents and MultiAgent Systems (AAMAS2003
, 2002
"... In this paper we study a class of resource allocation games which are inspired by the El Farol Bar problem. We consider a system of competitive agents that have to choose between several resources characterized by their time dependent capacities. The agents using a particular resource are rewarded i ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
In this paper we study a class of resource allocation games which are inspired by the El Farol Bar problem. We consider a system of competitive agents that have to choose between several resources characterized by their time dependent capacities. The agents using a particular resource are rewarded if their number does not exceed the resource capacity, and punished otherwise. Agents use a set of strategies to decide what resource to choose, and use a simple reinforcement learning scheme to update the accuracy of strategies. A strategy in our model is simply a lookup table that suggests to an agent what resource to choose based on the actions of its neighbors at the previous time step. In other words, the agents form a social network whose connectivity controls the average number of neighbors with whom each agent interacts. This statement of the adaptive resource allocation problem allows us to fully parameterize it by a small set of numbers. We study the behavior of the system via numeric simulations of 100 to 5000 agents using one to ten resources. Our results indicate that for a certain range of parameters the system as a whole adapts e#ectively to the changing capacity levels and results in very little underor overutilization of the resources.
Learning against multiple opponents
 in Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent
, 2006
"... We address the problem of learning in repeated nplayer (as opposed to 2player) generalsum games, paying particular attention to the rarely addressed situation in which there are a mixture of agents of different types. We propose new criteria requiring that the agents employing a particular learni ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
We address the problem of learning in repeated nplayer (as opposed to 2player) generalsum games, paying particular attention to the rarely addressed situation in which there are a mixture of agents of different types. We propose new criteria requiring that the agents employing a particular learning algorithm work together to achieve a joint bestresponse against a target class of opponents, while guaranteeing they each achieve at least their individual securitylevel payoff against any possible set of opponents. We then provide algorithms that provably meet these criteria for two target classes: stationary strategies and adaptive strategies with a bounded memory. We also demonstrate that the algorithm for stationary strategies outperforms existing algorithms in tests spanning a wide variety of repeated games with more than two players.
Reaching paretooptimality in prisoner’s dilemma using conditional joint action learning
 AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS
, 2005
"... We consider a repeated Prisoner’s Dilemma game where two independent learning agents play against each other. We assume that the players can observe each others ’ action but are oblivious to the payoff received by the other player. Multiagent learning literature has provided mechanisms that allow ag ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
We consider a repeated Prisoner’s Dilemma game where two independent learning agents play against each other. We assume that the players can observe each others ’ action but are oblivious to the payoff received by the other player. Multiagent learning literature has provided mechanisms that allow agents to converge to Nash Equilibrium. In this paper we define a special class of learner called a conditional joint action learner (CJAL) who attempts to learn the conditional probability of an action taken by the other given its own action and uses it to decide its next course of action. We prove that when played against itself, if the payoff structure of Prisoner’s Dilemma game satisfies certain conditions, using a limited exploration technique these agents can actually learn to converge to the Pareto optimal solution that dominates the Nash Equilibrium, while maintaining individual rationality. We analytically derive the conditions for which such a phenomenon can occur and have shown experimental results to support our claim.