Results 1  10
of
29
Collaborative Multiagent Reinforcement Learning by Payoff Propagation
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose t ..."
Abstract

Cited by 65 (2 self)
 Add to MetaCart
In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the global payoff function into a sum of local terms. First, we deal with the singlestate case and describe a payoff propagation algorithm that computes the individual actions that approximately maximize the global payoff function. The method can be viewed as the decisionmaking analogue of belief propagation in Bayesian networks. Second, we focus on learning the behavior of the agents in sequential decisionmaking tasks. We introduce different modelfree reinforcementlearning techniques, unitedly called Sparse Cooperative Qlearning, which approximate the global actionvalue function based on the topology of a coordination graph, and perform updates using the contribution of the individual agents to the maximal global action value. The combined use of an edgebased decomposition of the actionvalue function and the payoff propagation algorithm for efficient action selection, result in an approach that scales only linearly in the problem size. We provide experimental evidence that our method outperforms related multiagent reinforcementlearning methods based on temporal differences.
Utile coordination: Learning interdependencies among cooperative agents
 In Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG’05
, 2005
"... a multiagent system to learn where and how to coordinate. The method starts with uncoordinated learners and maintains statistics on expected returns. Coordination dependencies are dynamically added if the statistics indicate a statistically significant benefit. This results in a compact state repres ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
a multiagent system to learn where and how to coordinate. The method starts with uncoordinated learners and maintains statistics on expected returns. Coordination dependencies are dynamically added if the statistics indicate a statistically significant benefit. This results in a compact state representation because only necessary coordination is modeled. We apply our method within the framework of coordination graphs in which value rules represent the coordination dependencies between the agents for a specific context. The algorithm is first applied on a small illustrative problem, and next on a large predatorprey problem in which two predators have to capture a single prey. 1
Anytime Algorithms for Multiagent Decision Making Using Coordination Graphs
 In Proc. Intl. Conf. on Systems, Man and Cybernetics
, 2004
"... Coordination graphs provide a tractable framework for cooperative multiagent decision making by decomposing the global payoff function into a sum of local terms. In this paper we review some distributed algorithms for action selection in a coordination graph and discuss their pros and cons. For real ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
(Show Context)
Coordination graphs provide a tractable framework for cooperative multiagent decision making by decomposing the global payoff function into a sum of local terms. In this paper we review some distributed algorithms for action selection in a coordination graph and discuss their pros and cons. For realtime decision making we emphasize the need for anytime algorithms for action selection: these are algorithms that improve the quality of the solution over time. We describe variable elimination, coordinate ascent, and the maxplus algorithm, the latter being an instance of the belief propagation algorithm in Bayesian networks. We discuss some interesting open problems related to the use of the maxplus algorithm in realtime multiagent decision making.
Learning of Coordination: Exploiting Sparse Interactions in Multiagent Systems
, 2009
"... Creating coordinated multiagent policies in environments with uncertainty is a challenging problem, which can be greatly simplified if the coordination needs are known to be limited to specific parts of the state space, as previous work has successfully shown. In this work, we assume that such needs ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
(Show Context)
Creating coordinated multiagent policies in environments with uncertainty is a challenging problem, which can be greatly simplified if the coordination needs are known to be limited to specific parts of the state space, as previous work has successfully shown. In this work, we assume that such needs are unknown and we investigate coordination learning in multiagent settings. We contribute a reinforcement learning based algorithm in which independent decisionmakers/agents learn both individual policies and when and how to coordinate. We focus on problems in which the interaction between the agents is sparse, exploiting this property to minimize the coupling of the learning processes for the different agents. We introduce a twolayer extension of Qlearning, in which we augment the action space of each agent with a coordination action that uses information from other agents to decide the correct action. Our results show that our agents learn both to act coordinate and to act independently, in the different regions of the space where they need to, and need not to, coordinate, respectively.
Using the maxplus algorithm for multiagent decision making in coordination graphs
 In RoboCup2005: Robot Soccer World Cup IX
, 2005
"... Abstract. Coordination graphs offer a tractable framework for cooperative multiagent decision making by decomposing the global payoff function into a sum of local terms. Each agent can in principle select an optimal individual action based on a variable elimination algorithm performed on this graph. ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Coordination graphs offer a tractable framework for cooperative multiagent decision making by decomposing the global payoff function into a sum of local terms. Each agent can in principle select an optimal individual action based on a variable elimination algorithm performed on this graph. This results in optimal behavior for the group, but its worstcase time complexity is exponential in the number of agents, and it can be slow in densely connected graphs. Moreover, variable elimination is not appropriate for realtime systems as it requires that the complete algorithm terminates before a solution can be reported. In this paper, we investigate the maxplus algorithm, an instance of the belief propagation algorithm in Bayesian networks, as an approximate alternative to variable elimination. In this method the agents exchange appropriate payoff messages over the coordination graph, and based on these messages compute their individual actions. We provide empirical evidence that this method converges to the optimal solution for treestructured graphs (as shown by theory), and that it finds near optimal solutions in graphs with cycles, while being much faster than variable elimination. 1
Evolutionary multiagent systems
 In Proceedings of the 8th International Conference on Parallel Problem Solving from Nature PPSN04
, 2004
"... Abstract. In MultiAgent learning, agents must learn to select actions that maximize their utility given the action choices of the other agents. Cooperative Coevolution offers a way to evolve multiple elements that together form a whole, by using a separate population for each element. We apply this ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In MultiAgent learning, agents must learn to select actions that maximize their utility given the action choices of the other agents. Cooperative Coevolution offers a way to evolve multiple elements that together form a whole, by using a separate population for each element. We apply this setup to the problem of multiagent learning, arriving at an evolutionary multiagent system (EAMAS). We study a problem that requires agents to select their actions in parallel, and investigate the problem solving capacity of the EAMAS for a wide range of settings. Secondly, we investigate the transfer of the COllective INtelligence (COIN) framework to the EAMAS. COIN is a proved engineering approach for learning of cooperative tasks in MASs, and consists of reengineering the utilities of the agents so as to contribute to the global utility. It is found that, as in the Reinforcement Learning case, the use of the Wonderful Life Utility specified by COIN also leads to improved results for the EAMAS. 1
Learning multiagent state space representations
"... This paper describes an algorithm, called CQlearning, which learns to adapt the state representation for multiagent systems in order to coordinate with other agents. We propose a multilevel approach which builds a progressively more advanced representation of the learning problem. The idea is tha ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
This paper describes an algorithm, called CQlearning, which learns to adapt the state representation for multiagent systems in order to coordinate with other agents. We propose a multilevel approach which builds a progressively more advanced representation of the learning problem. The idea is that agents start with a minimal single agent state space representation, which is expanded only when necessary. In cases where agents detect conflicts, they automatically expand their state to explicitly take into account the other agents. These conflict situations are then analyzed in an attempt to find an abstract representation which generalises over the problem states. Our system allows agents to learn effective policies, while avoiding the exponential state space growth typical in multiagent environments. Furthermore, the method we introduce to generalise over conflict states allows knowledge to be transferred to unseen and possibly more complex situations. Our research departs from previous efforts in this area of multiagent learning because our agents combine state space generalisation with an agentcentric point of view. The algorithms that we introduce can be used in robotic systems to automatically reduce the sensor information to what is essential to solve the problem at hand. This is a must when dealing with multiple agents, since learning in such environments is a cumbersome task due to the massive amount of information, much of which may be irrelevant. In our experiments we demonstrate a simulation of such environments using various gridworlds.
Cooperative adaptive cruise control: a reinforcement learning approach
 In The Fourth Workshop on Agents in Traffic and Transportation
, 2006
"... ..."
(Show Context)
Partial Local FriendQ Multiagent Learning: Application to Team Automobile Coordination Problem
"... Abstract. Real world multiagent coordination problems are important issues for reinforcement learning techniques. In general, these problems are partially observable and this characteristic makes the solution computation intractable. Most of the existing approaches calculate exact or approximate sol ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Real world multiagent coordination problems are important issues for reinforcement learning techniques. In general, these problems are partially observable and this characteristic makes the solution computation intractable. Most of the existing approaches calculate exact or approximate solutions using the world model for only one agent. To handle a special case of partial observability, this article presents an approach to approximate the policy measuring a degree of observability for pure cooperative vehicle coordination problem. We compare empirically the performance of the learned policy for totally observable problems and performances of policies for different degrees of observability. If each degree of observability is associated with communication costs, multiagent system designers are able to choose a compromise between the performance of the policy and the cost to obtain the associated degree of observability of the problem. Finally, we show how the available space, surrounding an agent, influence the required degree of observability for nearoptimal solution. 1
Coordinated Learning for Loosely Coupled Agents with Sparse Interactions
"... Abstract. Multiagent learning is a challenging problem in the area of multiagent systems because of the nonstationary environment caused by the interdependencies between agents. Learning for coordination becomes more difficult when agents do not know the structure of the environment and have only l ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Multiagent learning is a challenging problem in the area of multiagent systems because of the nonstationary environment caused by the interdependencies between agents. Learning for coordination becomes more difficult when agents do not know the structure of the environment and have only local observability. In this paper, an approach is proposed to enable autonomous agents to learn where and how to coordinate their behaviours in an environment where the interactions between agents are sparse. Our approach firstly adopts a statistical method to detect those states where coordination is most necessary. A Qlearning based coordination mechanism is then applied to coordinate agents ’ behaviours based on their local observability of the environment. We test our approach in grid world domains to show its good performance.