Results 1  10
of
14
Collaborative Multiagent Reinforcement Learning by Payoff Propagation
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose t ..."
Abstract

Cited by 65 (2 self)
 Add to MetaCart
In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the global payoff function into a sum of local terms. First, we deal with the singlestate case and describe a payoff propagation algorithm that computes the individual actions that approximately maximize the global payoff function. The method can be viewed as the decisionmaking analogue of belief propagation in Bayesian networks. Second, we focus on learning the behavior of the agents in sequential decisionmaking tasks. We introduce different modelfree reinforcementlearning techniques, unitedly called Sparse Cooperative Qlearning, which approximate the global actionvalue function based on the topology of a coordination graph, and perform updates using the contribution of the individual agents to the maximal global action value. The combined use of an edgebased decomposition of the actionvalue function and the payoff propagation algorithm for efficient action selection, result in an approach that scales only linearly in the problem size. We provide experimental evidence that our method outperforms related multiagent reinforcementlearning methods based on temporal differences.
The CrossEntropy Method for Policy Search in Decentralized POMDPs
, 2008
"... Decentralized POMDPs (DecPOMDPs) are becoming increasingly popular as models for multiagent planning under uncertainty, but solving a DecPOMDP exactly is known to be an intractable combinatorial optimization problem. In this paper we apply the CrossEntropy (CE) method, a recently introduced metho ..."
Abstract

Cited by 11 (9 self)
 Add to MetaCart
Decentralized POMDPs (DecPOMDPs) are becoming increasingly popular as models for multiagent planning under uncertainty, but solving a DecPOMDP exactly is known to be an intractable combinatorial optimization problem. In this paper we apply the CrossEntropy (CE) method, a recently introduced method for combinatorial optimization, to DecPOMDPs, resulting in a randomized (samplingbased) algorithm for approximately solving DecPOMDPs. This algorithm operates by sampling pure policies from an appropriately parametrized stochastic policy, and then evaluates these policies either exactly or approximately in order to define the next stochastic policy to sample from, and so on until convergence. Experimental results demonstrate that the CE method can search huge spaces efficiently, supporting our claim that combinatorial optimization methods can bring leverage to the approximate solution of DecPOMDPs. Povzetek: Prispevek opisuje novo metodo multiagentnega načrtovanja. 1
QDlearning: A collaborative distributed strategy for multiagent reinforcement learning through consensus + innovations
 IEEE Transactions on Signal Processing
, 2013
"... The paper considers a class of multiagent Markov decision processes (MDPs), in which the network agents respond differently (as manifested by the instantaneous onestage random costs) to a global controlled state and the control actions of a remote controller. The paper investigates a distributed r ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
The paper considers a class of multiagent Markov decision processes (MDPs), in which the network agents respond differently (as manifested by the instantaneous onestage random costs) to a global controlled state and the control actions of a remote controller. The paper investigates a distributed reinforcement learning setup with no prior information on the global state transition and local agent cost statistics. Specifically, with the agents ’ objective consisting of minimizing a networkaveraged infinite horizon discounted cost, the paper proposes a distributed version of Qlearning, QDlearning, in which the network agents collaborate by means of local processing and mutual information exchange over a sparse (possibly stochastic) communication network to achieve the network goal. Under the assumption that each agent is only aware of its local online cost data and the interagent communication network is weakly connected, the proposed distributed scheme is almost surely (a.s.) shown to yield asymptotically the desired value function and the optimal stationary control policy at each network agent. The analytical techniques developed in the paper to address the mixed timescale stochastic dynamics of the consensus + innovations form, which arise as a result of the proposed interactive distributed scheme, are of independent interest.
A Comprehensive Survey of Multiagent Reinforcement Learning
"... Abstract—Multiagent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must, i ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Multiagent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must, instead, discover a solution on their own, using learning. A significant part of the research on multiagent learning concerns reinforcement learning techniques. This paper provides a comprehensive survey of multiagent reinforcement learning (MARL). A central issue in the field is the formal statement of the multiagent learning goal. Different viewpoints on this issue have led to the proposal of many different goals, among which two focal points can be distinguished: stability of the agents ’ learning dynamics, and adaptation to the changing behavior of the other agents. The MARL algorithms described in the literature aim—either explicitly or implicitly—at one of these two goals or at a combination of both, in a fully cooperative, fully competitive, or more general setting. A representative selection of these algorithms is discussed in detail in this paper, together with the specific issues that arise in each category. Additionally, the benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied. Finally, an outlook for the field is provided. Index Terms—Distributed control, game theory, multiagent systems, reinforcement learning. I.
Computing Convex Coverage Sets for Faster MultiObjective Coordination
"... Abstract In this article, we propose new algorithms for multiobjective coordination graphs (MOCoGs). Key to the efficiency of these algorithms is that they compute a convex coverage set (CCS) instead of a Pareto coverage set (PCS). Not only is a CCS a sufficient solution set for a large class of p ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract In this article, we propose new algorithms for multiobjective coordination graphs (MOCoGs). Key to the efficiency of these algorithms is that they compute a convex coverage set (CCS) instead of a Pareto coverage set (PCS). Not only is a CCS a sufficient solution set for a large class of problems, it also has important characteristics that facilitate more efficient solutions. We propose two main algorithms for computing a CCS in MOCoGs. Convex multiobjective variable elimination (CMOVE) computes a CCS by performing a series of agent eliminations, which can be seen as solving a series of local multiobjective subproblems. Variable elimination linear support (VELS) iteratively identifies the single weight vector w that can lead to the maximal possible improvement on a partial CCS and calls variable elimination to solve a scalarized instance of the problem for w. VELS is faster than CMOVE for small and medium numbers of objectives and can compute an εapproximate CCS in a fraction of the runtime. In addition, we propose variants of these methods that employ AND/OR tree search instead of variable elimination to achieve memory efficiency. We analyze the runtime and space complexities of these methods, prove their correctness, and compare them empirically against a naive baseline and an existing PCS method, both in terms of memoryusage and runtime. Our results show that, by focusing on the CCS, these methods achieve much better scalability in the number of agents than the current state of the art.
AgentBased Decentralised Coordination for Sensor Networks using the MaxSum Algorithm
"... In this paper, we consider the generic problem of how a network of physically distributed, computationally constrained devices can make coordinated decisions to maximise the effectiveness of the whole sensor network. In particular, we propose a new agentbased representation of the problem, based ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this paper, we consider the generic problem of how a network of physically distributed, computationally constrained devices can make coordinated decisions to maximise the effectiveness of the whole sensor network. In particular, we propose a new agentbased representation of the problem, based on the factor graph, and use stateoftheart DCOP heuristics (i.e., DSA and the maxsum algorithm) to generate suboptimal solutions. In more detail, we formally model a specific realworld problem where energyharvesting sensors are deployed within an urban environment to detect vehicle movements. The sensors coordinate their sense/sleep schedules, maintaining energy neutral operation while maximising vehicle detection probability. We theoretically analyse the performance of the sensor network for various coordination strategies and show that by appropriately coordinating their schedules the sensors can achieve significantly improved systemwide performance, detecting up to 50 % of the events that a randomly coordinated network fails to detect. Finally, we deploy our coordination approach in a realistic simulation of our wide area surveillance problem, comparing its performance to a number of benchmarking coordination strategies. In this setting, our approach achieves up to a 57 % reduction in the number of missed vehicles (compared to an uncoordinated network). This performance is close to that achieved by a benchmark centralised algorithm (simulated annealing) and to a continuously powered network (which is an unreachable upper bound for any coordination approach).
A Survey on Coordination Methodologies for Simulated Robotic Soccer Teams
"... Abstract—Multiagent systems (MAS) are a research topic with everincreasing importance. This is due to their inherently distributed organization that copes more naturally with reallife problems whose solution requires people to coordinate efforts. One of its most prominent challenges consists on t ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Multiagent systems (MAS) are a research topic with everincreasing importance. This is due to their inherently distributed organization that copes more naturally with reallife problems whose solution requires people to coordinate efforts. One of its most prominent challenges consists on the creation of efficient coordination methodologies to enable the harmonious operation of teams of agents in adversarial environments. This challenge has been promoted by the Robot World Cup (RoboCup) international initiative every year since 1995. RoboCup provides a pragmatic testbed based on standardized platforms for the systematic evaluation of developed MAS coordination techniques. This initiative encompasses a simulated robotic soccer league in which 11 against 11 simulated robots play a realistic soccer game that is particularly suited for researching coordination methodologies. This paper presents a comprehensive overview of the most relevant coordination techniques proposed up till now in the simulated robotic soccer domain. Index Terms—Coordination methodologies, MAS, simulated robotic soccer, RoboCup.
A Tutorial on Optimization for MultiAgent Systems
, 2012
"... Research on optimization in multiagent systems (MASs) has contributed with a wealth of techniques to solve many of the challenges arising in a wide range of multiagent application domains. Multiagent optimization focuses on casting MAS problems into optimization problems. The solving of those pro ..."
Abstract
 Add to MetaCart
(Show Context)
Research on optimization in multiagent systems (MASs) has contributed with a wealth of techniques to solve many of the challenges arising in a wide range of multiagent application domains. Multiagent optimization focuses on casting MAS problems into optimization problems. The solving of those problems could possibly involve the active participation of the agents in a MAS. Research on multiagent optimization has rapidly become a very technical, specialized field. Moreover, the contributions to the field in the literature are largely scattered. These two factors dramatically hinder access to a basic, general view of the foundations of the field. This tutorial is intended to ease such access by providing a gentle introduction to fundamental concepts and techniques on multiagent optimization.
Design and Organization of Autonomous Systems Person Identification SYStem
, 2006
"... Ambient intelligence applications heavily rely on successfully identifying people in their home environment. Current applications use complex, expensive or intrusive sensors and methods like RFID tags, retina scans or human face identification. In this paper we present the Person Identification SYSt ..."
Abstract
 Add to MetaCart
Ambient intelligence applications heavily rely on successfully identifying people in their home environment. Current applications use complex, expensive or intrusive sensors and methods like RFID tags, retina scans or human face identification. In this paper we present the Person Identification SYStem (PISYS). In contrast with previous work, PISYS uses simple, nonintrusive sensors to reason about the presence of persons in particular rooms and calculate probability values with respect to their identities. As part of the project, a home environment simulator was implemented and used to generate train and test scripts. Using this data, we present how the PISYS system can successfully identify people in a range of situations, including multiple people in the same room. 1 1
Journal of Machine Learning Research X (XXXX) XXX Submitted 12/05; Published XX/06 Collaborative Multiagent Reinforcement Learning by Payoff Propagation
"... In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the ..."
Abstract
 Add to MetaCart
(Show Context)
In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the global payoff function into a sum of local terms. First, we deal with the singlestate case and describe a payoff propagation algorithm that computes the individual actions that approximately maximize the global payoff function. The method can be viewed as the decisionmaking analogue of belief propagation in Bayesian networks. Second, we focus on learning the behavior of the agents in sequential decisionmaking tasks. We introduce different modelfree reinforcementlearning techniques, unitedly called Sparse Cooperative Qlearning, which approximate the global actionvalue function based on the topology of a coordination graph, and perform updates using the contribution of the individual agents to the maximal global action value. The combined use of an edgebased decomposition of the actionvalue function and the payoff propagation algorithm for efficient action selection, result in an approach that scales only linearly in the problem size. We provide experimental evidence that our method outperforms related multiagent reinforcementlearning methods based on temporal differences.