Results 1 - 10
of
65
Exploiting locality of interaction in factored Dec-POMDPs
- In Proc. Int. Joint Conf. Autonomous Agents and Multi Agent Systems
, 2008
"... Decentralized partially observable Markov decision processes (Dec-POMDPs) constitute an expressive framework for multiagent planning under uncertainty, but solving them is provably intractable. We demonstrate how their scalability can be improved by exploiting locality of interaction between agents ..."
Abstract
-
Cited by 46 (20 self)
- Add to MetaCart
(Show Context)
Decentralized partially observable Markov decision processes (Dec-POMDPs) constitute an expressive framework for multiagent planning under uncertainty, but solving them is provably intractable. We demonstrate how their scalability can be improved by exploiting locality of interaction between agents in a factored representation. Factored Dec-POMDP representations have been proposed before, but only for Dec-POMDPs whose transition and observation models are fully independent. Such strong assumptions simplify the planning problem, but result in models with limited applicability. By contrast, we consider general factored Dec-POMDPs for which we analyze the model dependencies over space (locality of interaction) and time (horizon of the problem). We also present a formulation of decomposable value functions. Together, our results allow us to exploit the problem structure as well as heuristics in a single framework that is based on collaborative graphical Bayesian games (CGBGs). A preliminary experiment shows a speedup of two orders of magnitude.
Bounded approximate decentralised coordination using the max-sum algorithm
- IN DISTRIBUTED CONSTRAINT REASONING WORKSHOP
, 2009
"... In this paper we propose a novel algorithm that provides bounded approximate solutions for decentralised coordination problems. Our approach removes cycles in any general constraint network by eliminating dependencies between functions and variables which have the least impact on the solution qualit ..."
Abstract
-
Cited by 29 (9 self)
- Add to MetaCart
In this paper we propose a novel algorithm that provides bounded approximate solutions for decentralised coordination problems. Our approach removes cycles in any general constraint network by eliminating dependencies between functions and variables which have the least impact on the solution quality. It uses the max-sum algorithm to optimally solve the resulting tree structured constraint network, providing a bounded approximation specific to the particular problem instance. We formally prove that our algorithm provides a bounded ap-proximation of the original problem and we present an empirical evaluation in a synthetic scenario. This shows that the approximate solutions that our algorithm provides are typically within 95 % of the optimum and the approximation ratio that our algorithm provides is typically 1.23.
A Survey on Sensor Networks from a Multi-Agent perspective
"... Sensor networks arise as one of the most promising technologies for the next decades. The recent emergence of small and inexpensive sensors based upon microelectromechanical system (MEMS) ease the development and proliferation of this kind of networks in a wide range of real-world applications. Mult ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
(Show Context)
Sensor networks arise as one of the most promising technologies for the next decades. The recent emergence of small and inexpensive sensors based upon microelectromechanical system (MEMS) ease the development and proliferation of this kind of networks in a wide range of real-world applications. Multi-Agent systems (MAS) have been identified as one of the most suitable technologies to contribute to this domain due to their appropriateness for modeling autonomous self-aware sensors in a flexible way. Firstly, this survey summarizes the actual challenges and research areas concerning sensor networks while identifying the most relevant MAS contributions. Secondly, we propose a taxonomy for sensor networks that classifies them depending on their features (and the research problems they pose). Finally, we identify some open future research directions and opportunities for MAS research. 1.
Safe and distributed kinodynamic replanning for vehicular networks
- Mobile Networks and Applications (MONET
, 2009
"... Abstract—This work deals with the problem of planning collision-free motions for multiple communicating vehicles that operate in the same, partially-observable environment in real-time. A challenging aspect of this problem is how to utilize communication so that vehicles do not reach states from whi ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
(Show Context)
Abstract—This work deals with the problem of planning collision-free motions for multiple communicating vehicles that operate in the same, partially-observable environment in real-time. A challenging aspect of this problem is how to utilize communication so that vehicles do not reach states from which collisions cannot be avoided due to second-order motion con-straints. This paper initially shows how it is possible to provide theoretical safety guarantees with a priority-based coordination scheme. Safety means avoiding collisions with obstacles and between vehicles. This notion is also extended to include the retainment of a communication network when the vehicles operate as a networked team. The paper then progresses to extend this safety framework into a fully distributed commu-nication protocol for real-time planning. The proposed algo-rithm integrates sampling-based motion planners with message-passing protocols for distributed constraint optimization. Each vehicle uses the motion planner to generate candidate feasible trajectories and the message-passing protocol for selecting a safe and compatible trajectory. The existence of such trajectories is guaranteed by the overall approach. The theoretical results have been also experimentally confirmed with a distributed simulator built on a cluster of processors and using applications such as coordinated exploration. Furthermore, the experiments show that the distributed protocol has better scalability properties when compared against the priority-based scheme. I.
Multiagent Reinforcement Learning for Urban Traffic Control using Coordination Graphs
- Proceedings of the 19th European Conference on Machine Learning
, 2008
"... Abstract. Since traffic jams are ubiquitous in the modern world, optimizing the behavior of traffic lights for efficient traffic flow is a critically important goal. Though most current traffic lights use simple heuristic protocols, more efficient controllers can be discovered automatically via mult ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Since traffic jams are ubiquitous in the modern world, optimizing the behavior of traffic lights for efficient traffic flow is a critically important goal. Though most current traffic lights use simple heuristic protocols, more efficient controllers can be discovered automatically via multiagent reinforcement learning, where each agent controls a single traffic light. However, in previous work on this approach, agents select only locally optimal actions without coordinating their behavior. This paper extends this approach to include explicit coordination between neighboring traffic lights. Coordination is achieved using the max-plus algorithm, which estimates the optimal joint action by sending locally optimized messages among connected agents. This paper presents the first application of max-plus to a large-scale problem and thus verifies its efficacy in realistic settings. It also provides empirical evidence that max-plus performs well on cyclic graphs, though it has been proven to converge only for tree-structured graphs. Furthermore, it provides a new understanding of the properties a traffic network must have for such coordination to be beneficial and shows that max-plus outperforms previous methods on networks that possess those properties. Key words: multiagent systems, reinforcement learning, coordination graphs, max-plus, traffic control 1
Automated design of adaptive controllers for modular robots using reinforcement learning’, accepted for publication
- in International Journal of Robotics Research, Special Issue on SelfReconfigurable Modular Robots
, 2007
"... Designing distributed controllers for self-reconfiguring modular robots has been consistently challenging. We have developed a reinforcement learning approach which can be used both to automate controller design and to adapt robot behavior online. In this paper, we report on our study of reinforceme ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
(Show Context)
Designing distributed controllers for self-reconfiguring modular robots has been consistently challenging. We have developed a reinforcement learning approach which can be used both to automate controller design and to adapt robot behavior online. In this paper, we report on our study of reinforcement learning in the domain of selfreconfigurable modular robots: the underlying assumptions, the applicable algorithms, and the issues of partial observability, large search spaces and local optima. We propose and validate experimentally in simulation a number of techniques designed to address these and other scalability issues that arise in applying machine learning to distributed systems such as modular robots. We discuss ways to make learning faster, more robust and amenable to online application by giving scaffolding to the learning agents in the form of policy representation, structured experience and additional information. With enough structure modular robots can run learning algorithms to both automate the generation of distributed controllers, and adapt to the changing environment and deliver on the self-organization promise with less interference from human designers, programmers and operators.
Decentralized Bayesian reinforcement learning for online agent collaboration
- In AAMAS
, 2012
"... Solving complex but structured problems in a decentralized manner via multiagent collaboration has received much attention in recent years. This is natural, as on one hand, multiagent systems usually possess a structure that determines the allowable interactions among the agents; and on the other ha ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
(Show Context)
Solving complex but structured problems in a decentralized manner via multiagent collaboration has received much attention in recent years. This is natural, as on one hand, multiagent systems usually possess a structure that determines the allowable interactions among the agents; and on the other hand, the single most pressing need in a cooperative multiagent system is to coordinate the local policies of autonomous agents with restricted capabilities to serve a system-wide goal. The presence of uncertainty makes this even more challenging, as the agents face the additional need to learn the unknown environment parameters while forming (and following) local policies in an online fashion. In this paper, we provide the first Bayesian reinforcement learning (BRL) approach for distributed coordination and learning in a cooperative multiagent system by devising two solutions to this type of problem. More specifically, we show how the Value of Perfect Information (VPI) can be used to perform efficient decentralised exploration in both modelbased and model-free BRL, and in the latter case, provide a closed form solution for VPI, correcting a decade old result by Dearden, Friedman and Russell. To evaluate these solutions, we present experimental results comparing their relative merits, and demonstrate empirically that both solutions outperform an existing multiagent learning method, representative of the state-of-the-art.
Exploiting structure in cooperative Bayesian games
- In UAI
, 2012
"... Cooperative Bayesian games (BGs) can model decision-making problems for teams of agents under imperfect information, but require space and computation time that is exponential in the number of agents. While agent independence has been used to mitigate these problems in perfect information settings, ..."
Abstract
-
Cited by 7 (7 self)
- Add to MetaCart
Cooperative Bayesian games (BGs) can model decision-making problems for teams of agents under imperfect information, but require space and computation time that is exponential in the number of agents. While agent independence has been used to mitigate these problems in perfect information settings, we propose a novel approach for BGs based on the observation that BGs additionally possess a different types of structure, which we call type independence. We propose a factor graph representation that captures both forms of independence and present a theoretical analysis showing that non-serial dynamic programming cannot effectively exploit type independence, while Max-Sum can. Experimental results demonstrate that ourapproachcantacklecooperativeBayesian games of unprecedented size. 1
A distributed protocol for safe real-time planning of communicating vehicles with second-order dynamics
- In ROBOCOMM
"... Abstract—This work deals with the problem of planning in real-time, collision-free motions for multiple communicating vehicles that operate in the same, partially-observable environment. A challenging aspect of this problem is how to utilize communication so that vehicles do not reach states from wh ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
(Show Context)
Abstract—This work deals with the problem of planning in real-time, collision-free motions for multiple communicating vehicles that operate in the same, partially-observable environment. A challenging aspect of this problem is how to utilize communication so that vehicles do not reach states from which collisions cannot be avoided due to second-order motion constraints. This paper provides a distributed communication protocol for realtime planning that guarantees collision avoidance with obstacles and between vehicles. It can also allow the retainment of a communication network when the vehicles operate as a networked team. The algorithm is a novel integration of sampling-based motion planners with message-passing protocols for distributed constraint optimization. Each vehicle uses the motion planner to generate candidate feasible trajectories and the messagepassing protocol for selecting a safe and compatible trajectory. The existence of such trajectories is guaranteed by the overall approach. Experiments on a distributed simulator built on a cluster of processors confirm the safety properties of the approach in applications such as coordinated exploration. Furthermore, the distributed protocol has better scalability properties when compared against typical priority-based schemes. I.
Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPs (theoretical proofs
, 2014
"... Researchers have introduced the Dynamic Distributed Con-straint Optimization Problem (Dynamic DCOP) formula-tion to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Exist ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
Researchers have introduced the Dynamic Distributed Con-straint Optimization Problem (Dynamic DCOP) formula-tion to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the prob-lem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algo-rithms, the Distributed RVI Q-learning algorithm and the Dis-tributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multi-arm bandit DCOP algorithm on dynamic DCOPs.