Results 11  20
of
93
Pointbased dynamic programming for DECPOMDPs
 In Proc. of the National Conference on Artificial Intelligence
, 2006
"... We introduce pointbased dynamic programming (DP) for decentralized partially observable Markov decision processes (DECPOMDPs), a new discrete DP algorithm for planning strategies for cooperative multiagent systems. Our approach makes a connection between optimal DP algorithms for partially obser ..."
Abstract

Cited by 35 (2 self)
 Add to MetaCart
(Show Context)
We introduce pointbased dynamic programming (DP) for decentralized partially observable Markov decision processes (DECPOMDPs), a new discrete DP algorithm for planning strategies for cooperative multiagent systems. Our approach makes a connection between optimal DP algorithms for partially observable stochastic games, and pointbased approximations for singleagent POMDPs. We show for the first time how relevant multiagent belief states can be computed. Building on this insight, we then show how the linear programming part in current multiagent DP algorithms can be avoided, and how multiagent DP can thus be applied to solve larger problems. We derive both an optimal and an approximated version of our algorithm, and we show its efficiency on test examples from the literature.
Interactiondriven Markov games for decentralized multiagent planning under uncertainty
 in Proc. AAMAS
, 2008
"... In this paper we propose interactiondriven Markov games (IDMGs), a new model for multiagent decision making under uncertainty. IDMGs aim at describing multiagent decision problems in which interaction among agents is a local phenomenon. To this purpose, we explicitly distinguish between situations ..."
Abstract

Cited by 34 (10 self)
 Add to MetaCart
(Show Context)
In this paper we propose interactiondriven Markov games (IDMGs), a new model for multiagent decision making under uncertainty. IDMGs aim at describing multiagent decision problems in which interaction among agents is a local phenomenon. To this purpose, we explicitly distinguish between situations in which agents should interact and situations in which they can afford to act independently. The agents are coupled through the joint rewards and joint transitions in the states in which they interact. The model combines several fundamental properties from transitionindependent DecMDPs and weakly coupled MDPs while allowing to address, in several aspects, more general problems. We introduce a fast approximate solution method for planning in IDMGs, exploiting their particular structure, and we illustrate its successful application on several large multiagent tasks.
Policy iteration for decentralized control of Markov decision processes
 JAIR
"... Coordination of distributed agents is required for problems arising in many areas, including multirobot systems, networking and ecommerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DECPOMDP). Though much work has been done on o ..."
Abstract

Cited by 32 (19 self)
 Add to MetaCart
(Show Context)
Coordination of distributed agents is required for problems arising in many areas, including multirobot systems, networking and ecommerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DECPOMDP). Though much work has been done on optimal dynamic programming algorithms for the singleagent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DECPOMDPs. The algorithm uses stochastic finitestate controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing valuepreserving transformations, which modify the controller without sacrificing value. We present two efficient valuepreserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of valuepreserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents’ actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. 1.
Optimizing FixedSize Stochastic Controllers for POMDPs and Decentralized POMDPs
"... POMDPs and their decentralized multiagent counterparts, DECPOMDPs, offer a rich framework for sequential decision making under uncertainty. Their computational complexity, however, presents an important research challenge. One approach that effectively addresses the intractable memory requirements ..."
Abstract

Cited by 30 (15 self)
 Add to MetaCart
(Show Context)
POMDPs and their decentralized multiagent counterparts, DECPOMDPs, offer a rich framework for sequential decision making under uncertainty. Their computational complexity, however, presents an important research challenge. One approach that effectively addresses the intractable memory requirements of current algorithms is based on representing agent policies as finitestate controllers. In this paper, we propose a new approach that uses this representation and formulates the problem as a nonlinear program (NLP). The NLP defines an optimal policy of a desired size for each agent. This new representation allows a wide range of powerful nonlinear programming algorithms to be used to solve POMDPs and DECPOMDPs. Although solving the NLP optimally is often intractable, the results we obtain using an offtheshelf optimization method are competitive with stateoftheart POMDP algorithms and outperform stateoftheart DECPOMDP algorithms. Our approach is easy to implement and it opens up promising research directions for solving POMDPs and DECPOMDPs using nonlinear programming methods. 1.
DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks
"... Buoyed by recent successes in the area of distributed constraint optimization problems (DCOPs), this paper addresses challenges faced when applying DCOPs to realworld domains. Three fundamental challenges must be addressed for a class of realworld domains, requiring novel DCOP algorithms. First, a ..."
Abstract

Cited by 28 (6 self)
 Add to MetaCart
(Show Context)
Buoyed by recent successes in the area of distributed constraint optimization problems (DCOPs), this paper addresses challenges faced when applying DCOPs to realworld domains. Three fundamental challenges must be addressed for a class of realworld domains, requiring novel DCOP algorithms. First, agents may not know the payoff matrix and must explore the environment to determine rewards associated with variable settings. Second, agents may need to maximize total accumulated reward rather than instantaneous final reward. Third, limited time horizons disallow exhaustive exploration of the environment. We propose and implement a set of novel algorithms that combine decisiontheoretic exploration approaches with DCOPmandated coordination. In addition to simulation results, we implement these algorithms on robots, deploying DCOPs on a distributed mobile sensor network.
Analyzing Myopic Approaches for MultiAgent Communication
 In Proc
, 2005
"... Choosing when to communicate is a fundamental problem in multiagent systems. This problem becomes particularly challenging when communication is constrained and each agent has different partial information about the overall situation. We take a decisiontheoretic approach to this problem that balan ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
Choosing when to communicate is a fundamental problem in multiagent systems. This problem becomes particularly challenging when communication is constrained and each agent has different partial information about the overall situation. We take a decisiontheoretic approach to this problem that balances the benefits of communication against the costs. Although computing the exact value of communication is intractable, it can be estimated using a standard myopic assumption—that communication is only possible at the present time. We examine specific situations in which this assumption leads to poor performance and demonstrate an alternative approach that relaxes the assumption and improves performance. The results provide an effective method for valuedriven communication policies in multiagent systems. Key words: multiagent systems, decentralized MDPs, communication, decisiontheoretic planning. 1.
Lossless clustering of histories in decentralized POMDPs
 In Proc. of the Eighth Int. Joint Conf. on Autonomous Agents and Multiagent Systems
, 2009
"... Decentralized partially observable Markov decision processes (DecPOMDPs) constitute a generic and expressive framework for multiagent planning under uncertainty. However, planning optimally is difficult because solutions map local observation histories to actions, and the number of such histories g ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
(Show Context)
Decentralized partially observable Markov decision processes (DecPOMDPs) constitute a generic and expressive framework for multiagent planning under uncertainty. However, planning optimally is difficult because solutions map local observation histories to actions, and the number of such histories grows exponentially in the planning horizon. In this work, we identify a criterion that allows for lossless clustering of observation histories: i.e., we prove that when two histories satisfy the criterion, they have the same optimal value and thus can be treated as one. We show how this result can be exploited in optimal policy search and demonstrate empirically that it can provide a speedup of multiple orders of magnitude, allowing the optimal solution of significantly larger problems. We also perform an empirical analysis of the generality of our clustering method, which suggests that it may also be useful in other (approximate) DecPOMDP solution methods.
Not all agents are equal: Scaling up distributed POMDPs for agent networks
 In: Proceedings of the seventh international
, 2008
"... Many applications of networks of agents, including mobile sensor networks, unmanned air vehicles, autonomous underwater vehicles, involve 100s of agents acting collaboratively under uncertainty. Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are wellsuited to address ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
(Show Context)
Many applications of networks of agents, including mobile sensor networks, unmanned air vehicles, autonomous underwater vehicles, involve 100s of agents acting collaboratively under uncertainty. Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are wellsuited to address such applications, but so far, only limited scaleups of up to five agents have been demonstrated. This paper escalates the scaleup, presenting an algorithm called FANS, increasing the number of agents in distributed POMDPs for the first time into double digits. FANS is founded on finite state machines (FSMs) for policy representation and expoits these FSMs to provide three key contributions: (i) Not all agents within an agent network need the same expressivity of policy representation; FANS introduces novel heuristics to automatically vary the FSM size in different agents for scaleup;
Decentralized planning under uncertainty for teams of communicating agents
 In Proc. AAMAS
, 2006
"... Decentralized partially observable Markov decision processes (DECPOMDPs) form a general framework for planning for groups of cooperating agents that inhabit a stochastic and partially observable environment. Unfortunately, computing optimal plans in a DECPOMDP has been shown to be intractable (NEX ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
(Show Context)
Decentralized partially observable Markov decision processes (DECPOMDPs) form a general framework for planning for groups of cooperating agents that inhabit a stochastic and partially observable environment. Unfortunately, computing optimal plans in a DECPOMDP has been shown to be intractable (NEXPcomplete), and approximate algorithms for specific subclasses have been proposed. Many of these algorithms rely on an (approximate) solution of the centralized planning problem (i.e., treating the whole team as a single agent). We take a more decentralized approach, in which each agent only reasons over its own local state and some uncontrollable state features, which are shared by all team members. In contrast to other approaches, we model communication as an integral part of the agent’s reasoning, in which the meaning of a message is directly encoded in the policy of the communicating agent. We explore iterative methods for approximately solving such models, and we conclude with some encouraging preliminary experimental results.
Pointbased backup for decentralized POMDPs: Complexity and new algorithms
 In AAMAS
, 2010
"... Decentralized POMDPs provide an expressive framework for sequential multiagent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of pointbased methods. Performing pointbased backup is a fundamental operatio ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
(Show Context)
Decentralized POMDPs provide an expressive framework for sequential multiagent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of pointbased methods. Performing pointbased backup is a fundamental operation in stateoftheart algorithms. We show that even a single backup step in the multiagent setting is NPComplete. Despite this negative worstcase result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing nearoptimal solutions to the backup problem.