Results 1  10
of
95
Optimal and approximate Qvalue functions for decentralized POMDPs
 J. Artificial Intelligence Research
"... Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue functi ..."
Abstract

Cited by 62 (26 self)
 Add to MetaCart
Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue function Q ∗ is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q ∗. In this paper we study whether similar Qvalue functions can be defined for decentralized POMDP models (DecPOMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Qvalue function for DecPOMDPs: one that gives a normative description as the Qvalue function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Qvalue functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Qvalue function Q ∗. Finally, unifying some previous approaches for solving DecPOMDPs, we describe a family of algorithms for extracting policies from such Qvalue functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem. 1.
Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping
"... Distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, but NEXPComplete complexity hinders their scalability and application in realworld domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, an algorithm to solve such distributed ..."
Abstract

Cited by 43 (16 self)
 Add to MetaCart
(Show Context)
Distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, but NEXPComplete complexity hinders their scalability and application in realworld domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, an algorithm to solve such distributed POMDPs. The primary novelty of TREMOR is that agents plan individually with a single agent POMDP solver and use social model shaping to implicitly coordinate with other agents. Experiments demonstrate that TREMOR can provide solutions orders of magnitude faster than existing algorithms while achieving comparable, or even superior, solution quality.
Incremental Policy Generation for FiniteHorizon DECPOMDPs
"... Solving multiagent planning problems modeled as DECPOMDPs is an important challenge. These models are often solved by using dynamic programming, but the high resource usage of current approaches results in limited scalability. To improve the efficiency of dynamic programming algorithms, we propose ..."
Abstract

Cited by 33 (21 self)
 Add to MetaCart
Solving multiagent planning problems modeled as DECPOMDPs is an important challenge. These models are often solved by using dynamic programming, but the high resource usage of current approaches results in limited scalability. To improve the efficiency of dynamic programming algorithms, we propose a new backup algorithm that is based on a reachability analysis of the state space. This method, which we call incremental policy generation, can be used to produce an optimal solution for any possible initial state or further scalability can be achieved by making use of a known start state. When incorporated into the optimal dynamic programming algorithm, our experiments show that planning horizon can be increased due to a marked reduction in resource consumption. This approach also fits nicely with approximate dynamic programming algorithms. To demonstrate this, we incorporate it into the stateoftheart PBIP algorithm and show significant performance gains. The results suggest that the performance of other dynamic programming algorithms for DECPOMDPs could be similarly improved by integrating the incremental policy generation approach.
Policy iteration for decentralized control of Markov decision processes
 JAIR
"... Coordination of distributed agents is required for problems arising in many areas, including multirobot systems, networking and ecommerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DECPOMDP). Though much work has been done on o ..."
Abstract

Cited by 32 (19 self)
 Add to MetaCart
(Show Context)
Coordination of distributed agents is required for problems arising in many areas, including multirobot systems, networking and ecommerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DECPOMDP). Though much work has been done on optimal dynamic programming algorithms for the singleagent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DECPOMDPs. The algorithm uses stochastic finitestate controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing valuepreserving transformations, which modify the controller without sacrificing value. We present two efficient valuepreserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of valuepreserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents’ actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems. 1.
Optimizing FixedSize Stochastic Controllers for POMDPs and Decentralized POMDPs
"... POMDPs and their decentralized multiagent counterparts, DECPOMDPs, offer a rich framework for sequential decision making under uncertainty. Their computational complexity, however, presents an important research challenge. One approach that effectively addresses the intractable memory requirements ..."
Abstract

Cited by 30 (15 self)
 Add to MetaCart
(Show Context)
POMDPs and their decentralized multiagent counterparts, DECPOMDPs, offer a rich framework for sequential decision making under uncertainty. Their computational complexity, however, presents an important research challenge. One approach that effectively addresses the intractable memory requirements of current algorithms is based on representing agent policies as finitestate controllers. In this paper, we propose a new approach that uses this representation and formulates the problem as a nonlinear program (NLP). The NLP defines an optimal policy of a desired size for each agent. This new representation allows a wide range of powerful nonlinear programming algorithms to be used to solve POMDPs and DECPOMDPs. Although solving the NLP optimally is often intractable, the results we obtain using an offtheshelf optimization method are competitive with stateoftheart POMDP algorithms and outperform stateoftheart DECPOMDP algorithms. Our approach is easy to implement and it opens up promising research directions for solving POMDPs and DECPOMDPs using nonlinear programming methods. 1.
Lossless clustering of histories in decentralized POMDPs
 In Proc. of the Eighth Int. Joint Conf. on Autonomous Agents and Multiagent Systems
, 2009
"... Decentralized partially observable Markov decision processes (DecPOMDPs) constitute a generic and expressive framework for multiagent planning under uncertainty. However, planning optimally is difficult because solutions map local observation histories to actions, and the number of such histories g ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
(Show Context)
Decentralized partially observable Markov decision processes (DecPOMDPs) constitute a generic and expressive framework for multiagent planning under uncertainty. However, planning optimally is difficult because solutions map local observation histories to actions, and the number of such histories grows exponentially in the planning horizon. In this work, we identify a criterion that allows for lossless clustering of observation histories: i.e., we prove that when two histories satisfy the criterion, they have the same optimal value and thus can be treated as one. We show how this result can be exploited in optimal policy search and demonstrate empirically that it can provide a speedup of multiple orders of magnitude, allowing the optimal solution of significantly larger problems. We also perform an empirical analysis of the generality of our clustering method, which suggests that it may also be useful in other (approximate) DecPOMDP solution methods.
Pointbased incremental pruning heuristic for solving finitehorizon DECPOMDPs
 In Proc. of the Eighth Int. Joint Conf. on Autonomous Agents and Multiagent Systems
, 2009
"... Recent scaling up of decentralized partially observable Markov decision process (DECPOMDP) solvers towards realistic applications is mainly due to approximate methods. Of this family, MEMORY BOUNDED DYNAMIC PROGRAMMING (MBDP), which combines in a suitable manner topdown heuristics and bottomup va ..."
Abstract

Cited by 22 (6 self)
 Add to MetaCart
Recent scaling up of decentralized partially observable Markov decision process (DECPOMDP) solvers towards realistic applications is mainly due to approximate methods. Of this family, MEMORY BOUNDED DYNAMIC PROGRAMMING (MBDP), which combines in a suitable manner topdown heuristics and bottomup value function updates, can solve DECPOMDPs with large horizons. The performances of MBDP, can be, however, drastically improved by avoiding the systematic generation and evaluation of all possible policies which result from the exhaustive backup. To achieve that, we suggest a heuristic search method, namely POINT BASED INCREMENTAL PRUNING (PBIP), which is able to distinguish policies with different heuristic estimates. Taking this insight into account, PBIP searches only among the most promising policies, finds those useful, and prunes dominated ones. Doing so permits us to reduce clearly the amount of computation required by the exhaustive backup. The computation experiment shows that PBIP solves DECPOMDP benchmarks up to 800 times faster than the current best approximate algorithms, while providing solutions with higher values.
Pointbased backup for decentralized POMDPs: Complexity and new algorithms
 In AAMAS
, 2010
"... Decentralized POMDPs provide an expressive framework for sequential multiagent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of pointbased methods. Performing pointbased backup is a fundamental operatio ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
(Show Context)
Decentralized POMDPs provide an expressive framework for sequential multiagent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of pointbased methods. Performing pointbased backup is a fundamental operation in stateoftheart algorithms. We show that even a single backup step in the multiagent setting is NPComplete. Despite this negative worstcase result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing nearoptimal solutions to the backup problem.
Distributed model shaping for scaling to decentralized POMDPs with hundreds of agents
 In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS
, 2011
"... The use of distributed POMDPs for cooperative teams has been severely limited by the incredibly large joint policyspace that results from combining the policyspaces of the individual agents. However, much of the computational cost of exploring the entire joint policy space can be avoided by observi ..."
Abstract

Cited by 21 (10 self)
 Add to MetaCart
(Show Context)
The use of distributed POMDPs for cooperative teams has been severely limited by the incredibly large joint policyspace that results from combining the policyspaces of the individual agents. However, much of the computational cost of exploring the entire joint policy space can be avoided by observing that in many domains important interactions between agents occur in a relatively small set of scenarios, previously defined as coordination locales (CLs) [11]. Moreover, even when numerous interactions might occur, given a set of individual policies there are relatively few actual interactions. Exploiting this observation and building on an existing model shaping algorithm, this paper presents DTREMOR, an algorithm in which cooperative agents iteratively generate individual policies, identify and communicate possible