Results 1 -
7 of
7
Incremental Policy Generation for Finite-Horizon DEC-POMDPs
"... Solving multiagent planning problems modeled as DEC-POMDPs is an important challenge. These models are often solved by using dynamic programming, but the high resource usage of current approaches results in limited scalability. To improve the efficiency of dynamic programming algorithms, we propose ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
Solving multiagent planning problems modeled as DEC-POMDPs is an important challenge. These models are often solved by using dynamic programming, but the high resource usage of current approaches results in limited scalability. To improve the efficiency of dynamic programming algorithms, we propose a new backup algorithm that is based on a reachability analysis of the state space. This method, which we call incremental policy generation, can be used to produce an optimal solution for any possible initial state or further scalability can be achieved by making use of a known start state. When incorporated into the optimal dynamic programming algorithm, our experiments show that planning horizon can be increased due to a marked reduction in resource consumption. This approach also fits nicely with approximate dynamic programming algorithms. To demonstrate this, we incorporate it into the state-of-the-art PBIP algorithm and show significant performance gains. The results suggest that the performance of other dynamic programming algorithms for DEC-POMDPs could be similarly improved by integrating the incremental policy generation approach.
Pointbased backup for decentralized POMDPs: Complexity and new algorithms
- In AAMAS
, 2010
"... Decentralized POMDPs provide an expressive framework for sequential multi-agent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of pointbased methods. Performing point-based backup is a fundamental operatio ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Decentralized POMDPs provide an expressive framework for sequential multi-agent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of pointbased methods. Performing point-based backup is a fundamental operation in state-of-the-art algorithms. We show that even a single backup step in the multi-agent setting is NP-Complete. Despite this negative worst-case result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing near-optimal solutions to the backup problem.
Anytime planning for decentralized POMDPs using expectation maximization
- In UAI
, 2010
"... Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While finite-horizon DEC-POMDPs have enjoyed significant success, progress remains slow for the infinite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representi ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While finite-horizon DEC-POMDPs have enjoyed significant success, progress remains slow for the infinite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers. 1
Toward Error-bounded Algorithms for Infinite-Horizon DEC-POMDPs
"... Over the past few years, attempts to scale up infinite-horizon DEC-POMDPs with discounted rewards are mainly due to approximate algorithms, but without the theoretical guarantees of their exact counterparts. In contrast, ε-optimal methods have only theoretical significance but are not efficient in p ..."
Abstract
- Add to MetaCart
Over the past few years, attempts to scale up infinite-horizon DEC-POMDPs with discounted rewards are mainly due to approximate algorithms, but without the theoretical guarantees of their exact counterparts. In contrast, ε-optimal methods have only theoretical significance but are not efficient in practice. In this paper, we introduce an algorithmic framework (β-PI) that exploits the scalability of the former while preserving the theoretical properties of the latter. We build upon β-PI a family of approximate algorithms that can find (provably) error-bounded solutions in reasonable time. Among this family, H-PI uses a branch-and-bound search method that computes a near-optimal solution over distributions over histories experienced by the agents. These distributions often lie near structured, low-dimensional subspace embedded in the high-dimensional sufficient statistic. By planning only on this subspace, H-PI successfully solves all tested benchmarks, outperforming standard algorithms, both in solution time and policy quality. 1.
Trial-Based Dynamic Programming for Multi-Agent Planning
"... Trial-based approaches offer an efficient way to solve singleagent MDPs and POMDPs. These approaches allow agents to focus their computations on regions of the environment they encounter during the trials, leading to significant computational savings. We present a novel trial-based dynamic programmi ..."
Abstract
- Add to MetaCart
Trial-based approaches offer an efficient way to solve singleagent MDPs and POMDPs. These approaches allow agents to focus their computations on regions of the environment they encounter during the trials, leading to significant computational savings. We present a novel trial-based dynamic programming (TBDP) algorithm for DEC-POMDPs that extends these benefits to multi-agent settings. The algorithm uses trial-based methods for both belief generation and policy evaluation. Policy improvement is implemented efficiently using linear programming and a sub-policy reuse technique that helps bound the amount of memory. The results show that TBDP can produce significant value improvements and is much faster than the best existing planning algorithms.
Distribution over Beliefs for Memory Bounded Dec-POMDP Planning
"... We propose a new point-based method for approximate planning in Dec-POMDP which outperforms the state-of-the-art approaches in terms of solution quality. It uses a heuristic estimation of the prior probability of beliefs to choose a bounded number of policy trees: this choice is formulated as a comb ..."
Abstract
- Add to MetaCart
We propose a new point-based method for approximate planning in Dec-POMDP which outperforms the state-of-the-art approaches in terms of solution quality. It uses a heuristic estimation of the prior probability of beliefs to choose a bounded number of policy trees: this choice is formulated as a combinatorial optimisation problem minimising the error induced by pruning. 1

