Results 1  10
of
20
Scaling Up Optimal Heuristic Search in DecPOMDPs via Incremental Expansion
"... Planning under uncertainty for multiagent systems can be formalized as a decentralized partially observable Markov decision process. We advance the state of the art for optimal solution of this model, building on the Multiagent A * heuristic search method. A key insight is that we can avoid the full ..."
Abstract

Cited by 21 (14 self)
 Add to MetaCart
(Show Context)
Planning under uncertainty for multiagent systems can be formalized as a decentralized partially observable Markov decision process. We advance the state of the art for optimal solution of this model, building on the Multiagent A * heuristic search method. A key insight is that we can avoid the full expansion of a search node that generates a number of children that is doubly exponential in the node’s depth. Instead, we incrementally expand the children only when a next child might have the highest heuristic value. We target a subsequent bottleneck by introducing a more memoryefficient representation for our heuristic functions. Proof is given that the resulting algorithm is correct and experiments demonstrate a significant speedup over the state of the art, allowing for optimal solutions over longer horizons for many benchmark problems. 1
Incremental Clustering and Expansion for Faster Optimal Planning in Decentralized POMDPs
, 2013
"... This article presents the stateoftheart in optimal solution methods for decentralized partially observable Markov decision processes (DecPOMDPs), which are general models for collaborative multiagent planning under uncertainty. Building off the generalized multiagent A * (GMAA*) algorithm, which ..."
Abstract

Cited by 19 (12 self)
 Add to MetaCart
(Show Context)
This article presents the stateoftheart in optimal solution methods for decentralized partially observable Markov decision processes (DecPOMDPs), which are general models for collaborative multiagent planning under uncertainty. Building off the generalized multiagent A * (GMAA*) algorithm, which reduces the problem to a tree of oneshot collaborative Bayesian games (CBGs), we describe several advances that greatly expand the range of DecPOMDPs that can be solved optimally. First, we introduce lossless incremental clustering of the CBGs solved by GMAA*, which achieves exponential speedups without sacrificing optimality. Second, we introduce incremental expansion of nodes in the GMAA * search tree, which avoids the need to expand all children, the number of which is in the worst case doubly exponential in the node’s depth. This is particularly beneficial when little clustering is possible. In addition, we introduce new hybrid heuristic representations that are more compact and thereby enable the solution of larger DecPOMDPs. We provide theoretical guarantees that, when a suitable heuristic is used, both incremental clustering and incremental expansion yield algorithms that are both complete and search equivalent. Finally, we present extensive empirical results demonstrating that GMAA*ICE, an algorithm that synthesizes these advances, can optimally solve DecPOMDPs of unprecedented size.
Decentralized Control of Partially Observable Markov Decision Processes
"... Abstract — Markov decision processes (MDPs) are often used to model sequential decision problems involving uncertainty under the assumption of centralized control. However, many large, distributed systems do not permit centralized control due to communication limitations (such as cost, latency or co ..."
Abstract

Cited by 13 (8 self)
 Add to MetaCart
(Show Context)
Abstract — Markov decision processes (MDPs) are often used to model sequential decision problems involving uncertainty under the assumption of centralized control. However, many large, distributed systems do not permit centralized control due to communication limitations (such as cost, latency or corruption). This paper surveys recent work on decentralized control of MDPs in which control of each agent depends on a partial view of the world. We focus on a general framework where there may be uncertainty about the state of the environment, represented as a decentralized partially observable MDP (DecPOMDP), but consider a number of subclasses with different assumptions about uncertainty and agent independence. In these models, a shared objective function is used, but plans of action must be based on a partial view of the environment. We describe the frameworks, along with the complexity of optimal control and important properties. We also provide an overview of exact and approximate solution methods as well as relevant applications. This survey provides an introduction to what has become an active area of research on these models and their solutions. I.
Scaling up decentralized MDPs through heuristic search
 In UAI
, 2012
"... Decentralized partially observable Markov decision processes (DecPOMDPs) are rich models for cooperative decisionmaking under uncertainty, but are often intractable to solve optimally (NEXPcomplete). The transition and observation independent DecMDP is a general subclass that has been shown to h ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
Decentralized partially observable Markov decision processes (DecPOMDPs) are rich models for cooperative decisionmaking under uncertainty, but are often intractable to solve optimally (NEXPcomplete). The transition and observation independent DecMDP is a general subclass that has been shown to have complexity in NP, but optimal algorithms for this subclass are still inefficient in practice. In this paper, we first provide an updated proof that an optimal policy does not depend on the histories of the agents, but only the local observations. We then present a new algorithm based on heuristic search that is able to expand search nodes by using constraint optimization. We show experimental results comparing our approach with the stateoftheart DecMDP and DecPOMDP solvers. These results show a reduction in computation time and an increase in scalability by multiple orders of magnitude in a number of benchmarks. 1
Sufficient PlanTime Statistics for Decentralized POMDPs
 IJCAI
, 2013
"... Optimal decentralized decision making in a team of cooperative agents as formalized by decentralized POMDPs is a notoriously hard problem. A major obstacle is that the agents do not have access to a sufficient statistic during execution, which means that they need to base their actions on their hist ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Optimal decentralized decision making in a team of cooperative agents as formalized by decentralized POMDPs is a notoriously hard problem. A major obstacle is that the agents do not have access to a sufficient statistic during execution, which means that they need to base their actions on their histories of observations. A consequence is that even during offline planning the choice of decision rules for different stages is tightly interwoven: decisions of earlier stages affect how to act optimally at later stages, and the optimal value function for a stage is known to have a dependence on the decisions made up to that point. This paper makes a contribution to the theory of decentralized POMDPs by showing how this dependence on the ‘past joint policy ’ can be replaced by a sufficient statistic. These results are extended to the case of kstep delayed communication. The paper investigates the practical implications, as well as the effectiveness of a new pruning technique for MAA * methods, in a number of benchmark problems and discusses future avenues of research opened by these contributions. 1
Periodic finite state controllers for efficient POMDP and DECPOMDP planning
 In Proc. of the 25th Annual Conf. on Neural Information Processing Systems
, 2011
"... Applications such as robot control and wireless communication require planning under uncertainty. Partially observable Markov decision processes (POMDPs) plan policies for single agents under uncertainty and their decentralized versions (DECPOMDPs) find a policy for multiple agents. The policy in i ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Applications such as robot control and wireless communication require planning under uncertainty. Partially observable Markov decision processes (POMDPs) plan policies for single agents under uncertainty and their decentralized versions (DECPOMDPs) find a policy for multiple agents. The policy in infinitehorizon POMDP and DECPOMDP problems has been represented as finite state controllers (FSCs). We introduce a novel class of periodic FSCs, composed of layers connected only to the previous and next layer. Our periodic FSC method finds a deterministic finitehorizon policy and converts it to an initial periodic infinitehorizon policy. This policy is optimized by a new infinitehorizon algorithm to yield deterministic periodic policies, and by a new expectation maximization algorithm to yield stochastic periodic policies. Our method yields better results than earlier planning methods and can compute larger solutions than with regular FSCs. 1
Planning with macroactions in decentralized POMDPs
 In Proceedings of the Thirteenth International Conference on Autonomous Agents and Multiagent Systems
, 2014
"... Decentralized partially observable Markov decision processes (DecPOMDPs) are general models for decentralized decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent’s actions are primitive operations lasting exactly one time step. ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
Decentralized partially observable Markov decision processes (DecPOMDPs) are general models for decentralized decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent’s actions are primitive operations lasting exactly one time step. We address the case where each agent has macroactions: temporally extended actions which may require different amounts of time to execute. We model macroactions as options in a factored DecPOMDP model, focusing on options which depend only on information available to an individual agent while executing. This enables us to model systems where coordination decisions only occur at the level of deciding which macroactions to execute, and the macroactions themselves can then be executed to completion. The core technical difficulty when using options in a DecPOMDP is that the options chosen by the agents no longer terminate at the same time. We present extensions of two leading DecPOMDP algorithms for generating a policy with options and discuss the resulting form of optimality. Our results show that these algorithms retain agent coordination while allowing nearoptimal solutions to be generated for significantly longer horizons and larger statespaces than previous DecPOMDP methods. 1.
Toward Errorbounded Algorithms for InfiniteHorizon DECPOMDPs
"... Over the past few years, attempts to scale up infinitehorizon DECPOMDPs with discounted rewards are mainly due to approximate algorithms, but without the theoretical guarantees of their exact counterparts. In contrast, εoptimal methods have only theoretical significance but are not efficient in p ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Over the past few years, attempts to scale up infinitehorizon DECPOMDPs with discounted rewards are mainly due to approximate algorithms, but without the theoretical guarantees of their exact counterparts. In contrast, εoptimal methods have only theoretical significance but are not efficient in practice. In this paper, we introduce an algorithmic framework (βPI) that exploits the scalability of the former while preserving the theoretical properties of the latter. We build upon βPI a family of approximate algorithms that can find (provably) errorbounded solutions in reasonable time. Among this family, HPI uses a branchandbound search method that computes a nearoptimal solution over distributions over histories experienced by the agents. These distributions often lie near structured, lowdimensional subspace embedded in the highdimensional sufficient statistic. By planning only on this subspace, HPI successfully solves all tested benchmarks, outperforming standard algorithms, both in solution time and policy quality. 1.
Treebased solution methods for multiagent POMDPs with delayed communication
 In Proceedings of the TwentySixth AAAI Conference on Artificial Intelligence
, 2012
"... Multiagent Partially Observable Markov Decision Processes (MPOMDPs) provide a powerful framework for optimal decision making under the assumption of instantaneous communication. We focus on a delayed communication setting (MPOMDPDC), in which broadcasted information is delayed by at most one tim ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Multiagent Partially Observable Markov Decision Processes (MPOMDPs) provide a powerful framework for optimal decision making under the assumption of instantaneous communication. We focus on a delayed communication setting (MPOMDPDC), in which broadcasted information is delayed by at most one time step. This model allows agents to act on their most recent (private) observation. Such an assumption is a strict generalization over having agents wait until the global information is available and is more appropriate for applications in which response time is critical. In this setting, however, value function backups are significantly more costly, and naive application of incremental pruning, the core of many stateoftheart optimal POMDP techniques, is intractable. In this paper, we overcome this problem by demonstrating that computation of the MPOMDPDC backup can be structured as a tree and by introducing two novel treebased pruning techniques that exploit this structure in an effective way. We experimentally show that these methods have the potential to outperform naive incremental pruning by orders of magnitude, allowing for the solution of larger problems. 1
Errorbounded approximations for infinitehorizon discounted decentralized POMDPs
 In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases
, 2014
"... Abstract. We address decentralized stochastic control problems represented as decentralized partially observable Markov decision processes (DecPOMDPs). This formalism provides a general model for decisionmaking under uncertainty in cooperative, decentralized settings, but the worstcase complexity ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We address decentralized stochastic control problems represented as decentralized partially observable Markov decision processes (DecPOMDPs). This formalism provides a general model for decisionmaking under uncertainty in cooperative, decentralized settings, but the worstcase complexity makes it difficult to solve optimally (NEXPcomplete). Recent advances suggest recasting DecPOMDPs into continuousstate and deterministic MDPs. In this form, however, states and actions are embedded into highdimensional spaces, making accurate estimate of states and greedy selection of actions intractable for all but trivialsized problems. The primary contribution of this paper is the first framework for errormonitoring during approximate estimation of states and selection of actions. Such a framework permits us to convert stateoftheart exact methods into errorbounded algorithms, which results in a scalability increase as demonstrated by experiments over problems of unprecedented sizes.