Results 1  10
of
26
Incremental Policy Generation for FiniteHorizon DECPOMDPs
"... Solving multiagent planning problems modeled as DECPOMDPs is an important challenge. These models are often solved by using dynamic programming, but the high resource usage of current approaches results in limited scalability. To improve the efficiency of dynamic programming algorithms, we propose ..."
Abstract

Cited by 33 (21 self)
 Add to MetaCart
Solving multiagent planning problems modeled as DECPOMDPs is an important challenge. These models are often solved by using dynamic programming, but the high resource usage of current approaches results in limited scalability. To improve the efficiency of dynamic programming algorithms, we propose a new backup algorithm that is based on a reachability analysis of the state space. This method, which we call incremental policy generation, can be used to produce an optimal solution for any possible initial state or further scalability can be achieved by making use of a known start state. When incorporated into the optimal dynamic programming algorithm, our experiments show that planning horizon can be increased due to a marked reduction in resource consumption. This approach also fits nicely with approximate dynamic programming algorithms. To demonstrate this, we incorporate it into the stateoftheart PBIP algorithm and show significant performance gains. The results suggest that the performance of other dynamic programming algorithms for DECPOMDPs could be similarly improved by integrating the incremental policy generation approach.
Scaling Up Optimal Heuristic Search in DecPOMDPs via Incremental Expansion
"... Planning under uncertainty for multiagent systems can be formalized as a decentralized partially observable Markov decision process. We advance the state of the art for optimal solution of this model, building on the Multiagent A * heuristic search method. A key insight is that we can avoid the full ..."
Abstract

Cited by 21 (14 self)
 Add to MetaCart
(Show Context)
Planning under uncertainty for multiagent systems can be formalized as a decentralized partially observable Markov decision process. We advance the state of the art for optimal solution of this model, building on the Multiagent A * heuristic search method. A key insight is that we can avoid the full expansion of a search node that generates a number of children that is doubly exponential in the node’s depth. Instead, we incrementally expand the children only when a next child might have the highest heuristic value. We target a subsequent bottleneck by introducing a more memoryefficient representation for our heuristic functions. Proof is given that the resulting algorithm is correct and experiments demonstrate a significant speedup over the state of the art, allowing for optimal solutions over longer horizons for many benchmark problems. 1
Pointbased policy generation for decentralized POMDPs
 In AAMAS
, 2010
"... Memorybounded techniques have shown great promise in solving complex multiagent planning problems modeled as DECPOMDPs. Much of the performance gains can be attributed to pruning techniques that alleviate the complexity of the exhaustive backup step of the original MBDP algorithm. Despite these i ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
Memorybounded techniques have shown great promise in solving complex multiagent planning problems modeled as DECPOMDPs. Much of the performance gains can be attributed to pruning techniques that alleviate the complexity of the exhaustive backup step of the original MBDP algorithm. Despite these improvements, stateoftheart algorithms can still handle a relative small pool of candidate policies, which limits the quality of the solution in some benchmark problems. We present a new algorithm, PointBased Policy Generation, which avoids altogether searching the entire joint policy space. The key observation is that the best joint policy for each reachable belief state can be constructed directly, instead of producing first a large set of candidates. We also provide an efficient approximate implementation of this operation. The experimental results show that our solution technique improves the performance significantly in terms of both runtime and solution quality.
Optimally solving DecPOMDPs as continuousstate MDPs
 in Proceedings of the TwentyThird International Joint Conference on Artificial Intelligence
, 2013
"... Optimally solving decentralized partially observable Markov decision processes (DecPOMDPs) is a hard combinatorial problem. Current algorithms search through the space of full histories for each agent. Because of the doubly exponential growth in the number of policies in this space as the planning ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
Optimally solving decentralized partially observable Markov decision processes (DecPOMDPs) is a hard combinatorial problem. Current algorithms search through the space of full histories for each agent. Because of the doubly exponential growth in the number of policies in this space as the planning horizon increases, these methods quickly become intractable. However, in real world problems, computing policies over the full history space is often unnecessary. True histories experienced by the agents often lie near a structured, lowdimensional manifold embedded into the history space. We show that by transforming a DecPOMDP into a continuousstate MDP, we are able to find and exploit these lowdimensional representations. Using this novel transformation, we can then apply powerful techniques for solving POMDPs and continuousstate MDPs. By combining a general search algorithm and dimension reduction based on feature selection, we introduce a novel approach to optimally solve problems with significantly longer planning horizons than previous methods. 1
An investigation into Mathematical Programming for Finite Horizon Decentralized POMDPs, in "The
 Journal of Artificial Intelligence Research
"... Decentralized planning in uncertain environments is a complex task generally dealt with by using a decisiontheoretic approach, mainly through the framework of Decentralized Partially Observable Markov Decision Processes (DECPOMDPs). Although DECPOMDPS are a general and powerful modeling tool, sol ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Decentralized planning in uncertain environments is a complex task generally dealt with by using a decisiontheoretic approach, mainly through the framework of Decentralized Partially Observable Markov Decision Processes (DECPOMDPs). Although DECPOMDPS are a general and powerful modeling tool, solving them is a task with an overwhelming complexity that can be doubly exponential. In this paper, we study an alternate formulation of DECPOMDPs relying on a sequenceform representation of policies. From this formulation, we show how to derive Mixed Integer Linear Programming (MILP) problems that, once solved, give exact optimal solutions to the DECPOMDPs. We show that these MILPs can be derived either by using some combinatorial characteristics of the optimal solutions of the DECPOMDPs or by using concepts borrowed from game theory. Through an experimental validation on classical test problems from the DECPOMDP literature, we compare our approach to existing algorithms. Results show that mathematical programming outperforms dynamic programming but is less efficient than forward search, except for some particular problems. The main contributions of this work are the use of mathematical programming for DECPOMDPs and a better understanding of DECPOMDPs and of their solutions. Besides, we argue that our alternate representation of DECPOMDPs could be helpful for designing novel algorithms looking for approximate solutions to DECPOMDPs. 1
Sufficient PlanTime Statistics for Decentralized POMDPs
 IJCAI
, 2013
"... Optimal decentralized decision making in a team of cooperative agents as formalized by decentralized POMDPs is a notoriously hard problem. A major obstacle is that the agents do not have access to a sufficient statistic during execution, which means that they need to base their actions on their hist ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Optimal decentralized decision making in a team of cooperative agents as formalized by decentralized POMDPs is a notoriously hard problem. A major obstacle is that the agents do not have access to a sufficient statistic during execution, which means that they need to base their actions on their histories of observations. A consequence is that even during offline planning the choice of decision rules for different stages is tightly interwoven: decisions of earlier stages affect how to act optimally at later stages, and the optimal value function for a stage is known to have a dependence on the decisions made up to that point. This paper makes a contribution to the theory of decentralized POMDPs by showing how this dependence on the ‘past joint policy ’ can be replaced by a sufficient statistic. These results are extended to the case of kstep delayed communication. The paper investigates the practical implications, as well as the effectiveness of a new pruning technique for MAA * methods, in a number of benchmark problems and discusses future avenues of research opened by these contributions. 1
Exploiting Model Equivalences for Solving Interactive Dynamic Influence Diagrams
"... We focus on the problem of sequential decision making in partially observable environments shared with other agents of uncertain types having similar or conflicting objectives. This problem has been previously formalized by multiple frameworks one of which is the interactive dynamic influence diagra ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
We focus on the problem of sequential decision making in partially observable environments shared with other agents of uncertain types having similar or conflicting objectives. This problem has been previously formalized by multiple frameworks one of which is the interactive dynamic influence diagram (IDID), which generalizes the wellknown influence diagram to the multiagent setting. IDIDs are graphical models and may be used to compute the policy of an agent given its belief over the physical state and others ’ models, which changes as the agent acts and observes in the multiagent setting. As we may expect, solving IDIDs is computationally hard. This is predominantly due to the large space of candidate models ascribed to the other agents and its exponential growth over time. We present two methods for reducing the size of the model space and stemming its exponential growth. Both these methods involve aggregating individual models into equivalence classes. Our first method groups together behaviorally equivalent models and selects only those models for updating which will result in predictive behaviors that are distinct from others in the updated model space. The second method further compacts the model space by focusing on portions of the behavioral predictions. Specifically, we cluster actionally equivalent models that prescribe identical actions at a single time step. Exactly identifying the equivalences would require us to solve all models in the initial set. We avoid this by selectively solving some of the models, thereby introducing an approximation. We discuss the error introduced by the approximation, and empirically demonstrate the improved efficiency in solving IDIDs due to the equivalences. 1.
Point Based Value Iteration with Optimal Belief Compression for DecPOMDPs
"... We present four major results towards solving decentralized partially observable Markov decision problems (DecPOMDPs) culminating in an algorithm that outperforms all existing algorithms on all but one standard infinitehorizon benchmark problems. (1) We give an integer program that solves collabo ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
We present four major results towards solving decentralized partially observable Markov decision problems (DecPOMDPs) culminating in an algorithm that outperforms all existing algorithms on all but one standard infinitehorizon benchmark problems. (1) We give an integer program that solves collaborative Bayesian games (CBGs). The program is notable because its linear relaxation is very often integral. (2) We show that a DecPOMDP with bounded belief can be converted to a POMDP (albeit with actions exponential in the number of beliefs). These actions correspond to strategies of a CBG. (3) We present a method to transform any DecPOMDP into a DecPOMDP with bounded beliefs (the number of beliefs is a free parameter) using optimal (not lossless) belief compression. (4) We show that the combination of these results opens the door for new classes of DecPOMDP algorithms based on previous POMDP algorithms. We choose one such algorithm, pointbased valued iteration, and modify it to produce the first tractable value iteration method for DecPOMDPs that outperforms existing algorithms. 1