Results 1  10
of
28
Optimal and approximate Qvalue functions for decentralized POMDPs
 J. Artificial Intelligence Research
"... Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue functi ..."
Abstract

Cited by 62 (26 self)
 Add to MetaCart
(Show Context)
Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue function Q ∗ is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q ∗. In this paper we study whether similar Qvalue functions can be defined for decentralized POMDP models (DecPOMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Qvalue function for DecPOMDPs: one that gives a normative description as the Qvalue function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Qvalue functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Qvalue function Q ∗. Finally, unifying some previous approaches for solving DecPOMDPs, we describe a family of algorithms for extracting policies from such Qvalue functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem. 1.
Incremental Clustering and Expansion for Faster Optimal Planning in Decentralized POMDPs
, 2013
"... This article presents the stateoftheart in optimal solution methods for decentralized partially observable Markov decision processes (DecPOMDPs), which are general models for collaborative multiagent planning under uncertainty. Building off the generalized multiagent A * (GMAA*) algorithm, which ..."
Abstract

Cited by 19 (12 self)
 Add to MetaCart
(Show Context)
This article presents the stateoftheart in optimal solution methods for decentralized partially observable Markov decision processes (DecPOMDPs), which are general models for collaborative multiagent planning under uncertainty. Building off the generalized multiagent A * (GMAA*) algorithm, which reduces the problem to a tree of oneshot collaborative Bayesian games (CBGs), we describe several advances that greatly expand the range of DecPOMDPs that can be solved optimally. First, we introduce lossless incremental clustering of the CBGs solved by GMAA*, which achieves exponential speedups without sacrificing optimality. Second, we introduce incremental expansion of nodes in the GMAA * search tree, which avoids the need to expand all children, the number of which is in the worst case doubly exponential in the node’s depth. This is particularly beneficial when little clustering is possible. In addition, we introduce new hybrid heuristic representations that are more compact and thereby enable the solution of larger DecPOMDPs. We provide theoretical guarantees that, when a suitable heuristic is used, both incremental clustering and incremental expansion yield algorithms that are both complete and search equivalent. Finally, we present extensive empirical results demonstrating that GMAA*ICE, an algorithm that synthesizes these advances, can optimally solve DecPOMDPs of unprecedented size.
MultiAgent Online Planning with Communication
"... We propose an online algorithm for planning under uncertainty in multiagent settings modeled as DECPOMDPs. The algorithm helps overcome the high computational complexity of solving such problems offline. The key challenge is to produce coordinated behavior using little or no communication. When c ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
We propose an online algorithm for planning under uncertainty in multiagent settings modeled as DECPOMDPs. The algorithm helps overcome the high computational complexity of solving such problems offline. The key challenge is to produce coordinated behavior using little or no communication. When communication is allowed but constrained, the challenge is to produce high value with minimal communication. The algorithm addresses these challenges by communicating only when history inconsistency is detected, allowing communication to be postponed if necessary. Moreover, it bounds the memory usage at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing offline planning algorithms and it outperforms the best online method, producing higher value with much less communication in most cases.
Multiagent Planning under Uncertainty with Stochastic Communication Delays
 In Proc. of the 18th Int. Conf. on Automated Planning and Scheduling
, 2008
"... We consider the problem of cooperative multiagent planning under uncertainty, formalized as a decentralized partially observable Markov decision process (DecPOMDP). Unfortunately, in these models optimal planning is provably intractable. By communicating their local observations before they take ac ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
We consider the problem of cooperative multiagent planning under uncertainty, formalized as a decentralized partially observable Markov decision process (DecPOMDP). Unfortunately, in these models optimal planning is provably intractable. By communicating their local observations before they take actions, agents synchronize their knowledge of the environment, and the planning problem reduces to a centralized POMDP. As such, relying on communication significantly reduces the complexity of planning. In the real world however, such communication might fail temporarily. We present a step towards more realistic communication models for DecPOMDPs by proposing a model that: (1) allows that communication might be delayed by one or more time steps, and (2) explicitly considers future probabilities of successful communication. For our model, we discuss how to efficiently compute an (approximate) value function and corresponding policies, and we demonstrate our theoretical results with encouraging experiments.
Offline planning for communication by exploiting structured interactions in decentralized mdps
 In Web Intelligence and Intelligent Agent Technologies, 2009. WIIAT’09. IEEE/WIC/ACM International Joint Conferences on
, 2009
"... Variants of the decentralized MDP model focus on problems exhibiting some special structure that makes them easier to solve in practice. Our work is concerned with two main issues. First, we propose a new model, EventDriven Interaction with Complex Rewards, that addresses problems having structured ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
Variants of the decentralized MDP model focus on problems exhibiting some special structure that makes them easier to solve in practice. Our work is concerned with two main issues. First, we propose a new model, EventDriven Interaction with Complex Rewards, that addresses problems having structured transition and reward dependence. Our model captures a wider range of problems than existing structured models. In spite of its generality, the model still offers structure that can be leveraged by heuristics and solution algorithms. This is facilitated by explicitly representing interactions as firstclass entities. We formulate and solve instances of our model as bilinear programs. Second, we look at making offline planning for communication tractable. To this end, we propose heuristics that limit problem size by making communication available only at a few strategically chosen points based on an analysis that exploits problem structure in the proposed model. Experimental results demonstrate a reduction in problem size and solution time using restricted communication, with little or no decrease in solution quality. Our heuristics therefore allow us to solve problems that would otherwise be intractable. 1
Decentralized Control of Partially Observable Markov Decision Processes
"... Abstract — Markov decision processes (MDPs) are often used to model sequential decision problems involving uncertainty under the assumption of centralized control. However, many large, distributed systems do not permit centralized control due to communication limitations (such as cost, latency or co ..."
Abstract

Cited by 13 (8 self)
 Add to MetaCart
(Show Context)
Abstract — Markov decision processes (MDPs) are often used to model sequential decision problems involving uncertainty under the assumption of centralized control. However, many large, distributed systems do not permit centralized control due to communication limitations (such as cost, latency or corruption). This paper surveys recent work on decentralized control of MDPs in which control of each agent depends on a partial view of the world. We focus on a general framework where there may be uncertainty about the state of the environment, represented as a decentralized partially observable MDP (DecPOMDP), but consider a number of subclasses with different assumptions about uncertainty and agent independence. In these models, a shared objective function is used, but plans of action must be based on a partial view of the environment. We describe the frameworks, along with the complexity of optimal control and important properties. We also provide an overview of exact and approximate solution methods as well as relevant applications. This survey provides an introduction to what has become an active area of research on these models and their solutions. I.
Decentralized Monitoring of Distributed Anytime Algorithms ABSTRACT
"... Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the “right” time so as to optimize a given timedependent utility function. However, these results apply only to th ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Anytime algorithms allow a system to trade solution quality for computation time. In previous work, monitoring techniques have been developed to allow agents to stop the computation at the “right” time so as to optimize a given timedependent utility function. However, these results apply only to the singleagent case. In this paper we analyze the problems that arise when several agents solve components of a larger problem, each using an anytime algorithm. Monitoring in this case is more challenging as each agent is uncertain about the progress made so far by the others. We develop a formal framework for decentralized monitoring, establish the complexity of several interesting variants of the problem, and propose solution techniques for each one. Finally, we show that the framework can be applied to decentralized flow and planning problems.
Modeling Information Exchange Opportunities For Effective Humancomputer Teamwork
"... This paper studies information exchange in collaborative group activities involving mixed networks of people and computer agents. It introduces the concept of “nearly decomposable” decisionmaking problems to address the complexity of information exchange decisions in such multiagent settings. This ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This paper studies information exchange in collaborative group activities involving mixed networks of people and computer agents. It introduces the concept of “nearly decomposable” decisionmaking problems to address the complexity of information exchange decisions in such multiagent settings. This class of decisionmaking problems arise in settings which have an action structure that requires agents to reason about only a subset of their partners ’ actions but otherwise allows them to act independently. The paper presents a formal model of nearly decomposable decision making problems, NEDMDPs, and defines an approximation algorithm, NEDDECOP that computes efficient information exchange strategies. The paper shows that NEDDECOP is more efficient than prior collaborative planning algorithms for this class of problem. It presents an empirical study of the information exchange decisions made by the algorithm that investigates the extent to which people accept interruption requests from a computer agent. The context for the study is a game in which the agent can ask people for information that may benefit its individual performance and thus the group’s collaboration. This study revealed the key factors affecting people’s perception of the benefit of interruptions in this setting. The paper also describes the use of machine learning to predict the situations in which people deviate from the strategies generated by the algorithm, using a combination of domain features and features informed by the algorithm. The methodology followed in this work could form the basis for designing agents that effectively exchange information in collaborations with people.