Results 1  10
of
53
Optimal and approximate Qvalue functions for decentralized POMDPs
 J. Artificial Intelligence Research
"... Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue functi ..."
Abstract

Cited by 62 (26 self)
 Add to MetaCart
(Show Context)
Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue function Q ∗ is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q ∗. In this paper we study whether similar Qvalue functions can be defined for decentralized POMDP models (DecPOMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Qvalue function for DecPOMDPs: one that gives a normative description as the Qvalue function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Qvalue functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Qvalue function Q ∗. Finally, unifying some previous approaches for solving DecPOMDPs, we describe a family of algorithms for extracting policies from such Qvalue functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem. 1.
Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping
"... Distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, but NEXPComplete complexity hinders their scalability and application in realworld domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, an algorithm to solve such distributed ..."
Abstract

Cited by 43 (16 self)
 Add to MetaCart
(Show Context)
Distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, but NEXPComplete complexity hinders their scalability and application in realworld domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, an algorithm to solve such distributed POMDPs. The primary novelty of TREMOR is that agents plan individually with a single agent POMDP solver and use social model shaping to implicitly coordinate with other agents. Experiments demonstrate that TREMOR can provide solutions orders of magnitude faster than existing algorithms while achieving comparable, or even superior, solution quality.
Influencebased policy abstraction for weaklycoupled DecPOMDPs
 In International Conference on Automated Planning and Scheduling (ICAPS2010
, 2010
"... Decentralized POMDPs are powerful theoretical models for coordinating agents ’ decisions in uncertain environments, but the generallyintractable complexity of optimal joint policy construction presents a significant obstacle in applying DecPOMDPs to problems where many agents face many policy choi ..."
Abstract

Cited by 38 (12 self)
 Add to MetaCart
(Show Context)
Decentralized POMDPs are powerful theoretical models for coordinating agents ’ decisions in uncertain environments, but the generallyintractable complexity of optimal joint policy construction presents a significant obstacle in applying DecPOMDPs to problems where many agents face many policy choices. Here, we argue that when most agent choices are independent of other agents ’ choices, much of this complexity can be avoided: instead of coordinating full policies, agents need only coordinate policy abstractions that explicitly convey the essential interaction influences. To this end, we develop a novel framework for influencebased policy abstraction for weaklycoupled transitiondependent DecPOMDP problems that subsumes several existing approaches. In addition to formally characterizing the space of transitiondependent influences, we provide a method for computing optimal and approximatelyoptimal joint policies. We present an initial empirical analysis, over problems with commonlystudied flavors of transitiondependent influences, that demonstrates the potential computational benefits of influencebased abstraction over stateoftheart optimal policy search methods.
Incremental Clustering and Expansion for Faster Optimal Planning in Decentralized POMDPs
, 2013
"... This article presents the stateoftheart in optimal solution methods for decentralized partially observable Markov decision processes (DecPOMDPs), which are general models for collaborative multiagent planning under uncertainty. Building off the generalized multiagent A * (GMAA*) algorithm, which ..."
Abstract

Cited by 19 (12 self)
 Add to MetaCart
(Show Context)
This article presents the stateoftheart in optimal solution methods for decentralized partially observable Markov decision processes (DecPOMDPs), which are general models for collaborative multiagent planning under uncertainty. Building off the generalized multiagent A * (GMAA*) algorithm, which reduces the problem to a tree of oneshot collaborative Bayesian games (CBGs), we describe several advances that greatly expand the range of DecPOMDPs that can be solved optimally. First, we introduce lossless incremental clustering of the CBGs solved by GMAA*, which achieves exponential speedups without sacrificing optimality. Second, we introduce incremental expansion of nodes in the GMAA * search tree, which avoids the need to expand all children, the number of which is in the worst case doubly exponential in the node’s depth. This is particularly beneficial when little clustering is possible. In addition, we introduce new hybrid heuristic representations that are more compact and thereby enable the solution of larger DecPOMDPs. We provide theoretical guarantees that, when a suitable heuristic is used, both incremental clustering and incremental expansion yield algorithms that are both complete and search equivalent. Finally, we present extensive empirical results demonstrating that GMAA*ICE, an algorithm that synthesizes these advances, can optimally solve DecPOMDPs of unprecedented size.
A Bilinear Programming Approach for Multiagent Planning
"... Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of twoagent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is t ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
(Show Context)
Multiagent planning and coordination problems are common and known to be computationally hard. We show that a wide range of twoagent problems can be formulated as bilinear programs. We present a successive approximation algorithm that significantly outperforms the coverage set algorithm, which is the stateoftheart method for this class of multiagent problems. Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement. The new algorithm can be terminated at any time and–unlike the coverage set algorithm–it facilitates the derivation of a useful online performance bound. It is also much more efficient, on average reducing the computation time of the optimal solution by about four orders of magnitude. Finally, we introduce an automatic dimensionality reduction method that improves the effectiveness of the algorithm, extending its applicability to new domains and providing a new way to analyze a subclass of bilinear programs. 1.
Offline planning for communication by exploiting structured interactions in decentralized mdps
 In Web Intelligence and Intelligent Agent Technologies, 2009. WIIAT’09. IEEE/WIC/ACM International Joint Conferences on
, 2009
"... Variants of the decentralized MDP model focus on problems exhibiting some special structure that makes them easier to solve in practice. Our work is concerned with two main issues. First, we propose a new model, EventDriven Interaction with Complex Rewards, that addresses problems having structured ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
Variants of the decentralized MDP model focus on problems exhibiting some special structure that makes them easier to solve in practice. Our work is concerned with two main issues. First, we propose a new model, EventDriven Interaction with Complex Rewards, that addresses problems having structured transition and reward dependence. Our model captures a wider range of problems than existing structured models. In spite of its generality, the model still offers structure that can be leveraged by heuristics and solution algorithms. This is facilitated by explicitly representing interactions as firstclass entities. We formulate and solve instances of our model as bilinear programs. Second, we look at making offline planning for communication tractable. To this end, we propose heuristics that limit problem size by making communication available only at a few strategically chosen points based on an analysis that exploits problem structure in the proposed model. Experimental results demonstrate a reduction in problem size and solution time using restricted communication, with little or no decrease in solution quality. Our heuristics therefore allow us to solve problems that would otherwise be intractable. 1
An iterative algorithm for solving constrained decentralized Markov decision processes
 in: Proceedings of the 21st National Conference on Artificial Intelligence
"... Despite the significant progress to extend Markov Decision Processes (MDP) to cooperative multiagent systems, developing approaches that can deal with realistic problems remains a serious challenge. Existing approaches that solve Decentralized Markov Decision Processes (DECMDPs) suffer from the fa ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
Despite the significant progress to extend Markov Decision Processes (MDP) to cooperative multiagent systems, developing approaches that can deal with realistic problems remains a serious challenge. Existing approaches that solve Decentralized Markov Decision Processes (DECMDPs) suffer from the fact that they can only solve relatively small problems without complex constraints on task execution. OCDECMDP has been introduced to deal with large DECMDPs under resource and temporal constraints. However, the proposed algorithm to solve this class of DECMDPs has some limits: it suffers from overestimation of opportunity cost and restricts policy improvement to one sweep (or iteration). In this paper, we propose to overcome these limits by first introducing the notion of Expected Opportunity Cost to better assess the influence of a local decision of an agent on the others. We then describe an iterative version of the algorithm to incrementally improve the policies of agents leading to higher quality solutions in some settings. Experimental results are shown to support our claims.
Complexity of Decentralized Control: Special Cases
"... The worstcase complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
The worstcase complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. 1
Influencebased abstraction for multiagent systems
 In AAAI
, 2012
"... This paper presents a theoretical advance by which factored POSGs can be decomposed into local models. We formalize the interface between such local models as the influence agents can exert on one another; and we prove that this interface is sufficient for decoupling them. The resulting influence ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
This paper presents a theoretical advance by which factored POSGs can be decomposed into local models. We formalize the interface between such local models as the influence agents can exert on one another; and we prove that this interface is sufficient for decoupling them. The resulting influencebased abstraction substantially generalizes previous work on exploiting weaklycoupled agent interaction structures. Therein lie several important contributions. First, our general formulation sheds new light on the theoretical relationships among previous approaches, and promotes future empirical comparisons that could come by extending them beyond the more specific problem contexts for which they were developed. More importantly, the influencebased approaches that we generalize have shown promising improvements in the scalability of planning for more restrictive models. Thus, our theoretical result here serves as the foundation for practical algorithms that we anticipate will bring similar improvements to more general planning contexts, and also into other domains such as approximate planning, decisionmaking in adversarial domains, and online learning. 1